7. The core library functions#

Python ships with a wealth of built-in functions at your disposal. Some are readily available but many should be “activated” using an import statement.

This chapter only deals with a very cursory inspection; your should really bookmark the docs.

7.1. Built-in functions#

The builtins are the functions you have always at your disposal, without import statement. The list is not very long because this keeps the core of Python lean. The complete list can be found in the doc page on built-ins

The core contains “constructor functions” for all built-in datatypes - which we have already seen: bool(), dict(), float(), int(), list(), set(), str(), tuple() (some were not discussed and will neither be listed here)

Also already discussed or demonstrated were help(), input(), len(), max(), min(), range(), type()

There are some functions related to object-oriented programming (OOP): getattr(), setattr(), hasattr(), isinstance(), issubclass(), iter(), next(), staticmethod() that will be deiscussed in the chapter on OOP.

In the listing below only a very short description is given. For a few others a more detailed discussion is provided in the following sections.

  • abs() gives the absolute value of a number (i.e. removes the minus sign if present).

  • dir() and vars() help you inspect the attributes available on a class, object or the current environment.

  • sum() gives the sum of a numeric iterable.

  • pow(a, b) calculates power of a to b (a^4)

  • round() rounds a number to the given number decimal digits (or to the nearest integer of none provided)

  • reversed() gives a reversed iterator of a sequence object. Can be inserted in a list or tuple constructor, or in a iteration control structure.

  • zip() yields n-length tuples, where n is the number of iterables passed as positional arguments to zip(). The i-th element in every tuple comes from the i-th iterable argument to zip().

7.1.1. Use zip() to create dicts#

The zip() function is used most often to create dicts since the dict constructor accepts a series of two-element tuples as initialization values.

Here is an example

tups = zip('abcde', range(5))
for t in tups:
    print(t)
    
my_dict = dict(zip('abcde', range(5)))
print(my_dict)
('a', 0)
('b', 1)
('c', 2)
('d', 3)
('e', 4)
{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4}

Note that if you try to do my_dict(tups) above, you get an empty dict because the tups object has already been iterated!

7.1.2. Get an iteration counter with enumerate()#

Used primarily in for loops to get hold of an iteration counter. It is a solution for the for(int i, i < length(seq), i++){} structure in other languages. The enumeration is wrapped around an iterable object such as string, list or other collection.

for (i, c) in enumerate("abcd"):
    print(f'the number {i+1} letter of the alphabet is {c}')
the number 1 letter of the alphabet is a
the number 2 letter of the alphabet is b
the number 3 letter of the alphabet is c
the number 4 letter of the alphabet is d

7.1.3. Read and write files with open()#

The open() function gives an iterator of file contents when used in a read-only context. It is often used in conjunction with the for loop.

for line in open("data/employees.txt"):
    print(line.strip())
employee_id;last_name;role
1;Jacobs;ict architect
2;Howard;programmer
3;Pierson;data scientist

The open() function comes with a few more parameters, of which only the mode is inetersting right now. It takes a string inidcating how the file should be opened. The default is "rt" which means the file is opened in read-only, text mode. When you want to write to file these are the options at your disposal:

  • "x": Create - creates the file but returns an error if the file exists

  • "a": Append - creates the file if the specified file does not exist and appends to the end

  • "w": Write - creates the file if the specified file does not exist and overwrites if it does exist

fruits = ["kiwi", "apple", "guava"]
fruits_file = open("data/fruits.txt", "w") # overwrite mode!
for fruit in fruits:
    fruits_file.write(f'this is a fruit: {fruit}\n') # the \n adds a newline
    
fruits_file.close()

The file fruits.txt now has this contents, no matter how often the snippet is run:

this is a fruit: kiwi
this is a fruit: apple
this is a fruit: guava

7.1.4. Converting between characters and ASCII/unicode#

Below the surface, characters are just numbers. Originally there were only 128 characters that could be encoded using a byte: the ASCII characters:

ASCII codes

The pair of functions chr() and ord() can be used to convert characters to their numeric counterpart and vice versa. For instance,

print(chr(7))

will sound a ‘bell’ (but not in Jupyter unfortunately - give it a try in ipython).

The distinction between chr() and str() is: str() will give the string representation of a number (or any object for that matter) whereas chr() will give the character belonging to a numeric code.

test = "Some Text"
ords = []
letters = []
for letter in test:
    print('{:<3} is encoded by {:<3}'.format(letter, ord(letter)))
S   is encoded by 83 
o   is encoded by 111
m   is encoded by 109
e   is encoded by 101
    is encoded by 32 
T   is encoded by 84 
e   is encoded by 101
x   is encoded by 120
t   is encoded by 116

7.1.5. Sorting with sorted() and list.sort()#

Sorting is quite ubiquitous in programming: give top-5 performing employees, sort countries on average income, sort members on last name, etc.

There are two functions available.

  • The built-in function sorted() returns a sorted copy of the original list

  • The list method sort() performs an in-place sort that modifies the original list

Both use natural ordering of text data (alphabetically) and numeric data (ascending) and both provide two customizing parameters: reverse and key.

fruits = ["kiwi", "apple", "guava"]
print(sorted(fruits))
print(fruits)        # unchanged!
['apple', 'guava', 'kiwi']
['kiwi', 'apple', 'guava']
fruits = ["kiwi", "apple", "guava"]
fruits.sort()
print(fruits)        # modified in-place
['apple', 'guava', 'kiwi']

Reversed sorting can be done using the function argument reverse=True|False

print(sorted([3, 2, 4, 1])) # default is reverse=False
print(sorted([3, 2, 4, 1], reverse=True))
[1, 2, 3, 4]
[4, 3, 2, 1]
numbers = [3, 2, 4, 1]
numbers.sort(reverse=True)
numbers
[4, 3, 2, 1]

7.1.5.1. The key parameter#

This parameter makes it possible to define custom sorting of collection types and objects. It takes as value a function that will return some property of each element to sort on.

For instance, suppose you want to sort a list of words on the second character:

def second_character_sorter(word):
    return word[1]

sorted(fruits, key=second_character_sorter)
['kiwi', 'apple', 'guava']

7.1.6. Lambdas (optional)#

Lambdas

A lambda function is a small anonymous function that is usually locally defined. They can take any number of arguments but can only have a single expression.

The sorted() parameter key is most often used in conjunction with an anonymous type of function called a lambda. They are usually defined at the location where they are needed and have the form of

lambda <data>: <return property of data>

The above function could have been written as this lambda:

key=lambda fruit: fruit[1]
fruits = ["kiwi", "apple", "guava"]

sorted(fruits, key=lambda fruit: fruit[1])
['kiwi', 'apple', 'guava']

Here is another example, involving a list of dictionaries.

fruits = [
    {'name': 'apple', 'color': 'green/red', 'origin': 'Europe'},
    {'name': 'kiwi', 'color': 'green', 'origin': 'New Zealand'},
    {'name': 'orange', 'color': 'orange', 'origin': 'Europe'},
    {'name': 'banana', 'color': 'yellow', 'origin': 'Africa'}]

sorted(fruits, key = lambda fruit: fruit['origin'])
[{'name': 'banana', 'color': 'yellow', 'origin': 'Africa'},
 {'name': 'apple', 'color': 'green/red', 'origin': 'Europe'},
 {'name': 'orange', 'color': 'orange', 'origin': 'Europe'},
 {'name': 'kiwi', 'color': 'green', 'origin': 'New Zealand'}]

7.1.6.1. Multi-key sorting#

Whenever you need sorting based on multiple properties - e.g. sorting first on family name and then on given name - you can employ the trick of tuple sorting.
If your ‘key’ function returns a tuple, then standard tuple sorting will be performed: on each consecutive element of the tuple.

persons = [{'first': 'Mark', 'last': 'Adams', 'age': 35},
           {'first': 'Brad', 'last': 'Young', 'age': 64}, 
           {'first': 'Rose', 'last': 'Berg', 'age': 51},
           {'first': 'Julia', 'last': 'Adams', 'age': 28}]

def last_first_sort(person):
    return (person['last'], person['first'])  # returns a tuple with last and first name

sorted(persons, key = last_first_sort)
[{'first': 'Julia', 'last': 'Adams', 'age': 28},
 {'first': 'Mark', 'last': 'Adams', 'age': 35},
 {'first': 'Rose', 'last': 'Berg', 'age': 51},
 {'first': 'Brad', 'last': 'Young', 'age': 64}]

7.2. Using filter() and map() (optional)#

These functions are used on collections, to filter the elements in them on some property, or to change each element or to swap them for something else. They represent important players in the filter-map-reduce functional programming paradigm.

For example, imagine a cupcake production line. There will be a machine taking in a plate of cupcakes and applying frosting to all of them: it maps a cupcake to a frosted cupcake. There will also be a machine taking in a plate of cupcakes, removing the badly formed ones: it filters the cupcakes, only letting the good ones pass.

  • map() applies a function (e.g. frosting) to all members of a collection, and returns the resulting collection, which is of course the same size as the original

  • filter() applies a function (e.g. scanning bad cupcakes) to all members of a collection, only keeping those members that pass the function (return True)

Map vs Filter Here follows an example of a map/filter chain. Note that both map and filter produce iterator objects that you usually need to embed in a collection constructor.

fruits = ["kiwi", "orange", "apple", "guava", "banana"]

def capitalize_name(fruit):
    return fruit.capitalize()
    
list(map(capitalize_name, fruits))
['Kiwi', 'Orange', 'Apple', 'Guava', 'Banana']
fruits = ["kiwi", "orange", "apple", "guava", "banana"]

def filter_with_an(fruit):
    return "an" in fruit
    
list(filter(filter_with_an, fruits))
['orange', 'banana']

Note that working with these functions is largely superceded by the use of comprehensions, which are outlined in the next chapter. I included these functions here for completeness’ sake, and because not everybody likes comprehensions.

7.3. Working with modules#

The core functionality of Python that is available to you once you start coding is rather small. That is because the deveolpers of the language wanted to keep the memory footprint as small as possible. Why load functionality if there is a significant possibility it will not be used?

To solve the footprint problem, most functionality in Python is put inside modules.

Modules

A Python module is a file containing code that can be imported into other modules, scripts or interactive sessions. A module can define functions, classes and variables. A module can also include runnable code.

A module allows you to organize and distribute your Python code. Grouping related code into a module makes the code easier to understand and use.

To make use of functionality (or data) within modules you need to import these. Here is a small example to illustrate.

import math
math.sqrt(16)
4.0

When you want to access data or functions within a module you need to use the module name, followed by the dot operator and then the function or data attribute name, as in math.sqrt()

To prevent having to type the module name all the time you can also specify which attributes of a module you want to import using the from <module> import <attr> syntax:

from math import ceil, floor 
print(ceil(3.1222))
print(floor(3.567))
4
3

Or, alternatively, use from <module> import <attr> as <name> to use a different name than the one specified for/in the module itself:

from math import floor as fl
fl(3.9999)
3

To import everything from a module you use the asterisk:

from math import *

Be hesitant to do this because it clutters your global namespace.
Besides this, using the math.ceil() syntax makes it clear in which module the function was defined.

7.3.1. Using your own modules#

Whenever you create functionality that you want to reuse between scripts you can put those functions within a script and put this in the Python search path. This concept is out of scope for the course. The simplest way is to put it somewhere in a subfolder of your current project.

See RealPython for a discussion of this topic.

In the example below, the module is defined by file my_module.py located in folder ./scripts, the contents of which are:

'''
a simple module
'''
message = "Programming is for everyone"

def say_hello(name):
    print(f"Hello {name}!")
from scripts import my_module
my_module.say_hello("Rob")

#or
# from scripts.my_module import say_hello
# say_hello("Mike")

print(help(my_module)) # use help to get info
Hello Rob!
Help on module scripts.my_module in scripts:

NAME
    scripts.my_module - a simple module

FUNCTIONS
    say_hello(name)

DATA
    message = 'Programming is for everyone'

FILE
    /Users/michielnoback/git_projects/python_intro/scripts/my_module.py


None

7.3.2. Core modules#

Besides the math module we have seen before Python ships with a wealth of other modles, each with their specific purpose and application domain. Here are just a few, and even less examples; have a look at The Python docs for a complete listing.

For all modules, importing them and typing help(module) is the best way to get detailed information.

module

purpose

time & datetime

time and date manipulations

math

math data and functions

os & os.path

operating system interface

sys

interpreter acces

itertools

functions creating iterators for efficient looping

csv

cvs file reading and parsing

pickle

storing objects on disk

shelve

storing objects in a “dict on disk”

7.3.2.1. Module sys#

This module is used for interacting with the interpreter, the Python system, not with the Operating System (see module os). Here are the main attributes and functions (copied from help):

Objects:

  • argv – command line arguments; argv[0] is the script pathname if known

  • stdin – standard input file object; used by input()

  • stdout – standard output file object; used by print()

  • stderr – standard error object; used for error messages

By assigning other file objects (or objects that behave like files) to these, it is possible to redirect all of the interpreter’s I/O.

Functions:

  • exit() – exit the interpreter by raising SystemExit

Example: using redirection of output stream. By providing a write() method you can redirect messages sent to the output stream, as in print(). First we need an interceptor:

class MyLogger:
    def __init__(self):
        self.log = ['My Logger\n']
        self.msgCount = 0
    def write(self, message):
        if message == '\n':
            return  
        self.msgCount += 1
        self.log.append('[{}: {}]\n'.format(self.msgCount, message) )

Next we can use this to intercept, store and adjust print messages

import sys
print('start log test\n') ## to regular std out
myLog = MyLogger()
_out = sys.stdout         ## store for later usage
sys.stdout = myLog        ## redirect stdout to myLogger
print('a first message')  ## print to myLogger
print('a second message')
sys.stdout = _out         ## restore print to std out
print(''.join(myLog.log)) ## print myLog to std out
print('end log test')
start log test

My Logger
[1: a first message]
[2: a second message]

end log test

7.3.2.2. Modules os & os.path#

As stated, the os module is used to interact with the operating system, especially the file system. Here are some common use cases.

import os
print(os.getcwd())                          #working dir - defaults to where the script lives
os.listdir()[:2]        # list contents (only first two items)
/Users/michielnoback/git_projects/python_intro
['13_next_steps.ipynb', '07_comprehensions.ipynb']

Other os functions that are often used:

  • os.chdir(): change the current working directory

  • os.mkdir(path[, mode]): Create a directory named path (with unix mode). You can also create temporary directories; see tempfile module

  • os.remove(path): Remove (delete) the file path. If path is a directory, an OSError is raised; see rmdir() below to remove a directory.

  • os.rename(src,dst): Rename the file or directory src to dst

  • os.system(command): Execute the command (a string) in a subshell

Some os.path goodies

  • os.path.sep: The character used by the OS to separate pathname components: ‘/’ (Linux) or ‘' (Windows)

  • os.path.exists(path): True if path refers to an existing path

  • os.path.isfile(path): True if path is an existing regular file

  • os.path.isdir(path): True if path is an existing directory

  • os.path.split(path): a split pathname into a pair (head, tail) where tail is the last pathname component and head is everything leading up to it

  • os.path.join(path1[, path2[, ...]]): an intelligently joined set of components

import os.path as path
print(path.split(os.getcwd()))
print(path.join("Downloads", "data", "project1"))
('/Users/michielnoback/git_projects', 'python_intro')
Downloads/data/project1

7.3.2.3. The itertools module#

Module itertools provides functions that create iterators for efficient looping. Here are a few:

import itertools as it
print(it.repeat(42, 5))         # prints the type and properties, not the data
print(list(it.repeat(42, 5)))   # need to wrap it in collection type or loop it
print(list(it.product([1,2,3],['a', 'b'])))
print(list(it.accumulate([2, 4, 6])))
repeat(42, 5)
[42, 42, 42, 42, 42]
[(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')]
[2, 6, 12]

7.3.2.4. Module csv#

Text files with the csv format are quite ubiquitous in data science.

Of course, you can write your own parser for every file type, which is not difficult of course, but you can make it even easier for yourself. Just use module csv!

Here, I present some examples for reading csv files. Remember that writing is just as easy!

As example, see the input file data/exp_data.csv:

exp,value,message
1,0.567,OK
2,0.334,OK
3,0.325,ND
4,0.766,OK
5,0.455,OK
6,0.421,ERR
7,0.876,OK

Let’s look at how to process this.

import csv
with open('data/exp_data.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    for row in csv_reader:
        print(row)
['exp', 'value', 'message']
['1', '0.567', 'OK']
['2', '0.334', 'OK']
['3', '0.325', 'ND']
['4', '0.766', 'OK']
['5', '0.455', 'OK']
['6', '0.421', 'ERR']
['7', '0.876', 'OK']

Alternatively, you can get an iterator of dictionaries where the first line in the file is assumed to be a header line

with open('data/exp_data.csv') as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=',')
    for row in csv_reader:
        print(row)
{'exp': '1', 'value': '0.567', 'message': 'OK'}
{'exp': '2', 'value': '0.334', 'message': 'OK'}
{'exp': '3', 'value': '0.325', 'message': 'ND'}
{'exp': '4', 'value': '0.766', 'message': 'OK'}
{'exp': '5', 'value': '0.455', 'message': 'OK'}
{'exp': '6', 'value': '0.421', 'message': 'ERR'}
{'exp': '7', 'value': '0.876', 'message': 'OK'}

7.3.2.5. The pickle module#

Python can write any data structure and objects into a file and read it again out of that file. This is done using the pickle module.

This is a very useful feature if -for example- you want to store program state for a next run. Here is a simple example.

import pickle

my_preferences = {'linewidth': 60, 'verbosity': 'HIGH'}

with open('data/preferences', mode='bw') as dmp:  #write to bniary mode
    pickle.dump(my_preferences, dmp)
    
# don't try to read the file - it is binary

with open('data/preferences', mode='rb') as dmp:  #read from binary mode
    loaded_prefs = pickle.load(dmp)

print(loaded_prefs)
{'linewidth': 60, 'verbosity': 'HIGH'}

7.3.2.6. The shelve module#

Related to the pickle module is the shelve module.
It is generally used to store data in a database-like structure.
The difference is you do NOT have to load the entire data structure every time you want to access a single item of it.

Here is a first run of the script: it stores three users. When you call open() on a non-existing shelve, one is automatically created. With the example below, a binary filed named users.db is created.

import shelve
users = shelve.open("data/users")
users['Mich'] = dict(name='Michiel', access='ALL')
users['Piet'] = dict(name='Piet', access='EDIT')
users['Sven'] = dict(name='Sven', access='NONE')

print(users['Mich'])
users.close()
{'name': 'Michiel', 'access': 'ALL'}

Now do an update, and a read operation; imagine this is in another shell or script:

new_users = shelve.open("data/users")
new_users['Arne'] = dict(name='Arne Poortinga', access='MIN')
print(new_users['Mich'])
print(len(new_users))    # 4 users in the database now
{'name': 'Michiel', 'access': 'ALL'}
4

7.3.2.7. JSON#

Although pickle and shelve are convenient, With serious applications JSON is the way to go. There are many libraries and programming languages with integrated support for json, and stored files are human-readable as well. This is outside the scope of this course.
Have a look at the json module

7.4. Key concepts#

Important

  • lambda: A lambda is an anonymous locally-defined function, an inline executable statement.

  • filter-map-reduce: Map, Filter, and Reduce are paradigms of functional programming. See here for more details.

  • module: A module is a file containing code that can be imported into other modules, scripts or interactive sessions. A module can define functions, classes and variables.