2. A quick overview#

In this course you will get to know the programming language Python. You will get to know the language up to a level where you will be able to write basic scripts and custom modules and classes. You will know how to run Python code as an executable program, in an interactive loop (e.g. QTconsole), or in a notebook environment such as Jupyter.

Since this is the preferred way for a majority of data scientists, and a convenient form for educational purposes, we will mainly focus on the notebook environment. The document you are currently reading is a Jupyter notebook rendered into an e-Book (Jupyterbook).

2.1. Calculating the surface area of triangles#

In this chapter we will walk through an entire coding case to create a standalone program that can be executed in a terminal environment (MacOS & Linux: terminal, Windows: Command Prompt).
The case is a trivial example: calculating the surface area of triangles.

2.1.1. Variables#

Suppose I want to calculate the surface of three different equilateral triangles, with sides 3, 4 and 2.21 respectively.
First I define a variable representing the length of the side and assign an initial value to it. To inspect the value of a variable at any given time you can print() it.

side = 3
print(side) # print the side
3

2.1.2. Functions#

The phrase print is the name of a function. This particular function can be used to display text on the console (or as output from a code cell). Arguments to a function are passed between a pair of parentheses ().

A function is a re-usable piece of code, accessible by its name. It can receive input data (arguments) specified by its parameters, passed into the function between a pair of parentheses ().
Even if there are no arguments to be passed, the parentheses are mandatory.

2.1.3. Comments#

Note the use of #text to add human-readable comments to the code. In all Python code (notebook code cell, script and repl console), everything on a line after # will be ignored when evaluating the cell.
When using an editor with syntax highlighting this will be shown clearly with a grey or other light color, as in the cell above.

Code comments

Any text occurring on a line after a hash symbol “#” will be ignored by the python interpreter.
Text after a hash is called code comment.

Displaying results of a cell

The result of the last expression in a notebook cell will be displayed even if it is not print()-ed explicitly.

2.1.4. Operators#

Next, I use the side variable to calculate the surface of a 3 - 3 - 3 triangle:

The formula to calculate the surface area of an equilateral triangle is $\(\frac{1}{2} * side^2\)$

To calculate this in Python code, you will need to combine data using things such as addition, subtraction, multiplication, division etc.
We call the symbols that signal these mathematical operations operators.
Below you see two of these: * and **.

side = 3
area =  0.5 * side**2

f'The area of an equilateral triangle with side {side} is {area}'
'The area of an equilateral triangle with side 3 is 4.5'

Operators

Operators such as * and ** are (combinations of) symbols within Python code that operate on (usually) the two values on either side of it (named operands), and that will generate a new value from these two input values. We call this combination an expression.

The mathematical operators in Python are not a big surprise: + is the addition operator that takes the values on either side and returns the sum of the two. Other mathematical operators are minus (-), plus (+), division (/).

There are many more operators, classified as mathematical (like the above), assignment (like the = symbol in the above cells), comparison operators (such as <), and some more categories.

A complete listing of operators can be found here or here

2.1.5. Format strings#

In the code cell above you may have noticed the use of this:

f'The area of an equilateral triangle with side {side} is {area}'

The structure f'Some text with {variable}' is a format string.
A format string is character sequence in which we can insert values from python variables or expressions at the location of the curly braces to produce this:

'The area of an equilateral triangle with side 3 is 4.5'

Until a few Python versions ago we could not use format strings and needed to write this:

print("The area of an equilateral triangle with side ", side, " is ", area)'

instead of this:

f'The area of an equilateral triangle with side {side} is {area}'

A big improvement!

2.1.6. Custom functions#

I want to do the same calculations and result reporting for triangles with other side lengths.
I could just copy-and-paste all code, as in the cell below:

side = 4
area =  0.5 * side**2

print(f'The area of an equilateral triangle with side {side} is {area}') 

side = 2.21
area =  0.5 * side**2

#note the use of round() in the statement below
print(f'The area of an equilateral triangle with side {side} is {round(area, 2)}') 
The area of an equilateral triangle with side 4 is 8.0
The area of an equilateral triangle with side 2.21 is 2.44

This is awful!

Copy-and-paste activities are a real no-no in programming. Whenever you catch yourself using Ctrl+C & Ctrl+V, stop and think of a better way to do it.

In many cases this will result in extracting the copied code into a custom function.

Below, the re-used piece of code is embedded in the function named triangle_area. It takes a single argument named side which is implicitly assumed to be a number.

def print_triangle_area(side):
    area =  0.5 * side**2
    print(f'The area of an equilateral triangle with side {side} is {round(area, 2)}')

With this function defined (and loaded) it is easy to repeat the operation for a whole series of values:

print_triangle_area(3)
print_triangle_area(4)
print_triangle_area(2.21)
The area of an equilateral triangle with side 3 is 4.5
The area of an equilateral triangle with side 4 is 8.0
The area of an equilateral triangle with side 2.21 is 2.44

2.1.7. Flow control: a “loop”#

But wait! There is still copied code. We now take a leap forward in Python programming concepts. We use the for loop to iterate over a collection of values contained within a list, which is an ordered collection of values maintained as a single unit. Here is a solution that does not use copied code at all.

sides = [3, 4, 2.21]        # a list of values
for n in sides:             # iterate using for
    print_triangle_area(n)  # the iterated 'block', as indented line(s)
The area of an equilateral triangle with side 3 is 4.5
The area of an equilateral triangle with side 4 is 8.0
The area of an equilateral triangle with side 2.21 is 2.44

The for loop iterates an iterable (a collection of values) and executes the given block for each of the values. We call this type of construct a flow control element because it controls the flow of the program.

2.2. Bring it together in a program#

By now you may be thinking that this does not look like programming at all. Where is the program? Actually, there is programming code but no standalone program (executable) here.

A program is usually a piece of functionality on a computer or other device that receives some input (keyboard, mouse, touchscreen, sensor, etc.) and generates some output (screen, terminal, file, database).

Given the triangle example above, a standalone terminal program for calculating and reporting triangle surface area would look like the code listing below.

import sys # non-core language functionality needs to be imported

def print_triangle_area(side):
    area =  0.5 * side**2
    print(f'The area of an equilateral triangle with side {side} is {round(area, 2)}')

for arg in sys.argv[1:]:        # loop the command-line arguments
    side = float(arg)           # convert to number 
    print_triangle_area(side)

Suppose this code is stored in a text file (e.g. ‘triangle_surface.py’) on your computer it could be run from a terminal (Linux or MacOS) or command prompt (Windows) using the command

> python3 triangle_surface.py 3 4 2.21
The area of an equilateral triangle with side 3 is 4.5
The area of an equilateral triangle with side 4 is 8.0
The area of an equilateral triangle with side 2.21 is 2.44

The code for this program, which we usually call a script, can be found here.

There are some elements that you probably do not understand yet. For those who cannot proceed without a little premature explanation:

  • import sys says “load the functionality located in module sys and make it available here”. The functionality that is available by default in any Python program is rather limited to keep it lean. Any additional functionality must be loaded from modules using an import statement.

  • sys.argv is the list of arguments entered on the command line, in this case ["triangle_surface.py", "3", "4", "2.21"]. We’ll get to lists in the next chapter.

  • for arg in sys.argv[1:]: this says: “iterate the command-line arguments but skip the first”. Again, subject of a later chapter.

2.3. Key Concepts#

Important

  • (computer) program: a computer program is a sequence or set of instructions in a programming language that a computer (or other device) can execute or interpret.

  • flow control: programming elements that control the flow of a program. Flow control elements are used for iteration and conditional execution.

  • function: A function is a chunk of code (usually named) that you can re-use, rather than copying it multiple times. Functions enable programmers to break down a problem into smaller pieces, each of which performs a particular task.

  • import: A statement making functionality available that is not loaded by default.

  • Jupyter: a Notebook platform in which you do interactive literate programming. It supports Julia, Python and R.

  • Markdown: Markdown is a lightweight markup language that you can use to add formatting elements to plaintext text documents. It is used in a wide range of settings: Jupyter Notebooks, R Markdown, eBook authoring etcetera.

  • operator: a symbol that operates on operands, usually on both sides, together forming an expression.

  • Python: a very popular programming language, praised for its ease of learning and use and applicability in a wide range of programming challenges.

  • script: a text file with computer code that can be executed as a program, usually by the interpreter for the programming lanuage used.

  • variable: a program element that couples a name to a memory location with some contents.