Chapter 3 Basic R - coding

3.1 First look at vectors, fuctions and variables

3.1.1 Doing Math in the console

The console is the place where you do quick calculations, tests and analyses that do not need to be saved (yet) or repeated. It is the the tab that says “Console” and on first use, R puts it in the lower left panel.

In the console, the prompt is the “greater than” symbol “>”. R waits here for you to enter commands. When the panel has “focus” the cursor is blinking on and off. You can use the console as a calculator. It supports all regular math operations, in the way you would expect them:

+  : ‘plus’, as in 2 + 2 = 4

- : ‘subtract’, as in 2 - 2 = 0

*  : ‘multiply’, as in 2 * 3 = 6

/  : ‘divide’, as in 8 / 4 = 2

^  : ‘exponent’, as in 2^3 = 8. In R, ^ is synonym of **

For the square root you can use \(n^{0.5}\): n**0.5, or the function sqrt() (discussed later).

When Enter is pressed when the mathematical statement is not complete yet, the > symbol is replaced by a + at the start of the new line, indicating the statement is a continuation. Here is an example:

> 1 + 3 + 4 + 
+ 

So the + at the start of line 2 is not a mathematical + but a “continuation symbol”. You can always abort the current statement by pressing Escape.

When a statement is complete, the result will be printed in the next line:

> 31 + 11
[1] 42

The result is of course 42; the leading [1] is the index of the result. We will address this later.

Operator Precedence

All “operators” adhere to the standard mathematical precedence rules (PEMDAS):

    Parentheses (simplify inside these)
    Exponents
    Multiplication and Division (from left to right)
    Addition and Subtraction (from left to right)

With complex statements you should be aware of operator precedence! If you are not sure, or want to make your expression less ambiguous you should simply use parentheses () because they have highest precedence.

Besides math operators, R knows a whole set of other operators. They will be dealt with later in this chapter.

Programming Rule Always place spaces around both sides of an operator, with the exception of ^ and **.

3.1.2 An expression dissected

When you type 21 / 3 this called an expression. The expression has three parts: an operator (/ in the middle) and two operands. The left operand is 21 and the right operand is 3.
Since there is no assignment, the result of this expression will be send to the console as output, giving [1] 7.

Because this expression is the sole contents of the current line in the console, it is also called a statement.

Statement vs expression A statement is a complete line of code that performs some action, while an expression is any section of code that evaluates to a value.

Ending statements

In R, the newline (enter) is an end-of-statement character. Optionally you can end statements with a semicolon “;”. However, when you have more statements on a single line they are mandatory is in this example:

x <- c(1, 2, 3); x; x <- 42; x
## [1] 1 2 3
## [1] 42

Programming Rule: Have one statement per line and don’t use semicolons

Comments

Everything on a line after a hash sign “#” will be ignored by R. Use it to add explanation to your code:

## starting cool analysis
x <- c(T, F, T) # Creating a logical vector
y <- c(TRUE, FALSE, TRUE) # same

3.2 Functions

Simple mathematics is not the core business of R.

Going further than basic math, you will need functions, mostly pre-existing functions but often also custom functions that you write yourself. Here is a definition of a function:

A function is a piece of functionality that you can execute by typing its name, followed by a pair of parentheses. Within these parentheses, you can pass data for the function to work on. Functions often, but not always, return a value.

Function usage -or a function call- has this general form: \[function\_name(arg_1, arg_2, ..., arg_n)\]

Example: Square root with sqrt()

You have already seen that the square root can be calculated as \(n^{0.5}\). However, there is also a function for it: sqrt(). It returns the square root of the given parameter, a number, e.g. sqrt(36)

36^0.5
sqrt(36)
## [1] 6
## [1] 6

Another example: paste()

The paste() function can take any number of arguments and returns them, combined into a single text (character) string. You can also specify a separator using sep="<separator string>":

paste(1, 2, 3, sep = "---")
## [1] "1---2---3"

Note the use of quotes surrounding the dashes: "---"; they indicate it is text, or character, data.
Also note the use of a name for only the last argument. Not all arguments can be specified by name, but when possible this has preference, as in sep = "---".

3.2.1 Getting help on a function

Type ?function_name or help(function_name) in the console to get help on a function. The function documentation will appear in the panel containing the Help tab, Its location is dependent on your set of preferences.
For instance, typing ?sqrt will give the help page of the square root function together with the abs() function.
R help pages always have the exact same structure:

  • Name & package (e.g. {base})
  • Short description
  • Description
  • Usage
  • Arguments
  • Details
  • Examples

Scroll down in the help to see example usages of the function. Alternatively, type example(sqrt) in the console to have all examples executed in order, until you press Escape.

3.3 Variables

In math and programming you often use variables to label or name pieces of data, or a function in order to have them reusable, retrievable, changeable.

A variable is a named piece of data stored in memory that can be accessed via its name

For instance, x = 42 is used to define a variable called x, with a value attached to it of 42. Variables are really variable - their value can change! In R you usually assign a value to a variable using “<-”, so “x <- 42” is equivalent to “x = 42”. Both will work in R, but the “arrow” notation is preferred.

3.4 Vectors

3.4.1 R is completely vector-based

In R, all data lives inside vectors. When you type ‘2 + 4’, R will execute the following series of actions:

  1. create a vector of length 1 with its element having the value 2
  2. create a vector of length 1 with its element having the value 4
  3. add the value of the second vector to ALL the values of vector one, and recycle any shorter vector as many times as needed

Step 3 is a crucial one. It is essential to grasp this aspect in order to understand R. Therefore we’ll revisit it later in more detail.

3.4.2 Five datatype that live in vectors

R knows five basic types of data:

type descripton examples
numeric numbers with a decimal part 3.123, 5000.0, 4.1E3
integer numbers without a decimal part 1, 0, 2999
logical Boolean values: yes/no) true false
character text, should be put within quotes 'hello R' "A cat!"
factor nominal and ordinal scales <dealt with later>

All these types are created in similar ways and can often be converted into other types.

Note 1: If you type a number in the console, it will always be a numeric value, decimal part or not.
Note 2: For character data, single and double quotes are equivalent but double are preferred; type ?Quotes in the console to read more on this topic.

3.4.3 Creating vectors

You will see shortly that there are many ways to create vectors: a custom collection, a series, a repetition of a smaller set, a random sample from a distribution, etc. etc.

The simplest way to create a vector is the first: create a vector from a custom set of elements, using the “Concatenate” function c(). The c() function simply takes all its arguments and puts them behind each other, in the order in which they were passed to it, and returns the resulting vector.

> c(2, 4, 3)
## [1] 2 4 3
> c("a", "b", c("c", "d"))
## [1] "a" "b" "c" "d"
> c(0.1, 0.01, 0.001)
## [1] 0.100 0.010 0.001
> c(T, F, TRUE, FALSE) # There are two way to write logical values
## [1]  TRUE FALSE  TRUE FALSE

Vectors can hold only one data type

A vector can hold only one type of data. Therefore, if you pass a mixed set of values to the function c(), it will coerce all data into one type. The preferred type is numeric. However, when that is not possible the result will most often be a character vector. In the example below, two numbers and a character value are passed. Since "a" cannot be coerced into a numeric, the returned vector will be a character vector.

c(2, 4, "a") 
## [1] "2" "4" "a"

Here are some more coercion examples.

> c(1, 2, TRUE) # To numeric
## [1] 1 2 1
> c(TRUE, FALSE, "TRUE") # To character
## [1] "TRUE"  "FALSE" "TRUE"
> c(1.3, TRUE, "1") # To character
## [1] "1.3"  "TRUE" "1"

Using the function class(), you can get the data type of any value or variable.

> class(c(2, 4, "a"))
## [1] "character"
> class(1:5)
## [1] "integer"
> class(c(2, 4, 0.3))
## [1] "numeric"
> class(c(2, 4, 3))
## [1] "numeric"

3.4.4 Vector fiddling

Vector arithmetic

Let’s have a look at what it means to work with vectors, as opposed to singular values (also called scalars). An example is probably best to get an idea.

x <- c(2, 4, 3, 5)
y <- c(6, 2)
x + y
## [1] 8 6 9 7

As you can see, R works set based and will cycle the shorter of the two operands to deal with all elements of the longer operand. How about when the longer one is not a multiple of the shorter one?

x <- c(2, 4, 3, 5)
z <- c(1, 2, 3)
x - z
## Warning in x - z: longer object length is not a multiple of shorter object
## length
## [1] 1 2 0 4

As you can see this generates a warning that “longer object length is not a multiple of shorter object length”. However, R will proceed anyway, cycling the shorter one.

3.5 Other operators

Here is a complete listing of operators in R. Some operators such as ^ are unary, which means they have a single operand; a single value or they operate on. On the other hand, binary operators such as + have two operands.

The following unary and binary operators are listed in precedence groups, from highest to lowest. Many of them are still unknown to you of course. We will encounter most of these along the way as the course progresses, starting with a few in this section.

operator purpose
:: ::: access variables in a namespace
$ @ component / slot extraction
[ [[ indexing
^ exponentiation (right to left)
- + unary minus and plus
: sequence operator
%any% special operators (including %% and %/%)
* / multiply, divide
+ - (binary) add, subtract
< > <= >= == != ordering and comparison
! negation
& && and
| || or
~ as in formulae
-> ->> rightwards assignment
<- <<- assignment (right to left)
= assignment (right to left)
? help (unary and binary)

3.5.1 Logical operators

Logical operators are used to evaluate and/or combine expressions that result in a single logical value: TRUE or FALSE. The comparison operators compare two values (numeric, character - any type is possible) to get to a logical value, but always set-based! In the following chunk, each of the values in x is considered and if it is smaller than or equal to the value 4, TRUE is returned, else FALSE.

x <- c(1, 5, 4, 3)
x <= 4
## [1]  TRUE FALSE  TRUE  TRUE

Other comparison operators are < (less then), <= (less then or equal to), > (greater then), >= (greater then or equal to), and == (equal to).

Another category of logical operators is the set of boolean operators. These are used to reduce two logical values into one. These are

  • &: logical “AND”; a & b will evaluate to TRUE only if a AND b are TRUE.
  • |: logical “OR”; a | b will evaluate to TRUE only if a OR b are TRUE, no matter which.
  • !: logical -unary- “NOT”; negates the right operand: ! a will evaluate to the “flipped” logical value of a.

Here is a more elaborate example combining comparison and boolean operators. Suppose you have vectors a and b and you want to know which values in a are greater than in b and also smaller than 3. This is the expression used for answering that question.

a <- c(2, 1, 3, 1, 5, 1)
b <- c(1, 2, 4, 2, 3, 0)
a > b & a < 3 ## returns a logical vector with test results
## [1]  TRUE FALSE FALSE FALSE FALSE  TRUE

Here is a special case. Can you figure out what happens there?

6 - 2 : 5 < 3
## [1] FALSE FALSE  TRUE  TRUE

Calculations with logical vectors

Quite often you want to know how many cases fit some condition. A convenient thing in that case is that logical values have a numeric counterpart or “hidden face”:

- TRUE == 1
- FALSE == 0
  • Use sum() to use this feature
x <- c(2, 4, 2, 1, 5, 3, 6)
x > 3 ## which values are greater than 3?
## [1] FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE
sum(x > 3) ## how many are greater than 3?
## [1] 3

3.5.2 Modulo: %%

The modulo operator gives the remainder of a division.

10 %% 3
## [1] 1
4 %% 2
## [1] 0
11 %% 3
## [1] 2

The modulo is most often used to establish periodicity: x %% 2 is zero for all even numbers. Likewise, x %% 10 will be zero for every tenth value.

3.5.3 Integer division %/% and rounding

The integer division is the complement of modulo and gives the integer part of a division, it simply “chops off” the decimal part.

10 %/% 3
## [1] 3
4 %/% 2
## [1] 2
11 %/% 3
## [1] 3

Note that floor() does the same. In the same manner, ceiling() rounds up to the nearest integer, no matter how large the decimal part. Finally, there is the round() method to be used for - well, rounding. Be aware that rounding in R is not the same as rounding your course grade which always goes up at x.5. Rounding x.5 values mathematically goes to the nearest even number:

x <- c(0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5)
round(x, 0)
## [1] 0 2 2 4 4 6 6 8

3.5.4 The %in% operator

The %in% operator is very handy when you want to know if the elements of one vector are present in another vector. An example explains best, as usual:

a <- c("one", "two", "three")
b <- c("zero", "three", "five", "two")
a %in% b
b %in% a
## [1] FALSE  TRUE  TRUE
## [1] FALSE  TRUE FALSE  TRUE

There is no positional evaluation, it simply reports if the corresponding element in the first is present anywhere in the second.

3.6 Vector creation methods

Since vectors are the bricks with which everything is built in R, there are many, many ways to create them. Here, I will review the most important ones.

Method 1: Constructor functions

Often you want to be specific about what you create: use the class-specific constructor OR one of the conversion methods. Constructor methods have the name of the type. They will create and return a vector of that type wit as length the number that is passed as constructor argument:

> integer(4)
## [1] 0 0 0 0
> character(4)
## [1] "" "" "" ""
> logical(4)
## [1] FALSE FALSE FALSE FALSE

Method 2: Conversion functions

Conversion methods have the name as.XXX() where XXX is the desired type. They will attempt to coerce the given input vector into the requested type.

x <- c(1, 0, 2, 2.3)
class(x)
## [1] "numeric"
as.logical(x)
## [1]  TRUE FALSE  TRUE  TRUE
as.integer(x)
## [1] 1 0 2 2

But there are limits to coercion: R will not coerce elements with types that are non-coercable: you get an NA value.

x <- c(2, 3, "a")
y <- as.integer(x)
## Warning: NAs introduced by coercion
class(y)
## [1] "integer"
y
## [1]  2  3 NA

Method 3: The colon operator

The colon operator (:) generates a series of integers fromthe left operand to -and including- the right operand.

1 : 5
## [1] 1 2 3 4 5
5 : 1
## [1] 5 4 3 2 1
2 : 3.66
## [1] 2 3

Method 4: The rep() function

The rep() function takes three arguments. The first is an input vector. The second, times =, specifies how often the entire input vector should be repeated. The second argument, each =, specifies how often each individual element from the input vector should be repeated. When both arguments are provided, each = is evaluated first, followed by times =.

rep(1 : 3, times = 3)
## [1] 1 2 3 1 2 3 1 2 3
rep(1 : 3, each= 3)
## [1] 1 1 1 2 2 2 3 3 3
rep(1 : 3, times = 2, each = 3)
##  [1] 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3

Method 5: The seq() function

The seq() function is used to create a numeric vector in which the subsequent element show sequential increment or decrement. You specify a range and a step which may be neative if the range end (to =) is lower than the range start (from =).

> seq(from = 1, to = 3, by = .2)
##  [1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0
> seq(1, 2, 0.2) # same
## [1] 1.0 1.2 1.4 1.6 1.8 2.0
> seq(1, 0, length.out = 5)
## [1] 1.00 0.75 0.50 0.25 0.00
> seq(3, 0, by = -1)
## [1] 3 2 1 0

Method 6: Through vector operations

Of course, new vectors, often of different type, are created when two vectors are combined in some operation, or a single vector is processed in some way.

This operation of two numeric vectors results in a logical vector:

1:5 < c(2, 3, 2, 1, 4)
## [1]  TRUE  TRUE FALSE FALSE FALSE

And this paste() call results in a character vector:

paste(0:4, 5:9, sep = "-")
## [1] "0-5" "1-6" "2-7" "3-8" "4-9"

3.7 Selecting vector elements

You often want to get to know things about specific values within a vector

  • what value is at the third position?
  • what is the highest value?
  • which positions have negative values?
  • what are the last 5 values?

There are two principal ways to do this: through indexing with positionional reference (“addresses”) and through logical indexing.

Here is a picture that demonstrates both.

The index is the position of a value within a vector. R starts at one (1), and therefore ends at the length of the vector. Brackets [] are used to specify one or more indices that should be selected (returned).

Here are two examples of straightforward indexing, selecing a single or a series of elements.

x <- c(2, 4, 6, 3, 5, 1)
x[4] ## fourth element
## [1] 3
x[3:5] ## elements 3 to 5
## [1] 6 3 5

However, the technique is much more versatile. You can use indexing to select elements multiple times and thus create copies of them, or select elements in any order you desire.

x[c(1, 2, 2, 5)] ## elements 1, 2, 2 and 5
## [1] 2 4 4 5
x <- c(2, 4, 6, 3, 5, 1)

Besides integers you can use logicals to perform selections:

x[c(T, F, T, T, T, F)]
## [1] 2 6 3 5

As with all vector operations, shorter vectors are cycled as often as needed to cover the longer one:

x[c(F, T, F)]
## [1] 4 5

In practice you won’t type literal logicals very often; they are ususaly the result of some comparison operation. Here, all even numbers are selected because their modulo will retun zero.

x[x %% 2 == 0]
## [1] 2 4 6

And all of the maximum values in a vector are retreived:

x <- c(2, 3, 3, 2, 1, 3)
x[x == max(x)]
## [1] 3 3 3

There is a caveat in selecting the last n values: the colon operator has highest precedence! Here, the last two elements are (supposed to be selected).

x <- c(2, 4, 6, 3, 5, 1)
x[length(x) - 1 : length(x)] #fails
## [1] 5 3 6 4 2
x[(length(x) - 1) : length(x)] ## parentheses required!
## [1] 5 1

Use which() to get an index instead of value

The function which() returns indices for which the logical test evaluates to true:

which(x >= 2) ## which positions have values 2 or greater?
## [1] 1 2 3 4 5
which(x == max(x)) ## which positions have the maximum value?
## [1] 3

3.8 Some coding style rules rules for writing code

  • Names of variables start with a lower-case letter
  • Words are separated using underscores
  • Be descriptive with names
  • Function names are verbs
  • Write all code and comments in English
  • Preferentially use one statement per line
  • Use spaces on both sides of ALL operators
  • Use a space after a comma
  • Indent code blocks -with {}- with 4 or 2 spaces, but be consistent

Follow Hadleys’ style guide http://adv-r.had.co.nz/Style.html

3.9 The best keyboard shortcuts for RStudio

  • ctr + 1 go to code editor
  • ctr + 2 go to console
  • ctr + alt + i insert code chunk (RMarkdown)
  • ctr + enter run current line
  • ctr + shift + k knit current document
  • ctr + alt + c run current code chunk
  • ctr + shift + o source the current document