When I was a teenager, I didn't mind extremely repetitive, mind-numbing labor. To document my musical tastes, every few months I'd gather data from my iTunes library, some of which involved manually counting the number of songs that I had listened to 0, 1, 2, 3, up to 6 times so I could create a histogram in Excel. I loved office work like stapling papers for hours, and pipetting was one of my favorite aspects of the evolutionary development lab I briefly worked in during college. (Side note: my favorite aspect was feeding the opossums, which made me realize I loved animal behavior.)

While I still don't mind the sort of work where you turn off your brain and do repetitive work (*cough cough* fixing fish trajectories by hand...), there are some points where you just

While I still don't mind the sort of work where you turn off your brain and do repetitive work (*cough cough* fixing fish trajectories by hand...), there are some points where you just

*can't*resort to human willpower to get through an analysis. Fortunately, computers are fantastic for work that would otherwise be menial and soul-draining. They don't get bored as far as we can tell, they're much faster, and they don't make mistakes if the code is correct.**This is the third in series of posts on R. This post covers:**

**- For loops**

- Random walks

- Random walks

The first post was an introduction to R, and the second was on using R to understand distributions, plotting, and linear regression. The most important code from those posts is listed below.

**c() # "Concatenate," or combine. This is how you tell R that there are**

# multiple elements.

x <- c(5, 10) # x is now a two-element vector of the numbers 5 and 10

x[x > 7] # Subset. This shows you only the elements of x that are greater than 7

data <- rnorm(100) # Draw 100 random values from a normal distribution with mean = 0

# multiple elements.

x <- c(5, 10) # x is now a two-element vector of the numbers 5 and 10

x[x > 7] # Subset. This shows you only the elements of x that are greater than 7

data <- rnorm(100) # Draw 100 random values from a normal distribution with mean = 0

**# and standard deviation = 1.**

hist(data)

hist(data)

plot(y ~ x) # Plot a scatterplot of y as a function of x

**# Plot a histogram of the***data*objectplot(y ~ x) # Plot a scatterplot of y as a function of x

__________________________________________________________________

**For loops**

For loops are basically a way to

**iterate**, or to run a command repeatedly. (Think of the word "reiterate," for example: "I had to__reiterate__to my cat to not use my Gmail without permission. It's the third time this week. I just don't think he's old enough to use the internet unsupervised.")__Here's the simplest for loop possible__

for(i in 1:10){

for(i in 1:10){

**print(i)**

}

}

It reads "for i in the sequence 1 to 10, print i." It's the same as if you wrote this:

**print(1)**

print(2)

print(3)

print(4)

print(5)

print(6)

print(7)

print(8)

print(9)

print(10)

print(2)

print(3)

print(4)

print(5)

print(6)

print(7)

print(8)

print(9)

print(10)

A

**for loop**finds wherever you placed an "i" within the loop and changes that to the ith value in the sequence you give it (here it's 1:10). The first value in the sequence above is 1, the second is 2, etc.
The sequence you give the for loop doesn't have to be linear. You could write whatever bunch of numbers you want. Writing this...

**bubbles <- c(1, -80, 4, NA, 100, -Inf, 0)**

for(i in bubbles){

print(i)

}

for(i in bubbles){

print(i)

}

... is the same as writing this:

**print(1)**

print(-80)

print(4)

print(NA)

print(100)

print(-Inf)

print(0)

print(-80)

print(4)

print(NA)

print(100)

print(-Inf)

print(0)

You'll also get the same thing if you tell R that you want the

*ith**value**of*

*bubbles*. This is a subtle distinction. Instead of looking at the actual numbers in

*bubbles*, it will go through the 1st value, then the 2nd, then the 3rd, until it reaches the end. I'd recommend doing it this way in the future.

**for(i in 1:length(bubbles)){ # "For i in 1 to 7 (the number of values in**

print(bubbles[i]) # "Print the ith value of

}

*bubbles*)"print(bubbles[i]) # "Print the ith value of

*bubbles*."}

The above code is the same thing as writing:

**print(bubbles[1])**

print(bubbles[2])

print(bubbles[3])

print(bubbles[4])

print(bubbles[5])

print(bubbles[6])

print(bubbles[7])

print(bubbles[2])

print(bubbles[3])

print(bubbles[4])

print(bubbles[5])

print(bubbles[6])

print(bubbles[7])

__Combine for loops with the paste() command for character values__

All of the values in

*bubbles*above are numbers, which makes things simple. It's only slightly more challenging if you have a bunch of words, or you're trying to iterate a sentence.

Let's say you want to type out "Today is April 1," Today is April 2," "Today is April 3," etc., all the way up to "Today is April 30." Maybe you really love April or something. In R, you could literally write:

**"Today is April 1"**

"Today is April 2"

"Today is April 3"

"Today is April 4"

"Today is April 2"

"Today is April 3"

"Today is April 4"

... and so on. That would be 30 lines of code. Manageable, but not great. Alternatively, you could use a for loop combined with the

**paste()**command.**for(i in 1:30){**

print(paste("Today is April", i))

}print(paste("Today is April", i))

}

If you wrote

**print("Today is April i")**, it will say "Today is April i" instead of replacing the i with whatever number you're on.**paste()**lets you input values into character strings. For example:**names <- c("Matt", "George", "Andre")**

for(i in 1:length(names)){

paste("My name is", names[i])

}for(i in 1:length(names)){

paste("My name is", names[i])

}

__Real-world example__

I work with animal tracking data in my research. I'll film experiments involving fish swimming in a tank. These videos then get fed through tracking software that identifies the positions and orientations of all individuals. These data are in the form of four matrices: the x- and y-coordinates for each fish for each frame, as well as the x- and y-components of the unit vector of their heading. Each row of a matrix is one individual, and each column is a time point.

From these simple matrices, I can calculate more interesting measurements like the speed and distance to nearest neighbor for every fish. Below is the code for calculating the speed time series for each individual.

# - Here, we just use the x-coordinate matrix, but any of the others would work, too.

**# First create an empty matrix with the same dimensions as the other matrices**# - Here, we just use the x-coordinate matrix, but any of the others would work, too.

s <- matrix(NA, ncol = ncol(xs), nrow = nrow(xs))

# Calculate how far a fish moved between time points. Iterate over all fishs <- matrix(NA, ncol = ncol(xs), nrow = nrow(xs))

# Calculate how far a fish moved between time points. Iterate over all fish

**for(i in 1:nrow(xs)){ # For each individual****s[i, ] <- c(NA, # No speed at first time point****sqrt(diff(xs[i, ])^2 - diff(ys[i, ])^2)) # Pythagorean theorem****print(i) # Say what iteration we're on****}**

# Plot Individual 6's trajectory from time point 1000 to 2000

focal.indiv <- 6

time.start <- 1000

time.end <- 2000

plot(s[focal.indiv, time.start:time.end],

main = paste("Individual ", focal.indiv, " speed from t = ", time.start,

" to t = ", time.end, sep = "")# Plot Individual 6's trajectory from time point 1000 to 2000

focal.indiv <- 6

time.start <- 1000

time.end <- 2000

plot(s[focal.indiv, time.start:time.end],

main = paste("Individual ", focal.indiv, " speed from t = ", time.start,

" to t = ", time.end, sep = "")

__Random walks__
For loops are ideal for when you need to iterate a computation. A simple example of this is a random walk. A random walk is a series of random steps. It's a time series that serves as a null expectation: "what should we expect to see if literally nothing but noise is happening?" It's a baseline that you can then compare to stock market prices, animal movement, gas molecule paths, and more.

Here, these peaks and valleys aren't caused by any extrinsic factor; they're just random fluctuations. The size of these fluctuations will depend on how much noise you add into the random walk:

And finally, remember that "random" means it's not going to be the same every time.

There are two steps to building a random walk in R: first, you set the starting point and the rules for the steps. Then, you run a for loop to take each step. Below is the code to build a simple one-dimensional random walk like the plots above.

**# Parameters**

start <- 0 # The starting point of the random walk

mean.move <- 0 # For each step, is there a bias to move in a certain direction?

start <- 0 # The starting point of the random walk

mean.move <- 0 # For each step, is there a bias to move in a certain direction?

**# A positive value means**

**the random walk will tend to move**

**# upwards, a negative value will make it tend downward, and**

**# zero is unbiased.**

sd.move <- 1 # The standard deviation of the step size. The higher this is,

sd.move <- 1 # The standard deviation of the step size. The higher this is,

**# the more the random walk will bounce around.**

t <- 100000 # The number of steps you want the walk to take overall

t <- 100000 # The number of steps you want the walk to take overall

**# Begin the walk at the starting point**

**walk**

**<- start**

**# Run the random walk**

for(i in 2:t){ # Start at 2 because we already have our

for(i in 2:t){ # Start at 2 because we already have our

**# starting point**

walk[i] <- walk[i - 1] + # Take the previous position...

walk[i] <- walk[i - 1] + # Take the previous position...

**rnorm(1, mean.move, sd.move) # ... and add a random value**

print(i) # Tell us what iteration we're on

}

print(i) # Tell us what iteration we're on

}

Finally, to visualize our results, we use the plot() command.

plot(walk, type = 'l')

plot(walk, type = 'l')

In the above walk, we use a normal distribution as the source of our step sizes. It means that most of our steps will be concentrated around

**mean.move**, with large step sizes (in either direction) occurring less frequently. We can easily use a different distribution like a long-tailed distribution (e.g. for a LÃ©vy flight) or a Poisson. The plot below shows such a skewed step size distribution and what a two-dimensional random walk with steps drawn from this distribution looks like. The code to do this is at the end of this post.*[There's a bit more to mention with the direction of travel as well as the step sizes, but I'll skip the details here.]*

Don't use for loops :-)

Don't use for loops :-)

The first thing you learn about for loops is what they are. The second thing you learn about for loops is that they're incredibly computationally inefficient. While computers are very good at thinking about many things simultaneously, for loops force your computer to focus on one thing at a time. This is usually not a big deal: for small computations like the ones listed above, we rarely encounter a situation where for loops won't work well.

For loops become a burden when you're doing thousands of calculations and speed is essential. Imagine an evolutionary model where you're simulating predator-prey population dynamics under various levels of resource availability. You'll want to run the model for long enough to see what the evolutionary trajectory looks like, so maybe you're looking at 100,000 generations. You'll also want to run the model several times to see whether the populations always reach the same outcome, though, or if something weird like a premature random extinction occurred.

Your computer will have to perform all the calculations you give it for a generation, for every generation, for every run of the model...

*for every combination of parameter values.*You'll need to repeat this process for every level of resource availability you care about, e.g. 5 patches, 10 patches, 15 patches, and 5 units of food / patch, 10 units/patch, 20, etc. Without exaggerating, your model could literally require*hundreds of years*of computational time to complete if the code isn't efficient.So.

That's why R blog post #5 will be on the

**apply()**functions, which are one way to avoid for loops! I'll first cover**function()**, though, which is a fantastic tool in R, especially when combined with**apply()**.Cheers,

-Matt

__________________________________________________________________

**Code for plots:**

start <- 0 # Starting position

mean.move <- 0 # The step distribution will be unbiased, i.e. centered at zero

sd.move <- 1 # The standard deviation of step sizes

n.steps <- 100000 # The number of steps

walk <- start # Start the walk at the origin

for(i in 2:n.steps){

walk[i] <- walk[i - 1] + # This step is the previous step plus...

rnorm(1, mean.move, sd.move) # A randomly drawn step from a normal distribution with

# the mean and SD we specified above

print(i) # Print the iteration we're on

}

plot(walk, type = 'l', col = "darkcyan", lwd = 2, xlab = "Time", ylab = "Position", las = 1,

main = "A random walk", cex.main = 1.5, cex.lab = 1.2, font.lab = 2)

__First random walk__start <- 0 # Starting position

mean.move <- 0 # The step distribution will be unbiased, i.e. centered at zero

sd.move <- 1 # The standard deviation of step sizes

n.steps <- 100000 # The number of steps

walk <- start # Start the walk at the origin

for(i in 2:n.steps){

walk[i] <- walk[i - 1] + # This step is the previous step plus...

rnorm(1, mean.move, sd.move) # A randomly drawn step from a normal distribution with

# the mean and SD we specified above

print(i) # Print the iteration we're on

}

plot(walk, type = 'l', col = "darkcyan", lwd = 2, xlab = "Time", ylab = "Position", las = 1,

main = "A random walk", cex.main = 1.5, cex.lab = 1.2, font.lab = 2)

__Random walks with differing noise__**start <- 0**

n.steps <- 1e5

mean.step <- 0

sd.step1 <- 0.01

sd.step2 <- 0.5

n.steps <- 1e5

mean.step <- 0

sd.step1 <- 0.01

sd.step2 <- 0.5

**# Start both walks at zero**

**walk1 <- walk2 <- start**

for(i in 2:n.steps){

walk1[i] <- walk1[i - 1] + rnorm(1, mean.step, sd.step1)

walk2[i] <- walk2[i - 1] + rnorm(1, mean.step, sd.step2)

print(i)

}

for(i in 2:n.steps){

walk1[i] <- walk1[i - 1] + rnorm(1, mean.step, sd.step1)

walk2[i] <- walk2[i - 1] + rnorm(1, mean.step, sd.step2)

print(i)

}

# Plot it

# Plot it

**par(mfrow = c(1,2))**

par(oma = c(0,0,2,0)) # Create space at the top of the window for a grand title

par(oma = c(0,0,2,0)) # Create space at the top of the window for a grand title

**plot(walk1, type = 'l', col = "dodgerblue", main = paste("SD =", sd.step1), font.lab = 2,**

xlab = "Time", ylab = "Position", ylim = c(min(walk2), max(walk2)))

plot(walk2, type = 'l', col = "dodgerblue", main = paste("SD =", sd.step2), font.lab = 2,

xlab = "Time", ylab = "Position", ylim = c(min(walk2), max(walk2)))

mtext(outer = T, "Random walks with differing noise", cex = 2, font = 2)

xlab = "Time", ylab = "Position", ylim = c(min(walk2), max(walk2)))

plot(walk2, type = 'l', col = "dodgerblue", main = paste("SD =", sd.step2), font.lab = 2,

xlab = "Time", ylab = "Position", ylim = c(min(walk2), max(walk2)))

mtext(outer = T, "Random walks with differing noise", cex = 2, font = 2)

**par(oma = array(0, 4))**

**# Make your graphics window look normal again**

start <- 0

mean.move <- 0

sd.move <- 1

n.steps <- 100000

n.indivs <- 10 # The number of random walks you want to run

# Create an empty matrix that we'll fill with the random walk values

WALK <- matrix(NA, nrow = n.indivs, ncol = n.steps)

# Start all the walks at the origin

WALK[, 1] <- start

# Perform the random walks

for(i in 2:n.steps){

WALK[, i] <- WALK[, i - 1] + # This column looks at the previous column...

rnorm(n.indivs, mean.move, sd.move) # 10 random values are generated, one for each row.

# If you did rnorm(1, ...), you would generate 10 identical random walks

# because the same value would be added to each walk

print(i)

}

# First create an empty plot, then add each walk with a for loop

plot(NA, ylim = c(min(WALK), max(WALK)), xlim = c(0, n.steps), cex.main = 1.5, cex.lab = 1.2,

xlab = "Time", ylab = "Position", las = 1, font.lab = 2, main = "10 random walks")

for(i in 1:nrow(WALK)){

lines(WALK[i,], col = rainbow(n.indivs)[i])

}

# Create the distribution of step sizes. A beta distribution works well for something strongly skewed

step.dist <- rbeta(10000, 0.1, 1) # Very likely to take small step; unlikely to take big step

# Set the parameters for the (Gaussian) turning angle distribution

mean.turn <- 0

sd.turn <- pi/4

# Starting positions

xs <- ys <- 0

BO <- runif(1, -pi, pi) # Could also start at zero. Here: random orientation chosen

# Run it

for(i in 2:t){

# Update the heading and step size

BO[i] <- BO[i - 1] + rnorm(1, mean.turn, sd.turn)

step.size <- sample(step.dist, 1)

# Move

xs[i] <- xs[i - 1] + cos(BO[i]) * step.size # cosine gives you the x-component of the unit vector

ys[i] <- ys[i - 1] + sin(BO[i]) * step.size # sine gives you the y-component

print(i)

}

__Multiple random walks__start <- 0

mean.move <- 0

sd.move <- 1

n.steps <- 100000

n.indivs <- 10 # The number of random walks you want to run

# Create an empty matrix that we'll fill with the random walk values

WALK <- matrix(NA, nrow = n.indivs, ncol = n.steps)

# Start all the walks at the origin

WALK[, 1] <- start

# Perform the random walks

for(i in 2:n.steps){

WALK[, i] <- WALK[, i - 1] + # This column looks at the previous column...

rnorm(n.indivs, mean.move, sd.move) # 10 random values are generated, one for each row.

# If you did rnorm(1, ...), you would generate 10 identical random walks

# because the same value would be added to each walk

print(i)

}

# First create an empty plot, then add each walk with a for loop

plot(NA, ylim = c(min(WALK), max(WALK)), xlim = c(0, n.steps), cex.main = 1.5, cex.lab = 1.2,

xlab = "Time", ylab = "Position", las = 1, font.lab = 2, main = "10 random walks")

for(i in 1:nrow(WALK)){

lines(WALK[i,], col = rainbow(n.indivs)[i])

}

__Two-dimensional random walk__# Create the distribution of step sizes. A beta distribution works well for something strongly skewed

step.dist <- rbeta(10000, 0.1, 1) # Very likely to take small step; unlikely to take big step

# Set the parameters for the (Gaussian) turning angle distribution

mean.turn <- 0

sd.turn <- pi/4

# Starting positions

xs <- ys <- 0

BO <- runif(1, -pi, pi) # Could also start at zero. Here: random orientation chosen

# Run it

for(i in 2:t){

# Update the heading and step size

BO[i] <- BO[i - 1] + rnorm(1, mean.turn, sd.turn)

step.size <- sample(step.dist, 1)

# Move

xs[i] <- xs[i - 1] + cos(BO[i]) * step.size # cosine gives you the x-component of the unit vector

ys[i] <- ys[i - 1] + sin(BO[i]) * step.size # sine gives you the y-component

print(i)

}

## No comments:

## Post a Comment