Today’s agenda: Manipulating data frames; practicing iteration; practicing re-writing code; checking how reliable random methods are.
General instructions for labs: Upload an R Markdown file, named with your andrew ID, to Blackboard. You will give the commands to answer each question in its own code block, which will also produce plots that will be automatically embedded in the output file. Each answer must be supported by written statements as well as any code used. Include the name of your lab partner (if you have one) in the file.
R includes a number of pre-specified data objects as part of its default installation. We will load and manipulate one of these, a data frame of 93 cars with model year 1993. Begin by ensuring that you can load this data with the commands
library(MASS)
data(Cars93)
Begin by examining the data frame with the command View(Cars93)
to understand the underlying object. You will need to use functions and other commands to extract elements for this assignment.
Obtain a summary()
of the full data structure. Can you tell from this how many rows are in the data? If so, say how; if not, use another method to obtain the number of rows.
What is the mean price of a car with a rear-wheel drive train?
What is the minimum horsepower of all cars with capacity for 7 passengers? With a capacity of at least 6 passengers?
Assuming that these cars are exactly as fuel efficient as this table indicates, find the cars that have the maximum, minimum and median distance travellable for highway driving. You will need at least two columns to work this out; why those two?
Some of the lectures have included examples of planning production for a factory that turns steel and labor into cars and trucks. Below is a piece of code that optimizes the factory’s output (roughly) given the available resources, using a repeat
loop. It’s embedded in a function to make it easier for you to run.
factory.function <- function (cars.output=1, trucks.output=1) {
factory <- matrix(c(40,1,60,3),nrow=2,
dimnames=list(c("labor","steel"),c("cars","trucks")))
available <- c(1600,70); names(available) <- rownames(factory)
slack <- c(8,1); names(slack) <- rownames(factory)
output <- c(cars.output, trucks.output); names(output) <- colnames(factory)
passes <- 0 # How many times have we been around the loop?
repeat {
passes <- passes + 1
needed <- factory %*% output # What do we need for that output level?
# If we're not using too much, and are within the slack, we're done
if (all(needed <= available) &&
all((available - needed) <= slack)) {
break()
}
# If we're using too much of everything, cut back by 10%
if (all(needed > available)) {
output <- output * 0.9
next()
}
# If we're using too little of everything, increase by 10%
if (all(needed < available)) {
output <- output * 1.1
next()
}
# If we're using too much of some resources but not others, randomly
# tweak the plan by up to 10%
# runif == Random number, UNIFormly distributed, not "run if"
output <- output * (1+runif(length(output),min=-0.1,max=0.1))
}
return(output)
}
factory.function()
## cars trucks
## 10.44 19.65
to obtain a default output value, starting from a very low initial planned output. What is the final output capacity obtained?
Repeat this four more times to obtain new output values. Do these answers differ from each other? If so why? If not, why not?
Right now, the number of passes
is a value held within the function itself and not shared. Change the code so that the number of passes
will be returned at the end of the function, as well as the final output
.
Now, set the initial output levels to 30 cars and 20 trucks and run the code. What is the final output plan (output
)? What is the final demand for resources (needed
)? Is the plan within budget and within the slack? How many iterations did it take to converge (passes
)? For all but output
you will need to either print this message out deliberately, or return an object that contains all the quantities you want.