12 Answer Key
12.1 Chapter 4 - Object Types in R Programming
- This object was an array.
- This object was a vector.
- This would output a vector.
- This would output a data frame.
- To output a factor, you would run the following code:
12.2 Chapter 5 - How to Filter and Transform Data in Base R
- Filter the following vector to values greater than 2
q1 <- seq(1,20,2)
q1[q1 > 2]
- Filter the following vector to values between 20 and 30, but only for the first three entries that meet that criteria. (Hint: add
[n:n]
for the range of values after you determine which values meet that criteria)
- Multiply the following matrices together.
- Subtract 41 from every entry in the second column of the following matrix. Replace the column with those new values.
- Select the second row each matrix in the following array. Subtract 5 from those rows.
q5 <- array(data=c(matrix(seq(1,15,1),5,3),
matrix(seq(4,60,4),5,3),
matrix(seq(2,30,2),5,3)),
dim=c(5,3,3))
q5
q5[2,,]-5
- Filter the following data frame to Bond films starring Roger Moore.
bond[bond["actor"]=="Roger Moore",]
- Filter the following data frame to Bond films starring Sean Connery made after 1966.
bond[bond["actor"]=="Sean Connery" & bond["year"] > 1966,]
12.3 Chapter 6 - How to Filter and Transform Data with the Dplyr Package
- You would use the
%>%
notation,filter()
function, and the operates|
,==
, and>
to accomplish this.
- In addition to the same script as above, you would use the
select()
function to reduce the columns.
- Instead of using
select()
in the previous script, you would usetransmute()
. This function allows you to both transform a column and select only those that are mentioned.
data(mtcars)
library(dplyr)
mtcars %>%
filter(gear==4 | hp > 115) %>%
transmute(mpg_log=log(mpg),cyl,gear,hp)
- You would use the
filter()
,group_by()
, andsummarize()
functions to pull this summary data.
12.4 Chapter 7 - Understanding and Using R Packages
- To install the
tidyverse
set of packages, run the scriptinstall.packages("tidyverse")
. - To load the
dplyr
package, run the scriptlibrary(dplyr)
.
12.5 Chapter 8 - How to Write Functions
- Modify the simply standard deviation function we wrote and change it to calculate mean. Do this without using the built-in
mean
function.
- Alter the
summary.group
function to include median, minimum, and maximum values.
summary.group <- function(data,group,field) {
groups <- levels(factor(data[,paste(group)]))
output <- data.frame(group=character(),
mean=numeric(),
sd=numeric(),
median=numeric(),
minimum=numeric(),
maximum=numeric())
for(i in 1:length(groups)) {
subdata <- data[data[,paste(group)]==groups[i],
paste(field)]
output[i,1:6] <- data.frame(groups[i],
mean(subdata),
sd(subdata),
median(subdata),
min(subdata),
max(subdata))
}
output
}
- Write a function for the Fibonacci Sequence, which ends at a number you choose. You’ll need to use a control flow to accomplish this and a default value for the end of the sequence. (Hint: You won’t use the
for(var in seq) expr
control flow. Execute?Control
to use a different version.)
12.6 Chapter 10 - How to Plot Data in R
- Use the
ggplot()
,aes()
, andgeom_point()
functions to construct a plot.
- Simply add
factor(cyl)
to the color argument in theaes()
function.
library(ggplot2)
data(mtcars)
ggplot(data=mtcars,
mapping=aes(x=hp,y=mpg,color=factor(cyl))) +
geom_point(size=3)
- Use the x, y, and color arguments in the
labs()
function to use a more intuitive naming convention.
library(ggplot2)
data(mtcars)
ggplot(data=mtcars,
mapping=aes(x=hp,y=mpg,color=factor(cyl))) +
geom_point(size=3) +
labs(x="Horsepower",
y="Miles Per Gallon",
color="Cylinders")
- Use the title argument in the
labs()
function.
library(ggplot2)
data(mtcars)
ggplot(data=mtcars,
mapping=aes(x=hp,y=mpg,color=factor(cyl))) +
geom_point(size=3) +
labs(x="Horsepower",
y="Miles Per Gallon",
color="Cylinders",
title="Car Performance")
- Use the
theme_few()
theme from theggthemes
package.
12.7 Chapter 11 - Statistical Functions in R
- Use the
summary()
function on your model to determine model performance, such as p-values.
summary(PracticeModel)
- Use the
predict()
function to make model predictions on a new data set.
NewData <- data.frame(Sepal.Length=5,Sepal.Width=3.25)
predict(PracticeModel,NewData)
- Use the
confint()
function to determine the confident intervals for a model.
confint(PracticeModel)