1: Create a function that returns a data frame with 2 columns (named
“Type” and “Value”) and 50 rows/observations.
- The first column should have “Control” as the first 25 observations
and “Treatment” as the second half of observations.
- The second column should have the first 25 values as random and
normally distributed with a mean of 10 and standard deviation of 1.5.
The next 25 values of the second column should be random and normally
distributed with a mean of 45 and standard deviation of 2. You can do
this all as a single line of code in the function or by breaking it up
into multiple code blocks.
set.seed(20) # set.seed to prevent new randomized data with each rerun
df_func<-function(means=c(10,45),sds=c(1.5,2)){ # I didn't have n=25 in function(...) but left it when I assigned variables below
Type<-rep(c("Control", "Treatment"), each=25) # create first column with 25 lines of both treatment and control
Value<-c(rnorm(n=25, mean=means[1], sd=sds[1]),(rnorm(n=25, mean=means[2], sd=sds[2]))) # create second column with random values
typevalue_df<-data.frame(Type, Value, stringsAsFactors = FALSE) # create df with "Type" and "Value" columns
return(typevalue_df) # return df please
}
df_func()
## Type Value
## 1 Control 11.744028
## 2 Control 9.121113
## 3 Control 12.678198
## 4 Control 8.001109
## 5 Control 9.330150
## 6 Control 10.854409
## 7 Control 5.665424
## 8 Control 8.696472
## 9 Control 9.307446
## 10 Control 9.166689
## 11 Control 9.969797
## 12 Control 9.774427
## 13 Control 9.057810
## 14 Control 11.984831
## 15 Control 7.717974
## 16 Control 9.343858
## 17 Control 11.455866
## 18 Control 10.042334
## 19 Control 9.871327
## 20 Control 10.583822
## 21 Control 10.355031
## 22 Control 9.783340
## 23 Control 11.083345
## 24 Control 10.554860
## 25 Control 9.636901
## 26 Treatment 42.055873
## 27 Treatment 43.807681
## 28 Treatment 42.706600
## 29 Treatment 40.050727
## 30 Treatment 43.772983
## 31 Treatment 44.567377
## 32 Treatment 48.180292
## 33 Treatment 48.112287
## 34 Treatment 47.216902
## 35 Treatment 42.805316
## 36 Treatment 41.278789
## 37 Treatment 43.172842
## 38 Treatment 47.491138
## 39 Treatment 45.175709
## 40 Treatment 45.846964
## 41 Treatment 43.363034
## 42 Treatment 41.914864
## 43 Treatment 46.111764
## 44 Treatment 44.261942
## 45 Treatment 42.905323
## 46 Treatment 45.036360
## 47 Treatment 46.763755
## 48 Treatment 46.763723
## 49 Treatment 47.052486
## 50 Treatment 44.237382
- Assignment 8: Yes, I can run my simulated data function with
new values for the means without error.
Save your new function’s output as a variable, and use a function to
view the first 6 rows of the data frame.
q1df<-df_func() # save function output as variable
head(q1df, 6) # print first 6 rows of df
## Type Value
## 1 Control 11.649153
## 2 Control 9.953624
## 3 Control 10.285509
## 4 Control 12.002810
## 5 Control 11.095829
## 6 Control 10.084303
Let’s say you would like to know whether there is a statistically
significant difference in “Value” (response variable y) depending on
“Type” (explanatory variable x). Type ?aov in the console to determine
how to run an analysis of variance (ANOVA) on your simulated data. Write
a line of code that displays a summary of your ANOVA.
ANOVA_Test<-aov(Value ~ Type, data=q1df) # perform ANOVA to test for significance between response variable Y "Value" and explanatory variable x "Type"
summary(ANOVA_Test) # produce summary of ANOVA results
## Df Sum Sq Mean Sq F value Pr(>F)
## Type 1 15512 15512 5750 <2e-16 ***
## Residuals 48 130 3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1