Weekly Assignment #2

Abigail Griffin

2023-01-30

Weekly Assignment #2

1: Create a function that returns a data frame with 2 columns (named “Type” and “Value”) and 50 rows/observations.

  • The first column should have “Control” as the first 25 observations and “Treatment” as the second half of observations.
  • The second column should have the first 25 values as random and normally distributed with a mean of 10 and standard deviation of 1.5. The next 25 values of the second column should be random and normally distributed with a mean of 45 and standard deviation of 2. You can do this all as a single line of code in the function or by breaking it up into multiple code blocks.
set.seed(20) # set.seed to prevent new randomized data with each rerun
df_func<-function(means=c(10,45),sds=c(1.5,2)){ # I didn't have n=25 in function(...) but left it when I assigned variables below
Type<-rep(c("Control", "Treatment"), each=25) # create first column with 25 lines of both treatment and control
Value<-c(rnorm(n=25, mean=means[1], sd=sds[1]),(rnorm(n=25, mean=means[2], sd=sds[2]))) # create second column with random values
typevalue_df<-data.frame(Type, Value, stringsAsFactors = FALSE) # create df with "Type" and "Value" columns
return(typevalue_df) # return df please
}
df_func()
##         Type     Value
## 1    Control 11.744028
## 2    Control  9.121113
## 3    Control 12.678198
## 4    Control  8.001109
## 5    Control  9.330150
## 6    Control 10.854409
## 7    Control  5.665424
## 8    Control  8.696472
## 9    Control  9.307446
## 10   Control  9.166689
## 11   Control  9.969797
## 12   Control  9.774427
## 13   Control  9.057810
## 14   Control 11.984831
## 15   Control  7.717974
## 16   Control  9.343858
## 17   Control 11.455866
## 18   Control 10.042334
## 19   Control  9.871327
## 20   Control 10.583822
## 21   Control 10.355031
## 22   Control  9.783340
## 23   Control 11.083345
## 24   Control 10.554860
## 25   Control  9.636901
## 26 Treatment 42.055873
## 27 Treatment 43.807681
## 28 Treatment 42.706600
## 29 Treatment 40.050727
## 30 Treatment 43.772983
## 31 Treatment 44.567377
## 32 Treatment 48.180292
## 33 Treatment 48.112287
## 34 Treatment 47.216902
## 35 Treatment 42.805316
## 36 Treatment 41.278789
## 37 Treatment 43.172842
## 38 Treatment 47.491138
## 39 Treatment 45.175709
## 40 Treatment 45.846964
## 41 Treatment 43.363034
## 42 Treatment 41.914864
## 43 Treatment 46.111764
## 44 Treatment 44.261942
## 45 Treatment 42.905323
## 46 Treatment 45.036360
## 47 Treatment 46.763755
## 48 Treatment 46.763723
## 49 Treatment 47.052486
## 50 Treatment 44.237382
  • Assignment 8: Yes, I can run my simulated data function with new values for the means without error.

Save your new function’s output as a variable, and use a function to view the first 6 rows of the data frame.

q1df<-df_func() # save function output as variable 
head(q1df, 6) # print first 6 rows of df
##      Type     Value
## 1 Control 11.649153
## 2 Control  9.953624
## 3 Control 10.285509
## 4 Control 12.002810
## 5 Control 11.095829
## 6 Control 10.084303

Let’s say you would like to know whether there is a statistically significant difference in “Value” (response variable y) depending on “Type” (explanatory variable x). Type ?aov in the console to determine how to run an analysis of variance (ANOVA) on your simulated data. Write a line of code that displays a summary of your ANOVA.

ANOVA_Test<-aov(Value ~ Type, data=q1df) # perform ANOVA to test for significance between response variable Y "Value" and explanatory variable x "Type"
summary(ANOVA_Test) # produce summary of ANOVA results
##             Df Sum Sq Mean Sq F value Pr(>F)    
## Type         1  15512   15512    5750 <2e-16 ***
## Residuals   48    130       3                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Finally create a function that uses a data frame as its input and returns only the p-value of an ANOVA summary table (feel free to use Google/Stack Overflow). Write your code in such a way that you can use any simulated data set with two columns as the function’s argument.

pvalue_func<-function(data=NULL){ # initally put ANOVA_Test here, but this wouldn't allow any simulated data set to be used
  p<-summary(ANOVA_Test)[[1]][["Pr(>F)"]][1] # extract the first element in the first compartment named "Pr(>F)" (the p value)
  return(p) # return p value please
}
pvalue_func(data=ANOVA_Test)
## [1] 1.237636e-51