Entering the tidyverse 01-23-2023

Entering the tidyverse

January 23 2023
AGG

tidyverse: collection of packages that share philosophy, grammar (or how the code is structured), and data structures. More intuitive
operators: symbols that tell R to perform different operations (between variables, functions, etc.)
e.g. arithmetic operators: + , - , *, /, ^, ~
assignment operators: <- (how we create all variables)
logical operators: !, &, | (or)
relational operators: ==, != (does not equal), >, <, >= (greater than or equal to), <=
today we will learn about miscellaneous operators: %>% (forward pipe operators), %in%

Tidyverse

Today we will be installing the tidyverse packages. You only need to install packages once. First install.packages("tidyverse) then library(tidyverse)
library(tidyverse) loads in packages
dplyr: new(er) packages provides a set of tools for manipulating data sets.
specifically written to be fast
individual functions that correspond to common operations
, (comma) = both of these conditions have to be met for this to be filtered
& (ampersand) = multiple conditions can be met
(line) = or
5 core verbs:
filter():
arrange():
select():
groupby() and summarize():
mutate():
Built in data set

library(tidyverse)
library(dplyr)
data(starwars)
glimpse(starwars)
class(starwars)

Tibble: modern take on data frames. Keeps great aspects of df’s and drops frustrating ones (change variables)
If we want to clean up data:
Check for NA’s

anyNA(starwars) # are there any NA's? returns a logical # is.na #complete.cases
starwarsClean<-starwars[complete.cases(starwars[, 1:10]),] # removes all rows with NAs in all rows, the first 10 columns # second comma says give me all columns back
# anyNA(starwarsClean[,1:10]) # we have completely removed all NA's in first 10 columns

use **filter()**: picks/subsets observations (ROWS) by their values

filter(starwarsClean, gender=="masculine" & height < 180) # can replace & (ampersand) with a comma which means and - both of these conditions have to be met for this to be filtered
filter(starwarsClean, gender=="masculine" , height < 180 , height > 100) # multiple conditions for the same variable
filter(starwarsClean, gender=="masculine" | gender=="feminine") # gender is masculine or feminine

now working with %in% operator, a matching operator. Similar to the == but you can compare vectors of different length (can’t do with ==)

a<-LETTERS[1:10]
length(a) # how many elements are in this vector

#output of %in% depends of first vector
# a %in% b
# b %in% a

use %in% to subset

eyes<-filter(starwars, eye_color %in% c("blue", "brown")) # subsetting characters with these eye colors
view(eyes) # opens up table in another tab
eyes2<-filter(starwars, eye_color == "blue" | eye_color == "brown")
view(eyes2)

arrange(): reorders rows

sw<-arrange(starwarsClean, by=height) # arranges the whole data frame by the variable you specify

can also use helper function desc()

arrange(starwarsClean, by=desc(height)) # arrange in descending order

put in additional conditions

tail(sw) # missing values are added to the end ??

select(): chooses variables (COLUMNS) by their names

select(starwarsClean, 1:10)
select(starwarsClean)
select(starwarsClean, name: species) # ??
select(starwarsClean, -(films:starships))
starwarsClean[,1:11]

everything(): rearrange columns. a helper function that is useful if you have a couple variables that you want to move to the beginning

select(starwarsClean, name, gender, species, everything()) # reorders this but doesn't keep it reordered.

contains(): another helper function

select(starwarsClean, contains ("color")) # others include: ends_with(), starts with(), num_range()

select() can also rename columns

select(starwarsClean, haircolor=hair_color) # returns only renamed column
rename(starwarsClean, haircolor=hair_color) # returns whole data set

mutate(): creates new variables using functions of existing variables
let’s create a new column that is height divided by mass

mutate(starwarsClean, ratio=height/mass)
starwars_lbs<-mutate(starwarsClean, mass_lbs=mass*2.2)
starwars_lbs<-select(starwars_lbs, 1:3, mass_lbs, everything()) # everything() is everything else. brought it to the front using select
glimpse(starwars_lbs)

transmute():

transmute(starwarsClean, mass_lbs=mass*2.2) # only returns mutated columns
transmute(starwarsClean, mass, mass_lbs=mass*2.2, height)

group_by() and summarize()

summarize(starwarsClean, meanHeight=mean(height)) # throws NA if any NAs are in df - need to use na.rm
summarize(starwarsClean, meanHeight=mean(height), TotalNumber=n())

use group_by() for maximum usefulness

starwarsGenders<-group_by(starwars, gender)
head(starwarsGenders) # lets you view first 6 rows, can also do head(starwarsGender, 10) to see first 10. Can also use tail(starwarsGender) to see last 6

summarize(starwarsGenders, meanHeight=mean(height, na.rm=TRUE), TotalNumber=n()) # TotalNumber=n() counts how many belong to each group

Piping `%>%`

piping is used to emphasize a sequence of actions
allows you to pass an intermediate result onto the next function (uses output of one function as input of the next)
avoid if you need to manipulate more than one object/variable at a time or if a variable is meaningful
formatting: should always have a space before the %>% followed by a new line

starwarsClean %>%
  group_by(gender) %>%
  summarize(meanHeight=mean(height, na.rm = TRUE, TotalNumber=n()))
# na.rm=TRUE removes NAs
# this data frame acts as %>% input
 # of this function
# much cleaner with piping!

case_when(): is useful for multiple if/ifelse statements

starwarsClean %>%
  mutate(sp=case_when(species=="Human"~"Human", TRUE~"Non-Human"))
# if species = Human, put Human. Tilda tells what to put. If species is not true, put Non-Human.
# uses condition, puts "Human" if True in sp column, puts "Non-Human" if it's FALSE

Entering the tidyverse 01-23-2023

Abigail Griffin

2023-01-24

Entering the tidyverse

Tidyverse

Piping `%>%`

Entering the tidyverse

Tidyverse

Piping %>%

Piping `%>%`