Directories/R Projects
Reading/Writing Data
Merging Data
dplyr
Functions
Directories is the file system located on your computer.
A file path indicates the location of certain files relative to your main (home) folder.
This is the folder where R will save and read all the files when the file path is not specified.
To get the current working directory:
To set the working directory:
R Projects are ways for RStudio to organize files together for specific
Easiest way is to have RStudio do it for you
Use Base R functions
Use readr
package for tabular/text files
Use readxl
package for excel files
Use haven
package to read SAS, SPSS, or Stata files.
Download the following zip file: data
Load data data_3_1.csv
and data_3_2.csv
.
Load the following data: https://m408.inqs.info/files/data/data_3_3.csv
Several functions that you can use to write functions from the readr
and readxl
.
I recommend using the write_csv
function and provide csv files.
RData is the data file specific for R.
*_join()
*_join()
functions are used to merge 2 data frames together.Merge data sets data_3_1.csv
and data_3_2.csv
using the full_join()
dplyr
Functionsmutate()
Adds a new variable to a data frame
Example:
#> mutate: new variable 'log_mpg' (double) with 25 unique values and 0% NA
#> mpg cyl disp hp drat wt qsec vs am gear carb log_mpg
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 3.044522
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3.044522
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3.126761
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 3.063391
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2.928524
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 2.895912
mutate()
Each argument is a new variable added
Example:
#> mutate: new variable 'log_mpg' (double) with 25 unique values and 0% NA
#> new variable 'log_hp' (double) with 22 unique values and 0% NA
#> mpg cyl disp hp drat wt qsec vs am gear carb log_mpg
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 3.044522
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3.044522
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3.126761
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 3.063391
#> Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2.928524
#> Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 2.895912
#> log_hp
#> Mazda RX4 4.700480
#> Mazda RX4 Wag 4.700480
#> Datsun 710 4.532599
#> Hornet 4 Drive 4.700480
#> Hornet Sportabout 5.164786
#> Valiant 4.653960
Using the penguins
dataset from palmerpenguins
, create a new variable that is the ln of flipper_length_mm
.
select()
-This selects the variables to keep in the data frame
-Example:
mtcars %>%
mutate(log_mpg = log(mpg), log_hp = log(hp)) %>%
select(mpg, log_mpg, hp, log_hp) %>%
head()
#> mutate: new variable 'log_mpg' (double) with 25 unique values and 0% NA
#> new variable 'log_hp' (double) with 22 unique values and 0% NA
#> select: dropped 9 variables (cyl, disp, drat, wt, qsec, …)
#> mpg log_mpg hp log_hp
#> Mazda RX4 21.0 3.044522 110 4.700480
#> Mazda RX4 Wag 21.0 3.044522 110 4.700480
#> Datsun 710 22.8 3.126761 93 4.532599
#> Hornet 4 Drive 21.4 3.063391 110 4.700480
#> Hornet Sportabout 18.7 2.928524 175 5.164786
#> Valiant 18.1 2.895912 105 4.653960
Using the penguins
dataset from palmerpenguins
, only select the variables that are continuous data points.
filter()
Selects observations that satisfy a condition
Example:
mtcars %>%
mutate(log_mpg = log(mpg), log_hp = log(hp)) %>%
select(mpg, log_mpg, hp, log_hp) %>%
filter(log_hp < 5) %>%
head()
#> mutate: new variable 'log_mpg' (double) with 25 unique values and 0% NA
#> new variable 'log_hp' (double) with 22 unique values and 0% NA
#> select: dropped 9 variables (cyl, disp, drat, wt, qsec, …)
#> filter: removed 15 rows (47%), 17 rows remaining
#> mpg log_mpg hp log_hp
#> Mazda RX4 21.0 3.044522 110 4.700480
#> Mazda RX4 Wag 21.0 3.044522 110 4.700480
#> Datsun 710 22.8 3.126761 93 4.532599
#> Hornet 4 Drive 21.4 3.063391 110 4.700480
#> Valiant 18.1 2.895912 105 4.653960
#> Merc 240D 24.4 3.194583 62 4.127134
Using the penguins
dataset from palmerpenguins
, filter the data set to look at penguins that are a Gentoo species.
if_else()
A function that provides T (1) if the condition is met and F (0) otherwise
Example:
mtcars %>%
mutate(log_mpg = log(mpg), log_hp = log(hp)) %>%
select(mpg, log_mpg, hp, log_hp) %>%
filter(log_hp < 5) %>%
mutate(hilhp = if_else(log_hp > mean(log_hp), 1, 0)) %>%
head()
#> mutate: new variable 'log_mpg' (double) with 25 unique values and 0% NA
#> new variable 'log_hp' (double) with 22 unique values and 0% NA
#> select: dropped 9 variables (cyl, disp, drat, wt, qsec, …)
#> filter: removed 15 rows (47%), 17 rows remaining
#> mutate: new variable 'hilhp' (double) with 2 unique values and 0% NA
#> mpg log_mpg hp log_hp hilhp
#> Mazda RX4 21.0 3.044522 110 4.700480 1
#> Mazda RX4 Wag 21.0 3.044522 110 4.700480 1
#> Datsun 710 22.8 3.126761 93 4.532599 1
#> Hornet 4 Drive 21.4 3.063391 110 4.700480 1
#> Valiant 18.1 2.895912 105 4.653960 1
#> Merc 240D 24.4 3.194583 62 4.127134 0
Using the penguins
dataset from palmerpenguins
, create a new variable that dichotomizes a penguin if their bill is longer than the average bill_length_mm
.
group_by()
This groups the data frame
Example:
mtcars %>%
mutate(log_mpg = log(mpg), log_hp = log(hp)) %>%
select(mpg, log_mpg, hp, log_hp) %>%
filter(log_hp < 5) %>%
mutate(hilhp = if_else(log_hp > mean(log_hp), 1, 0)) %>%
group_by(hilhp) %>%
head()
#> mutate: new variable 'log_mpg' (double) with 25 unique values and 0% NA
#> new variable 'log_hp' (double) with 22 unique values and 0% NA
#> select: dropped 9 variables (cyl, disp, drat, wt, qsec, …)
#> filter: removed 15 rows (47%), 17 rows remaining
#> mutate: new variable 'hilhp' (double) with 2 unique values and 0% NA
#> group_by: one grouping variable (hilhp)
#> # A tibble: 6 × 5
#> # Groups: hilhp [2]
#> mpg log_mpg hp log_hp hilhp
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 3.04 110 4.70 1
#> 2 21 3.04 110 4.70 1
#> 3 22.8 3.13 93 4.53 1
#> 4 21.4 3.06 110 4.70 1
#> 5 18.1 2.90 105 4.65 1
#> 6 24.4 3.19 62 4.13 0
Using the penguins
dataset from palmerpenguins
, group by species and find the average ln flipper_length_mm