Data Manipulation

Learning Objectives

  • tidyr Functions

  • Wide to Long Example

Tidyr

tidyr Functions

A set of functions that will tidy up a data set such that:

  • Every Column is a variable

  • Every Row is an observation

  • Every Cell is a single value

pivot_longer()

  • The pivot_longer() function grabs the variables that repeated in an observation places them in one variable

pivot_wider()

  • The pivot_wider() function then converts long data to wide data.

separate()

  • The separate() function will separate a variable to multiple variables:

Example

Wide to Long Data Example

We work on converting data from wide to long using the functions in the tidyr package. For many statistical analysis, long data is necessary.

Load Data

Use the read_csv() to read data_3_4.csv into an object called data1;

data1 <- read_csv(file="http://www.inqs.info/files/hiss_3/data_3_4.csv")

Wide Data

Long Data

pivot_longer()

  • The pivot_longer() function grabs the variables that repeated in an observation places them in one variable:
df1 <- data1 %>% 
  pivot_longer(cols=`v1/mean`:`v4/median`,
               names_to = "measurement",
               values_to = "value")
#> pivot_longer: reorganized (v1/mean, v1/sd, v1/median, v2/mean, v2/sd, …) into (measurement, value) [was 1000x13, now 12000x3]

separate()

  • The separate() function will separate a variable to multiple variables:
df2 <- data1 %>% 
  pivot_longer(cols=`v1/mean`:`v4/median`,
               names_to = "measurement",
               values_to = "value") %>% 
  separate(col=measurement,into=c("time","stat"),sep="/")
#> pivot_longer: reorganized (v1/mean, v1/sd, v1/median, v2/mean, v2/sd, …) into (measurement, value) [was 1000x13, now 12000x3]

pivot_wider()

  • The pivot_wider() function then converts long data to wide data.
df3 <- data1 %>% 
  pivot_longer(`v1/mean`:`v4/median`,
               names_to = "measurement", 
               values_to = "value") %>% 
  separate(measurement,c("time","stat"),sep="/") %>% 
  pivot_wider(names_from = stat,
              values_from = value)
#> pivot_longer: reorganized (v1/mean, v1/sd, v1/median, v2/mean, v2/sd, …) into (measurement, value) [was 1000x13, now 12000x3]
#> pivot_wider: reorganized (stat, value) into (mean, sd, median) [was 12000x4, now 4000x5]