Data Manipulation

Learning Objectives

tidyr Functions
Wide to Long Example
Plotting with ggplot2

`Tidyr`

`tidyr` Functions

A set of functions that will tidy up a data set such that:

Every Column is a variable
Every Row is an observation
Every Cell is a single value

`pivot_longer()`

The pivot_longer() function grabs the variables that repeated in an observation places them in one variable

`pivot_wider()`

The pivot_wider() function then converts long data to wide data.

`separate()`

The separate() function will separate a variable to multiple variables:

Example

Wide to Long Data Example

We work on converting data from wide to long using the functions in the tidyr package. For many statistical analysis, long data is necessary.

Load Data

Use the read_csv() to read data_3_4.csv into an object called data1;

data1 <- read_csv(file="http://www.inqs.info/files/hiss_3/data_3_4.csv")

Wide Data

Long Data

`pivot_longer()`

The pivot_longer() function grabs the variables that repeated in an observation places them in one variable:

df1 <- data1 %>% 
  pivot_longer(cols=`v1/mean`:`v4/median`,
               names_to = "measurement",
               values_to = "value")

#> pivot_longer: reorganized (v1/mean, v1/sd, v1/median, v2/mean, v2/sd, …) into (measurement, value) [was 1000x13, now 12000x3]

`separate()`

The separate() function will separate a variable to multiple variables:

df2 <- data1 %>% 
  pivot_longer(cols=`v1/mean`:`v4/median`,
               names_to = "measurement",
               values_to = "value") %>% 
  separate(col=measurement,into=c("time","stat"),sep="/")

#> pivot_longer: reorganized (v1/mean, v1/sd, v1/median, v2/mean, v2/sd, …) into (measurement, value) [was 1000x13, now 12000x3]

`pivot_wider()`

The pivot_wider() function then converts long data to wide data.

df3 <- data1 %>% 
  pivot_longer(`v1/mean`:`v4/median`,
               names_to = "measurement", 
               values_to = "value") %>% 
  separate(measurement,c("time","stat"),sep="/") %>% 
  pivot_wider(names_from = stat,
              values_from = value)

#> pivot_longer: reorganized (v1/mean, v1/sd, v1/median, v2/mean, v2/sd, …) into (measurement, value) [was 1000x13, now 12000x3]

#> pivot_wider: reorganized (stat, value) into (mean, sd, median) [was 12000x4, now 4000x5]

ggplot2

ggplot2 is an R package used to create plots. The main idea is to use a data frame and a set of aesthetics (variables in the data frame) to create a base plot. Then, ggplot2 will layer geometries (plots) to the base plot to create a data visualization.

All new changes to the plot are layered on with the + symbol.

Base Plot

mtcars |> ggplot(aes(x = mpg))

Histogram

mtcars |> ggplot(aes(x = mpg)) +
  geom_histogram()

Box Plot

mtcars |> ggplot(aes(x = mpg)) +
  geom_boxplot()

Density Plot

mtcars |> ggplot(aes(x = mpg)) +
  geom_density()

Box Plot By Category

mtcars |> ggplot(aes(x = mpg, y = as.factor(cyl))) +
  geom_boxplot()

Density Plot By Category

mtcars |> ggplot(aes(x = mpg, color = as.factor(cyl))) +
  geom_density()

Scatter Plot

mtcars |> ggplot(aes(x = wt, y = mpg)) +
  geom_point()

Scatter Plot by Group

mtcars |> ggplot(aes(x = wt, y = mpg, color = as.factor(cyl))) +
  geom_point()

Add Regression Line

mtcars |> ggplot(aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", se = F)

Smooth Line

mtcars |> ggplot(aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(se = F)

Regression Lines by Group

mtcars |> ggplot(aes(x = wt, y = mpg,
                  color = as.factor(cyl))) +
  geom_point() +
  geom_smooth(method = "lm", se = F)

Example

Using the penguins data set from palmerpenguins package. Create any plot and make it publication ready. Use the following resources to customize the plot: R Graphics Cookbook, R Graph Gallery, R Charts, and ggplot2

Data Manipulation

Learning Objectives

Tidyr

tidyr Functions

pivot_longer()

pivot_wider()

separate()

Example

Wide to Long Data Example

Load Data

Wide Data

Long Data

pivot_longer()

separate()

pivot_wider()

ggplot2

ggplot2

Base Plot

Histogram

Box Plot

Density Plot

Box Plot By Category

Density Plot By Category

Scatter Plot

Scatter Plot by Group

Add Regression Line

Smooth Line

Regression Lines by Group

Example

`Tidyr`

`tidyr` Functions

`pivot_longer()`

`pivot_wider()`

`separate()`

`pivot_longer()`

`separate()`

`pivot_wider()`