Ex-1-1: Data Wrangling
1 Filter
1.1 Exercise 1
Objective: Filter the mtcars dataset for cars that have an automatic transmission (am == 1) and weigh more than 3,000 lbs (wt > 3).
1.2 Exercise 2
Objective: Filter the iris dataset for flowers of the species setosa where the sepal length (Sepal.Length) exceeds 5 cm.
1.3 Exercise 3
Objective: Filter the dataset for diamonds with a cut of βPremiumβ and a carat size between 1 and 2.
1.4 Exercise 4:
Objective: Filter the data for days in June (Month == 6) where the ozone level (Ozone) exceeded 100 (ignoring NA values).
1.5 Exercise 5:
Objective: Filter for records of chicks (Chick) number 1 to 5 (inclusive) and for times (Time) less than or equal to 10 days.
2 Mutate
2.1 Exercise 1
Objective: Add a column named efficiency that calculates miles-per-gallon (mpg) divided by the number of cylinders (cyl).
2.2 Exercise 2
Objective: Create a new column named area which multiplies sepal length (Sepal.Length) by sepal width (Sepal.Width).
2.3 Exercise 3
Objective: Calculate the price per carat and name the new column price_per_carat.
2.4 Exercise 4
Objective: Convert the temperature from Fahrenheit (Temp) to Celsius and name the new column TempC. The formula is C = (F - 32) * 5/9.
3 Group summary
3.1 Exercise 1
Objective: Group by the number of cylinders (cyl) and compute the average miles-per-gallon (mpg) for each group.
3.2 Exercise 2
Objective: Group by flower species (Species) and calculate the average sepal length (Sepal.Length) and sepal width (Sepal.Width) for each species.
3.3 Exercise 3
Objective: Group by cut and color and compute the median price for each combination.
3.4 Exercise 4
Objective: Group by month (Month) and compute the maximum temperature (Temp) and average ozone level (Ozone, omitting NA values) for each month.
3.5 Exercise 5
Objective: Group by diet (Diet) and chick number (Chick). For each combination, compute the final weight (i.e., weight at the maximum time).
4 Use all
4.1 Exercise 1: Calculate Average MPG by Cylinder
Task: Filter the dataset to cars with more than 100 horsepower. Then, for these cars, calculate the average miles per gallon (mpg) for each number of cylinders (cyl).
Functions to use: filter(), mutate(), group_by(), summarize()
4.2 Exercise 2: Adjusted Price Calculation
Task: Filter diamonds that are βIdealβ in cut and have carat less than 1. Calculate an adjusted price which is 90% of the original price. Finally, calculate the average adjusted price for each clarity level.
Functions to use: filter(), mutate(), group_by(), summarize()
4.3 Exercise 3: Compute Average Dispersion by Gear
Task: Filter cars with 4 or 6 cylinders. Create a new column named disp_per_cyl that calculates the dispersion (disp) per cylinder (cyl). Then compute the average disp_per_cyl for each gear (gear) level.
Functions to use: filter(), mutate(), group_by(), summarize()