Ex-1-1: Data Wrangling
1 Filter
1.1 Exercise 1
Objective: Filter the mtcars
dataset for cars that have an automatic transmission (am == 1
) and weigh more than 3,000 lbs (wt > 3
).
1.2 Exercise 2
Objective: Filter the iris
dataset for flowers of the species setosa
where the sepal length (Sepal.Length
) exceeds 5 cm.
1.3 Exercise 3
Objective: Filter the dataset for diamonds with a cut
of βPremiumβ and a carat
size between 1 and 2.
1.4 Exercise 4:
Objective: Filter the data for days in June (Month == 6
) where the ozone level (Ozone
) exceeded 100 (ignoring NA values).
1.5 Exercise 5:
Objective: Filter for records of chicks (Chick
) number 1 to 5 (inclusive) and for times (Time
) less than or equal to 10 days.
2 Mutate
2.1 Exercise 1
Objective: Add a column named efficiency
that calculates miles-per-gallon (mpg
) divided by the number of cylinders (cyl
).
2.2 Exercise 2
Objective: Create a new column named area
which multiplies sepal length (Sepal.Length
) by sepal width (Sepal.Width
).
2.3 Exercise 3
Objective: Calculate the price per carat and name the new column price_per_carat
.
2.4 Exercise 4
Objective: Convert the temperature from Fahrenheit (Temp
) to Celsius and name the new column TempC
. The formula is C = (F - 32) * 5/9
.
3 Group summary
3.1 Exercise 1
Objective: Group by the number of cylinders (cyl
) and compute the average miles-per-gallon (mpg
) for each group.
3.2 Exercise 2
Objective: Group by flower species (Species
) and calculate the average sepal length (Sepal.Length
) and sepal width (Sepal.Width
) for each species.
3.3 Exercise 3
Objective: Group by cut
and color
and compute the median price for each combination.
3.4 Exercise 4
Objective: Group by month (Month
) and compute the maximum temperature (Temp
) and average ozone level (Ozone
, omitting NA
values) for each month.
3.5 Exercise 5
Objective: Group by diet (Diet
) and chick number (Chick
). For each combination, compute the final weight (i.e., weight at the maximum time).
4 Use all
4.1 Exercise 1: Calculate Average MPG by Cylinder
Task: Filter the dataset to cars with more than 100 horsepower. Then, for these cars, calculate the average miles per gallon (mpg
) for each number of cylinders (cyl
).
Functions to use: filter()
, mutate()
, group_by()
, summarize()
4.2 Exercise 2: Adjusted Price Calculation
Task: Filter diamonds that are βIdealβ in cut
and have carat
less than 1. Calculate an adjusted price which is 90% of the original price
. Finally, calculate the average adjusted price for each clarity level.
Functions to use: filter()
, mutate()
, group_by()
, summarize()
4.3 Exercise 3: Compute Average Dispersion by Gear
Task: Filter cars with 4 or 6 cylinders. Create a new column named disp_per_cyl
that calculates the dispersion (disp
) per cylinder (cyl
). Then compute the average disp_per_cyl
for each gear (gear
) level.
Functions to use: filter()
, mutate()
, group_by()
, summarize()