Coffee Survey Proposal
Proposal
Dataset
coffee_survey <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-05-14/coffee_survey.csv')
glimpse(coffee_survey)
Rows: 4,042
Columns: 57
$ submission_id <chr> "gMR29l", "BkPN0e", "W5G8jj", "4xWgGr", "…
$ age <chr> "18-24 years old", "25-34 years old", "25…
$ cups <chr> NA, NA, NA, NA, NA, NA, NA, NA, "Less tha…
$ where_drink <chr> NA, NA, NA, NA, NA, NA, "At a cafe, At th…
$ brew <chr> NA, "Pod/capsule machine (e.g. Keurig/Nes…
$ brew_other <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ purchase <chr> NA, NA, NA, NA, NA, NA, "National chain (…
$ purchase_other <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ favorite <chr> "Regular drip coffee", "Iced coffee", "Re…
$ favorite_specify <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ additions <chr> "No - just black", "Sugar or sweetener, N…
$ additions_other <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ dairy <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ sweetener <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ style <chr> "Complex", "Light", "Complex", "Complex",…
$ strength <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ roast_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ caffeine <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ expertise <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ coffee_a_bitterness <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_a_acidity <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_a_personal_preference <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_a_notes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ coffee_b_bitterness <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_b_acidity <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_b_personal_preference <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_b_notes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ coffee_c_bitterness <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_c_acidity <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_c_personal_preference <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_c_notes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ coffee_d_bitterness <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_d_acidity <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_d_personal_preference <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_d_notes <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ prefer_abc <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ prefer_ad <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ prefer_overall <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ wfh <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ total_spend <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ why_drink <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ why_drink_other <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ taste <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ know_source <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ most_paid <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ most_willing <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ value_cafe <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ spent_equipment <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ value_equipment <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ gender <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ gender_specify <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ education_level <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ ethnicity_race <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ ethnicity_race_specify <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ employment_status <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ number_children <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ political_affiliation <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
submission_id
age
cups
where_drink
brew
brew_other
purchase
purchase_other
favorite
favorite_specify
additions
additions_other
dairy
sweetener
style
strength
roast_level
caffeine
expertise
coffee_a_bitterness
coffee_a_acidity
coffee_a_personal_preference
coffee_a_notes
coffee_b_bitterness
coffee_b_acidity
coffee_b_personal_preference
coffee_b_notes
coffee_c_bitterness
coffee_c_acidity
coffee_c_personal_preference
coffee_c_notes
coffee_d_bitterness
coffee_d_acidity
coffee_d_personal_preference
coffee_d_notes
prefer_abc
prefer_ad
prefer_overall
wfh
total_spend
why_drink
why_drink_other
taste
know_source
most_paid
most_willing
value_cafe
spent_equipment
value_equipment
gender
gender_specify
education_level
ethnicity_race
ethnicity_race_specify
employment_status
number_children
political_affiliation
[1] "18-24 years old" "25-34 years old" "35-44 years old" "55-64 years old"
[5] NA "<18 years old" ">65 years old" "45-54 years old"
[1] "Regular drip coffee" "Iced coffee"
[3] "Latte" "Pourover"
[5] NA "Other"
[7] "Cortado" "Cappuccino"
[9] "Espresso" "Cold brew"
[11] "Americano" "Mocha"
[13] "Blended drink (e.g. Frappuccino)"
ggplot(coffee_survey)+geom_bar(aes(favorite,fill=favorite))+
coord_flip(clip = "off")+labs(title='favorite')
[1] NA "Other (please specify)" "Female"
[4] "Male" "Non-binary" "Prefer not to say"
ggplot(coffee_survey)+geom_bar(aes(gender,fill=gender))+
coord_flip(clip = "off")+labs(title='gender')
[1] NA "Other (please specify)"
[3] "White/Caucasian" "Asian/Pacific Islander"
[5] "Black/African American" "Hispanic/Latino"
[7] "Native American/Alaska Native"
ggplot(coffee_survey)+geom_bar(aes(ethnicity_race))+
coord_flip(clip = "off")+labs(title='ethnicity_race')
[1] NA "Bachelor's degree"
[3] "Master's degree" "Less than high school"
[5] "Some college or associate's degree" "Doctorate or professional degree"
[7] "High school graduate"
ggplot(coffee_survey)+geom_bar(aes(education_level,fill=education_level))+
coord_flip(clip = "off")+labs(title='education_level')
[1] NA "Employed full-time" "Unemployed"
[4] "Student" "Employed part-time" "Retired"
[7] "Homemaker"
ggplot(coffee_survey)+geom_bar(aes(employment_status))+
coord_flip(clip = "off")+labs(title='employment_status')
[1] NA ">$100" "$40-$60" "$20-$40" "$60-$80" "<$20" "$80-$100"
ggplot(coffee_survey)+geom_bar(aes(total_spend))+
coord_flip(clip = "off")+labs(title='total_spend')
[1] NA "Democrat" "No affiliation" "Independent"
[5] "Republican"
ggplot(coffee_survey)+geom_bar(aes(political_affiliation))+
coord_flip(clip = "off")+labs(title='political_affiliation')
Dataset Description
The coffee survey was collected from people participated in a YouTube event called the “Great American Coffee Taste Test”. This event was primairly to taste test 4 different types of coffee, while the entire survey collected a wide variety of information. It has a total of 4042 participants/observations (rows), and 57 variables (columns).
I chose this data set because I am an avid coffee drinker and would like to know what other fellow coffee lovers like to drink, and if they spend as much money on coffee as I do. I also used to work at a local Mexican style coffee shop named Tierra Mia in the Los Angles area. Exploring this data set would just give me insight into the community that I would like to have.
Questions
The two questions you want to answer.
Q1: Which age group spends the most money on coffee per month? There are 7 age groups. 5 range from 18-65. One is <18, and another is >65. Effectively covering ages from all walks of life.
Q2: What is the favorite kinds of coffee for each group: gender, education level, employment status, and political affiliations, and ethnicity/race?
Analysis plan
- I am planning on creating either a heat map like plot of the with count as the gradient or stacked bar charts
- The variables I need to use are listed below along with the question that was asked to participants in the survey:
- total_spend: In total, how much money do you typically spend on coffee in a month?
- favorite: What is your favorite coffee drink?
- age: What is your age?
- employment_status
- education_level
- gender
- political_affiliation
- ethnicity_race
- Will have to create a count of total_spend variable per age group and count of favorite drinks per age group.
Task List:
Task Name | Status | Assignee | Due | Priority | Summary |
---|---|---|---|---|---|
Import Data | complete | luis | 05/29 | Low | use read_csv to import data from the tidyTuesday github repo |
Explore Data | complete | luis | 06/01 | Moderate | plot counts for each of the categorical variables to get a feel for whats in the data |
Q1 | Complete | luis | 06/03 | High | how much money does each age group spend on coffee? |
Q2 | Complete | luis | 06/05 | High | Favorite coffee? |
Write Up | Complete | luis | 06/07 | High | |
Presentation | Complete | luis | 06/10 | High | |
Final Review | Complete | luis | 06/15 | Moderate |