Coffee Survey Proposal

Proposal

Project description: How much money do different age groups spend on coffee? How do coffee type preferences differ by age group?
Author
Affiliation

Stats for Stacks

School of Information, University of Arizona

if (!require("pacman")) 
  install.packages("pacman")
# use this line for installing/loading
pacman::p_load(readr,dplyr,ggplot2,scico,here,tidyverse,ggrepel,devtools,
               ggridges,dsbox,fs,janitor)

Dataset

coffee_survey <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-05-14/coffee_survey.csv')

glimpse(coffee_survey)
Rows: 4,042
Columns: 57
$ submission_id                <chr> "gMR29l", "BkPN0e", "W5G8jj", "4xWgGr", "…
$ age                          <chr> "18-24 years old", "25-34 years old", "25…
$ cups                         <chr> NA, NA, NA, NA, NA, NA, NA, NA, "Less tha…
$ where_drink                  <chr> NA, NA, NA, NA, NA, NA, "At a cafe, At th…
$ brew                         <chr> NA, "Pod/capsule machine (e.g. Keurig/Nes…
$ brew_other                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ purchase                     <chr> NA, NA, NA, NA, NA, NA, "National chain (…
$ purchase_other               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ favorite                     <chr> "Regular drip coffee", "Iced coffee", "Re…
$ favorite_specify             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ additions                    <chr> "No - just black", "Sugar or sweetener, N…
$ additions_other              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ dairy                        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ sweetener                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ style                        <chr> "Complex", "Light", "Complex", "Complex",…
$ strength                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ roast_level                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ caffeine                     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ expertise                    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ coffee_a_bitterness          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_a_acidity             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_a_personal_preference <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_a_notes               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ coffee_b_bitterness          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_b_acidity             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_b_personal_preference <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_b_notes               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ coffee_c_bitterness          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_c_acidity             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_c_personal_preference <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_c_notes               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ coffee_d_bitterness          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_d_acidity             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_d_personal_preference <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA…
$ coffee_d_notes               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ prefer_abc                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ prefer_ad                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ prefer_overall               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ wfh                          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ total_spend                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ why_drink                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ why_drink_other              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ taste                        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ know_source                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ most_paid                    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ most_willing                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ value_cafe                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ spent_equipment              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ value_equipment              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ gender                       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ gender_specify               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ education_level              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ ethnicity_race               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ ethnicity_race_specify       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ employment_status            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ number_children              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ political_affiliation        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
coffee_survey |> 
  colnames() |> 
  cat(sep = "\n")
submission_id
age
cups
where_drink
brew
brew_other
purchase
purchase_other
favorite
favorite_specify
additions
additions_other
dairy
sweetener
style
strength
roast_level
caffeine
expertise
coffee_a_bitterness
coffee_a_acidity
coffee_a_personal_preference
coffee_a_notes
coffee_b_bitterness
coffee_b_acidity
coffee_b_personal_preference
coffee_b_notes
coffee_c_bitterness
coffee_c_acidity
coffee_c_personal_preference
coffee_c_notes
coffee_d_bitterness
coffee_d_acidity
coffee_d_personal_preference
coffee_d_notes
prefer_abc
prefer_ad
prefer_overall
wfh
total_spend
why_drink
why_drink_other
taste
know_source
most_paid
most_willing
value_cafe
spent_equipment
value_equipment
gender
gender_specify
education_level
ethnicity_race
ethnicity_race_specify
employment_status
number_children
political_affiliation
coffee_survey <- coffee_survey|>
  janitor::clean_names() 
  
unique(coffee_survey$age)
[1] "18-24 years old" "25-34 years old" "35-44 years old" "55-64 years old"
[5] NA                "<18 years old"   ">65 years old"   "45-54 years old"
ggplot(coffee_survey)+geom_bar(aes(age,fill=age))+ 
  coord_flip(clip = "off")+labs(title='age')

unique(coffee_survey$favorite)
 [1] "Regular drip coffee"              "Iced coffee"                     
 [3] "Latte"                            "Pourover"                        
 [5] NA                                 "Other"                           
 [7] "Cortado"                          "Cappuccino"                      
 [9] "Espresso"                         "Cold brew"                       
[11] "Americano"                        "Mocha"                           
[13] "Blended drink (e.g. Frappuccino)"
ggplot(coffee_survey)+geom_bar(aes(favorite,fill=favorite))+ 
  coord_flip(clip = "off")+labs(title='favorite')

unique(coffee_survey$gender)
[1] NA                       "Other (please specify)" "Female"                
[4] "Male"                   "Non-binary"             "Prefer not to say"     
ggplot(coffee_survey)+geom_bar(aes(gender,fill=gender))+ 
  coord_flip(clip = "off")+labs(title='gender')

unique(coffee_survey$ethnicity_race)
[1] NA                              "Other (please specify)"       
[3] "White/Caucasian"               "Asian/Pacific Islander"       
[5] "Black/African American"        "Hispanic/Latino"              
[7] "Native American/Alaska Native"
ggplot(coffee_survey)+geom_bar(aes(ethnicity_race))+ 
  coord_flip(clip = "off")+labs(title='ethnicity_race')

unique(coffee_survey$education_level)
[1] NA                                   "Bachelor's degree"                 
[3] "Master's degree"                    "Less than high school"             
[5] "Some college or associate's degree" "Doctorate or professional degree"  
[7] "High school graduate"              
ggplot(coffee_survey)+geom_bar(aes(education_level,fill=education_level))+ 
  coord_flip(clip = "off")+labs(title='education_level')

unique(coffee_survey$employment_status)
[1] NA                   "Employed full-time" "Unemployed"        
[4] "Student"            "Employed part-time" "Retired"           
[7] "Homemaker"         
ggplot(coffee_survey)+geom_bar(aes(employment_status))+ 
  coord_flip(clip = "off")+labs(title='employment_status')

unique(coffee_survey$total_spend)
[1] NA         ">$100"    "$40-$60"  "$20-$40"  "$60-$80"  "<$20"     "$80-$100"
ggplot(coffee_survey)+geom_bar(aes(total_spend))+ 
  coord_flip(clip = "off")+labs(title='total_spend')

unique(coffee_survey$political_affiliation)
[1] NA               "Democrat"       "No affiliation" "Independent"   
[5] "Republican"    
ggplot(coffee_survey)+geom_bar(aes(political_affiliation))+ 
  coord_flip(clip = "off")+labs(title='political_affiliation')

Dataset Description

  • The coffee survey was collected from people participated in a YouTube event called the “Great American Coffee Taste Test”. This event was primairly to taste test 4 different types of coffee, while the entire survey collected a wide variety of information. It has a total of 4042 participants/observations (rows), and 57 variables (columns).

  • I chose this data set because I am an avid coffee drinker and would like to know what other fellow coffee lovers like to drink, and if they spend as much money on coffee as I do. I also used to work at a local Mexican style coffee shop named Tierra Mia in the Los Angles area. Exploring this data set would just give me insight into the community that I would like to have.

Questions

The two questions you want to answer.

Q1: Which age group spends the most money on coffee per month? There are 7 age groups. 5 range from 18-65. One is <18, and another is >65. Effectively covering ages from all walks of life.

Q2: What is the favorite kinds of coffee for each group: gender, education level, employment status, and political affiliations, and ethnicity/race?

Analysis plan

  • I am planning on creating either a heat map like plot of the with count as the gradient or stacked bar charts
  • The variables I need to use are listed below along with the question that was asked to participants in the survey:
    • total_spend: In total, how much money do you typically spend on coffee in a month?
    • favorite: What is your favorite coffee drink?
    • age: What is your age?
    • employment_status
    • education_level
    • gender
    • political_affiliation
    • ethnicity_race
    • Will have to create a count of total_spend variable per age group and count of favorite drinks per age group.

Task List:

Task Name Status Assignee Due Priority Summary
Import Data complete luis 05/29 Low use read_csv to import data from the tidyTuesday github repo
Explore Data complete luis 06/01 Moderate plot counts for each of the categorical variables to get a feel for whats in the data
Q1 Complete luis 06/03 High how much money does each age group spend on coffee?
Q2 Complete luis 06/05 High Favorite coffee?
Write Up Complete luis 06/07 High
Presentation Complete luis 06/10 High
Final Review Complete luis 06/15 Moderate