Great American Coffee Taste Test

INFO 526 - Summer 2024 - Final Project

Project description: How much money do different age groups spend on coffee? How do coffee type preferences differ by age group?
Author
Affiliation

Stats for Stacks - Luis Estrada

School of Information, University of Arizona

submission_id
age
cups
where_drink
brew
brew_other
purchase
purchase_other
favorite
favorite_specify
additions
additions_other
dairy
sweetener
style
strength
roast_level
caffeine
expertise
coffee_a_bitterness
coffee_a_acidity
coffee_a_personal_preference
coffee_a_notes
coffee_b_bitterness
coffee_b_acidity
coffee_b_personal_preference
coffee_b_notes
coffee_c_bitterness
coffee_c_acidity
coffee_c_personal_preference
coffee_c_notes
coffee_d_bitterness
coffee_d_acidity
coffee_d_personal_preference
coffee_d_notes
prefer_abc
prefer_ad
prefer_overall
wfh
total_spend
why_drink
why_drink_other
taste
know_source
most_paid
most_willing
value_cafe
spent_equipment
value_equipment
gender
gender_specify
education_level
ethnicity_race
ethnicity_race_specify
employment_status
number_children
political_affiliation

Introduction

The coffee survey was collected from people that participated in a YouTube event called the “Great American Coffee Taste Test”. This event was primairly to taste test 4 different types of coffee, while the entire survey collected a wide variety of information. It has a total of 4042 participants/observations (rows), and 57 variables (columns).

Based on the data exploration done during the proposal, the demographics are of people between the ages of 18-64 years, with bachelors degrees, male, full time employees, and white/Caucasian ethnicity. This is worth noting in my mind, to try and understand how representative the data is of the whole population at large.

Question 1: How much money does each age group spend on coffee?

Intro:

The first question I chose to address for this project is: Which age group spends the most money on coffee per month? It then evolved into how much money does each age group spend on coffee?

The variables need to explore this question:

  • age: What is your age?
  • total_spend: In total, how much money do you typically spend on coffee in a month?

This interested me because as an avid coffee drinker, I would like to know if other folks spend as much money on coffee as i do.

Approach:

For this question I chose to create a stacked bar chart in order look at numerical values across multiple categorical variables. This required me to derive a count for all of the reported total_spend ranges, group by age, and allowed me to show all of the data together on one chart.

Analysis:

Discussion:

At face value, the youngest age group has the highest percentage of people that spend >$100 per month. This subgroup also has the smallest number of total entries, so its likely just outlier data.

Thereafter, the trend appears to be that the older folks get the more they spend on coffee untill about the 55-64 year age group. Ac cross all groups the most people spend between $20-$60 on coffee per month.

Question 2: What is the favorite kinds of coffee for each group: gender, education level, employment status, and political affiliations, and ethnicity/race?

Intro:

The second question I chose to address for this project is: What is the favorite kinds of coffee for each group: gender, education level, employment status, and political affiliations, and ethnicity/race?

The variables need to explore this question:

  • favorite: What is your favorite coffee drink?
  • employment_status
  • education_level
  • gender
  • political_affiliation
  • ethnicity_race

Approach:

For this question I chose to create another stacked bar chart. This was done for a similar reason to question where we are dealing with multiple categorical variables. To be able to generate a stacked bar chart, I had to transform the dataframe by doing a pivot long on the categorical variables of interest into a column called “personal_Info”, grouping by this new column, and taking a count of the favorite coffee drinks. To complete the graph I had to integrate a facet_wrap layer, with independent y-axis. Similar to Q1, all of this allowed me to show all of the data together on one chart.

Analysis:

Discussion:

During the data exploration phase of the proposal, it was shown that the most popular coffee drinks where regular drip coffee, pourovers, and lattes. This trend held true overall when separated out over all of the demographic data of interest. Other drinks of high interest where cappuccinos, and good ole plain espresso, or ole-reliable as I like to call it.