The goal of small research is to analyze the survey among young people and produce recommendations for a Slovak start-up company on two main points:
- how to frame the ad campaign?
- who to target with the products?
The company offers a wide range of products, however, here the focus is on the new superfood delivery.
Before examining the data and building a Bayesian network, a nice idea is to search for the existing research and see what factors may be possibly connected to the positive attitude towards the superfood. Again, I formulated several Google searches to answer the following:
- what is exactly superfood??
- is there any existing research on the superfood?
- who buys superfood?
According to the Harvard Health Blog, superfood is:
Uber Eats also has such “superfood delivery” feature! Smoked salmon open topped sandwich… yum. As the website states, the superfoods they provide are:
Summing up, seems like an ideal potential customer may be someone who follows a healthy lifestyle, loves fruits & vegetables (more specific food preferences and requests), likely follows the trends (maybe even in tech, you know: grabbing a smoothie after a long practice and closing activity rings on Apple Watch..). And, perhaps, a female. But targeting by gender is rather criticized lately, so it’s better not to do that, at least with the superfood delivery [4]. Actually, sounds like Gen Z or millennials.
Based on Australian publication, 22.5% of metrotechs use food delivery services, and this is the highest percentage among other categories identified here.
Metrotechs: “Socially aware, successful, career focussed and culturally diverse, are trend and tech focused. They are committed experience seekers, willing to spend big on the best of city life and thrive on being out and about in the world.”
Well, this is quite consistent with the suggestions from the previous sub-section and my general knowledge. Nice!
The dataset contains 150 variables that may be split into the following groups:
As the aim of this research is to find a target group for the new superfood delivery, music & movie preferences & phobias are less likely to influence the adoption of such product and, at least on this stage, can be dropped. As for other categories, I’ve selected questions related to healthy lifestyle, attitude to animals and spendings on healthy food. The latter is selected as an “outcome” variable because it directly relates to whether a person is willing to spend money on good & healthy food, i.e. superfood 🍏. Also meaning that it would best predict the adoption of the superfood delivery app as it implies delivering the high quality & healthy products.
yps_r = read_csv("young-people-survey/columns.csv")
yps_c = read_csv("young-people-survey/responses.csv")
“I will happily pay more money for good, quality or healthy food” - the selected outcome variable among those available in the survey, which would best predict the adoption of the superfood delivery.
In the next section, I present the hypotheses and, based on them, create a Bayesian network.
group | question | scale |
---|---|---|
health habits | Smoking habits | Never smoked - Tried smoking - Former smoker - Current smoker |
health habits | Drinking | Never - Social drinker - Drink a lot |
health habits | I live a very healthy lifestyle | Strongly disagree 1-2-3-4-5 Strongly agree |
traits, views on life & opinions | I eat because I have to. I don’t enjoy food and eat as fast as I can | Strongly disagree 1-2-3-4-5 Strongly agree |
traits, views on life & opinions | I worry about my health | Strongly disagree 1-2-3-4-5 Strongly agree |
traits, views on life & opinions | I am always full of life and energy | Strongly disagree 1-2-3-4-5 Strongly agree |
outcome, spending habits | I will happily pay more money for good, quality or healthy food | Strongly disagree 1-2-3-4-5 Strongly agree |
demographics | Age | int. |
demographics | Gender | Female-Male |
demographics | Highest education achieved | Currently a Primary school pupil - Primary school - Secondary school - College/Bachelor degree |
The first step before proceeding to the creation of the Bayesian network is variables selection. In total, I ended up with the 9 variables related to demographic information, bad habits, healthy eating and spendings on such eating. Further I removed rows with NAs, the final dataset resulting into 983 observations & 9 columns.
bn_yps = yps_c %>% dplyr::select("Age", "Education","Smoking", "Alcohol", "Health", "Healthy eating", "Eating to survive", "Energy levels", #"Passive sport", "Active sport",
"Spending on healthy eating") %>% na.omit()
kableExtra::kable(head(bn_yps), format = "markdown")
Age | Education | Smoking | Alcohol | Health | Healthy eating | Eating to survive | Energy levels | Spending on healthy eating |
---|---|---|---|---|---|---|---|---|
20 | college/bachelor degree | never smoked | drink a lot | 1 | 4 | 1 | 5 | 3 |
19 | college/bachelor degree | never smoked | drink a lot | 4 | 3 | 1 | 3 | 2 |
20 | secondary school | tried smoking | drink a lot | 2 | 3 | 5 | 4 | 2 |
22 | college/bachelor degree | former smoker | drink a lot | 1 | 3 | 1 | 2 | 1 |
20 | secondary school | tried smoking | social drinker | 3 | 4 | 1 | 5 | 4 |
20 | secondary school | never smoked | never | 3 | 2 | 2 | 4 | 4 |
Transforming to factors and mapping the levels:
age
- in accordance with the original specificationage
- tried to split into more/less explainable intervals, 4 yrs each:
summary(bn_yps$Age)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.00 19.00 20.00 20.44 22.00 30.00
Next, the basic statistics of the data, showing the number of (1) factor levels, (2) number of observations in each. What are the main parts here?
## The data contains 983 observations of the following variables:
## - Age: 4 levels: 15-18 (n = 210); 19-22 (n = 605); 23-26 (n = 118) and 27-30 (n = 50)
## - Education: 6 levels: college/bachelor degree (n = 204); currently a primary school pupil (n = 9); doctorate degree (n = 5); masters degree (n = 77); primary school (n = 77) and secondary school (n = 611)
## - Smoking: 4 levels: current smoker (n = 184); former smoker (n = 174); never smoked (n = 201) and tried smoking (n = 424)
## - Alcohol: 3 levels: drink a lot (n = 219); never (n = 118) and social drinker (n = 646)
## - Health: 5 levels: Strongly Disagree (n = 72); Disagree (n = 129); Neutral (n = 390); Agree (n = 261) and Strongly Agree (n = 131)
## - Healthy eating: 5 levels: Strongly Disagree (n = 79); Disagree (n = 132); Neutral (n = 496); Agree (n = 231) and Strongly Agree (n = 45)
## - Eating to survive: 5 levels: Strongly Disagree (n = 350); Disagree (n = 279); Neutral (n = 184); Agree (n = 110) and Strongly Agree (n = 60)
## - Energy levels: 5 levels: Strongly Disagree (n = 31); Disagree (n = 85); Neutral (n = 302); Agree (n = 361) and Strongly Agree (n = 204)
## - Spending on healthy eating: 5 levels: Strongly Disagree (n = 40); Disagree (n = 131); Neutral (n = 279); Agree (n = 317) and Strongly Agree (n = 216)
Mostly, young people do not eat just to survive. However, 170 agreed (& strongly) with this statement, and probably represent the audience that we do not want to target as they don’t like food that much. Half (496) of the respondents neutrally estimated the “I live a very healthy lifestyle”, i.e. neither agreed not disagreed. Though, 231 and 45 agreed and strongly agreed. And these are the ones that better be targeted. Healthy lifestylers need heathy food. Good finding: students mostly do feel energetic.
As for the outcome, the majority agreed that they would happily pay more money for good, quality or healthy food, nice!
The hypotheses behind that are as following:
Age
: as stated in the previous sections, younger people faster adapt to various services that emerge, and for them it’s easier to get into something new. So, delivering superfood may sound super attractive for them, especially if having a healthy lifestyle is a trend.Education
: people with higher levels of educations likely know more about various vitamins and benefits of healty lifestyleSmoking
: smoking is definitely not a part of a healty lifestyle, so, perhaps, current smokers are not the best audience for targetingAlcohol
: same applies to drinking - very bad for healthHealth
: being concerned about the health means that a person cares about his/her health, which is goodHealthy eating
: one of the most important variables that should be taken into the account is it. If a person prefers eating healthy foods & in addition finds it difficult to get wanted product, he/she may be willing to pay for delivery; directly influences the outcome as if a person strognly agrees on eating health, then he/she probobly is ready to pay more; also influenced by eating to survive
and health
: just surviving - not healthy eating. Cares about health - eats healthy.Eating to survive
: if a person does not like food, it’s better not to target them & not make feel even worse while reminding them something that they don’t like;Energy levels
- sometimes being energetic means being open for new suggestions - superfood delivry; especially if you have a healthy lifestyle! this also has ties to Health, Alcohol and Smoking as bad habits may badly influence the organism & energySpending on healthy eating
- shows whether and how much a person would spend on such food?## Loading required namespace: Rgraphviz
For simplicity, split variables into “rather_yes” (agrees) and “rather no” (neutral or disagrees)
bn_yps$spending_on_healthy_eating = plyr::mapvalues(bn_yps$spending_on_healthy_eating, from = c("Strongly Disagree", "Disagree", "Neutral", "Agree", "Strongly Agree"), to=c("rather_no","rather_no", "rather_no","rather_yes","rather_yes"))
bn_yps$healthy_eating = plyr::mapvalues(bn_yps$healthy_eating, from = c("Strongly Disagree", "Disagree", "Neutral", "Agree", "Strongly Agree"), to=c("rather_no","rather_no", "rather_no","rather_yes","rather_yes"))
bn_yps$energy_levels = plyr::mapvalues(bn_yps$energy_levels, from = c("Strongly Disagree", "Disagree", "Neutral", "Agree", "Strongly Agree"), to=c("rather_no","rather_no", "rather_no","rather_yes","rather_yes"))
bn_yps$eating_to_survive = plyr::mapvalues(bn_yps$eating_to_survive, from = c("Strongly Disagree", "Disagree", "Neutral", "Agree", "Strongly Agree"), to=c("rather_no","rather_no", "rather_no","rather_yes","rather_yes"))
bn_yps$health = plyr::mapvalues(bn_yps$health, from = c("Strongly Disagree", "Disagree", "Neutral", "Agree", "Strongly Agree"), to=c("rather_no","rather_no", "rather_no","rather_yes","rather_yes"))
net_yps = bn.fit(bn, data = data.frame(bn_yps))
The probability of subscription based on our small dataset is 0.49%.
set.seed(17)
cpquery(net_yps, event = (spending_on_healthy_eating == "rather_yes"), evidence = TRUE)
## [1] 0.4916
Let’s look at several combinations. Some of them present ones that are suitable for targeting, while some - bad ones that should be avoided.
set.seed(17)
cpquery(net_yps, event = (spending_on_healthy_eating == "rather_yes"),
evidence = (eating_to_survive=="rather_no" & health=="rather_yes" & alcohol=="never" & smoking=="never smoked"&
energy_levels=="rather_yes" & education=="college/bachelor degree" & age=="19-22"))
## [1] 0.5714286
set.seed(17)
cpquery(net_yps, event = (spending_on_healthy_eating == "rather_yes"),
evidence = (eating_to_survive=="rather_no" & health=="rather_yes" & alcohol=="social drinker" & smoking=="never smoked"&
energy_levels=="rather_yes" & education=="college/bachelor degree" & age=="19-22"))
## [1] 0.5
set.seed(17)
cpquery(net_yps, event = (spending_on_healthy_eating == "rather_yes"),
evidence = (eating_to_survive=="rather_yes" & health=="rather_no" & alcohol=="social drinker" & smoking=="current smoker"&
energy_levels=="rather_no" & education=="college/bachelor degree" & age=="19-22"))
## [1] 0.4444444
In sum this means that the more caring a person is about health and the less bad habits - the better he/she fits for targeting.
How does the education relate to the outcome?
Overall, there might be some relationship between the education level and spendings on healthy food.
## spending_on_healthy_eating
## education rather_no rather_yes
## college/bachelor degree 0.4614959 0.5385041
## currently a primary school pupil 0.5203279 0.4796721
## doctorate degree 0.4743548 0.5256452
## masters degree 0.4753624 0.5246376
## primary school 0.4235053 0.5764947
## secondary school 0.4658607 0.5341393
Bad habits. Firstly, let’s look at smoking. Seems like it does not influence the outcome that much! Perhaps, because it does not require drinking or eating something unhealthy.
## spending_on_healthy_eating
## smoking rather_no rather_yes
## current smoker 0.4625700 0.5374300
## former smoker 0.4627665 0.5372335
## never smoked 0.4633831 0.5366169
## tried smoking 0.4629230 0.5370770
So does the alcohol! Thi turned out to be quite surprising: not much difference here as well. This leads to the conclusion that bad habits do not relate to spendings on health that much.
## spending_on_healthy_eating
## alcohol rather_no rather_yes
## drink a lot 0.4629967 0.5370033
## never 0.4632088 0.5367912
## social drinker 0.4628463 0.5371537
Eating healthy? Then the probability increases from 0.48 to 0.69! People who follow healthy diets are willing to pay more for good & healthy food.
## spending_on_healthy_eating
## healthy_eating rather_no rather_yes
## rather_no 0.5212930 0.4787070
## rather_yes 0.3130869 0.6869131
Those who eat for survival also do have lower probability of the service adoption. If he/she rather does consume food in that way, the probability is 0.457, while if not - 0.57.
## spending_on_healthy_eating
## eating_to_survive rather_no rather_yes
## rather_no 0.4439951 0.5560049
## rather_yes 0.5534447 0.4465553
What about age? Well, not much difference here too. However, it’s better to focus on people from, 15 to 26.
## spending_on_healthy_eating
## age rather_no rather_yes
## 15-18 0.4424410 0.5575590
## 19-22 0.4629685 0.5370315
## 23-26 0.4659383 0.5340617
## 27-30 0.5412861 0.4587139
This work examined the factors that may lead to the adoption of the superfood delivery service. Overall, the company should target clients aged 15-26 who follow a healthy lifestyle, are energetic ,and pursue education. This also may be done regardless of whether a person has bad habits as according to the conditional probability tables, drinking or smoking does not influnce the willingness to spend money on healthy food.
[1] 10 superfoods to boost a healthy diet - Harvard Health Blog
[2] Sizing up ‘superfoods’ for heart health - Harvard Health
[3] Superfood Delivery | Uber Eats
[4] Facebook Axes Age, Gender and Other Targeting for Some Sensitive Ads - WSJ
[5] Metrotechs and Millennials have taken to Uber Eats, Menulog, Deliveroo, Foodora and more
[6] Helix Personas - Roy Morgan Research
guess now i’m closer to being a healthy analyst.