Data
The data is from https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2024/2024-06-04/cheeses.csv. The data is from the cheese datset from Tidy Tuesday and can be found through the link or from searching up their github repository. The original data itself was scraped from https://www.cheese.com/ which is a website containing all kinds of cheeses for purchase. The data contains 1187 different cheeses from from various regions all over the world. The data also contains 19 different variables for each cheese. The following is a table including each variables name, class, and description.
## variable class
## 1 cheese character
## 2 url character
## 3 milk character
## 4 country character
## 5 region character
## 6 family character
## 7 type character
## 8 fat_content character
## 9 calcium_content character
## 10 texture character
## 11 rind character
## 12 color character
## 13 flavor character
## 14 aroma character
## 15 vegetarian logical
## 16 vegan logical
## 17 synonyms character
## 18 alt_spellings character
## 19 producers character
## description
## 1 Name of the cheese.
## 2 Location of the cheese's description at cheese.com
## 3 The type of milk used for the cheese, when known.
## 4 The country or countries of origin of the cheese.
## 5 The region in which the cheese is produced, either within the country of origin, or as a wider description of multiple countries.
## 6 The family to which the cheese belongs, if any.
## 7 The broad type or types to describe the cheese.
## 8 The fat content of the cheese, as a percent or range of percents.
## 9 The calcium content of the cheese, when known. Values include units.
## 10 The texture of the cheese.
## 11 The type of rind used in producing the cheese.
## 12 The color of the cheese.
## 13 Characteristic(s) of the taste of the cheese.
## 14 Characteristic(s) of the smell of the cheese.
## 15 Whether cheese.com considers the cheese to be vegetarian.
## 16 Whether cheese.com considers the cheese to be vegan.
## 17 Alternative names of the cheese.
## 18 Alternative spellings of the name of the cheese (likely overlaps with synonyms).
## 19 Known producers of the cheese.
Are Certain Cheese Textures More Commonly Associated with Specific Colors?
Cheeses come in many different textures and colors. Within our dataset, the textures vary among soft, hard, and firm while the colors vary among ivory, orange, white, and yellow. Understanding the relationship between cheese texture and color leads to many realizations and new questions. Before we can understand why certain cheese textures may be related to certain cheese colors, we must first examine if such relationships exist. Knowledge of such relationships can help customers make more educated purchasing decisions and can help cheese manufacturers better innovate towards idealized and targeted combinations. For example, if a customer prefers a certain type of texture, they can make a more educated purchase by purchasing cheese colors that are more strongly correlated with that texture. On the manufacturing side, a producer may want to create a firm cheese, then they will know which color more commonly occurs within firm cheeses and from there they can more accurately predict the final outcome appearance of their cheese and know to look into ways of changing their cheeses color ahead of time if need be. Overall, understanding this question will better prepare customers and producers of cheese.
To begin with answering our question, we decided to make a heatmap that examines the count of different cheese type and color combinations. A heatmap would help us visualize if certain combinations are more common than others.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
The above heatmap shows the relationship between cheese texture on the y-axis (soft, firm and hard) and cheese color on the x-axis (ivory, orange, white, and yellow). The shading and numbers in each tile represent the number of cheeses that fall into each combination. The darker the shade, the more cheeses that happen in that combination. Immediately, we see two clear patterns. First, yellow and white cheeses are extremely common overall. For example, we see that hard yellow cheeses are the most common group, with 251 entries, followed closely by soft white cheeses with 173. This suggests that both hard and soft cheeses are more commonly produced in these colors - likely due to their aging process and milk source. Second, firm cheeses are rare in general, with very low counts across all colors - only 5 to 7 per cell. This could reflect either fewer cheeses being described with the word “firm”, or that firm cheeses are less popular or less documented by the dataset. Interestingly, orange cheeses are also quite rare. Overall, the heatmap clearly visualizes how texture and color intersect in cheese classification, and highlights a strong skew toward yellow and white cheeses, especially among soft and hard textures. This gives us a larger scale overall view of the spread of cheeses.
After seeing which cheese combinations are the most popular and by how much, we decided the next best step towards answering our question was to determine if certain cheese types occurred higher in proportion among certain cheese colors. Therefore, we created a spine chart. With the spine chart, we hope to see more respective distributions of cheeses.
From the spine chart, we have cheese color again on the x-axis (ivory, orange, white, and yellow) and then we have the proportions on the y-axis. Then, we have the cheese types indicated by different colors. Red indicates “firm”, green indicates “hard”, and blue indicates “soft”. Once again, we can see from the small to no percentages of red within each cheese color category, firm cheeses are very rare. However, what we can see now that we could not see earlier, is the distribution of cheese types among each color. Within ivory cheeses, we can see that soft cheeses are the most popular at 65%. Within orange cheeses, we can see that hard and soft cheeses are almost equally prevalent at a respective 54% and 46%. Within white cheeses, we can see that soft cheeses are the most common at 82%. Within yellow cheeses, we can see that hard cheeses are the most abundant at 62%. From the spine chart, we can interpret that soft cheeses are more common within ivory and white colored cheeses while hard cheeses are more common within yellow cheeses and that they are both relatively equally common among orange cheeses.
In conclusion, we can see that yellow cheeses tend to be more often hard cheeses with the combination of yellow and hard being the most popular as well. Furthermore, the whiter cheeses that are white or ivory tend to be more often softer.
Moving Forward, it may be important to determine why yellow and hard cheeses are the most common. Perhaps it is due to ingredient differences, but that is data we lack. Also, determining why and causation can be difficult by just looking at data and graphs. However, the question of what makes a cheese a certain texture or color can be important in the production and brainstorming of cheese creation. One possible way, it can be determined through graphs and data is by finding further common variables that correlate between cheese types and colors such as the presence of certain ingredients.
How does Milk Type Correlate with Fat Content
Cheeses can be made from a variety of milk types and can even be made from a combination of milk types. Within our dataset, the types of milk vary among buffalo and cow, cow and sheep, cow and goat and sheep, cow and water buffalo, cow, sheep, goat, goat and sheep, cow and goat, water buffalo, camel, and moose. Understanding the relationship between milk type and fat content is very important for both consumers and producers. Especially with health consciousness on the rise, knowing which types of milk correlate with higher or lower concentrations of fat can help consumers determine which options may be better suited for their diets. Furthermore, producers can use the information to produce cheeses that are more directly aligned with their fat content goals. Overall, understanding this question can help consumers make more personalized decisions and producers make more personalized cheeses.
First we decided to create a boxplot to show the distribution of fat content across different milk types. Thus, we could see a broad overview of how fat content varies among different milk types. From there, hopefully we could determine trends such as whether or not certain milk types correletae with higher or lower fat contents.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Rows: 1187 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (17): cheese, url, milk, country, region, family, type, fat_content, cal...
## lgl (2): vegetarian, vegan
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
In the above boxplot, each box summarizes the middle 50% spread of fat values for cheeses made from that milk and the lines inside each box marks the median value. On the x-axis, we have all the varying different combinations and types of milk used to make the cheeses within the dataset. On the y-axis, we have the fat content. The gray dots represent individual cheeses further helping visualize the spread of data points beyond the boxplots alone. From the boxplots and gray dots, we can see that cheeses made from cow milk are the most common and also most variable. However, goat cheese is also very variable. Most of the cheeses have a median fat content around 45% with a few milk outliers in “Buffalo, Cow” cheese having the highest fat content and “Moose” cheese having the lowest fat content. “Camel” cheese also is very low. A surprising cheese that is relatively low in fat content is the “Cow, Goat” cheese which has a median fat content around 25%. This is interesting since cheese that contain just goat or cow milk have median fat contents around 45%. “Water Buffalo” cheese also has a relatively lower median fat content. From the boxplots, we can conclude that most cheese fat contents hover around 45% with a few outliers. Furthermore, the only milk type that is significantly different than other milk types and is not just a single data point is “Sheep” cheese and “Water Buffalo” cheese. “Sheep” cheese has a significantly higher fat content thatn “Water Buffalo”, however “Sheep” cheese still hovers around the more common 45%.
Now that we have a broad understanding of the distribution of fat content among milk types, we would like to see them in order according to their average fat content. This information would help us understand a more hierarchical ordering of fat content among milk type which would provide more insight on if certain milk types do have higher fat contents than others. To do so, we created a heatmap indicating fat content among milk types.
On the y-axis we have our different milk types and they are ordered
based on their average fat content. The shade indicates the average fat
content with the darker the shade representing the highter the fat
content. This graph allows us to better visualize which milk types have
higher fat contents. From the graph, we see that darkest shade belongs
to “Buffalo, Cow” cheese which does have the highest average fat content
as we also saw earlier. However, the Buffalo, Cow” cheese only has one
data point. Next, we see that “Cow, Sheep” cheese has the next highest
average fat content. From then on, the shades are relatively similar and
they only slightly become lighter until the “Camel” and “Moose” cheeses
which have significantly lighter shades visually since they have the
lowest average fat contents. However, once again, the “Camel” and
“Moose” cheeses only contain one data point. The lowest average fat
content cheese that contains more than one data point belongs to the
“Water Buffalo” cheeses.
In conclusion, it seems that though most milk types generate cheeses that feature a median fat content of around 45% with not many milk types being significantly different than others, there are still some milk types that are clearly lower or higher in fat content. For example, “Sheep” cheese is significantly higher in fat content than “Water Buffalo” cheese which is one of the relatively lower fat content cheeses. Another cheese that is relatively lower in fat content is “Cow, Goat” cheeses.
Moving Forward, it would help make better conclusions if there was more data on the more niche cheeses such as “Camel”, “Moose”, and “Buffalo, Cow”. Perhaps knowing the percentages used on the blended milk types would also provide further insight on why certain blends are fattier than others.
Do Certain Countries Have a Propensity Towards Producing Vegetarian Cheeses
Cheeses can contain a variety of ingredients in their manufacturing process that determine if a cheese is either vegetarian or not. The top five countries that have produced the most unique cheeses within our dataset are Canada, France, Italy, the United Kingdom, and the United States. Now determining whether each of these countries happen to produce more or less or equal unique products of vegetarian cheeses compared to each other can provide insight into a countries eating habits and priorities. Knowing this information can help consumers determine which country is best for their dietary habits and can help manufacturers determine which country best matches their demographic. Overall, understanding this question can provide insight on the cheese making habits of different countries which can help understand how consumer demands vary among countries. This knowledge can especially help manufacturers understand their markets.
We begin by creating a stacked bar chart to see the different distributions of vegetarianism among the different countries.
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
With the above stacked bar graph, the different countries are shown on
the x-axis (Canada, France, Italy, United Kingdom, and United States).
On the y-axis, are the proportion of whether the cheese is vegetarian or
not. The red indicates the proportion of non-vegetarian cheeses, and the
blue indicates the proportion of vegetarian cheeses. From the bar graph,
we can see that the United Kingdom, and the United States generate a
higher proportion of vegetarian cheeses than the other countries at
around 80% and 73% vegetarian respectively. France and Italy, on the
other hand, generate very little vegetarian cheeses at around 6% and 7%
vegetarian respectively. Canada generates around 30% vegetarian
cheeses.
Visually, we can see that the countries all generate different proportions of vegetarian non-vegetarian cheeses, but it would be helpful to conduct a chi-square test to truly see if the countries are significantly different in their cheese production.
##
## Pearson's Chi-squared test
##
## data: cheese_table
## X-squared = 209.39, df = 4, p-value < 2.2e-16
Our chi-squared test returned a p-value < 2.2e-16 which means we can reject the null hypothesis and we have sufficient evidence to conclude that the different countries produce different proportions of vegetarian and non-vegetarian cheeses. Next, we wanted to create a heatmap to get a better idea of how the proportions play out among the different countries and their production of vegetarian and non-vegetarian cheeses.
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
From the heat map, we have wheter or not the cheese is vegetarian on the x-axis, and the country on the y-axis. The darker the shade, the higher count exists within that cell. So, we can see that vegetarian cheeses from the United States are the most common with non-vegetarian cheeses from France being the next most common. Something interesting that we also notice that we could not see from earlier is that there are still more non-vegetarian cheeses from the United States than there are from the United Kingdom or Canada. There are also more non-vegetarian cheeses from the United States than there vegetarian cheeses from Italy, France, and Canada. Thus, the United States does produce a lot of cheeses within the dataset.
In conclusion, we can see from our two graphs and our test that different countries do tend to produce different proportions of vegetarian and non-vegetarian cheeses with the United States and the United Kingdom producing more vegetarian cheeses, while countries like France, Italy, and Canada produce more non-vegetarian cheeses.
Three main research issues are examined in this paper in order to determine the relationships between the various qualities of cheese: (1) Are some colors more frequently linked to particular textures? (2) What is the relationship between milk type and fat content? (3) Are vegetarian cheeses more likely to be produced in some nations than others? Both producers and consumers can benefit from knowing the answers to these questions. Based on texture preferences, nutritional objectives, or ethical considerations, consumers can use these information to inform their diet and purchase decisions. Finding trends in texture, fat content, and cultural preferences enables manufacturers to create and market products more precisely. When taken as a whole, these studies enhance our knowledge of how cheese properties change depending on various factors and emphasize the interaction of manufacturing processes, consumer preferences, and culinary tradition.