Over the past few years, food and drink have become an essential part of our social media footprints. This shouldn’t come as a surprise – after all, eating and drinking were social activities long before the first #foodporn hashtag on Instagram. In fact, scientific studies have showed that what we gobble up or gulp down is shaped by social and regional influences, and how we tend to mirror habits of people with shared social connections.
Nowadays, we have an unprecedented opportunity to study eating & drinking habits at scale, as people share more and more of that online, both on popular social networks like Instagram, Twitter, and Facebook, but also on “dedicated” apps like Yummly or Untappd.
Along these lines is our recent paper, “Of Wines and Reviews: Measuring and Modeling the Vivino Wine Social Network,” recently presented at the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2018) in Barcelona. The study – co-authored by former UCL undergraduate student Neema Kotonya, Italian wine journalist Paolo De Cristofaro, and UCL faculty Emiliano De Cristofaro – presented a preliminary study showcasing big-data and social network analysis of how users worldwide consume, rate, and provide reviews of wines. We did so through the lens of Vivino, a popular wine social network. (And, yes, Paolo is Emiliano’s brother! 😊)
What is Vivino?
Vivino.com is an online community for wine enthusiasts, available both as a web and a mobile application. It was founded in 2009 by Heini Zachariassen, with his colleague Theis Sondergaard joining in 2010. In a nutshell, Vivino allows users to review and purchase wines through third-party vendors. The mobile app also provides a “wine scanner” functionality – i.e., users can upload pictures of wine labels and access reviews and details about the wine/winery.
But Vivino is actually a social network, as it allows wine enthusiasts to communicate with and follow each other, as well as share reviews and recommendations. As of September 2018, it had 32 million users, 9.7 million wines covering a multitude of wine styles, grapes, and geographical regions, as well as 103.7 million ratings and almost 35 million reviews.
Vivino reports that their 5-star rating system (with 0.5 increments) has a good correlation with Robert Parker’s 100 point scale, and that its users are able to produce a greater number of ratings than the seven most prolific wine experts (they gave ratings for 1.4 million wines between 2011 and 2015, while only 370 thousand received expert ratings.
The Data
Between November 2016 and March 2017, we collected data about wines, wineries, and users from the Vivino website. (Note that, to avoid generating an extensive amount of traffic or causing possible issues for the site operators, we only made one request every five seconds.)
- Wines: We collected data for 1 million wines, including name, type, vintage, price, ratings, region, winery, grape.
- Wineries: We got records for 8,300 wineries, including location, ratings, etc.
- Users: We gathered information from the top 10,000 most “active” users from each of the 15 countries with the highest level of wine consumption according to the OIV; for instance, we got the number of followers, taste profile, etc.
- Reviews: Finally, we retrieved the reviews posted by the “top” users, getting 772,000 reviews for 86,600 wines.
What are the wines on Vivino like?
Our dataset includes 1 million wines from 49 distinct countries, but more than two thirds of them (68%) are actually from either France, Italy, Spain, or the US. Overall, 21 countries[a] account for 98% of the wines, so in the rest of the analysis we only focus on those countries. Here are some highlights:
- Ratings given by Vivino users are generally favorable, in fact the most common ratings are 3 or 4 stars. Wines from all countries have a mean rating of at least 3.2;
- The highest average ratings are for wines from to the US (3.84), Germany (3.79), and France (3.78);
- Lowest mean ratings are for Brazilian (3.36), Chilean (3.46) and Romanian (3.28) wines;
- Countries with the highest number of ratings per wine are Argentina (145.35), Chile (112.57), and Brazil (99.51), possibly due to a lower diversity of wines drunk by Vivino users.
- There is no evidence that ratings provided by Vivino users are influenced by the price of wines, i.e., more expensive wines don’t receive higher average ratings than less expensive wines.
- Vivino users are more likely to review wines from their home countries or neighboring countries e.g. the highest number of ratings for Argentinian wines is from Argentina (173,049) and Brazil (232, 271). For three countries, China, Russia and United Kingdom, this is not the case. Users from these countries are most likely to rate European wines.
Digging a bit deeper?
In order to provide visualizations of rating and vintage statistics, beyond simple averages, we use so-called “Cumulative Distribution Functions”, or CDFs. If you are interested in seeing a bunch of these, please take a look at our paper. Here is an example of an interesting CDF:
In this CDF plot, the x-axis represents vintages (or years) of production, and the y-axis reports the fraction of wines produced before or in that year. Note that we have several lines, in different colors, separating wines based on their average rating.
This allows us to quickly notice that very high rated wines tend to be “older.” More specifically, 60% (0.6) of wines with ratings above 4 (blue line) were produced on or before 2009, while the same ratio (0.6) of those with ratings between 2 and 3 (yellow line) or between 3 and 4 (red line) include wines from around 2012 or before. Curiously, however, poorly rated wines (average rating below 2, green line) also tend to be “older,” possibly due to disappointed users or that wines might have just gone bad over time.
Some interesting trends
Expert wine reviews have at times been criticized for a perceived inflation, i.e., that the preferences of suppliers for highly rated wines influences the reviews produced by experts. Naturally, the higher the rating of a wine, the greater the demand, which, according to market forces, also drives up the price. This trend has also somewhat manifested itself in the fast growing number of wines receiving a perfect score on the Robert Parker scale, which went up from 17 in 2004 to 103 in 2013.
By contrast, we find that – regardless of type, country of origin, varietal, or blend – older wines are preferred to newer wines (i.e., those produced between 1960 and 2000 vs those produced after 2000). Across all wine variables: type, country, varietal, blend, there is a decline in average ratings between 2000 and 2010, but begin to increase again for 2016 vintages. Grapes (both varietals and blends) and countries also show a similar ratings decline over the years.
Our analysis also shows that wine prices vary substantially across wine variables, in particular vintage, as expected older wines are more expensive than newer ones. Similarly, wines from specific countries, France in particular, but also the United States are more expensive.
Language on Vivino
Next, we examined the biographies and reviews of users. User biographies are tag-lines and short snippets of text, which appear on the profile page. Reviews vary from single word comments, expressing either approval or disapproval, to lengthier descriptive texts, which outline a user’s experiences of a wine. These usually detail the taste and appearance of the beverage, and suggest dishes which complement the wine and from time to time describe occasions where the wine could be served.
Only about 16,000 out of 137,500 users have biographies; but, we do find some interesting things: 683 biographies contain the word sommelier; 434 biographies contain the abbreviations WSET, N2, N3, N4 or N5 (names of professional wine tasting qualifications); 120 contain the word expert; and 135 contain the word professional. Conversely, 300 users describe themselves as amateur, 159 as learning, and 9 think of themselves as a learner. Overall, significantly more Vivino biographies contain keywords implying that users are wine experts, than those which contain keywords implying they are novices.
Looking forward
Our paper also shows that, given enough historical data obtained from Vivino, we can actually predict the average rating of a wine based on region, country, wine type, regional style, food pairings and grapes. This can be done using standard machine learning algorithm. Similarly, we can also predict the taste profile of a Vivino user. Please see the paper and/or contact us if you’d like to know more about these experiments.
Overall, our results demonstrate that there is consistency across the ratings given by Vivino users. We also show that wine ratings are not random, rather, the ratings assigned to wines by users based on informed and considered decisions. Therefore, we believe that the analysis of factors that influence wine ratings and the development of models for predicting wine reviews can help the understanding of how specialist social media platforms influence and shape our eating and drinking habits, and how we can minimize the subjectivity of online food and drinking ratings.
Nonetheless, ours was just a first step, showcasing how data science can unlock interesting insights from large social network datasets; this time in the context of wine. In the future, we hope to encourage further studies providing a finer grain understanding of wine characteristics, as well as how their geographical and historical evolution relate to consumption, rating, and price.
[a] Argentina, Australia, Brazil, Canada, China, France, Germany, Italy, Portugal, Romania, Russia, South Africa, Spain, United Kingdom, and United States. ↩
Is the Vivino dataset that you created available on Github or anywhere else? Sounds like interesting data to practice on!
Hi Emiliano, thanks for this lovely article.
Would you have the csv for the extracted dataset from vivino?
I’m studying data science with python and I’d like to practice with vivino data.
Best regards,
Rik