Unveiling Human Behavior with Data

"Predicting and Measuring Human Behaviour", Suzy Moat, Assistant Prof at Warwick Business School

Estimated read time: 1:20

Summary

In "Predicting and Measuring Human Behaviour," Suzy Moat explores how data from everyday digital activities can provide insights into current and future human behaviors. By analyzing internet data from sources like Google searches and Wikipedia page views, Moat and her team at Warwick Business School's Data Science Lab uncover patterns linked to economic status and future orientation. They develop innovative methods to predict stock market trends and map tourist movements using geotagged photos on Flickr. Moat also highlights a fascinating correlation between the beauty of English landscapes, as rated through a crowdsourcing project, and the self-reported health of locals, suggesting aesthetic environments may contribute to well-being. This research exemplifies the potential of data science in anticipating human actions and enhancing social insights.

Highlights

Suzy Moat explains how data from digital activities provides insights into human behavior 📱.
Google search data can reflect economic status and predict future trends 🌐.
Analyzing Wikipedia could potentially foresee stock market fluctuations 📉.
Using Flickr photos, researchers can map tourist patterns without expensive surveys 🗺️.
A crowdsourced game reveals a link between scenic beauty and health in England 🌄.

Key Takeaways

Big data from the internet is transforming how we measure and predict human behavior 📊.
Economic success might be linked to future-oriented Google searches 🌎.
Wikipedia page views can signal upcoming stock market changes 📉.
Geotagged Flickr photos help map tourist movements efficiently 📸.
Scenic beauty correlates with better health in English residents 🌳.

Overview

Suzy Moat from Warwick Business School delves into the fusion of computer science and psychology, utilizing data generated from everyday activities to explore human behavior patterns. Moat's unique background allows her to pinpoint meaningful trends from vast data sources such as Google and Wikipedia, shedding light on societal trends and future behaviors.

A fascinating breakthrough in Moat's research highlights the connection between a country's economic status and its population's internet searches regarding future years. This Internet behavior, alongside Wikipedia's page view trends, permits predictions in stock market shifts, demonstrating the untapped potential of leveraging everyday online data for economic forecasting.

Taking a creative leap, Moat's team analyzes geotagged photos on Flickr to chart tourist movements across countries, offering a cost-effective alternative to traditional surveys. Furthermore, by leveraging a crowdsource evaluation of England's scenic locations, they uncover an intriguing link between environmental beauty and citizen health, opening new avenues for public policy impacts.

Chapters

00:00 - 00:30: Introduction The chapter introduces Susie Mo, an associate professor of Behavioral Science at Warwick Business School. She shares her unique background in computer science and psychology, which was once considered an odd combination but is now recognized as valuable in today's interconnected fields.
00:30 - 01:30: The Changing Landscape of Data The chapter titled "The Changing Landscape of Data" discusses how increasingly, every aspect of our lives generates data. Whether it's calling a friend on the phone, making purchases with a credit or reward card, or using public transport with a smart card, everyday activities are contributing to data production. The focus at the Oric Business School's data science lab is to explore and analyze such data.
01:30 - 02:30: Using Internet Data to Measure and Predict Human Behavior This chapter discusses the potential of using internet data, such as search queries from Google, Wikipedia page views, and photo uploads on platforms like Flickr, as a tool to measure and predict human behavior. It explores how this data can provide insights into current activities and potentially forecast future actions.
02:30 - 04:30: Google Search Data and Future Orientation Index The chapter titled 'Google Search Data and Future Orientation Index' discusses the vast scope of data available through Google Search. The author expresses fascination with this extensive dataset, as it covers global search information. This data allows unprecedented measurement of the information people worldwide are seeking.
04:30 - 11:30: Wikipedia Data and Stock Market Analysis Using Wikipedia data to compare global interest is challenging due to language differences. An idea by colleagues was to use the year, a universal element across languages, as a key variable.
11:30 - 14:00: Flickr Data and Tourism Analysis The chapter titled 'Flickr Data and Tourism Analysis' discusses a study involving data collection from the year 2010 for 45 countries, each with at least 5 million internet users. The focus was on analyzing search behaviors for subsequent years to draw insights.
14:00 - 19:00: Scenic Beauty and Health Correlation Study The chapter explores a study on the correlation between scenic beauty and health. It discusses search trends from 2009 to 2011, showing how different countries focused on future prospects versus past data. Countries marked in blue were more interested in future events (2011), while those in red focused on the past (2009). The map provides a visual representation of global search trends, offering insight into how scenic beauty may influence health-related interests and decisions.
19:00 - 20:00: Conclusion and Summary This chapter compares the global well-being of citizens in different countries using a color-coded map. Countries colored in blue, like Germany and Switzerland, are highlighted as places where citizens are generally well off. In contrast, countries shaded in red, like India, are characterized as regions where citizens are comparatively less well off on a global scale.

"Predicting and Measuring Human Behaviour", Suzy Moat, Assistant Prof at Warwick Business School Transcription

00:00 - 00:30 so thank you very much for the invitation to be here my name is Susie Mo I'm an associate professor of Behavioral Science at Warick business school now my background is a mixture of computer science and psychology and for many years people told me this was a really strange combination but I don't have to tell all of you how much things have really
00:30 - 01:00 changed in the past few years increasingly everything we're doing is generating data from calling your friend on your phone for a chat buying some bread in the supermarket with your credit card or reward card or perhaps taking a ride on the tube with your oyster card now at oric business school I have the great pleasure of co-directing the data science lab and in our lab what we're interested in is data from the
01:00 - 01:30 internet so data on what people are looking for on Google data on what pages people are looking at on Wikipedia or data on photos that people are uploading to services such as flicker for example and what we want to know is whether we can use this data to measure what people are doing in the world right now and possibly even to predict what they're going to do in the
01:30 - 02:00 future so let me give you some examples we'll start with data from Google so one of the things we were really fascinated about there we go Works um was the was the breadth of this data it's it spans the entire world and never before have we had an opportunity to measure what PE what information people all around the world are looking
02:00 - 02:30 for the thing is there's a bit of a challenge in using this data to compare what people in different countries are looking for because people tend to search in different languages now one day my colleagues toas price Jean Stanley Steven Bishop and I had a moment of inspiration we realized there was one thing which was Universal between language is and that's the year in
02:30 - 03:00 Arabic numerals so 2015 2016 2014 for example so what we did is we took data from 2010 for 45 countries where we knew that they had at least 5 million internet users each and we measured how often they were searching for the next year to
03:00 - 03:30 2011 and how often they were searching for the previous year 2009 so on this map what you can see is that countries which are colored in blue were looking more for the next year 2011 whereas countries that are colored in red we searching more for the previous year 2009 now if you look at this map you
03:30 - 04:00 might recognize a pattern you can see that some of the countries which are colored in blue such as Germany and Switzerland are countries where we know that on a global scale their citizens are reasonably well off whereas in comparison some of the countries which are colored in red such as India are countries where we know that again on a global scale their citizens are not as well off
04:00 - 04:30 so to investigate this pattern in more detail we created what we called a future orientation index and for each of these countries we divided the number of searches we saw for the next year 2011 by the number of searches we saw for the previous year 2009 and we compared this to the in the capita GDP for for each of these
04:30 - 05:00 countries and what we saw was that internet users from countries with a higher per capita GDP tended to be searching for more information about the future now there's a number of reasons this might be the case it might be that greater economic success allows you to focus on the future or perhaps that a greater Focus on the future might support economic
05:00 - 05:30 success or another interpretation might just be that what we're measuring here is that internet users in countries with a higher per capit GDP are increasingly reliant on data on the internet to help them make decisions about what they're going to do in the future now this last point is quite important we know that generally when people are looking for information
05:30 - 06:00 online it's it's possible that they're looking for that information to try and help them make decisions about what they're going to do in the future and there's a number of areas where it would be useful for us to be able to anticipate what people were going to do in the future and one of those is the stock markets now I don't have to tell all of you that the big crash in stock market prices in 2008 had repercussions for people a long way beyond the traders who
06:00 - 06:30 were trading in those markets and so what my collaborators Tob as price Jean Stanley and I wondered was whether we might be able to find a link between changes in patterns of people looking for financial information online and subsequent stock market moves so to test this idea we created a hypothetical trading strategy trading on the Dow Jones Industrial Average and
06:30 - 07:00 this strategy worked like this we' used data on how often people had been looking at Wikipedia pages so consider a Wikipedia page such as the page about the Bank of America we'd look at how often people had looked at that page in week T and we compare it to the number of times that they'd looked at that page in the previous 3 weeks
07:00 - 07:30 if we found that people had looked less at that page than they had on average in the previous few weeks then we'd buy the Dow Jones at the beginning of the next week and sell it back a week later if on the other hand we saw they'd been looking more at that page in this week than in the previous few weeks then we'd sell Dow Jones at the beginning of the next week and buy it back a week later now we didn't just just do this
07:30 - 08:00 with the page about Bank of America we did it for pages about all 30 companies which were listed in the Dow Jones Industrial Average so companies such as JP Morgan or IBM for example and using data we had from the end of 2007 to early 2012 when we carried this analysis out we tried out our trading strategy and what we found was that indeed
08:00 - 08:30 increases in people looking at those Dow Jones company Pages tended to be followed by stock market Falls now at the time we were very lucky we were working with an excellent PhD student who I'm very happy to say is now Dr chesto and Chester said well we don't just have information on how often people have been looking at those pages we know how often they've been editing them as well and so Chester went off and collected all of that information we put that into the trading strategy as
08:30 - 09:00 well and what we found was that actually if you traded on that information the returns that you'd see were statistically no different to the returns that you'd get just trading randomly each week so either deciding to buy or to sell having a 50% chance of either and making a decision that didn't depend on any decisions that you'd made in the previous few weeks so we're excited to see this work for 30 pages but we did wonder you know
09:00 - 09:30 might it work for a bigger set of pages and so we went off and tested the trading strategy using data from 285 pages that Wikipedia users had labeled as being about General economic Concepts so the page on Capital page on wealth the page on macroeconomics for example and what we saw was this again in increases in people
09:30 - 10:00 looking at those pages tended to be followed by stock market Falls whereas there was no relationship for data on the edits now there's a number of reasons that this lack of signal for the edits might be there if you think about how you and I use Wikipedia it won't surprise you to know we've got an awful lot more data on people looking at these Pages than we have on people editing these pages and indeed in this case we had so few records of how often people have
10:00 - 10:30 been editing these pages that sometimes the strategy was unable to move alternatively it could of course be the case that there's a complete disconnect between the group of people who have been editing these pages and the group of people who have any influence on stock market moves so it had worked for two sets of pages about financial topics but we wondered would it work for just any old Wikipedia Pages because that wouldn't make an awful lot of sense so we tried
10:30 - 11:00 to trick the trading algorithm by putting in data on pages which weren't financially related at all pages about actors and filmmakers so George Clooney and Sandra bulock for example and what we found was this absolutely nothing so trading on that information would lead indeed just to what you would have found by trading randomly so we found that increase or in
11:00 - 11:30 people searching for financial information on Wikipedia could serve as up early warning signs for subsequent changes in stock market prices and the stock markets aren't the only place where we've looked at using online data to anticipate people's behavior in the future using data from Google Wikipedia Google analytics we've seen that this data can provide insight into future Behavior across a range of areas is for example helping us estimate
11:30 - 12:00 where songs are going to chart or indeed help us work out how many students we can expect to arrive at our University the next year these are all examples using data on text that people have been looking for online and we know that increasingly online information isn't just textual there's lots of photos there too so consider this picture this is a map we created simply by plotting the location of 32 million
12:00 - 12:30 photos that flicker users took and uploaded in 2012 we were looking at this map and we thought you know this gives us a lot of information about where those flicker users are and potentially if they're taking photos on holiday or as they travel around where they're going to as well and this was of interest to us because we know that the government invests a lot of in a lot of money into trying to work exactly this sort of thing out so if you've flown often
12:30 - 13:00 enough into a British airport you might at some point have been grabbed by an official with a clipboard who wanted to know where you'd spent the last 12 months and this is currently how the British office for National statistics tries to estimate the origin of tourists to the UK so we were looking at this map and we wondered could we not just look at where people had taken photos in the UK and
13:00 - 13:30 looking at photos that those people had previously taken estimate the country in which they'd spent the previous 12 months so using data from 28 countries where the on had also collected data from 2008 to 2013 that's exactly what we did we created these flicker estimates and we compared them to the 's estimates and we found that indeed there was an excellent
13:30 - 14:00 correspondence interestingly if you dig even further into the data you see that the on has spoken to 40,000 people per year with clipboards whereas we've managed to find 15,000 people a year for free just looking at this flicker data so this was interesting but it was only using metadata relating to the photos you know leaving out the most interesting part the photos themselves so consider this picture this is a picture of the Lake District in the
14:00 - 14:30 UK now I love the Lake District I grew up near the lake district and I used to really like spending time there because it was beautiful and it would make me feel better recently we were lucky to start working with a new PhD student chuki Sarah s and before chuki came to work with us she used to run a design agency and so she's very interested in how things look as well and so we wondered if we might not be able to use the increasing volume of
14:30 - 15:00 geotagged photos online to try and quantify the relationship between what a place looks like and how healthy its inhabitants feel now measuring the health of English inhabitants is thankfully quite straightforward because we ask them how healthy they consider themselves to be every 10 years as part of the census and if you if you map that data then this is
15:00 - 15:30 what it looks like so you can see that lighter areas are areas where people are less healthy and darker areas are people are places where people have reported themselves to be more healthy now we normalize for things like age you can imagine that people who are older sadly tend to say that they're less healthy so if you look at this map then you can see that there's areas like London where people aren't reporting the elves to be as healthy as they might do further away
15:30 - 16:00 from the cities so that was straightforward but we had this question how do we get large scale measurements of how beautiful different places are and this has been the barrier for policy makers and others who've been interested in this question to dat and thankfully the answer came to us in the shape of a project that my Society set up in 2009 now my Society is an excellent organization in the UK and they were
16:00 - 16:30 setting this up for completely different reasons they wanted to help people choose places to live what they did is they created a website called Scenic or not where they'd show people pictures and you could rate them as Scenic or not so if you thought it was Scenic you could give it 10 if you didn't think it was very Scenic at all you could give it one and they showed people pictures from all over England and so if we plot that data then then this is what we get so
16:30 - 17:00 that dark area up in the north is the Lake District that's basically everybody agreeing with me that it's gorgeous whereas down in the Southeast sadly London's not doing so well and if you compare this data to the ratings of Health then you see that indeed people who live in more Scenic locations report themselves to be healthier now the first thing you might wonder is well I've been pointing out the differences with the cities is this just about differences between cities and countries and Suburban areas we wondered that too so we tried
17:00 - 17:30 splitting England up into cities Suburban areas and rural areas and we saw that this relationship didn't just hold across England as a whole but in all three different kinds of areas too you might also wonder is this just a question of income perhaps richer people choose to live in more beautiful areas and they're also as a result of their wealth healthier now we have data on income and other measures of deprivation to so we
17:30 - 18:00 put those into the model as well and that's not enough to explain this relationship a final thing you might wonder looking at this picture is whether Scenic isn't just about areas being very green now we've been able to measure Green Space for a long time from satellite pictures you can take a satellite image and estimate how green different parts of the country are and if you do that and you map it out you get something which looks like this if you compare that to the the scenicus map you can see there are similarities for
18:00 - 18:30 example the Lake District up in the north is both very green and very beautiful but towards the east of England you'll see an example of differences so we'll see that that's rated as quite green but not necessarily very Scenic and so we tried putting both of these types of data into our statistical models of how healthy people were reporting themselves to be and we found that there was very evidence whatsoever
18:30 - 19:00 that we could ignore these subjective ratings of how beautiful places were across England as a whole and in urban suburban and rural areas the statistical evidence suggests that models that consider these subjective ratings of the beauty of places allow us to build much better estimates of how healthy people report themselves to be so I've given you a few examples
19:00 - 19:30 we've seen that internet users from countries with a higher paaf GDP tend to look for more information about the future increases in people looking for financial information online has been followed historically by stock market MO stock market Falls we've seen that data on photos that people post to the internet can help us estimate how people move around something which can otherwise be quite expensive and we've also seen evidence that data from a
19:30 - 20:00 crowdsource game on how beautiful different parts of England are can give us insight into how healthy people report themselves to be these are just a few examples of the results that we've been finding in our data science lab at Warick business school that data from the internet might help us measure and possibly even predict human behavior thank you very much [Applause]
20:00 - 20:30 thank you