Dr. Steve Pickering, Brunel University London, created as part of the CROP-IT project.
Japanese prime minister Shinzo Abe recently caused a sensation by appearing at the closing ceremony of the Rio Olympics after burrowing through the earth from Tokyo in the guise of the Nintendo character Super Mario. This was a remarkable expression of soft power which clearly draws on lessons from previous Olympic games (and on which much will surely be written). At the CROP-IT project, funded by the Japan Society for the Promotion of Science and led by Atsushi Tago, we're very interested in soft power projection, and a very quick way of measuring the immediate spread of this soft power is to look at Twitter: the term "abe mario" was trending very quickly on people's tweets.
A full analysis of people's reactions to Abe's appearance would require content analysis across multiple languages: this is ongoing. However, as a quick indicator of the spread and distribution of reaction, we can use Twitter to see how many people tweeted the term "abe mario" during or after the closing ceremony, and where they are in the world. In so doing, not only will we find out more about Japanese soft power, but also about the workings of Twitter: what we can do with it, and what limitations it has.
Back in 2010, the US Library of Congress announced that it was going to archive every tweet ever sent. While a laudable aim, at the time of writing, this has yet to happen, because the Library massively underestimated the scale of such a project. Twitter themselves have for several years used "Taipei 101"s as the unit of measure for the number of tweets sent in a year (if you printed all the tweets on pieces of office paper and stacked them on top of each other, how many Taipei 101s would you reach?) and this growth has been exponential. But this is just the tweets themselves: to make a functioning, searchable archive, the Library would need all of the metadata associated with each tweet as well. Put simply, if Twitter themselves have not been able to make such an archive available, what chance the Library of Congress?
As such, we should be grateful that Twitter are giving us the data they can (unlike some other social media platforms), but recognise that what we are getting is only a sample from a larger population. As political scientists, though, this shouldn't be a problem: we are used to dealing with samples, and have developed elaborate (though not always successful) means of skewing our samples to make them more representative of the wider population.
"Hey, Twitter, send me all of the tweets in the world with the term "abe mario" in them for the past seven days."(Remember, though, that they won't send all of the tweets in the world: just a sample based on whatever secret sauce runs their algorithm.) We ask for seven days, as that's about as much as we can get: this is another restriction that Twitter impose to save their servers.
Twitter will then send us these tweets, and some of them will be geocoded. The proportion of tweets that are geocoded, though, varies massively from country to country, and from search term to search term; sometimes it can be less than one in a thousand. As such, unless the search term is incredibly popular, and the search is only concerned with one country, we should avoid using the "fishing" approach.
"Hey, Twitter, send me all of the tweets with the term "abe mario" in them within a radius of x kilometres of latitude y and longitude z."Again, Twitter will send us a sample (also using their secret sauce) of the tweets. These tweets will be based on inferred location, using factors such as the user's IP address, and as such are rather imprecise. Also, as running great circle calculations for each search would be computationally expensive, the search must be based on something more simple (such as a grid).
We can send all of these coordinates to Twitter and ask it how many tweets were received within a 25km radius of each of the capital cities in the twelve hours after Abe appeared in Rio. Figure 3 illustrates the findings.
We can see that the largest number of tweets was from, rather surprisingly, Singapore. This is probably due to Singapore's population density, but it is still a little unexpected to find more tweets from Singapore than from Tokyo.
Rank | State | Capital | Tweets |
1 | Singapore | Singapore | 992 |
2 | Japan | Tokyo | 631 |
3 | United Kingdom | London | 609 |
4 | France | Paris | 558 |
5 | USA | Washington, D.C. | 315 |
6 | Mexico | Mexico City | 135 |
7 | Philippines | Manila | 118 |
8 | Spain | Madrid | 104 |
9 | India | New Delhi | 95 |
10 | Italy | Rome | 66 |
If we run searches for "abe mario" tweets in each of these cities (again, based on a 25km radius), we get the results shown in Figure 5 and Table 2. London and Paris are the leaders (note that the numbers are a bit higher than in the earlier Table 1, as Table 1 was based on 12 hours of data, but Table 2 is based on a few days of data).
Rank | State | Capital | Tweets |
1 | United Kingdom | London | 793 |
2 | France | Paris | 709 |
3 | Spain | Madrid | 111 |
4 | Italy | Rome | 83 |
5 | Spain | Barcelona | 64 |
6 | Italy | Milan | 46 |
7 | United Kingdom | Leeds | 35 |
8 | Ireland | Dublin | 29 |
9 | The Netherlands | Amsterdam | 21 |
10 | Italy | Bologna | 20 |
The technique used here is to search for all tweets containing the term "t.co": Twitter's URL shortening service. Put simply, if you copy a web address into a tweet, Twitter will usually shorten it for you, using their t.co service.
Searches for "t.co" in the EU urban areas with a population greater than 500,000, then, gives us Figure 6 and Table 3.
London still comes top, but interestingly, Barcelona comes next, not Paris. This suggests that proportionately more "abe mario" tweets came from Paris than they did from London.
Rank | State | Capital | Tweets |
1 | United Kingdom | London | 5157 |
2 | Spain | Barcelona | 3489 |
3 | Germahy | Frankfurt | 1424 |
4 | France | Paris | 1203 |
5 | Spain | Madrid | 802 |
6 | United Kingdom | Manchester | 433 |
7 | Sweden | Stockholm | 430 |
8 | Spain | Bilbao | 349 |
9 | United Kingdom | Leeds | 299 |
10 | Greece | Athens | 274 |
When we send these coordinates to Twitter, we get the results presented in Figure 9. I've excluded national flags on this one, as a grid cell can easily encompass several different states.
As we can see, the United States now has the largest Mario. This makes sense: the grid cell with the largest number of tweets (41.5° lat, -73.5° lng) encompasses not only New York City itself (and parts of New York state), but also parts of several neighbouring states, including Connecticut, Massachusetts and New Jersey.
The next biggest cell (35.5, 138.5) includes the western part of Tokyo, and after that, we have the eastern part of Tokyo.
The next four are especially interesting. The most westerly is a point in Indonesia, in Sumatra (1.5, 100.5), while the most easterly is a point in the South China Sea (1.5, 106.5). Separated by more than 650km, Singapore lies between the two, but is at least well beyond the 100km radius of either. Yet the four cells in question contain almost exactly the same tweets. As was discussed above, Twitter must be employing some form of cell structure in their geo-coding, but exactly how this works is unclear.
This is problematic, as we see several identically sized Marios in the Americas, Africa and Asia.
Figure 10 shows the number of "abe mario" tweets sent from the capitals of each of the 47 prefectures. The distribution is a little surprising. The vast majority are in Tokyo and neighbouring Saitama (in the Figure, Saitama obscures Tokyo as they are so close together; indeed, the tweets sent in Tokyo (of which there were 1317) were virtually identical to those sent in Saitama (1302)). Three other prefectures can also be identified: Osaka, Hyogo and Kyoto. But aside from these, no other prefectures can be identified in the image. Indeed, only 24 prefectures returned any tweets, seven of them returning one tweet each (the total number of tweets in the few days after the ceremony received in the prefectural capitals was 3359).
Perhaps, though, the capitals of the prefectures are misrepresentative. What if we could try to capture tweets from the whole prefecture? If we find the centroid of each prefecture, and then draw a circle round it, capturing as much of the prefecture as possible without taking too much from neighbouring prefectures, we might be able to do this. Figure 11 illustrates these zones of varying radius.
If we send these to Twitter, we get the results presented in Figure 12. The result is similar to that in Figure 10. Tokyo is dominant, obscured here not by Saitama, but by nearby Kanagawa. Osaka is next, Kyoto has all but fallen off the map, and Hyogo has fallen off the map. 25 prefectures returned a total of 3030 tweets: similar to just using the capitals, but actually slightly less.
Again, this is odd: it is hardly representative of the population distribution in Japan (presented in Figure 13). Why might this be the case? Several possibilities emerge: