#abe #mario: What can Twitter tell us about Japanese soft power, and what can Japanese soft power tell us about Twitter?


Figure 1: Shinzo Abe (or Super Mario) appears in Rio after drilling a tunnel through the earth from Tokyo

Dr. Steve Pickering, Brunel University London, created as part of the CROP-IT project.

Japanese prime minister Shinzo Abe recently caused a sensation by appearing at the closing ceremony of the Rio Olympics after burrowing through the earth from Tokyo in the guise of the Nintendo character Super Mario. This was a remarkable expression of soft power which clearly draws on lessons from previous Olympic games (and on which much will surely be written). At the CROP-IT project, funded by the Japan Society for the Promotion of Science and led by Atsushi Tago, we're very interested in soft power projection, and a very quick way of measuring the immediate spread of this soft power is to look at Twitter: the term "abe mario" was trending very quickly on people's tweets.

A full analysis of people's reactions to Abe's appearance would require content analysis across multiple languages: this is ongoing. However, as a quick indicator of the spread and distribution of reaction, we can use Twitter to see how many people tweeted the term "abe mario" during or after the closing ceremony, and where they are in the world. In so doing, not only will we find out more about Japanese soft power, but also about the workings of Twitter: what we can do with it, and what limitations it has.

Using Twitter for research

Twitter is just one of many social media tools that can help political scientists understand public opinion and political communication in ways which were unthinkable a decade ago, and as new platforms and new technologies emerge, and as our relationship with the internet and social media evolves, there will surely be more in the future.

Back in 2010, the US Library of Congress announced that it was going to archive every tweet ever sent. While a laudable aim, at the time of writing, this has yet to happen, because the Library massively underestimated the scale of such a project. Twitter themselves have for several years used "Taipei 101"s as the unit of measure for the number of tweets sent in a year (if you printed all the tweets on pieces of office paper and stacked them on top of each other, how many Taipei 101s would you reach?) and this growth has been exponential. But this is just the tweets themselves: to make a functioning, searchable archive, the Library would need all of the metadata associated with each tweet as well. Put simply, if Twitter themselves have not been able to make such an archive available, what chance the Library of Congress?

Fire hose, or garden hose with a foot on the pipe?

The "live feed" of Twitter data is known as the "fire hose." This is made available to Twitter internally, plus a few select external users. For the rest of us, Twitter have created an application programming interface (API). This is more like a "garden hose": not as powerful as the fire hose, and it sometimes feels like somebody is standing on the pipe. The main reason why the fire hose is not more widely available is a simple structural one: their servers just couldn't handle it. Indeed, given the amount of data involved, Twitter have been remarkably open in giving us the API.

As such, we should be grateful that Twitter are giving us the data they can (unlike some other social media platforms), but recognise that what we are getting is only a sample from a larger population. As political scientists, though, this shouldn't be a problem: we are used to dealing with samples, and have developed elaborate (though not always successful) means of skewing our samples to make them more representative of the wider population.

Two ways of searching Twitter

There are essentially two ways of running geo-coded searches on Twitter, which I like to think of as "fishing" and "focused."

Geocoded searches: fishing

A fishing expedition for geocoded tweets works basically like this. We send a query to Twitter saying:
"Hey, Twitter, send me all of the tweets in the world with the term "abe mario" in them for the past seven days."
(Remember, though, that they won't send all of the tweets in the world: just a sample based on whatever secret sauce runs their algorithm.) We ask for seven days, as that's about as much as we can get: this is another restriction that Twitter impose to save their servers.

Twitter will then send us these tweets, and some of them will be geocoded. The proportion of tweets that are geocoded, though, varies massively from country to country, and from search term to search term; sometimes it can be less than one in a thousand. As such, unless the search term is incredibly popular, and the search is only concerned with one country, we should avoid using the "fishing" approach.

Geocoded searches: focused

To run a focused geocoded search, then, we say to Twitter:
"Hey, Twitter, send me all of the tweets with the term "abe mario" in them within a radius of x kilometres of latitude y and longitude z."
Again, Twitter will send us a sample (also using their secret sauce) of the tweets. These tweets will be based on inferred location, using factors such as the user's IP address, and as such are rather imprecise. Also, as running great circle calculations for each search would be computationally expensive, the search must be based on something more simple (such as a grid).

Running a search on the geo-location of "abe mario" tweets

So where in the world are people tweeting "abe mario"?

Case study 1: capital cities

Taking the "focused" approach, let's start with all of the capital cities in the world.


Figure 2: World capital cities (click to enlarge)

We can send all of these coordinates to Twitter and ask it how many tweets were received within a 25km radius of each of the capital cities in the twelve hours after Abe appeared in Rio. Figure 3 illustrates the findings.


Figure 3: Proportion of "abe mario" tweets from within a 25km radius of world capitals (click to enlarge)

We can see that the largest number of tweets was from, rather surprisingly, Singapore. This is probably due to Singapore's population density, but it is still a little unexpected to find more tweets from Singapore than from Tokyo.

RankStateCapitalTweets
1SingaporeSingapore992
2JapanTokyo631
3United KingdomLondon609
4FranceParis558
5USAWashington, D.C.315
6MexicoMexico City135
7PhilippinesManila118
8SpainMadrid104
9IndiaNew Delhi95
10ItalyRome66

Table 1: Top ten "abe mario" tweeting capital cities

Case study 2: EU urban areas

Some capital cities are large and densely populated; others are not. As an alternative, let's try looking at all of the urban areas in the EU with a population greater than 500,000. These can be seen in Figure 4.


Figure 4: Urban areas in the EU with a population greater than 500,000 (click to enlarge)

If we run searches for "abe mario" tweets in each of these cities (again, based on a 25km radius), we get the results shown in Figure 5 and Table 2. London and Paris are the leaders (note that the numbers are a bit higher than in the earlier Table 1, as Table 1 was based on 12 hours of data, but Table 2 is based on a few days of data).


Figure 5: "abe mario" tweets sent from urban areas in the EU with a population greater than 500,000 (click to enlarge)

RankStateCapitalTweets
1United KingdomLondon793
2FranceParis709
3SpainMadrid111
4ItalyRome83
5SpainBarcelona64
6ItalyMilan46
7United KingdomLeeds35
8IrelandDublin29
9The NetherlandsAmsterdam21
10ItalyBologna20

Table 2: Top ten "abe mario" tweeting urban areas in the EU with a population over 500,000

Getting a baseline

But can we get a baseline of Twitter usage in each of the areas outlined above? Getting a baseline for Twitter is surprisingly difficult: we need to find language-independent search terms that will be used by everyone, but the term itself cannot be too popular: if it is, we are unable to download all of the tweets because of Twitter's rate limits.

The technique used here is to search for all tweets containing the term "t.co": Twitter's URL shortening service. Put simply, if you copy a web address into a tweet, Twitter will usually shorten it for you, using their t.co service.

Searches for "t.co" in the EU urban areas with a population greater than 500,000, then, gives us Figure 6 and Table 3.

London still comes top, but interestingly, Barcelona comes next, not Paris. This suggests that proportionately more "abe mario" tweets came from Paris than they did from London.


Figure 6: Twitter baseline: tweets containing "t.co" from EU urban areas with population greater than 500,000

RankStateCapitalTweets
1United KingdomLondon5157
2SpainBarcelona3489
3GermahyFrankfurt1424
4FranceParis1203
5SpainMadrid802
6United KingdomManchester433
7SwedenStockholm430
8SpainBilbao349
9United KingdomLeeds299
10GreeceAthens274

Table 3: Top ten "t.co" tweeting urban areas in the EU with a population over 500,000

"abe mario" tweets as a proportion of the baseline

As we have the "abe mario" tweets and the "t.co" tweets for the EU urban areas, we can divide the one by the other to get an indication of the number of "abe mario" tweets sent as a proportion of Twitter usage in those areas. The results, presented in Figure 7, show that Mario has returned to his spiritual homeland, Italy (although we shouldn't read too much into this, as the actual number of tweets coming from Italy is very low).


Figure 7: "abe mario" tweets as a proportion of the "t.co" baseline from EU urban areas with population greater than 500,000

Case study 3: The world as a grid

As was discussed above, the geocoding method being used in this research is the "focused" approach: saying to Twitter, "here are the coordinates, give me the tweets." However, because of the restrictions imposed by Twitter on the number of searches you can make (180 every 15 minutes) or the number of tweets you can receive (18,000 every 15 minutes), coupled with the fact that the tweets will disappear from the API after a week, we can give Twitter a relatively small number of coordinate pairs. Nevertheless, it's enough to make a grid of the world, albeit at a low resolution. The resolution chosen was 2 degrees per grid cell, which means we need to send 16,200 requests to Twitter (or, less than this, if you determine just the ones with land in them).


Figure 8: The world divided into 2 degree grid cells

When we send these coordinates to Twitter, we get the results presented in Figure 9. I've excluded national flags on this one, as a grid cell can easily encompass several different states.

As we can see, the United States now has the largest Mario. This makes sense: the grid cell with the largest number of tweets (41.5° lat, -73.5° lng) encompasses not only New York City itself (and parts of New York state), but also parts of several neighbouring states, including Connecticut, Massachusetts and New Jersey.

The next biggest cell (35.5, 138.5) includes the western part of Tokyo, and after that, we have the eastern part of Tokyo.

The next four are especially interesting. The most westerly is a point in Indonesia, in Sumatra (1.5, 100.5), while the most easterly is a point in the South China Sea (1.5, 106.5). Separated by more than 650km, Singapore lies between the two, but is at least well beyond the 100km radius of either. Yet the four cells in question contain almost exactly the same tweets. As was discussed above, Twitter must be employing some form of cell structure in their geo-coding, but exactly how this works is unclear.

This is problematic, as we see several identically sized Marios in the Americas, Africa and Asia.


Figure 9: "abe mario" tweets based on two degree grid cells

Case study 4: Japan

While the primary interest of this project is in Japanese soft power projection to the rest of the world, we should of course not rule out the domestic audience in Japan itself.

Figure 10 shows the number of "abe mario" tweets sent from the capitals of each of the 47 prefectures. The distribution is a little surprising. The vast majority are in Tokyo and neighbouring Saitama (in the Figure, Saitama obscures Tokyo as they are so close together; indeed, the tweets sent in Tokyo (of which there were 1317) were virtually identical to those sent in Saitama (1302)). Three other prefectures can also be identified: Osaka, Hyogo and Kyoto. But aside from these, no other prefectures can be identified in the image. Indeed, only 24 prefectures returned any tweets, seven of them returning one tweet each (the total number of tweets in the few days after the ceremony received in the prefectural capitals was 3359).


Figure 10: "abe mario" tweets sent within 25km of prefecture capitals

Perhaps, though, the capitals of the prefectures are misrepresentative. What if we could try to capture tweets from the whole prefecture? If we find the centroid of each prefecture, and then draw a circle round it, capturing as much of the prefecture as possible without taking too much from neighbouring prefectures, we might be able to do this. Figure 11 illustrates these zones of varying radius.


Figure 11: Zones of varying radius, based on prefecture centroids

If we send these to Twitter, we get the results presented in Figure 12. The result is similar to that in Figure 10. Tokyo is dominant, obscured here not by Saitama, but by nearby Kanagawa. Osaka is next, Kyoto has all but fallen off the map, and Hyogo has fallen off the map. 25 prefectures returned a total of 3030 tweets: similar to just using the capitals, but actually slightly less.

Again, this is odd: it is hardly representative of the population distribution in Japan (presented in Figure 13). Why might this be the case? Several possibilities emerge:


Figure 12: "abe mario" tweets sent within the zones of varying radius, based on prefecture centroids


Figure 13: Prefecture population

Summary

So where does this leave us? Well, we can make some preliminary findings: For further understanding, we need to turn to content analysis. That comes next for the CROP-IT project!