The problem

Part of our study in Internet usage in Africa, we want to investigate how close popular content is to African users. Therefore we need to accurately pinpoint geographical locations of servers that we captured in the traces (~20 Million IPs). This turned out to be very challenging as relying on standard GeoIP databases (e.g. MaxMind GeoLite2) would lead to wrong results because:

1. These databases use registration information for location mapping (i.e. IPs registered by Google would be incorrectly located in the U.S.)
2. Anycast routing traffic is sent to the nearest destination (i.e. DNS queries to 8.8.8.8 are routed to the closet server, which is dependent on the location of clients).
3. Many sites use content delivery networks (e.g. Akamai) for which the corresponsing geo location is also dependent on the location of clients.

Example

Take for example one of Google's registered IPs 216.58.223.46. If we search MaxMind GeoIP2 City database for this IP, we get its corresponding location as "Mountain View, California, USA". As MaxMind is using registration detail to establish this mapping, the reported location would not reflect the actual physical location of 216.58.223.46 in South Africa.

MaxMind

If we do reverse DNS lookup for 216.58.223.46, we get back jnb01s08-in-f14.1e100.net. Notice the encoded location "jnb" in the hostname, which suggests that probably this server is located in Johannesburg, South Africa.

Pinging this IP from a location in Johannesburg (using Seacom Looking Glass), we get small RTT. This confirms the location of 216.58.223.46 to be South Africa.

216.58.223.46 ping from South Africa

Challenges

1. RTT triangulation theoretically should solve this problem. However if we rely on RTT triangulation in regions such as Africa, we will get wrong results. This is mainly because ISPs operating in Africa choose to peer in Europe exchange points for economic reasons and therefore a connection between two endpointsin the region would typically take a convoluted route. Take for example a route from South Africa to Egypt that goes via India and France!

Traceroute from South Africa to Egypt

2. If we have the full traceroute path, we could inspect when the path reaches a regional ISP. From the ISP name or even using MaxMind we can guess where is physically the destination IP. However Google has its own network [The tier 1 network that isn't], meaning that routers IPs along the path are registered by Google. And again using MaxMind, we will get them all to be in the U.S.

Our solution (Google IPs)

The brute force way is to ping the destination IP from different locations until we reach to a minimal RTT, and then we are done! It looks a simple task but not scalable unless it can be automated somehow. We need a node virtually per country from which we can execute a ping request. In fact the only option available to us is to use ISPs looking glass web interface, which is useful but manual. We have therefore to reduce the number of destination IPs that we need to check for this to be tractable in reasonable time.

Let's focus on Google. Google IPs belong to servers that are hosted in their datacentres and there is a limited set of routes from a specific node to a Google datacentre. Therefore we do traceroute for all Google IPs from a node in Africa and group these IPs by the route taken. It is then enough to try to geolocate one destination IP from each group (by searching for the minimal ping RTT). It is now doable with manual inspection using ISPs looking glass web interface. It turned out that in East Africa (where we are doing our case study), there are just tens of these distinct routes (to ~200 destination IPs).

A few IPs from Google have encoded geographical locations (airport codes) in their domain names as shown above. We use this information as well to get a better understanding on the corresponding locations.

Our solution (the rest)

As for non-Google IPs, the task is "easier". This is because we can take a look at the traceroute result to each IP and check when the path enters the regional ISP. From this, we can have an idea about the location of the end destination. In fact we use MaxMind to geolocate IPs for the regional ISPs (this is what it was built for after all!). We still had to do the traceroutes from a node in Kenya so that we capture the effect of anycast routing. The good news is that this process can be automated (unlike Google IPs).

Results (selected)

Here I plot IPs belonging to Google, Facebook and Amazon (normalised by the corresponding download size seen in the Rwanda traces/February 2015). A picture is worth a thousand words at this point!

Google:

Google

Facebook:

Facebook

Amazon (CloudFront):

Amazon

Moving forward

I think it is a very interesting challenge to attempt to geolocate destination IPs at a global scale. There isn't really a tool that I could find for geolocating destination/server IPs accurately. As there is a lot of fussiness involved, a machine learning technique would be useful to narrow down the search area but at the end of the day an active measurement is probably required.

Contact

Sherif Akoush, sherif [dot] akoush [at] gmail [dot] com
Last modified in July 2016.