Tel +49 (30) 814504070

Willi Schönborn
24.01.2010 13:13 Uhr

Free geo data solutions compared: GeoNames.org vs. Yahoo! GeoPlanet

Tags:

Yahoo! recently (eight months ago) released big parts of their GeoPlanet data. We successfully integrated the free dataset from GeoNames.org into TheLabelFinder platform and thought the GeoPlanet data is worth being reviewed and compared. This artice is the result of our research.

Geo data for free!

geonames.jpg

Both GeoNames and GeoPlanet provides Webservice Interfaces. GeoNames offers a REST API while GeoPlanet uses SOAP. But relying on third party webservices can sometimes be tricky. The connection might be slow, unstable or even totally broken because of any kind of server error. This is why we prefer integrating the data into our own databases and applications.

The actual downloads can be found on GeoNames.org and Yahoo! GeoPlanet. Both data sets are licensed under the Creative Commons Attribution License.

GeoMatrix

GeoNames and GeoPlanet have a lot in common. The following list shows features which are roughly the same or at least comparable.

  • Unique, constant identifier
  • ISO 3166-1 alpha-2 country codes
  • Classification
  • Additonal names per language
    • Very useful for localising applications
    • Allow to associate multiple names in different languages and of different types with the same place

As you can see, their concepts overlap, but as always: The devil is in the details. The following table shows the most important differences.

GeoNames.org Yahoo! GeoPlanet
Geo coordinates yes no
Structure flat hierarchical
Neighboring no yes

The biggest eye-catching disadvantage of the GeoPlanet data is: It doesn't come with geo coordinates. That's really sad, because the webservices they offer not only does support that but also provides the bounding box of a given place which would be a really handy feature.

But in contrast to GeoNames, GeoPlanet excels at structuring the data. GeoNames' structure is flat. No record (GeoNames calls them toponyms) knows about its surrounding location. E.g. Berlin does not know about Germany, which itself doesn't know about Europe and so forth. GeoPlanet records, or places as they are called by Yahoo!, always (except one) have a reference to its parent place and therefore offer relations between places like the following:

  • Parent (direct surrounding place)
  • Child (direct sub-places)
  • Siblings (places sharing the same parent and place type)
  • Ancestors (set of all parents)

If you take e.g. our company's district you will get the following family tree:

  • Deutschland (WOEID1) 23424829)
    • Bundesland Berlin (WOEID 2345496)
      • Stadtkreis Berlin (WOEID 1259838)
        • Berlin (WOEID 638242)
          • Ortsteil Pankow (WOEID 26821868)
          • Ortsteil Prenzlauer Berg (WOEID 26821872)
          • Ortsteil Wedding (WOEID 26821851)
          • Ortsteil Tempelhof (WOEID 26821861)

Their is a fifth relationship GeoPlanet offers:

  • Neighbors (adjacent places)

Back to our local district example, this would be:

  • Ortsteil Prenzlauer Berg (WOEID 26821872)
  • Ortsteil Mitte (WOEID 26821864)
  • Ortsteil Tiergarten (WOEID 26821854)
  • Ortsteil Friedrichshain (WOEID 26821877)
  • Ortsteil Weißensee (WOEID 26821880)
  • Ortste… you got the point, right?

In case you didn't already guess which place has no parent place in the GeoPlanet data:

It's Earth (WOEID 1).

Quality

As meantioned before, we actively using the GeoNames data in a production critical environment and are very happy with it. The data quality suits our needs, but as long as we didn't work with the GeoPlanet dataset it would be unfair trying to judge them in this sector. If anybody already has some experiences, or explicit examples: Please let us know.

Quantity

We can't really compare data quality, but what we can do is comparing quantity. Might not be useful but it was easy to collect, so here you go:

GeoNames.org Yahoo! GeoPlanet
Places/Toponyms 7,069,291 5,332,310
Aliases/Alternate Names 2,928,296 1,950,735
Neighbors n/a 8,521,075
Size (all files, unzipped) 882M 504M

Ergo?

GeoPlanet's hierarchical structuring looks promising and allows thinking about some really neat features. Then again the lack of important basic information like geo coordinates really upsets me. If Yahoo! considers integrating center and bounding box information, the GeoPlanet data would be a real competitor to GeoNames.

GeoNames on the other hand is a community driven project. The data might (who knows?) be not as good as their GeoPlanet counterpart but you are free to register and change it yourself. Many people (reasonably) fear a dependency on a big company like Yahoo! or Google. In that case GeoNames might be the better choice.

1) Where on Earth ID
Bookmark and Share

Comments

Older Comments

Cd-MaN
2010/01/25 14:40

Hello.

You might also want to check out Nominatim from the Open Street Map project: http://wiki.openstreetmap.org/wiki/Nominatim

Regards.

Dani
2010/01/25 19:42

yahoo geo prodives coordinates:

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20geo.places%20where%20text%3D%22sfo%22&format=xml

Willi
2010/01/25 21:02

@Dani: Thats the webservice, my article is about the free download.

Sara
2010/11/23 17:48

The quality of Geoplanet is good but there are plenty of duplicates in geographical areas where English is not the primary language. So it seems that they have combined data from several sources and as we all know there are several ways of spelling the name of a place… when the spelling is different they are identified as different places. So then you need to come up with algo's to remove “quasi duplicates” but that is really hard and it messes up the structure of the database as everything is linked through parent ID's.

About CosmoCode

CosmoCode is a Berlin based IT service provider with a strong emphasis on web applications. We mainly focus on Content Management Systems, Wikis and custom solutions.

Subscribe

Subscribe Like our blog? Stay up to date via RSS
Freie Stelle: Forschungsassistent Freie Stelle: Forschungsassistent