CosmoCode is a Berlin based IT service provider focusing on CMS, Wikis and Web2.0
Great software. Bright people. Happy customers!
Mail info@cosmocode.deTel +49 (30) 814504070
Yahoo! recently (eight months ago) released big parts of their GeoPlanet data. We successfully integrated the free dataset from GeoNames.org into TheLabelFinder platform and thought the GeoPlanet data is worth being reviewed and compared. This artice is the result of our research.
Both GeoNames and GeoPlanet provides Webservice Interfaces. GeoNames offers a REST API while GeoPlanet uses SOAP. But relying on third party webservices can sometimes be tricky. The connection might be slow, unstable or even totally broken because of any kind of server error. This is why we prefer integrating the data into our own databases and applications.
The actual downloads can be found on GeoNames.org and Yahoo! GeoPlanet. Both data sets are licensed under the Creative Commons Attribution License.
GeoNames and GeoPlanet have a lot in common. The following list shows features which are roughly the same or at least comparable.
As you can see, their concepts overlap, but as always: The devil is in the details. The following table shows the most important differences.
| GeoNames.org | Yahoo! GeoPlanet | |
|---|---|---|
| Geo coordinates | yes | no |
| Structure | flat | hierarchical |
| Neighboring | no | yes |
The biggest eye-catching disadvantage of the GeoPlanet data is: It doesn't come with geo coordinates. That's really sad, because the webservices they offer not only does support that but also provides the bounding box of a given place which would be a really handy feature.
But in contrast to GeoNames, GeoPlanet excels at structuring the data. GeoNames' structure is flat. No record (GeoNames calls them toponyms) knows about its surrounding location. E.g. Berlin does not know about Germany, which itself doesn't know about Europe and so forth. GeoPlanet records, or places as they are called by Yahoo!, always (except one) have a reference to its parent place and therefore offer relations between places like the following:
If you take e.g. our company's district you will get the following family tree:
Their is a fifth relationship GeoPlanet offers:
Back to our local district example, this would be:
In case you didn't already guess which place has no parent place in the GeoPlanet data:
It's Earth (WOEID 1).
As meantioned before, we actively using the GeoNames data in a production critical environment and are very happy with it. The data quality suits our needs, but as long as we didn't work with the GeoPlanet dataset it would be unfair trying to judge them in this sector. If anybody already has some experiences, or explicit examples: Please let us know.
We can't really compare data quality, but what we can do is comparing quantity. Might not be useful but it was easy to collect, so here you go:
| GeoNames.org | Yahoo! GeoPlanet | |
|---|---|---|
| Places/Toponyms | 7,069,291 | 5,332,310 |
| Aliases/Alternate Names | 2,928,296 | 1,950,735 |
| Neighbors | n/a | 8,521,075 |
| Size (all files, unzipped) | 882M | 504M |
GeoPlanet's hierarchical structuring looks promising and allows thinking about some really neat features. Then again the lack of important basic information like geo coordinates really upsets me. If Yahoo! considers integrating center and bounding box information, the GeoPlanet data would be a real competitor to GeoNames.
GeoNames on the other hand is a community driven project. The data might (who knows?) be not as good as their GeoPlanet counterpart but you are free to register and change it yourself. Many people (reasonably) fear a dependency on a big company like Yahoo! or Google. In that case GeoNames might be the better choice.
yahoo geo prodives coordinates:
http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20geo.places%20where%20text%3D%22sfo%22&format=xml
@Dani: Thats the webservice, my article is about the free download.
The quality of Geoplanet is good but there are plenty of duplicates in geographical areas where English is not the primary language. So it seems that they have combined data from several sources and as we all know there are several ways of spelling the name of a place… when the spelling is different they are identified as different places. So then you need to come up with algo's to remove “quasi duplicates” but that is really hard and it messes up the structure of the database as everything is linked through parent ID's.
About CosmoCode
Subscribe
Cd-MaN
2010/01/25 14:40
Hello.
You might also want to check out Nominatim from the Open Street Map project: http://wiki.openstreetmap.org/wiki/Nominatim
Regards.