Nearly all of the mobile devices that are available for purchase these days have GPS location services baked into them. In particular, running apps are interesting, since they can plot your position over time, calculate your pace, and show you where you were fast or slow along your route. I’m also particularly fond of what was once called Google Local. Being able to find restaurants near my location has been quite a handy feature when out travelling in new locations.
I have a few ideas surrounding mobile applications and GPS services, so I wanted to check out what kind of data is freely available. While the Google Location API is available for development on the Android platform, I wanted to know more about freely available alternatives, as well as get to know more about GIS in general.
The OpenStreetMap project is a community driven project that attempts to provide location data all over the world. Contributors with GPS devices tag various locations, and can add them to the database. The database is released under the Open Data Commons Open Database License. Warning: I am by no means a lawyer, so you may want to read the text of the license yourself so that you understand it. The license for the OpenStreetMap database allows you to use it for commercial and personal use, but requires you to attribute the source, and release any modified version of the database back to the community. For me this is perfect, since it lets me experiment with various applications and GIS data without having to think about commercial licenses, closed APIs, or commercial data sources.
The first question was: what database system is required to load and access all of that data? Some form of SQL database was favorable, since I know how to run SQL queries, and configuring PostgreSQL or MySQL is in my wheelhouse. As it turns out, there is an extension for PostgreSQL called PostGIS which is designed specifically for storing geographical and spatial objects within a database. Perfect.
Note: most of the installation instructions below were culled from a variety of sources such as here, here, and here. These sources detail how to set up a fully-fledged tile server, which will allow you to draw maps from the data contained within the database. In this post, I’m just interested in the PostGIS database, and loading OpenStreetMap data. I’ll talk about tile servers in a future post.
As it turns out installing PostGIS is incredibly easy under Ubuntu 14.04. Assuming that you don’t have a PostgreSQL server installed yet:
sudo apt-get install postgresql-9.3 postgresql-server-dev-9.3 postgresql-contrib-9.3 postgresql-9.3-postgis-2.1
Once installed, you will need to set up a database, and load some extensions into that database. To do that:
sudo -u postgres -i createuser gis createdb -E UTF8 -O gis gis psql -f /usr/share/postgresql/9.3/contrib/postgis-2.1/postgis.sql -d gis psql -f /usr/share/postgresql/9.3/contrib/postgis-2.1/spatial_ref_sys.sql -d gis psql -f /usr/share/postgresql/9.3/contrib/postgis-2.1/postgis_comments.sql -d gis echo "alter table geometry_columns owner to gis; alter table spatial_ref_sys owner to gis;" | psql -d gis echo "create extension hstore;" | psql -d gis
sudo command switches users to the
postgres user so that we can run the database commands. The
createuser command will create a
gis user in the database, while the
createdb command will create a database named
gis and add the
gis user to it. The commands that follow set up the necessary extensions to store GIS information in the database. It will also enable the
hstore extension, which I’ll talk about a little later on in this post.
Importing OpenStreetMap Data
With the database ready to go, we need to load it with some actual data. You can download the entire world map at a whopping 35 GB from the Planet OSM site. However, if you are like me, you just want to experiment a little with GIS data, and probably only want a small portion of the database to start with. Geofabrik maintains a site with much smaller chunks of the world data. For example, I was able to download British Columbia map data separately, at the cost of 500 MB. You will want to download the data in the
While you are downloading the data for your region, you will also need to download and install the osm2pgsql importer. This open source tool is responsible for importing from the downloaded file format into your new PostGIS database. You can either build the tool from source, or if you are like me, simply install the package:
sudo apt-get install osm2pgsql
With everything downloaded, it’s time to actually run the import:
sudo -u postgres -i osm2pgsql -c -d gis -U gis -W -C 2048 --hstore --slim ~/british-columbia-latest.osm.bz2
This runs the actual import. There are several options specified on the
osm2pgsql command – let’s break down what is happening:
-ctells the tool to clear out the database before running the import.
-dlets us specify the database we want to load into, in this case the database
-Uspecifies what user we should use during the import, in this case, we’re using the
gisuser we created before.
-Ctells the system to use 2 GB of RAM for caching purposes. If you’re importing bigger data sets, you may need to make this larger.
--hstoretells the import to create an additional
tagscolumn (I’ll talk more about this below).
--slimtells the import to use database tables to store temporary data. This slows down the import, but is good for machines with little RAM.
~/british-columbia-latest.osm.bz2is the path to the actual data to load.
The tool will probably take some time to finish. On my Intel Dual Core i5 with 4 GB of RAM and a slow disk, this took about 64 minutes to complete. Once the process is over, I strongly recommend creating indexes for the tags on each of the tables. You can do that with the following commands:
sudo -u postgres -i psql gis CREATE INDEX idx_planet_osm_point_tags ON planet_osm_point USING gist(tags); CREATE INDEX idx_planet_osm_polygon_tags ON planet_osm_polygon USING gist(tags); CREATE INDEX idx_planet_osm_line_tags ON planet_osm_line USING gist(tags); \q
The command simply logs into the
gis database, and creates indexes for the
tags column on the tables
Querying the Database
Alright, with data in the database, now it’s time to actually run some queries on the OpenStreetMap data. The
planet_osm_point table contains useful information relating to various points of interest. The schema for the
planet_osm_point table contains quite a few fields such as:
Each of the fields can have a wide variety of different values associated with it. For example, the
amenity field can take on values such as:
Luckily, the OpenStreetMap project maintains an online wiki with all of this information described in it. For example, they have a description of the amenity field which defines what values will be found on it. We can use this information to write some simple SQL statements. For example, say you wanted to find libraries in the database:
SELECT name, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext FROM planet_osm_point WHERE amenity='library';
Or maybe you wanted to find restaurants:
SELECT name, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext FROM planet_osm_point WHERE amenity='restaurant';
This will return records that are considered to be restaurants, showing the name (if it has one), and the coordinates where it can be found. That’s good, but let’s say that you are only interested in finding pizzerias. Trudging through all of the restaurant data looking for what seems to be a pizzeria would take too long. This is where additional tags come into play.
Tags and the HSTORE Column Type
Additional information for each point is stored in the
tags column. Each point may have one or more tags associated with it, describing things such as address, hours of operation, phone number, etc. The
tags of each point are stored as an HSTORE column. The HSTORE column type is simply a set of keys and their associated values normally called a key, value pair. A single record may have only one entry with the specified key, but may have many key, value pairs. A typical key, value pair is specified as follows:
There are several helpful functions in PostgreSQL that help you deal with HSTOREs. For example, to check if the HSTORE column
tags contains a
phone key, regardless of its value:
tags ? 'phone'
To check if the HSTORE column
tags has a key of
cuisine and value of
tags @> 'cuisine=>fast_food'
Queries with Tags
Putting together the
tags column and some common PostgreSQL syntax results in some more selective queries. For example, to find all the restaurants that are pizzerias:
SELECT name, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext FROM planet_osm_point WHERE amenity='restaurant' AND tags @> 'cuisine=>pizza';
You have to be careful however, since the
restaurant classification is only one of many food related classifications in the
amenity ontology. For example, there is also:
I was surprised that Starbucks was under a different
amenity type –
cafe. At first, I was restricting my search to those elements that had the
cuisine key, with
coffee_shop as the value:
SELECT name, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext FROM planet_osm_point WHERE amenity='cafe' AND tags @> 'cuisine=>coffee_shop' AND name='Starbucks';
Note however, that not all Starbucks have a
cuisine tag associated with them, so sometimes you may have to modify your query to be less specific. For example, you can see here the difference in number of records returned when we search just for the name, and when we start to add more restrictions on the query:
gis=# SELECT COUNT(*) FROM planet_osm_point WHERE name='Starbucks'; COUNT ------- 122 (1 ROW) gis=# SELECT COUNT(*) FROM planet_osm_point WHERE name='Starbucks' AND amenity='cafe'; COUNT ------- 121 (1 ROW) gis=# SELECT COUNT(*) FROM planet_osm_point WHERE name='Starbucks' AND amenity='cafe' AND tags @> 'cuisine=>coffee_shop'; COUNT ------- 25 (1 ROW)
There is quite a bit of information available as tags. For example, perhaps you’re interested in finding restaurants that have phone numbers associated with them in the database:
SELECT name, tags, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext FROM planet_osm_point WHERE amenity='restaurant' AND tags ? 'phone';
Or maybe you are looking for a particular restaurant along Yates Street:
SELECT name, tags, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext FROM planet_osm_point WHERE amenity='restaurant' AND tags @> 'addr:street=>Yates\ Street';
As you can see, it’s really simple to perform some basic queries.
In this post, we looked at how to set up a PostGIS database to store OpenStreetMap data. We also looked at how to perform some basic queries using the provided fields, as well as tags. In a future post, I will discuss tile servers, and how to serve actual map images.