OpenStreetMap and PostGIS
Posted on Thu 03 July 2014 in OpenStreetMap
Nearly all of the mobile devices that are available for purchase these days have GPS location services baked into them. In particular, running apps are interesting, since they can plot your position over time, calculate your pace, and show you where you were fast or slow along your route. I’m also particularly fond of what was once called Google Local. Being able to find restaurants near my location has been quite a handy feature when out travelling in new locations.
I have a few ideas surrounding mobile applications and GPS services, so I wanted to check out what kind of data is freely available. While the Google Location API is available for development on the Android platform, I wanted to know more about freely available alternatives, as well as get to know more about GIS in general.
OpenStreetMap
The OpenStreetMap project is a community driven project that attempts to provide location data all over the world. Contributors with GPS devices tag various locations, and can add them to the database. The database is released under the Open Data Commons Open Database License. Warning: I am by no means a lawyer, so you may want to read the text of the license yourself so that you understand it. The license for the OpenStreetMap database allows you to use it for commercial and personal use, but requires you to attribute the source, and release any modified version of the database back to the community. For me this is perfect, since it lets me experiment with various applications and GIS data without having to think about commercial licenses, closed APIs, or commercial data sources.
Enter PostGIS
The first question was: what database system is required to load and access all of that data? Some form of SQL database was favorable, since I know how to run SQL queries, and configuring PostgreSQL or MySQL is in my wheelhouse. As it turns out, there is an extension for PostgreSQL called PostGIS which is designed specifically for storing geographical and spatial objects within a database. Perfect.
Note: most of the installation instructions below were culled from a variety of sources such as here, here, and here. These sources detail how to set up a fully-fledged tile server, which will allow you to draw maps from the data contained within the database. In this post, I’m just interested in the PostGIS database, and loading OpenStreetMap data. I'll talk about tile servers in a future post.
As it turns out installing PostGIS is incredibly easy under Ubuntu 14.04. Assuming that you don't have a PostgreSQL server installed yet:
sudo apt-get install postgresql-9.3 postgresql-server-dev-9.3 postgresql-contrib-9.3 postgresql-9.3-postgis-2.1
Once installed, you will need to set up a database, and load some extensions into that database. To do that:
sudo -u postgres -i
createuser gis
createdb -E UTF8 -O gis gis
psql -f /usr/share/postgresql/9.3/contrib/postgis-2.1/postgis.sql -d gis
psql -f /usr/share/postgresql/9.3/contrib/postgis-2.1/spatial_ref_sys.sql -d gis
psql -f /usr/share/postgresql/9.3/contrib/postgis-2.1/postgis_comments.sql -d gis
echo "alter table geometry_columns owner to gis; alter table spatial_ref_sys owner to gis;" | psql -d gis
echo "create extension hstore;" | psql -d gis
The sudo
command switches users to the postgres
user so that we can run the database commands. The createuser
command will create a gis user in the database, while the createdb
command will create a database named gis
and add the gis user to it. The commands that follow set up the necessary extensions to store GIS information in the database. It will also enable the hstore
extension, which I'll talk about a little later on in this post.
Importing OpenStreetMap Data
With the database ready to go, we need to load it with some actual data. You can download the entire world map at a whopping 35 GB from the Planet OSM site. However, if you are like me, you just want to experiment a little with GIS data, and probably only want a small portion of the database to start with. Geofabrik maintains a site with much smaller chunks of the world data. For example, I was able to download British Columbia map data separately, at the cost of 500 MB. You will want to download the data in the .osm.bz2
format.
While you are downloading the data for your region, you will also need to download and install the osm2pgsql importer. This open source tool is responsible for importing from the downloaded file format into your new PostGIS database. You can either build the tool from source, or if you are like me, simply install the package:
sudo apt-get install osm2pgsql
With everything downloaded, it’s time to actually run the import:
sudo -u postgres -i
osm2pgsql -c -d gis -U gis -W -C 2048 --hstore --slim ~/british-columbia-latest.osm.bz2
This runs the actual import. There are several options specified on the osm2pgsql
command - let's break down what is happening:
-c
tells the tool to clear out the database before running the import.-d
lets us specify the database we want to load into, in this case the database gis.-U
specifies what user we should use during the import, in this case, we're using the gis user we created before.-C
tells the system to use 2 GB of RAM for caching purposes. If you're importing bigger data sets, you may need to make this larger.--hstore
tells the import to create an additional tags column (I'll talk more about this below).--slim
tells the import to use database tables to store temporary data. This slows down the import, but is good for machines with little RAM.~/british-columbia-latest.osm.bz2
is the path to the actual data to load.
The tool will probably take some time to finish. On my Intel Dual Core i5 with 4 GB of RAM and a slow disk, this took about 64 minutes to complete. Once the process is over, I strongly recommend creating indexes for the tags on each of the tables. You can do that with the following commands:
sudo -u postgres -i psql gis
CREATE INDEX idx_planet_osm_point_tags ON planet_osm_point USING gist(tags);
CREATE INDEX idx_planet_osm_polygon_tags ON planet_osm_polygon USING gist(tags);
CREATE INDEX idx_planet_osm_line_tags ON planet_osm_line USING gist(tags);
\q
The command simply logs into the gis
database, and creates indexes for the tags column on the tables planet_osm_point
, planet_osm_polygon
and planet_osm_line
.
Querying the Database
Alright, with data in the database, now it's time to actually run some queries on the OpenStreetMap data. The planet_osm_point
table contains useful information relating to various points of interest. The schema for the planet_osm_point
table contains quite a few fields such as:
amenity
barrier
bicycle
foot
horse
highway
Each of the fields can have a wide variety of different values associated with it. For example, the amenity field can take on values such as:
restaurant
school
library
ferry_terminal
fuel
dentist
Luckily, the OpenStreetMap project maintains an online wiki with all of this information described in it. For example, they have a description of the amenity field which defines what values will be found on it. We can use this information to write some simple SQL statements. For example, say you wanted to find libraries in the database:
SELECT name, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext
FROM planet_osm_point
WHERE amenity='library';
Or maybe you wanted to find restaurants:
SELECT name, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext
FROM planet_osm_point
WHERE amenity='restaurant';
This will return records that are considered to be restaurants, showing the name (if it has one), and the coordinates where it can be found. That's good, but let's say that you are only interested in finding pizzerias. Trudging through all of the restaurant data looking for what seems to be a pizzeria would take too long. This is where additional tags come into play.
Tags and the HSTORE Column Type
Additional information for each point is stored in the tags column. Each point may have one or more tags associated with it, describing things such as address, hours of operation, phone number, etc. The tags of each point are stored as an HSTORE
column. The HSTORE
column type is simply a set of keys and their associated values normally called a key, value pair. A single record may have only one entry with the specified key, but may have many key, value pairs. A typical key, value pair is specified as follows:
KEY=>VALUE
There are several helpful functions in PostgreSQL that help you deal with HSTORE
s. For example, to check if the HSTORE
column tags contains a phone key, regardless of its value:
tags ? 'phone'
To check if the HSTORE
column tags has a key of cuisine
and value of fast_food
:
tags @> 'cuisine=>fast_food'
Queries with Tags
Putting together the tags column and some common PostgreSQL syntax results in some more selective queries. For example, to find all the restaurants that are pizzerias:
SELECT name, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext
FROM planet_osm_point
WHERE amenity='restaurant' AND tags @> 'cuisine=>pizza';
You have to be careful however, since the restaurant classification is only one of many food related classifications in the amenity ontology. For example, there is also:
pub
ice_cream
cafe
food_court
I was surprised that Starbucks was under a different amenity type - cafe. At first, I was restricting my search to those elements that had the cuisine key, with
coffee_shop
as the value:
SELECT name, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext
FROM planet_osm_point
WHERE amenity='cafe' AND tags @> 'cuisine=>coffee_shop' AND name='Starbucks';
Note however, that not all Starbucks have a cuisine
tag associated with them, so sometimes you may have to modify your query to be less specific. For example, you can see here the difference in number of records returned when we search just for the name, and when we start to add more restrictions on the query:
gis=# SELECT COUNT(*)
FROM planet_osm_point
WHERE name='Starbucks';
COUNT
-------
122
(1 ROW)
gis=# SELECT COUNT(*)
FROM planet_osm_point
WHERE name='Starbucks' AND amenity='cafe';
COUNT
-------
121
(1 ROW)
gis=# SELECT COUNT(*)
FROM planet_osm_point
WHERE name='Starbucks' AND amenity='cafe' AND tags @> 'cuisine=>coffee_shop';
COUNT
-------
25
(1 ROW)
There is quite a bit of information available as tags. For example, perhaps you're interested in finding restaurants that have phone numbers associated with them in the database:
SELECT name, tags, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext
FROM planet_osm_point
WHERE amenity='restaurant' AND tags ? 'phone';
Or maybe you are looking for a particular restaurant along Yates Street:
SELECT name, tags, ST_AsText(ST_Transform(way,4326)) AS pt_lonlattext
FROM planet_osm_point
WHERE amenity='restaurant' AND tags @> 'addr:street=>Yates\ Street';
As you can see, it’s really simple to perform some basic queries.
Conclusion
In this post, we looked at how to set up a PostGIS database to store OpenStreetMap data. We also looked at how to perform some basic queries using the provided fields, as well as tags. In a future post, I will discuss tile servers, and how to serve actual map images.