I’ve just completed importing the OpenStreetMap data into SQL/Server 2008 using the new geography data types. Originally I linked the data to Manifold GIS to verify the import process but have just started on my first real project using the data which is address geocoding for a GPS tracking site located in Australia. The OSM data doesn’t contain enough street number information to be useful for most areas but the aim is to provide nearest street names at no charge, versus some local commercial geocoding services that charge around ten cents per address.
Loading the OSM ‘ways’ as a series of around 250,000 linestrings seems to be giving a performance of around 40 geocoding operations per second. As you’d expect performance of the STDistance method depends heavily on the distance, so in the SQL/Server stored procedure I first start by locating streets within 100 meters, followed by 1Km, 10Km and finally 100Km. The series of queries only attempts to find street names not matched by the previous query, and for Australia these numbers seem to give a match rate of about 90%, 9%, 1% and not much for the final distance that results in good performance.
For optimisation of the spatial index I found that changing the default of medium for each level in the index grid to high gave approximately a 100% increase in performance. It makes sense that the finest level of granularity would work best when looking for the nearest distance to points. The following was what I ended up using for the spatial index definition, changing the cells per object didn’t seem to have a great deal of effect for this dataset / application so I left it at the default value of 16.
CREATE SPATIAL INDEX [spatial_IX] ON [dbo].[Ways] ( [LineString] ) USING GEOGRAPHY_GRID WITH ( GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH), CELLS_PER_OBJECT = 16, PAD_INDEX = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]