Geocoding using OSM data and SQL/Server 2008

I’ve just completed importing the OpenStreetMap data into SQL/Server 2008 using the new geography data types. Originally I linked the data to Manifold GIS to verify the import process but have just started on my first real project using the data which is address geocoding for a GPS tracking site located in Australia. The OSM data doesn’t contain enough street number information to be useful for most areas but the aim is to provide nearest street names at no charge, versus some local commercial geocoding services that charge around ten cents per address.

Loading the OSM ‘ways’ as a series of around 250,000 linestrings seems to be giving a performance of around 40 geocoding operations per second. As you’d expect performance of the STDistance method depends heavily on the distance, so in the SQL/Server stored procedure I first start by locating streets within 100 meters, followed by 1Km, 10Km and finally 100Km. The series of queries only attempts to find street names not matched by the previous query, and for Australia these numbers seem to give a match rate of about 90%, 9%, 1% and not much for the final distance that results in good performance.

For optimisation of the spatial index I found that changing the default of medium for each level in the index grid to high gave approximately a 100% increase in performance. It makes sense that the finest level of granularity would work best when looking for the nearest distance to points. The following was what I ended up using for the spatial index definition, changing the cells per object didn’t seem to have a great deal of effect for this dataset / application so I left it at the default value of 16.

CREATE SPATIAL INDEX [spatial_IX] ON [dbo].[Ways] 
(
	[LineString]
) USING  GEOGRAPHY_GRID 
WITH (
    GRIDS =(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH), 
    CELLS_PER_OBJECT = 16, PAD_INDEX  = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, 
    ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) 
ON [PRIMARY]

JScript to automatically retrieve / check web page

Recently I discovered a problem with Manifold WMS where if the page wasn’t accessed for a while it would cause the w3wp.exe IIS worker thread to loop. Until I find out the underlying cause I’ve written the map_keepalive.wsf script below to automatically retrieve a web page and scheduled it to run every 5 minutes. I assume something gets spun down and the concurrent WMS requests cause a problem, the map_test.asp page that this script retrieves is just a default IMS page generated from within Manifold.

<package>
  <job id="js">
     <script language="JScript">
	var finished = false;
	var timeout = 0;
	var timeoutSecs = 30;
	var forAppending = 8;
	var xmlhttp = WScript.CreateObject("MSXML2.XMLHTTP");
	xmlhttp.open("GET", "http://www.satsleuth.com", false);
	xmlhttp.onreadystatechange = function() {
		if(xmlhttp.readyState == 4)
			finished = true;
	}
	try {
		xmlhttp.Send("");
	} catch (e) {
	}
	while ((!finished) && (timeout < timeoutSecs))
	{
		timeout++;
		WScript.Sleep(1000);
	}
	var msg = ""
	if (!finished)
		msg = "Timeout occurred";
	else if (xmlhttp.status != 200)
		msg = "Error retrieving map_test.asp, error code: " + xmlhttp.status;
	if (msg != "") {
		var oFSO = WScript.CreateObject("Scripting.FileSystemObject");
		var oFile = oFSO.OpenTextFile("map_keepalive.log", forAppending, true);
		var dt = new Date();
		oFile.WriteLine(dt.toLocaleString() + ": " + msg);
		oFile.Close();
	}
      </script>
   </job>
</package>