Can we limit the search range of geocode() function in PostGIS Tiger Geocoder?
I found the server with only 2 states data loaded is much faster than the server with all states loaded. My theory is bad formatted address that don't have a exact hit at first will cost much more time when the geocoder checked all states. With only 2 states this search is limited and stopped much early.
There is a
geocodefunction looks promising if it can limit the search range, since I have enough information or reason to believe the state information in my addresses input are correct. I wrote a query trying to use one state's geometry as the limiting parameter:
SELECT geocode('501 Fairmount DR , Annapolis, MD 20137', 1, the_geom) FROM tiger.state WHERE statefp = '24';
and compared the performance with the simple version
SELECT geocode('501 Fairmount DR , Annapolis, MD 20137',1);
EDIT Note the city and zipcode in this input is wrong which is intentional. If the address is perfect there is no point to limit the search range since the geocoder can go to the right table at first try. Only this attempt is meaningful only when the zipcode or city is wrong.
I didn't find performance gain with the parameter. Instead it lost the performance gain from caching, which usually came from running same query immediately again because all the needed data have been cached in RAM.
Maybe my usage is not proper, or this parameter is not intended to work as I expected. However if the search range can be limited, the performance gain could be substantial, since it's the bad formatted addresses took the most time to geocode, and they also often mess up the already cached data because the geocoder need to search for states, even all my input are in one state and all data can be cached in RAM.
No surprise the geometry filter doesn't boost performance. It was more designed to prevent matching against areas where absolutely an address can not exist. You could try simpyfying the geometry with ST_Simplify, but even that might not help much.
What ideally you might want to do is normalize the addresses first especially if your problem is bad normalization and then only filter for ones falling in the right state.
So something like:
WITH addys AS (SELECT original_address, normalize(original_address) As addy FROM your_table LIMIT 100) SELECT geocode(addy,1) FROM addys WHERE (addy).stateAbbrev = 'MD';
If it can determine the address, it should be smart enough to go right away to the right tables. If you are noticing significant slow down, make sure you indexed and vacuum analyzed from the parent tables. That often is culprit for slow geocoding as you add more states. I've forgotten that myself on many occasion and wondered why things were so slow.
SELECT install_missing_indexes(); vacuum analyze verbose tiger.addr; vacuum analyze verbose tiger.edges; vacuum analyze verbose tiger.faces; vacuum analyze verbose tiger.featnames; vacuum analyze verbose tiger.place; vacuum analyze verbose tiger.cousub; vacuum analyze verbose tiger.county; vacuum analyze verbose tiger.state;