A docker container for geocoding, assigning census tract, and deprivation index to addresses
cchmc_batch_geocoder
is now out of date. Please consider using our updated geocoder instead. 🛑This DeGAUSS container condenses the sequence of (1) geocoding street addresses with a custom geocoder based on 2015 TIGER/Line address range files , (2) joining the geocodes to a 2010 census tract shapefile from NHGIS using the epsg:5072 projection, and (3) adding census tract level data from the community deprivation index all into a single image.
To run, navigate to the directory containing a CSV file with a column called address
and call:
docker run --rm=TRUE -v $PWD:/tmp degauss/cchmc_batch_geocoder my_address_file.csv
The container tries to simplify interpretation of the geocoding results with some new columns:
bad_address
: TRUE
for Cincinnati foster & institutional addresses, “foreign”, “verify”, “unknown”, and missing addressesPO
: TRUE
if a Post Office (PO) boxprecise_geocode
: TRUE
if geocoding result had a precision method of “street” or “range” and a score of > 0.5If precise_geocode
is FALSE
, this means that the address was geocoded but probably not well enough to accurately place it in a census tract. The lat
and lon
columns and the corresponding census tract variables (like fips_tract_id
, dep_index
, etc…) for these are set to missing since we cannot accurately place them at a coordinate and in a census tract.
The addresses that are not successfully geocoded are still in the output file, but all moved to the top. This allows for quick examination of these addresses for errors. After edits are made, rerun the container. The successful geocodes are cached locally in a folder called geocoding_cache
so that the geocoding process is never repeated, but instead read from disk. This makes the process of manually editing problematic addresses and rerunning the edited file through the container very quick.
If your address components are in different columns, you will need to paste them together into a single string. Below are some tips that will help optimize geocoding accuracy and precision:
32709
) and not “plus four” (i.e. 32709-0000
)St.
instead of Street
or OH
instead of Ohio
)13
instead of thirteen
)3333 Burnet Ave Cincinnati 45229 OH
)To find more information on how to install Docker and use DeGAUSS, see the DeGAUSS README or our publications in JAMIA or JOSS.