A geocoder that relies on offline TIGER/Line data useful for geocoding private health information.
Note that you can call an older version of the geocoder by specifying its version number in the docker
addressand an optional identifier column (e.g.,
id). Fewer columns will increase geocoding speed.
32709) and not “plus four” (i.e.
3333 Burnet Ave Cincinnati 45229 OH)
my_address_file.csv is a file in the current working directory with an address column named
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/geocoder:3.0.2 my_address_file.csv
my_address_file_geocoded_v3.0.2.csv with added columns including
lon, and geocoding diagnostic information.
Note: If you are using a Windows machine to run Docker, please review this page for Windows-specific changes that likely need to be made to successfully use DeGAUSS. You can ignore this if you are using macOS or linux.
The geocoder’s output file includes the following columns:
matched_zip: matched address componets (e.g.,
matched_street is the street the geocoder matched with the input address); can be used to investigate input address misspellings, typos, etc.
precision: The qualitative precision of the geocode. The value will be one of:
range: interpolated based on address ranges from street segments
street: center of the matched street
intersection: intersection of two streets
zip: centroid of the matched zip code
city: centroid of the matched city
score: The percentage of text match between the given address and the geocoded result, expressed as a number between 0 and 1. A higher score indicates a closer match. Note that each score is relative within a precision method (i.e. a
0.8 with a
range is not the same as a
0.8 with a
lon: geocoded coordinates for matched address
geocode_result: A qualitative summary of the geocoding result. The value will be one of
po_box: the address was not geocoded because it is a PO Box
cincy_inst_foster_addr: the address was not geocoded because it is a known institutional address, not a residential address
non_address_text: the address was not geocoded because it was blank or listed as “foreign”, “verify”, or “unknown”
imprecise_geocode: the address was geocoded, but results were suppressed because the
city and/or the
score was less than
geocoded: the address was geocoded with a
cityare returned with a missing
lonbecause they are likely too inaccurate and/or too imprecise to be used for further analysis.
lonare also returned as missing if the
scoreis less than
0.5(regardless of the precision). This threshold can be changed by including an optional argument in the docker call (
docker run --rm -v $PWD:/tmp degauss/geocoder:3.0 my_address_file.csv 0.4).
For detailed documentation on DeGAUSS, including general usage and installation, please see the DeGAUSS homepage.