The DeGAUSS geocoder relies on offline TIGER/Line data useful for geocoding private health information. View the geocoder homepage for more information.
Addresses must be stored as a
CSV file and follow these formatting requirements:
addressand an optional identifier column (e.g.,
id). Fewer columns will increase geocoding speed.
32709) and not “plus four” (i.e.
3333 Burnet Ave Cincinnati 45229 OH)
If using MacOS, open the terminal application. For Windows, use the command prompt or powershell.
Navigate to the working directory where your address file is located. For example, if your file is saved to the Desktop, use
cd Desktop. See here for more information on file navigation.
Enter the DeGAUSS command. See the example below:
my_address_file.csv is a file in the current working directory with an address column named
docker run --rm -v $PWD:/tmp degauss/geocoder:3.0.1 my_address_file.csv
my_address_file_geocoded_v3.0.csv with added columns including
lon, and geocoding diagnostic information.
Note: If you are using a Windows machine to run Docker, please review this page for Windows-specific changes that likely need to be made to successfully use DeGAUSS. You can ignore this if you are using macOS or linux.
The geocoder’s output file includes the following columns:
matched_zip: matched address componets (e.g.,
matched_street is the street the geocoder matched with the input address); can be used to investigate input address misspellings, typos, etc.
precision: The qualitative precision of the geocode. The value will be one of:
range: interpolated based on address ranges from street segments
street: center of the matched street
intersection: intersection of two streets
zip: centroid of the matched zip code
city: centroid of the matched city
score: The percentage of text match between the given address and the geocoded result, expressed as a number between 0 and 1. A higher score indicates a closer match. Note that each score is relative within a precision method (i.e. a
0.8 with a
rangeis not the same as a
0.8 with a
lon: geocoded coordinates for matched address
geocode_result: A qualitative summary of the geocoding result. The value will be one of
po_box: the address was not geocoded because it is a PO Box
cincy_inst_foster_addr: the address was not geocoded because it is a known institutional address, not a residential address
non_address_text: the address was not geocoded because it was blank or listed as “foreign”, “verify”, or “unknown”
imprecise_geocode: the address was geocoded, but results were suppressed because the
city and/or the
score was less than
geocoded: the address was geocoded with a
precision of either
street and a
0.5 or greater.
cityare returned with a missing
lonbecause they are likely too inaccurate and/or too imprecise to be used for further analysis.
lonare also returned as missing if the
scoreis less than
0.5(regardless of the precision). This threshold can be changed by including an optional argument in the docker call (
docker run --rm -v $PWD:/tmp degauss/geocoder:3.0 my_address_file.csv 0.4).