If my_address_file.csv
is a file in the current working directory with an address column named address
, then the DeGAUSS command:
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/geocoder:3.3.0 my_address_file.csv
will produce my_address_file_geocoder_3.3.0_score_threshold_0.5.csv
with added columns:
matched_street
, matched_city
, matched_state
, matched_zip
: matched address componets (e.g., matched_street
is the street the geocoder matched with the input address); can be used to investigate input address misspellings, typos, etc.precision
: The method/precision of the geocode. The value will be one of:
range
: interpolated based on address ranges from street segmentsstreet
: center of the matched streetintersection
: intersection of two streetszip
: centroid of the matched zip codecity
: centroid of the matched cityscore
: The percentage of text match between the given address and the geocoded result, expressed as a number between 0 and 1. A higher score indicates a closer match. Note that each score is relative within a precision method (i.e. a score
of 0.8
with a precision
of range
is not the same as a score
of 0.8
with a precision
of street
).lat
and lon
: geocoded coordinates for matched addressgeocode_result
: A character string summarizing the geocoding result. The value will be one of
geocoded
: the address was geocoded with a precision
of either range
or street
and a score
of 0.5
or greater.imprecise_geocode
: the address was geocoded, but results were suppressed because the precision
was intersection
, zip
, or city
and/or the score
was less than 0.5
.po_box
: the address was not geocoded because it is a PO Boxcincy_inst_foster_addr
: the address was not geocoded because it is a known institutional address, not a residential addressnon_address_text
: the address was not geocoded because it was blank or listed as “foreign”, “verify”, or “unknown”intersection
, zip
, or city
are returned with a missing lat
and lon
because they are likely too inaccurate and/or too imprecise to be used for further analysis.lat
and lon
are also returned as missing if the score
is less than 0.5
(regardless of the precision).docker run --rm -v $PWD:/tmp degauss/geocoder:3.2.0 my_address_file.csv 0.6
).all
instead of a numeric score_threshold
returns all geocodes regardless of score
, precision
, or po_box
, cincy_inst_foster_addr
, and non_address_text
filters.address
and an optional identifier column (e.g., id
). Fewer columns will increase geocoding speed.address
.32709
) and not “plus four” (i.e. 32709-0000
)St.
instead of Street
or OH
instead of Ohio
)13
instead of thirteen
)3333 Burnet Ave Cincinnati 45229 OH
)geocoder.db
is a SQL database prepared following the instructions here using 2021 TIGER/Line Street Range Address files from the Censuss3://geomarker/geocoder_2021.db
For detailed documentation on DeGAUSS, including general usage and installation, please see the DeGAUSS homepage.