
If my_address_file.csv is a file in the current working directory with an address column named address, then the DeGAUSS command:
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal:0.1.4 my_address_file.csv
will produce my_address_file_postal_0.1.4.csv with added columns:
cleaned_address: address with non-alphanumeric characterics and excess whitespace removed (with dht::clean_address())parsed.{address_component}: multiple columns, one for each parsed address component (e.g., parsed.road, parsed.state, parsed.house_number)parsed_address: a “parsed” address created by pasting together available parsed.house_number, parsed.road, parsed.city, parsed.state, and the first five digits of the parsed.postcode address componentsAfter parsing, the parsed addresses can be expanded into several possible normalized addresses using libpostal. This can be useful for matching of these addresses with other messy, real world addresses.
If any value is provided as an argument (e.g., “expand”), then the DeGAUSS command:
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/postal:0.1.4 my_address_file.csv expand
will produce my_address_file_postal_0.1.4_expand.csv with the above columns plus:
expanded_addresses: the expanded addresses for parsed_addressBecause each parsed_address will likely result in more than one expanded_addresses, each input row is duplicated to accomodate several expanded_addresses. This means that when expanding addresses, the input CSV file is “expanded” too by duplicating the input rows.
Input addresses are parsed/normalized using libpostal by:
-) and excess whitespace (with dht::clean_address())libpostal/scr/address_parser (a machine learning model trained on OpenStreetMap and OpenAddresses)For detailed documentation on DeGAUSS, including general usage and installation, please see the DeGAUSS homepage.