Introduction

This is an example of the workflow a NASH CRN study site might use to add geomarkers to their data with DeGAUSS.

If you have used DeGAUSS, would you mind providing us some feedback and completing a short survey?

Step 0: Install Docker

See the Installing Docker webpage.

This step is noted as part of the Docker Installation instructions, but is worth mentioning again here. If your computer is managed by your institution’s IT deparment, your admin account is likely different from your user account. In this case, someone with admin credentials must add your user account to the docker-users group. To do this, they should run Computer Management as an administrator and navigate to Local Users and Groups > Groups > docker-users. Then right-click to add the user to the group, and log out and log back in for the changes to take effect.

Note about Docker Settings:
After installing Docker, but before running containers, go to Docker Settings > Advanced and change memory to greater than 4000 MB (or 4 GiB)

If you are using a Windows computer, also set CPUs to 1.

Click Apply and wait for Docker to restart.

Step 1: Preparing Your Input File

The input file must be a CSV file with one column called address containing all address components. Other columns may be present and will be returned in the output file, but should be kept to a minimum to reduce file size.

An example input CSV file (called my_address_file.csv) might look like:

id address
13100070229 1922 CATALINA AV CINCINNATI, OH 45237
54000600136 5358 LILIBET CT DELHI TOWNSHIP, OH 45238
11200020024 630 GREENWOOD AV CINCINNATI, OH 45229

Refer to the DeGAUSS geocoding webpage for more information about the input file and address string formatting.

Step 2: Navigating the Shell

Open a shell (i.e., terminal on Mac or CMD on Windows). We will use this shell for the rest of the steps in this example.

Navigate to the directory where the CSV file to be geocoded is located. See here for help navigating a filesystem using the command line.

For those unfamiliar with the command line, a simple approach is to save the file to be geocoded to the Desktop, then navigate to your Desktop folder with the command cd Desktop.

Step 3: Geocoding

After navigating to your working directory, use the ghcr.io/degauss-org/geocoder to geocode your addresses.

macOS example call:

docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/geocoder:3.2.0 my_address_file.csv

Replace my_address_file.csv with the name of the CSV file to be geocoded and run the call in the shell.


Note for Windows Users:
In this and all following docker calls in this example, replace "$PWD" with "%cd%". Refer to the DeGAUSS Troubleshooting page for more information.

See here for more information on the anatomy of a degauss command.

The output file is written to the same directory and in our example, will be called my_address_file_geocoder_3.2.0_score_threshold_0.5.csv.

Example output:

id address matched_street matched_zip matched_city matched_state lat lon score precision geocode_result
54000600136 5358 LILIBET CT DELHI TOWNSHIP OH 45238 Lilibet Ct 45238 Delhi Hills OH 39.11552 -84.61902 0.754 range geocoded
13100070229 1922 CATALINA AV CINCINNATI OH 45237 Catalina Ave 45237 Cincinnati OH 39.17112 -84.46176 0.922 range geocoded
11200020024 630 GREENWOOD AV CINCINNATI OH 45229 Greenwood Ave 45229 Cincinnati OH 39.15321 -84.49236 0.922 range geocoded

For more information on interpreting geocoder output, see here.

Step 4: Deprivation Index

macOS example call:

docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/dep_index:0.2.1 my_address_file_geocoder_3.2.0_score_threshold_0.5.csv

Replace my_address_file_geocoder_3.2.0_score_threshold_0.5.csv with the name of the geocoded CSV file created in Step 3 and run.

The output file is written to the same directory and, in our example, will be called my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1.csv.

Example output:

id address matched_street matched_zip matched_city matched_state lat lon score precision geocode_result census_tract_id fraction_assisted_income fraction_high_school_edu median_income fraction_no_health_ins fraction_poverty fraction_vacant_housing dep_index
54000600136 5358 LILIBET CT DELHI TOWNSHIP OH 45238 Lilibet Ct 45238 Delhi Hills OH 39.11552 -84.61902 0.754 range geocoded 39061021303 0.0380034 0.9396114 83385 0.0236515 0.0250104 0.0128779 0.2087159
13100070229 1922 CATALINA AV CINCINNATI OH 45237 Catalina Ave 45237 Cincinnati OH 39.17112 -84.46176 0.922 range geocoded 39061006300 0.1149033 0.8787645 38395 0.0391429 0.1641705 0.1284085 0.3569748
11200020024 630 GREENWOOD AV CINCINNATI OH 45229 Greenwood Ave 45229 Cincinnati OH 39.15321 -84.49236 0.922 range geocoded 39061006800 0.3517316 0.8051400 19783 0.0579212 0.3901274 0.2309613 0.5527528

More information on the dep_index container

More information on the deprivation index

Step 5: Average Annual Daily Traffic

macOS example call:

docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/aadt:0.2.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1.csv 

Replace my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1.csv with the name of the CSV file created in Step 4 and run.

The output file is written to the same directory and in our example, will be called my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer.csv.

Example output:

id address matched_street matched_zip matched_city matched_state lat lon score precision geocode_result census_tract_id fraction_assisted_income fraction_high_school_edu median_income fraction_no_health_ins fraction_poverty fraction_vacant_housing dep_index length_stop_go length_moving vehicle_meters_stop_go vehicle_meters_moving truck_meters_stop_go truck_meters_moving
54000600136 5358 LILIBET CT DELHI TOWNSHIP OH 45238 Lilibet Ct 45238 Delhi Hills OH 39.11552 -84.61902 0.754 range geocoded 39061021303 0.0380034 0.9396114 83385 0.0236515 0.0250104 0.0128779 0.2087159 900 0 12118633 0 0 0
13100070229 1922 CATALINA AV CINCINNATI OH 45237 Catalina Ave 45237 Cincinnati OH 39.17112 -84.46176 0.922 range geocoded 39061006300 0.1149033 0.8787645 38395 0.0391429 0.1641705 0.1284085 0.3569748 350 0 2098249 0 0 0
11200020024 630 GREENWOOD AV CINCINNATI OH 45229 Greenwood Ave 45229 Cincinnati OH 39.15321 -84.49236 0.922 range geocoded 39061006800 0.3517316 0.8051400 19783 0.0579212 0.3901274 0.2309613 0.5527528 0 0 0 0 0 0

More information on aadt

Step 6: Proximity to Roads

macOS example call:

docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/roads:0.2.1 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer.csv

Replace my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer.csv with the name of the CSV file created in Step 5 and run.

The output file is written to the same directory and in our example, will be called my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer.csv.

Example output:

id address matched_street matched_zip matched_city matched_state lat lon score precision geocode_result census_tract_id fraction_assisted_income fraction_high_school_edu median_income fraction_no_health_ins fraction_poverty fraction_vacant_housing dep_index length_stop_go length_moving vehicle_meters_stop_go vehicle_meters_moving truck_meters_stop_go truck_meters_moving dist_to_1100 dist_to_1200 length_1100 length_1200
54000600136 5358 LILIBET CT DELHI TOWNSHIP OH 45238 Lilibet Ct 45238 Delhi Hills OH 39.11552 -84.61902 0.754 range geocoded 39061021303 0.0380034 0.9396114 83385 0.0236515 0.0250104 0.0128779 0.2087159 900 0 12118633 0 0 0 5793.1013 1654.8006 0 0
13100070229 1922 CATALINA AV CINCINNATI OH 45237 Catalina Ave 45237 Cincinnati OH 39.17112 -84.46176 0.922 range geocoded 39061006300 0.1149033 0.8787645 38395 0.0391429 0.1641705 0.1284085 0.3569748 350 0 2098249 0 0 0 502.7043 534.7928 0 0
11200020024 630 GREENWOOD AV CINCINNATI OH 45229 Greenwood Ave 45229 Cincinnati OH 39.15321 -84.49236 0.922 range geocoded 39061006800 0.3517316 0.8051400 19783 0.0579212 0.3901274 0.2309613 0.5527528 0 0 0 0 0 0 1453.0147 548.5412 0 0

More information on roads

Step 7: Greenspace

macOS example call:

docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/greenspace:0.3.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer.csv

Replace my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer.csv with the name of the CSV file created in Step 6 and run.

The output file is written to the same directory and in our example, will be called my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0.csv.

Example output:

id address matched_street matched_zip matched_city matched_state lat lon score precision geocode_result census_tract_id fraction_assisted_income fraction_high_school_edu median_income fraction_no_health_ins fraction_poverty fraction_vacant_housing dep_index length_stop_go length_moving vehicle_meters_stop_go vehicle_meters_moving truck_meters_stop_go truck_meters_moving dist_to_1100 dist_to_1200 length_1100 length_1200 evi_500 evi_1500 evi_2500
54000600136 5358 LILIBET CT DELHI TOWNSHIP OH 45238 Lilibet Ct 45238 Delhi Hills OH 39.11552 -84.61902 0.754 range geocoded 39061021303 0.0380034 0.9396114 83385 0.0236515 0.0250104 0.0128779 0.2087159 900 0 12118633 0 0 0 5793.1013 1654.8006 0 0 0.4183 0.4350 0.4296
13100070229 1922 CATALINA AV CINCINNATI OH 45237 Catalina Ave 45237 Cincinnati OH 39.17112 -84.46176 0.922 range geocoded 39061006300 0.1149033 0.8787645 38395 0.0391429 0.1641705 0.1284085 0.3569748 350 0 2098249 0 0 0 502.7043 534.7928 0 0 0.3356 0.3556 0.3864
11200020024 630 GREENWOOD AV CINCINNATI OH 45229 Greenwood Ave 45229 Cincinnati OH 39.15321 -84.49236 0.922 range geocoded 39061006800 0.3517316 0.8051400 19783 0.0579212 0.3901274 0.2309613 0.5527528 0 0 0 0 0 0 1453.0147 548.5412 0 0 0.4157 0.4083 0.3774

More information on greenspace

Step 8: Drive Time

macOS example call:

docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/drivetime:1.1.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0.csv cchmc

Replace my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0.csv with the name of the CSV file created in Step 7 and replace cchmc with the abbrevation for your site and run.

List of site abbrevations

The output file is written to the same directory and in our example, will be called my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc.csv.

Example output:

id address matched_street matched_zip matched_city matched_state lat lon score precision geocode_result census_tract_id fraction_assisted_income fraction_high_school_edu median_income fraction_no_health_ins fraction_poverty fraction_vacant_housing dep_index length_stop_go length_moving vehicle_meters_stop_go vehicle_meters_moving truck_meters_stop_go truck_meters_moving dist_to_1100 dist_to_1200 length_1100 length_1200 evi_500 evi_1500 evi_2500 drive_time distance
54000600136 5358 LILIBET CT DELHI TOWNSHIP OH 45238 Lilibet Ct 45238 Delhi Hills OH 39.11552 -84.61902 0.754 range geocoded 39061021303 0.0380034 0.9396114 83385 0.0236515 0.0250104 0.0128779 0.2087159 900 0 12118633 0 0 0 5793.1013 1654.8006 0 0 0.4183 0.4350 0.4296 30 10219.409
13100070229 1922 CATALINA AV CINCINNATI OH 45237 Catalina Ave 45237 Cincinnati OH 39.17112 -84.46176 0.922 range geocoded 39061006300 0.1149033 0.8787645 38395 0.0391429 0.1641705 0.1284085 0.3569748 350 0 2098249 0 0 0 502.7043 534.7928 0 0 0.3356 0.3556 0.3864 18 5004.925
11200020024 630 GREENWOOD AV CINCINNATI OH 45229 Greenwood Ave 45229 Cincinnati OH 39.15321 -84.49236 0.922 range geocoded 39061006800 0.3517316 0.8051400 19783 0.0579212 0.3901274 0.2309613 0.5527528 0 0 0 0 0 0 1453.0147 548.5412 0 0 0.4157 0.4083 0.3774 6 1755.939

More information on drive time

Step 9: Landcover

macOS example call:

docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/nlcd:0.2.3 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc.csv

Replace my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc.csv with the name of the CSV file created in Step 8 and run.

The output file is written to the same directory and in our example, will be called my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc_nlcd_0.2.3_400m_buffer.csv.

Note that the landcover container will result in mulitple rows per patient, due to multiple years of landcover data.

Example output:

More information on landcover

Step 10: Census Tract Data

In this step, you will use a container to merge the output of Step 9 with a CSV file containing census tract level data.

The census tract level data is called nash_crn_census_data_2010.csv and should have been sent to you. Otherwise you can download the data and accompanying data dictionary here.

Note that any CSV opened in Microsoft Excel will not show leading zeros. If a CSV is opened in Excel then saved, the leading zeros will be truncated (e.g., 01234567891 will become 1234567891). Avoid opening any CSVs and saving.

macOS example call:

docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/census_merger:0.1.2 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc_nlcd_0.2.3_400m_buffer.csv nash_crn_census_data_2010.csv

Replace my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc_nlcd_0.2.3_400m_buffer.csv with the name of the CSV file created in Step 9 and run.

The output file is written to the same directory and in our example, will be called my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc_nlcd_0.2.3_400m_buffer_nash_crn_census_data_2010_census_merger_0.1.2.csv.

Step 11: Removing PHI

Before sharing your data, remove the following columns from both the air pollution output file and the file created by Step 8:

Code Summary

docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/geocoder:3.2.0 my_address_file.csv
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/dep_index:0.2.1 my_address_file_geocoder_3.2.0_score_threshold_0.5.csv
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/aadt:0.2.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1.csv 
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/roads:0.2.1 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer.csv
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/greenspace:0.3.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer.csv
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/drivetime:1.1.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0.csv cchmc
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/nlcd:0.2.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc.csv
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/census_merger:0.1.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc_nlcd_0.2.0_400m_buffer.csv nash_crn_census_data_2010.csv