This is an example of the workflow a NASH CRN study site might use to add geomarkers to their data with DeGAUSS.
If you have used DeGAUSS, would you mind providing us some feedback and completing a short survey?
See the Installing Docker webpage.
This step is noted as part of the Docker Installation instructions, but is worth mentioning again here. If your computer is managed by your institution’s IT deparment, your admin account is likely different from your user account. In this case, someone with admin credentials must add your user account to the docker-users group. To do this, they should run Computer Management as an administrator and navigate to Local Users and Groups > Groups > docker-users. Then right-click to add the user to the group, and log out and log back in for the changes to take effect.
Note about Docker Settings:
After installing Docker, but before running containers, go to Docker Settings > Advanced and change memory to greater than 4000 MB (or 4 GiB)
If you are using a Windows computer, also set CPUs to 1.
Click Apply and wait for Docker to restart.
The input file must be a CSV file with one column called
address
containing all address components. Other columns
may be present and will be returned in the output file, but should be
kept to a minimum to reduce file size.
An example input CSV file (called my_address_file.csv
)
might look like:
id | address |
---|---|
13100070229 | 1922 CATALINA AV CINCINNATI, OH 45237 |
54000600136 | 5358 LILIBET CT DELHI TOWNSHIP, OH 45238 |
11200020024 | 630 GREENWOOD AV CINCINNATI, OH 45229 |
Refer to the DeGAUSS geocoding webpage for more information about the input file and address string formatting.
After navigating to your working directory, use the ghcr.io/degauss-org/geocoder
to geocode your addresses.
macOS example call:
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/geocoder:3.2.0 my_address_file.csv
Replace my_address_file.csv
with the name of the CSV
file to be geocoded and run the call in the shell.
Note for Windows Users:
In this and all following docker calls in this example, replace"$PWD"
with"%cd%"
. Refer to the DeGAUSS Troubleshooting page for more information.
See here for more information on the anatomy of a degauss command.
The output file is written to the same directory and
in our example, will be called
my_address_file_geocoder_3.2.0_score_threshold_0.5.csv
.
Example output:
id | address | matched_street | matched_zip | matched_city | matched_state | lat | lon | score | precision | geocode_result |
---|---|---|---|---|---|---|---|---|---|---|
54000600136 | 5358 LILIBET CT DELHI TOWNSHIP OH 45238 | Lilibet Ct | 45238 | Delhi Hills | OH | 39.11552 | -84.61902 | 0.754 | range | geocoded |
13100070229 | 1922 CATALINA AV CINCINNATI OH 45237 | Catalina Ave | 45237 | Cincinnati | OH | 39.17112 | -84.46176 | 0.922 | range | geocoded |
11200020024 | 630 GREENWOOD AV CINCINNATI OH 45229 | Greenwood Ave | 45229 | Cincinnati | OH | 39.15321 | -84.49236 | 0.922 | range | geocoded |
For more information on interpreting geocoder output, see here.
macOS example call:
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/dep_index:0.2.1 my_address_file_geocoder_3.2.0_score_threshold_0.5.csv
Replace
my_address_file_geocoder_3.2.0_score_threshold_0.5.csv
with
the name of the geocoded CSV file created in Step 3 and run.
The output file is written to the same directory
and, in our example, will be called
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1.csv
.
Example output:
id | address | matched_street | matched_zip | matched_city | matched_state | lat | lon | score | precision | geocode_result | census_tract_id | fraction_assisted_income | fraction_high_school_edu | median_income | fraction_no_health_ins | fraction_poverty | fraction_vacant_housing | dep_index |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
54000600136 | 5358 LILIBET CT DELHI TOWNSHIP OH 45238 | Lilibet Ct | 45238 | Delhi Hills | OH | 39.11552 | -84.61902 | 0.754 | range | geocoded | 39061021303 | 0.0380034 | 0.9396114 | 83385 | 0.0236515 | 0.0250104 | 0.0128779 | 0.2087159 |
13100070229 | 1922 CATALINA AV CINCINNATI OH 45237 | Catalina Ave | 45237 | Cincinnati | OH | 39.17112 | -84.46176 | 0.922 | range | geocoded | 39061006300 | 0.1149033 | 0.8787645 | 38395 | 0.0391429 | 0.1641705 | 0.1284085 | 0.3569748 |
11200020024 | 630 GREENWOOD AV CINCINNATI OH 45229 | Greenwood Ave | 45229 | Cincinnati | OH | 39.15321 | -84.49236 | 0.922 | range | geocoded | 39061006800 | 0.3517316 | 0.8051400 | 19783 | 0.0579212 | 0.3901274 | 0.2309613 | 0.5527528 |
More information on the dep_index container
More information on the deprivation index
macOS example call:
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/aadt:0.2.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1.csv
Replace
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1.csv
with the name of the CSV file created in Step 4 and run.
The output file is written to the same directory and
in our example, will be called
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer.csv
.
Example output:
id | address | matched_street | matched_zip | matched_city | matched_state | lat | lon | score | precision | geocode_result | census_tract_id | fraction_assisted_income | fraction_high_school_edu | median_income | fraction_no_health_ins | fraction_poverty | fraction_vacant_housing | dep_index | length_stop_go | length_moving | vehicle_meters_stop_go | vehicle_meters_moving | truck_meters_stop_go | truck_meters_moving |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
54000600136 | 5358 LILIBET CT DELHI TOWNSHIP OH 45238 | Lilibet Ct | 45238 | Delhi Hills | OH | 39.11552 | -84.61902 | 0.754 | range | geocoded | 39061021303 | 0.0380034 | 0.9396114 | 83385 | 0.0236515 | 0.0250104 | 0.0128779 | 0.2087159 | 900 | 0 | 12118633 | 0 | 0 | 0 |
13100070229 | 1922 CATALINA AV CINCINNATI OH 45237 | Catalina Ave | 45237 | Cincinnati | OH | 39.17112 | -84.46176 | 0.922 | range | geocoded | 39061006300 | 0.1149033 | 0.8787645 | 38395 | 0.0391429 | 0.1641705 | 0.1284085 | 0.3569748 | 350 | 0 | 2098249 | 0 | 0 | 0 |
11200020024 | 630 GREENWOOD AV CINCINNATI OH 45229 | Greenwood Ave | 45229 | Cincinnati | OH | 39.15321 | -84.49236 | 0.922 | range | geocoded | 39061006800 | 0.3517316 | 0.8051400 | 19783 | 0.0579212 | 0.3901274 | 0.2309613 | 0.5527528 | 0 | 0 | 0 | 0 | 0 | 0 |
More information on aadt
macOS example call:
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/roads:0.2.1 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer.csv
Replace
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer.csv
with the name of the CSV file created in Step 5 and run.
The output file is written to the same directory and
in our example, will be called
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer.csv
.
Example output:
id | address | matched_street | matched_zip | matched_city | matched_state | lat | lon | score | precision | geocode_result | census_tract_id | fraction_assisted_income | fraction_high_school_edu | median_income | fraction_no_health_ins | fraction_poverty | fraction_vacant_housing | dep_index | length_stop_go | length_moving | vehicle_meters_stop_go | vehicle_meters_moving | truck_meters_stop_go | truck_meters_moving | dist_to_1100 | dist_to_1200 | length_1100 | length_1200 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
54000600136 | 5358 LILIBET CT DELHI TOWNSHIP OH 45238 | Lilibet Ct | 45238 | Delhi Hills | OH | 39.11552 | -84.61902 | 0.754 | range | geocoded | 39061021303 | 0.0380034 | 0.9396114 | 83385 | 0.0236515 | 0.0250104 | 0.0128779 | 0.2087159 | 900 | 0 | 12118633 | 0 | 0 | 0 | 5793.1013 | 1654.8006 | 0 | 0 |
13100070229 | 1922 CATALINA AV CINCINNATI OH 45237 | Catalina Ave | 45237 | Cincinnati | OH | 39.17112 | -84.46176 | 0.922 | range | geocoded | 39061006300 | 0.1149033 | 0.8787645 | 38395 | 0.0391429 | 0.1641705 | 0.1284085 | 0.3569748 | 350 | 0 | 2098249 | 0 | 0 | 0 | 502.7043 | 534.7928 | 0 | 0 |
11200020024 | 630 GREENWOOD AV CINCINNATI OH 45229 | Greenwood Ave | 45229 | Cincinnati | OH | 39.15321 | -84.49236 | 0.922 | range | geocoded | 39061006800 | 0.3517316 | 0.8051400 | 19783 | 0.0579212 | 0.3901274 | 0.2309613 | 0.5527528 | 0 | 0 | 0 | 0 | 0 | 0 | 1453.0147 | 548.5412 | 0 | 0 |
More information on roads
macOS example call:
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/greenspace:0.3.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer.csv
Replace
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer.csv
with the name of the CSV file created in Step 6 and run.
The output file is written to the same directory and
in our example, will be called
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0.csv
.
Example output:
id | address | matched_street | matched_zip | matched_city | matched_state | lat | lon | score | precision | geocode_result | census_tract_id | fraction_assisted_income | fraction_high_school_edu | median_income | fraction_no_health_ins | fraction_poverty | fraction_vacant_housing | dep_index | length_stop_go | length_moving | vehicle_meters_stop_go | vehicle_meters_moving | truck_meters_stop_go | truck_meters_moving | dist_to_1100 | dist_to_1200 | length_1100 | length_1200 | evi_500 | evi_1500 | evi_2500 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
54000600136 | 5358 LILIBET CT DELHI TOWNSHIP OH 45238 | Lilibet Ct | 45238 | Delhi Hills | OH | 39.11552 | -84.61902 | 0.754 | range | geocoded | 39061021303 | 0.0380034 | 0.9396114 | 83385 | 0.0236515 | 0.0250104 | 0.0128779 | 0.2087159 | 900 | 0 | 12118633 | 0 | 0 | 0 | 5793.1013 | 1654.8006 | 0 | 0 | 0.4183 | 0.4350 | 0.4296 |
13100070229 | 1922 CATALINA AV CINCINNATI OH 45237 | Catalina Ave | 45237 | Cincinnati | OH | 39.17112 | -84.46176 | 0.922 | range | geocoded | 39061006300 | 0.1149033 | 0.8787645 | 38395 | 0.0391429 | 0.1641705 | 0.1284085 | 0.3569748 | 350 | 0 | 2098249 | 0 | 0 | 0 | 502.7043 | 534.7928 | 0 | 0 | 0.3356 | 0.3556 | 0.3864 |
11200020024 | 630 GREENWOOD AV CINCINNATI OH 45229 | Greenwood Ave | 45229 | Cincinnati | OH | 39.15321 | -84.49236 | 0.922 | range | geocoded | 39061006800 | 0.3517316 | 0.8051400 | 19783 | 0.0579212 | 0.3901274 | 0.2309613 | 0.5527528 | 0 | 0 | 0 | 0 | 0 | 0 | 1453.0147 | 548.5412 | 0 | 0 | 0.4157 | 0.4083 | 0.3774 |
More information on greenspace
macOS example call:
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/drivetime:1.1.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0.csv cchmc
Replace
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0.csv
with the name of the CSV file created in Step 7 and
replace cchmc
with the abbrevation for your site and
run.
The output file is written to the same directory and
in our example, will be called
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc.csv
.
Example output:
id | address | matched_street | matched_zip | matched_city | matched_state | lat | lon | score | precision | geocode_result | census_tract_id | fraction_assisted_income | fraction_high_school_edu | median_income | fraction_no_health_ins | fraction_poverty | fraction_vacant_housing | dep_index | length_stop_go | length_moving | vehicle_meters_stop_go | vehicle_meters_moving | truck_meters_stop_go | truck_meters_moving | dist_to_1100 | dist_to_1200 | length_1100 | length_1200 | evi_500 | evi_1500 | evi_2500 | drive_time | distance |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
54000600136 | 5358 LILIBET CT DELHI TOWNSHIP OH 45238 | Lilibet Ct | 45238 | Delhi Hills | OH | 39.11552 | -84.61902 | 0.754 | range | geocoded | 39061021303 | 0.0380034 | 0.9396114 | 83385 | 0.0236515 | 0.0250104 | 0.0128779 | 0.2087159 | 900 | 0 | 12118633 | 0 | 0 | 0 | 5793.1013 | 1654.8006 | 0 | 0 | 0.4183 | 0.4350 | 0.4296 | 30 | 10219.409 |
13100070229 | 1922 CATALINA AV CINCINNATI OH 45237 | Catalina Ave | 45237 | Cincinnati | OH | 39.17112 | -84.46176 | 0.922 | range | geocoded | 39061006300 | 0.1149033 | 0.8787645 | 38395 | 0.0391429 | 0.1641705 | 0.1284085 | 0.3569748 | 350 | 0 | 2098249 | 0 | 0 | 0 | 502.7043 | 534.7928 | 0 | 0 | 0.3356 | 0.3556 | 0.3864 | 18 | 5004.925 |
11200020024 | 630 GREENWOOD AV CINCINNATI OH 45229 | Greenwood Ave | 45229 | Cincinnati | OH | 39.15321 | -84.49236 | 0.922 | range | geocoded | 39061006800 | 0.3517316 | 0.8051400 | 19783 | 0.0579212 | 0.3901274 | 0.2309613 | 0.5527528 | 0 | 0 | 0 | 0 | 0 | 0 | 1453.0147 | 548.5412 | 0 | 0 | 0.4157 | 0.4083 | 0.3774 | 6 | 1755.939 |
More information on drive time
macOS example call:
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/nlcd:0.2.3 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc.csv
Replace
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc.csv
with the name of the CSV file created in Step 8 and run.
The output file is written to the same directory and
in our example, will be called
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc_nlcd_0.2.3_400m_buffer.csv
.
Note that the landcover container will result in mulitple rows per patient, due to multiple years of landcover data.
Example output:
More information on landcover
In this step, you will use a container to merge the output of Step 9 with a CSV file containing census tract level data.
The census tract level data is called
nash_crn_census_data_2010.csv
and should have been sent to
you. Otherwise you can download the data and accompanying data
dictionary here.
Note that any CSV opened in Microsoft Excel will not show leading
zeros. If a CSV is opened in Excel then saved, the leading zeros will be
truncated (e.g., 01234567891
will become
1234567891
). Avoid opening any CSVs and saving.
macOS example call:
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/census_merger:0.1.2 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc_nlcd_0.2.3_400m_buffer.csv nash_crn_census_data_2010.csv
Replace
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc_nlcd_0.2.3_400m_buffer.csv
with the name of the CSV file created in Step 9 and run.
The output file is written to the same directory and
in our example, will be called
my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc_nlcd_0.2.3_400m_buffer_nash_crn_census_data_2010_census_merger_0.1.2.csv
.
Before sharing your data, remove the following columns from both the air pollution output file and the file created by Step 8:
address
matched_street
matched_city
matched_zip
matched_state
lat
lon
census_tract_id
nlcd_cell
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/geocoder:3.2.0 my_address_file.csv
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/dep_index:0.2.1 my_address_file_geocoder_3.2.0_score_threshold_0.5.csv
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/aadt:0.2.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1.csv
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/roads:0.2.1 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer.csv
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/greenspace:0.3.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer.csv
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/drivetime:1.1.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0.csv cchmc
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/nlcd:0.2.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc.csv
docker run --rm -v "$PWD":/tmp ghcr.io/degauss-org/census_merger:0.1.0 my_address_file_geocoder_3.2.0_score_threshold_0.5_dep_index_0.2.1_aadt_0.2.0_400m_buffer_roads_0.2.1_400m_buffer_greenspace_0.3.0_drivetime_1.1.0_cchmc_nlcd_0.2.0_400m_buffer.csv nash_crn_census_data_2010.csv