degauss_schwartz_guide

DeGAUSS Schwartz Guide

This step-by-step guide covers how to add daily spatiotemporal PM25, NO2, and O3 estimates from models created by Joel Schwartz and used within CREW. We assume that the user starts with geocoded data, but DeGAUSS can also be used to geocode address data if needed.

Please note that each of these DeGAUSS commands will use the most recent version of each container, but older versions can be specified for each. For detailed documentation on DeGAUSS, including general usage and installation, please see https://degauss.org.

If you have used DeGAUSS, would you mind providing us some feedback and completing a short survey?

1. Assemble Data

id lat lon start_date end_date index_date
A 39.1967 -84.5826 2000-10-20 2000-10-23 2000-02-14
B 33.9729 -118.2328 2008-11-14 2008-11-16 2008-04-11
C 35.8718 -78.6385 2004-01-30 2004-02-02 2003-05-02

2. Use geolocation to find grid identifier

docker run --rm -v $PWD:/tmp degauss/schwartz_grid_lookup my_address_file_geocoded.csv
id lat lon start_date end_date index_date site_index sitecode
A 39.1967 -84.5826 2000-10-20 2000-10-23 2000-02-14 9607238 211050640897
B 33.9729 -118.2328 2008-11-14 2008-11-16 2008-04-11 324003 208050280324
C 35.8718 -78.6385 2004-01-30 2004-02-02 2003-05-02 9784599 211050904096

3. Use grid identifers and dates to get exposure estimates

docker run --rm -v $PWD:/tmp degauss/schwartz my_address_file_geocoded_schwartz_site_index.csv
id lat lon start_date end_date index_date site_index sitecode date gh6 gh3 year gh3_combined PM25 NO2 O3 days_from_index_date
A 39.1967 -84.5826 2000-10-20 2000-10-23 2000-02-14 9607238 211050640897 2000-10-20 dngz52 dng 2000 dng 22.6 64.1 48.5 249
A 39.1967 -84.5826 2000-10-20 2000-10-23 2000-02-14 9607238 211050640897 2000-10-21 dngz52 dng 2000 dng 37.2 48.4 49.5 250
A 39.1967 -84.5826 2000-10-20 2000-10-23 2000-02-14 9607238 211050640897 2000-10-22 dngz52 dng 2000 dng 28.4 52.9 60.2 251
A 39.1967 -84.5826 2000-10-20 2000-10-23 2000-02-14 9607238 211050640897 2000-10-23 dngz52 dng 2000 dng 42.3 56.6 43.1 252
C 35.8718 -78.6385 2004-01-30 2004-02-02 2003-05-02 9784599 211050904096 2004-01-30 dq2h4d dq2 2004 dq2 7.1 27.6 27.8 273
C 35.8718 -78.6385 2004-01-30 2004-02-02 2003-05-02 9784599 211050904096 2004-01-31 dq2h4d dq2 2004 dq2 6.8 32.8 30.7 274
C 35.8718 -78.6385 2004-01-30 2004-02-02 2003-05-02 9784599 211050904096 2004-02-01 dq2h4d dq2 2004 dq2 13.7 32.1 31.3 275
C 35.8718 -78.6385 2004-01-30 2004-02-02 2003-05-02 9784599 211050904096 2004-02-02 dq2h4d dq2 2004 dq2 11.8 37 33.5 276
B 33.9729 -118.2328 2008-11-14 2008-11-16 2008-04-11 324003 208050280324 2008-11-14 9q5cm2 9q5 2008 9q5 24.2 81.7 28 217
B 33.9729 -118.2328 2008-11-14 2008-11-16 2008-04-11 324003 208050280324 2008-11-15 9q5cm2 9q5 2008 9q5 36.9 66.8 33 218
B 33.9729 -118.2328 2008-11-14 2008-11-16 2008-04-11 324003 208050280324 2008-11-16 9q5cm2 9q5 2008 9q5 85.4 59.5 36.2 219

4. Remove potential identifiers

id PM25 NO2 O3 days_from_index_date
A 22.6 64.1 48.5 249
A 37.2 48.4 49.5 250
A 28.4 52.9 60.2 251
A 42.3 56.6 43.1 252
C 7.1 27.6 27.8 273
C 6.8 32.8 30.7 274
C 13.7 32.1 31.3 275
C 11.8 37 33.5 276
B 24.2 81.7 28 217
B 36.9 66.8 33 218
B 85.4 59.5 36.2 219

Troubleshooting and Details

Optional index_date

Dates and Sitecode formatting

Methodological Details

Linking to the nearest “grid”

example_schwartz_lookup

Downloading and Extracting Exposure Estimates

Privacy Considerations

degauss/schwartz_exposure_assessment does not come with all of the spatiotemporal exposure estimates (which are around 200 GB in a compressed zip file). Instead, the container determines which geographic regions and calendar years are necessary based on the input dataset. However, since the container requesting chunks of the exposure estimate data is downloading them over the internet, we need to make sure to not include any protected health information. The HIPAA Safe Harbor Guidelines and the Revised Common Rule specify that spatial location information is not considered PHI if the geographic identifier contains at least 20,000 people. For example, this is why “de-identified” datasets will only contain 3-digit zipcodes instead of 5-digit zipcodes. Similarly, we can conveniently use the geohash we are already using for lookup to “coarsen” our geographic precision by truncating it from a resolution of 6 to a resolution of 3. The container transmits the calendar year and resolution 3 h2 geohash to an Amazon Web Services Simple Storage Service (AWS S3) to retrieve estimates as needed. This prevents the download of unnecessary spatial and/or temporal “slices” of data that will never be necessary for every individual user, decreasing the time and resources needed by the end user to run the software while never sharing any protected health information.

population of each resolution 3 h2 geohash polygon