Scientific studies often examine the relationships between place-based information and health outcomes; for example, air pollution and asthma, neighborhood crime and mental health, or community greenspace and IQ. Study subjects with location information, most commonly a residential mailing address, are linked to databases of place-based information, or “geomarkers” in order to conduct these studies. Defined formally, a “geomarker” is any objective, contextual or geographic measure that influences or predicts the incidence of outcome or disease. “Geocoding” is the process of translating a string of text referring to a location (most often a mailing address) into coordinates on the earth’s surface (most often latitude and longitude). These coordinates are required to link participants to their estimated exposures to geomarkers – a process we call “geomarker assessment”. Some examples of geomarker assessment commonly performed in health studies using people include distance to the nearest major roadway – a commonly used as a measure of estimated exposure to traffic related air pollution that is associated with increased risk of asthma – or neighborhood median household income – a commonly used as a measure of community deprivation associated with increased bed days spent in the hospital.
Within these health studies, both geocoding and geomarker assessment involve the use of identifying information (addresses and location coordinates) and therefore must be conducted in a HIPAA and IRB compliant manner. These laws were designed to protect the privacy of study participants by preventing the sharing of a myriad of protected health information (PHI). While beneficial with respect to privacy, this prevents an outstanding challenge for researchers by preventing them from using external third party software, e.g. Google Maps, to analyze and extract information from study participants’ addresses or locations. Furthermore, this restricts scientists’ ability to collaborate by combining datasets containing any PHI. We are critically missing standard ways to make this easy.
Our solution is a standalone, container-based application that can produce geocodes and conduct geomarker assessment. A container is a platform that wraps software into a complete filesystem containing everything it needs to run. For geocoding and geomarker assessment, this includes code, runtime, system tools, system libraries (shapefiles, databases, rasters, etc…) and data. Usable on PC, Mac, or Linux machines, researchers can use DeGAUSS containers to geocode and conduct geomarker assessment without PHI ever leaving their local machine. After geomarkers are attached to subjects’ health information, personal identifiers like address or location coordinates are removed, effectively making the dataset no longer PHI and not subject to HIPAA or privacy concerns. This facilitates sharing and collaboration among scientists without privacy concerns over PHI. In addition, the use of containers guarantees the software will always run the same, regardless of its environment, which is a vital requirement for reproducible research, especially within in a collaborative, multi-site study.