The following guide is a step-by-step instruction manual for
developing a DeGAUSS container using
tools available in the dht
R package.
Creating all files needed
Within an empty directory, use
dht::use_degauss_container()
to create all of the files
needed for a DeGAUSS container. Note that the name of this initially
empty directory will be used as the name of the geomarker in the
documentation, code, and image repository; it is a good idea to only use
lower case letters and underscores (e.g.,
census_block_group
) to comply with container naming
conventions.
Most of the added files will usually not need to be edited:
Makefile |
see Make
targets
|
test/my_address_file_geocoded.csv |
test input data file |
LICENSE.md |
GPL license |
.github/workflows/build-deploy-pr.yaml |
GitHub Actions continuous integration for pull requests |
.github/workflows/build-deploy-release.yaml |
GitHub Actions continuous integration for releases |
.dockerignore |
edit to include files other than entrypoint.R in the image |
However, some files must be edited:
Dockerfile |
degauss_description environment
variable |
entrypoint.R |
contains R code to be run in the container |
README.md |
software documentation and example instructions |
Editing files to add R code and documentation
Documentation
Fill out the “Using”, “Geomarker Methods”, and “Geomarker Data”
sections in README.md
, as applicable. Make sure that
the version in the example call matches the version of the latest
released container.
Within the Dockerfile
, ENV
instructions are
used to define environment variables that
capture metadata about the DeGAUSS container, including the name
(degauss_name
), version (degauss_version
), and
a short description (degauss_description
). These
environment variables will be available to R code running from inside
the container and can be used for dht::greeting()
as well
as other {dht} functions that read and write geomarker data. Outside of
a DeGAUSS container, these are also used by
get_degauss_env_dockerfile()
,
get_degauss_env_online()
, and
get_degauss_core_lib_env()
.
When creating a new DeGAUSS container, all but
degauss_description
are automatically defined and this
value needs to be edited from
ENV degauss_description="insert short description here that finishes the sentence 'This container returns ...'"
to something specific and short (ideally less than 50 characters) that finishes the sentence “This container returns …”, like
ENV degauss_description="proximity and length of major roads"
When releasing a new version of the DeGAUSS container in the future,
the environment variable degauss_version
can be edited in
the Dockerfile
and R code in the container can use it for
greetings, writing output files, and other operations that depend on the
current version (or name or description). This prevents the need for
manually changing the version number in several different locations for
each new release.
Using R code and data files inside the container
Edit entrypoint.R
by replacing the example R code with R
code that completes the specific task to be performed by the
container.
When ready to build the container, run renv::init()
to
initiate the renv
framework and create
renv.lock
. Subsequent builds can update
renv.lock
by using renv::snapshot()
.
Adding required system dependencies
R packages, especially spatial packages, often depend on external
system dependencies. These will need to be installed using
RUN
instructions in the Dockerfile. For example, the {sf}
package for R requires gdal
and other programs, each of
which require different install processes depending on the operating
system. Since these R packages will always be running inside of a Docker
container running Ubuntu 20.04, we can use remotes::system_requirements()
to get the specific required install instructions for {sf}:
remotes::system_requirements("ubuntu-20.04", package = "sf")
# "apt-get install -y libudunits2-dev" "apt-get install -y libssl-dev"
# "apt-get install -y libgdal-dev" "apt-get install -y gdal-bin"
# "apt-get install -y libgeos-dev" "apt-get install -y libproj-dev"
These system requirements could be translated to RUN
instructions for the Dockerfile to make sure they are available before
the R packages that require them are installed:
RUN apt-get update \
&& apt-get install -yqq --no-install-recommends \
\
libudunits2-dev \
libssl-dev \
libgdal-dev \
gdal-bin \
libgeos-dev \
libproj-dev && apt-get clean
Including files inside the container
By default, the container will copy in entrypoint.R
and
renv.lock
for use at runtime and ignore anything else in
the working directory to automatically speed up build times and keep
containers smaller in size. If the container requires any other files
(e.g., .rds
datafiles), edit Dockerfile
and
.dockerignore
so that the files are copied to the container
and not ignored by Docker. For example, if we want to use
geomarker_data.rds
, we would make the following changes in
Dockerfile
:
COPY entrypoint.R .
COPY geomarker_data.rds # copy .rds file from host to container when building
and in .dockerignore
:
# ignore everything
**
# except what we need
!/renv.lock
!/entrypoint.R
!/geomarker_data.rds # make sure the .rds file is not ignored
Tests
A test
directory is added with an example geocoded
address file (test/my_address_file_geocoded.csv
) and is
useful for interactive development and automated testing (see
below).
Using make
for interactive development
The Makefile
defines several useful make
targets that can be useful when locally developing and testing:
-
make build
will build the current DeGAUSS image and name it -
make test
will run the container on the included example geocoded CSV file -
make shell
will run a DeGAUSS command, but start an interactive shell inside the container for debugging -
make clean
is equivalent todocker system prune -f
, which cleans up any stopped containers or dangling image layers