DYKONDO: Debloating Container Images for Reduced Attack Surface and Edge Deployments

pdf

Containerization is an increasingly popular method for enhancing portability and reliability in commercial and governmental deployments. Containerized environments are often built using baseline images (e.g., common Linux distributions) with required dependencies installed for the target software packages (e.g., Python tooling, database infrastructure, web frontends). Depending on the capabilities required, the resulting container images often contain bloat – software, libraries, and other files that do not support the stated mission for a given deployment. Bloat can result from the need to provide widely applicable baseline containers for varied domains coupled with the high effort associated with manually optimizing a container for a given deployment. This bloat represents unnecessary attack surfaces (e.g., for living-off-the-land attacks), spurious static analysis vulnerability results to triage, and overhead when transmitting, maintaining, and using these images. 

To solve this problem, we introduce DYKONDO (DYnamic KONtainer Debloater/Optimizer) and describe our experiences developing and using this tool to transform deployment-specific commercial and governmental containers. DYKONDO takes as input a container and a deployment-specific runtime scenario (e.g., a test suite) and uses dynamic analysis to track exactly the files used during intended deployment executions, removing all other files to reduce bloat. The tooling is designed to be run in continuous integration (CI) environments as a post-build step, rewriting a standard image built by Docker, Podman, or Buildah and automatically filtering files not accessed in the trace data. 

We have evaluated DYKONDO on a diverse set of popular open-source images. For instance, DYKONDO reduces the size of Grafana OnCall’s Python-based backend image by 87% when using its supplied test suite. Examining the official images for the popular database PostgreSQL, DYKONDO reduces their sizes by up to 44% using the project’s regression test suite; even the pre-optimized Alpine-based version of the image can be reduced in size by 16%. DYKONDO runtime is typically dominated by tracing via the supplied test suite – we observed the debloating logistics executing in about a minute, in practice. 

We found that DYKONDO correctly removes only spurious files: most are utilities included in baseline images for general applicability, but not used for individual intended deployments. For example, the set of debloated files often contains package managers’ files that are not strictly required in resource constrained environments, such as caches, package indexes, and configuration data of considerable size that can be tedious to clean up manually (and thus are often left behind in practice). 

Application code comprises the largest remaining portion of our images. As future work, we plan to integrate intra-file debloating strategies for this application code. 

DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. Approved, DCN# 0543-1381-24. 

This material is based upon work supported by the Office of Naval Research under Contract No. N00014-21-C-1032. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Office of Naval Research.


Jonathan Dorn is Director of Engineering at GrammaTech. His research includes software evolution, modeling non-functional properties of software, binary analysis and rewriting, and human factors in software engineering. He received a PhD from the University of Virginia. 

Dr. Zachary P Fry is a Senior Scientist at GrammaTech, having received his Ph.D. in Computer Science from the University of Virginia in 2014. As part of the autonomic team at GrammaTech, he has served as PI on several DOD contracts, including most recently "CASPAR: Cyber Attack Sensing for Psychologically-informed Adaptive Response" and "ARTCAT: Autonomic Response to [Disruptive] Cyber-Attacks". 

Adam Seitz is a Software Engineer at GrammaTech. Recently, he has contributed to GrammaTech's binary rewriting toolchain, including open-source projects, GTIRB and DDisasm. He received a B.S. in Computer Engineering from Rose-Hulman Institute of Technology in 2019

License: CC-3.0
Submitted by Amy Karns on