Session
FOSDEM Schedule 2021
Dependency Management

DepClean: Automatically revealing bloated software dependencies in Maven projects

D.dependency
César Soto Valero
The talk introduces DepClean, an open-source tool that we developed to automatically determine the presence of bloated dependencies in Maven artifacts. DepClean performs a deep static analysis of the dependency network and suggests direct and transitive dependencies to be removed or excluded. Given an application and its build file, DepClean collects the complete dependency tree (the list of dependencies declared in the pom.xml, as well as the transitive dependencies) and analyzes the bytecode of the artifact and all its dependencies to determine the presence of bloated dependencies. DepClean also generates a clean variant of the build file in which bloated dependencies are removed.
This talk focuses on one specific type of software dependency: bloated dependencies. They are libraries that are packaged with the application's compiled code but that are actually not necessary to build and execute the application. In other words, they are libraries declared as dependencies in a build file, which can be removed from the file and the build still successfully passes. As a consequence of bloated dependencies, the binary file includes more code than necessary. An artificially large binary is an issue when the application is sent over the network (e.g., web applications) or it is deployed on small devices (e.g., embedded systems). In addition, bloated dependencies embed vulnerable code that can be exploited while being actually useless for the application. Overall, bloated dependencies needlessly increase the difficulty of managing and evolving software applications. The talk introduces DepClean, an open-source tool that we developed to automatically determine the presence of bloated dependencies in Maven artifacts. DepClean performs a deep static analysis of the dependency network and suggests direct and transitive dependencies to be removed or excluded. Given an application and its build file, DepClean collects the complete dependency tree (the list of dependencies declared in the pom.xml, as well as the transitive dependencies) and analyzes the bytecode of the artifact and all its dependencies to determine the presence of bloated dependencies. DepClean also generates a clean variant of the build file in which bloated dependencies are removed. We present our analysis of 9639 Java artifacts hosted on Maven Central, including 723444 dependency relationships. Our key result is as follows: 2,7% of the dependencies directly declared are bloated, 15,4% of the inherited dependencies are bloated, and 57% of the transitive dependencies of the studied artifacts are bloated. Based on these results, we distilled and discussed two possible causes: the cascade of unwanted transitive dependencies induced by direct dependencies and the multi-module Maven projects' dependency heritage mechanism. The qualitative assessment of DepClean involved 30 notable open-source projects. For each project, we used DepClean to generate a pom.xml file without bloated dependencies and submitted the changes as a pull request to the project. Notably, our work yielded 21 merged pull requests by open-source developers, and 140 bloated dependencies were removed. In summary, our results indicate that developers pay attention to their dependencies when they are notified of the problem, which stresses the need to engineer, i.e., analyze, maintain, and test POM files.

Additional information

Type devroom

More sessions

2/7/21
Dependency Management
Paolo Boldi
D.dependency
The goal of the EU project FASTEN is being able to perform a more sophisticated analysis of security-vulnerability propagation, licensing compliance, and dependency risk profiles (among others) by relying on the call-level dependency network of the whole software ecosystem. We outline the purpose and structure of the project, and present some preliminary results.
2/7/21
Dependency Management
Tom Mens
D.dependency
When developing open source software end-user applications or reusable software packages, developers depend on software packages distributed through package managers such as npm, Packagist, Cargo, RubyGems. In addition to this, empirical evidence has shown that these package managers adhere to a large extent to semantic versioning principles. Packages that are still in major version zero are considered unstable according to semantic versioning, as some developers consider such packages as ...
2/7/21
Dependency Management
Rhys Arkins
D.dependency
Despite best intentions, Open Source releases with regression errors are published every day. In the best case scenario, a downstream user detects it early thanks to good tests, files an issue, and the maintainer can fix it before too many people have upgraded. Other scenarios involve various degrees of brokenness and games of "is it broken for everyone or just me?". Renovate Bot is an open source dependency automation tool but which also is run as a free app on github.com, where it is installed ...
2/7/21
Dependency Management
Brendan O'Leary
D.dependency
The Solarwinds breach at the end of 2020 is an event that we won't truly understand the breadth and depth of for some time - if ever. But already, several discussions we've been having in the abstract for years have become very concrete. Firstly, the systems we use to develop, code, build and deploy our code are all essential production systems - and should be treated as such. And second, securing the software supply chain is one of the most underrated aspects of security and is often ...
2/7/21
Dependency Management
Todd Gamblin
D.dependency
Every software ecosystem seems to have a package manager these days, but reusing software across these ecosystems is still a challenge. Major Linux distributions package software from a wide range of languages, but they restrict the versions you can install, and they make deep assumptions about compilers and runtime libraries to keep everything compatible. If you need a newer libc or a newer Python than the OS offers, you're often on your own. Python packaging supports native libraries, but it ...