Skip to main content

Search...

Using open source securely

Anyone who installs open source components needs a bill of materials - because 80 to 95 percent of modern software is in them. What licenses, security vulnerabilities and the Cyber Resilience Act have to do with it.

10 min read
Cover for Using open source securely

Using open source software securely in products means meeting four requirements: creating a complete software bill of materials (S-BOM), checking the licenses actually included, complying with license obligations upon delivery and continuously monitoring known security vulnerabilities in installed components.

Key Takeaways

  • As a distributor, anyone who passes on software to third parties is bound by the license terms of the open source components used, regardless of whether the software was obtained free of charge from GitHub.
  • A software bill of materials (S-BOM) is already a purchase requirement in many supply chains and is becoming a regulatory obligation for software manufacturers as a result of the EU’s Cyber Resilience Act.
  • There is an iceberg behind every direct dependency in the build system: for every ten direct dependencies in a module, there are often hundreds more transitive dependencies.
  • The license metadata in package managers often does not match the actual license text, which means that the labeling of a package is not a reliable legal basis.
  • For discontinued software without an active maintenance contract, the Cyber Resilience Act still requires vendors to report and, if necessary, fix security vulnerabilities.

Open source does not equal free availability

Code on GitHub is not automatically open source. Only an open source license turns programmed code into open source software. If there is no license, it is private property that you may not use without the consent of the rights holder.

Dirk Riehle, Professor of Open Source Software at the University of Erlangen, compares this to an apple tree in the neighbor’s garden: Just because you can see the apple and reach for it doesn’t mean it’s yours. There are also proprietary licensed projects on GitHub. In any case, you need to read what the owner allows.

Nevertheless, the findings are clear: open source software is usually good software, and it’s almost everywhere. 80 to 95 percent of today’s software in a product is open source software. Distribution is therefore not a marginal issue, but the norm in development.

Pure use is harmless, distribution is not

If you only use open source software yourself, you generally have no obligations. This is the most important all-clear for anyone who downloads a tool and uses it in-house. The granting of rights with open source allows you to use and modify the software free of charge.

The obligations only arise as soon as you pass the software on to third parties. At that point, you become a distributor and the license conditions apply in full. A web server that delivers JavaScript code to the browser is already such a transfer.

This boundary between end user and distributor determines whether you have to read license texts carefully or not. For software manufacturers, the case is clear: they are almost always distributors.

Permissive licenses require attribution, copyleft requires more

Open source licenses fall into two groups: those with a copyleft clause and those without. This distinction determines how complex the use in your own product will be.

Permissive licenses such as the MIT license primarily require attribution. You must clearly declare whose code you are using, i.e. pass on the copyright notices and license texts. You do not have to disclose your own source code, even if you have modified the component.

Copyleft licenses from the GPL family require the opposite. If you use copyleft-licensed code and distribute your product, you must also provide the recipients with your own modifications under the same license. The incoming license must be the expiring license.

Riehle holds back on the moral question of whether disclosure should be mandatory. His argument is pragmatic: the license expresses what the original developer wants.

There are many developers who say, I don’t want to force my users to disclose their source code, so they take a permissive license. And there are many who say, I want to force those who use my source code to disclose their own source code, then they use a copyleft license.

Dirk Riehle

Community open source and commercial open source follow different logics

There are two fundamentally different motivations behind open source projects. Anyone who wants to understand why a project exists and how reliably it is maintained should be aware of this separation.

Community open source is a collaborative effort. Today, these are mostly paid employees working on behalf of their employers on components that everyone needs, but that are not competitive differentiators. No company benefits from doing this work alone, so the costs are shared between several shoulders.

Commercial open source pursues a business objective. Riehle distinguishes distributors such as SUSE, Univention and Red Hat from single-vendor companies that develop software themselves and make it available under an open source license.

The classic example is MySQL, which later became MariaDB. The manufacturer made the database available free of charge as open source and sold a commercial license at the same time. As the rights holder, a company can license as often as it wishes: The open source license is one, the commercial license is a second.

In the open core model, parts are free of charge, while individual functions are subject to a charge. In the case of MySQL, this concerned a clustering feature that only professional high-end users needed. The open source version is sufficient for a private website. Today, the boundary often shifts to operation: self-hosting is free, the hosted web service is the commercial service.

What a software bill of materials is and why customers demand it

The software bill of materials, or S-BOM, lists all the components that are built into a product. It is the central data structure for the secure use of open source.

In the build file, a developer only sees the first-level dependencies, i.e. the libraries against whose interfaces he is programming. Below this lies an iceberg. These direct dependencies have their own dependencies, resulting in a deep dependency graph.

The proportions are sobering: for a single custom module, there can be ten direct dependencies and a hundred more underneath. A good build system resolves this graph via package managers, otherwise no binary could be created at all. If you flatten the graph, you get the parts list.

Today, this parts list is a purchase requirement. Customers want to know what is in the software before they even buy it. In the USA, the delivery of an S-BOM by executive order was mandatory when selling to US ministries. In the EU, the Cyber Resilience Act is moving in a similar direction.

Security vulnerabilities are your problem, even if a manufacturer is behind it

If you operate software, you have to monitor what is built into it yourself. The parts list is only sufficient if you constantly check it against newly discovered security vulnerabilities.

Riehle illustrates this using the example of a bank. If it operates a product with a vulnerable open source component, such as Log4Shell, and someone exploits the gap, then it is their customer data that ends up on the black market. Pointing the finger at the manufacturer won’t help.

Fixing the problem goes back to the manufacturer. But in a worst-case scenario, you will have to shut down the software if it provides an open gateway. This is why every end user must monitor, not just produce.

How troubleshooting flows through the supply chain

Software supply chains work like physical ones: software is built into software, which is built back into software. In the event of a security problem, responsibility works its way back through this chain.

One special feature reverses the direction. Security vulnerabilities in open source components are ideally only made public once the bugfix already exists. The corrected component is therefore available at the beginning of the chain before the problem becomes known.

From there, the fix must flow forward. Every user has to update and get the component from upstream. This quickly becomes difficult because the corrected component may no longer be compatible with other dependencies. This can take several steps before the final commercial product is released.

This is exacerbated by discontinued software. Many companies operate software without a maintenance contract. The Cyber Resilience Act suggests that companies should remain responsible for this as well: at the very least, they should provide information and, if necessary, repair it, even if there are no longer any license agreements. Much of this has not yet been clarified in regulatory terms.

Tools help, but the work remains manual

Software Composition Analysis, or SCA for short, is the tool category that analyzes what has been built into a product. Such tools determine the parts list and then monitor it.

Riehle mentions its own SCA tool, available at scar-tool.com, as well as commercial providers such as Black Duck, Fossa and Fossa ID. Originally, these tools were used to check licenses, but increasingly they are also used for security purposes. The market is on the move because the regulatory requirements from the EU and BSI are growing.

Nevertheless, there is no simple solution. No software looks into the data center and automatically creates a complete parts list, especially not at binary level. You first need an inventory of the software you are running, often created manually, and then you have to determine the bill of materials for each piece of software.

The honest situation: Most companies do not yet have anyone responsible for open source.

Why metadata is the real problem

A component’s label doesn’t necessarily match its content. This is where the greatest need for cleanup lies, and AI does not solve this.

A package manager can label a component as MIT-licensed, even though there is GPL-licensed, i.e. copyleft-licensed code in the middle of it. This happens easily. The metadata on licenses is extremely poor.

Then there is the security problem below the component level. If someone has copied an algorithm with a known bug from a source such as Stack Overflow into a library, the error is in the middle of the code and is no longer recognized as such. It is almost impossible to determine where such known bugs are located until they become public for the specific component.

The industry is trying to fix these problems at source. So far, every manufacturer has redundantly tested the same open source components and analyzed their licenses, often hundreds or thousands of times in parallel. It would make more sense to do this once in the open source project itself.

Initiatives by the Linux Foundation and the Eclipse Foundation are working on bringing this inventory back into the projects. The path has been taken, but it will take many years.

Four requirements that open source places on every manufacturer

Anyone who uses open source professionally must have four things under control. Riehle summarizes them as the core of good open source governance.

RequirementsWhat to do
Create parts listOnly create the S-BOM with tools, as it has become a purchase requirement from customers
Check licensesLook into the packages to see which licenses are really included, beyond the label
Ensure governanceDo not install software whose license does not match your own business model
Fulfill license obligationsGenerate correct legal notices upon delivery, including for delivered JavaScript code

Riehle sees a concrete danger in the last point. A website that delivers JavaScript code is a distribution and needs legal notices for this. Practically nobody does that. He therefore expects a significant wave of warnings at some point.

Share this page

Related Posts