How does dependency confusion in Python look like?
Recently, Alex Birsan shocked a lot of blue teams around the world by bringing up the topic of dependency confusion. If you have not read the write-up yet with the clickbaity title “Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies”, do that now. Click to read.
For that, we first have to understand how dependencies are managed in Python. It all starts with a requirements.txt file, which can look as simple as this:
In the screenshot above, the only dependency we would download is the super useful requests library. In order to download all the dependencies listed in that file, you would run:
python -m pip install -r requirements.txt
Typically, you would specify a specific dependency version which has been properly tested in order to ensure that your software runs smoothly. This can be achieved by using two equal signs:
requests==2.25.1 (latest version at the time of writing)
However, Python pip allows additional ways to select the version:
requests>2,<3 (version has to be greater 2 but less than 3)
requests~=2.25.0 (version can be any minor version of 2.25 (e.g. 2.25.5); equal to ==2.25.*)
Additionally, we have to understand where Python pip is loading all the dependencies from. If not closer specified, pip tries to obtain the libraries from PyPi (the official Python package index).
But the requests library is a public lib, what can go wrong?
Dependency confusion comes into play if you need to depend on libraries which you (or your company) do not want to make public. Those are typically built and stored in an artifactory, such as JFrog artifactory. Python pip supports two ways to specify an alternate index:
python -m pip install -r requirements.txt --index-url <your internal artifactory domain> (download only possible from your internal artifactory)
python -m pip install -r requirements.txt --extra-index-url <your internal artifactory domain> (download possible from your internal artifactory or PyPi)
You might already see that the second variant is making things a little more interesting.
Okay, so how would I get exploited?
Disclaimer: All results listed below have been tested on my local system running PIP v21.0.1
We are building an artificial case here. Hacksplained Inc. (a company dealing with extremely delicate information) is running an internal Python application depending on a library called hacksplained.
For some unknown reason, the CI/CD pipeline was configured to use the potentially dangerous
python -m pip install -r requirements.txt --extra-index-url <hacksplained.example.artifactory> command to download all dependencies.
The infamous hacksplained library is not available and registered yet on Python’s package index (see screenshot).
The requirements.txt file contains the line:
hacksplained (You would get exploited if the attacker registers the hacksplained library with a higher version number than the one hosted on your local package index)
hacksplained~=1.10 (You would get exploited if the attacker registers the hacksplained library with a version number >=1.10, e.g. 1.999; You would not get exploited if the attacker registers 2.999)
hacksplained>2,<3 (Assumption: the latest version in your artifactory is hacksplained 2.27; You would get exploited if the attacker registers the hacksplained library with any version number that is within the limit; Even lower versions than the one in your artifactory are chosen over yours; You would not get exploited if the attacker registers e.g. 3.999)
hacksplained==2.27 (You would only get exploited if the attacker registers 2.27 on PyPi)
Generally speaking, an attacker would have to:
- Guess or get a hint of an internally used library name
- Register the library name on PyPi
- Choose a version number that would trick Python pip into using the rogue package (according to the limitations listed above)
- Hope that the requirement install process was set up in a way that hybrid package loading (from multiple indices) takes place
If all stars align for the attacker, he could then go ahead and extract all confidential information of Hacksplained Inc. :(
How can I defend myself?
- Use Python pip with the “–index-url” parameter if possible
- If hybrid indices (multiple indices) are needed, make sure to study the dependency confusion prevention mechanisms of your artifactory software