How we Boosted Performance by 20% with a Python 3.11 Upgrade

Rob Parker
Engineering at Depop
7 min readJun 14, 2023

--

Photo by David Clode on Unsplash

Recently we upgraded our backend Django monolith to Python 3.11 and the performance improvements were beyond our expectations. It gave us an average reduction in p95 latency of 20%, which isn’t just a statistical victory — the knock-on impact translated to notable CPU usage reductions, yielding significant cost savings. This post explores how we did it and some of the challenges we encountered along the way.

P95 latency dropped the second we deployed to production

Django at Depop

We’re pretty big on Scala microservices at Depop, they make up the majority of our backend services. However, our original Django monolith (based on Django Rest Framework) is still responsible for many of our core backend functions and has been running for well over a decade.

This service:

  • Handles ~40% of our total traffic.
  • Is the source of truth for many of our core business functions such as product listings and user profiles.
  • Launched with Django 1.3 over 12 years ago.
  • Over 20,000 commits since.

We knew that we needed to tackle the technical debt of upgrading both Django and Python at some point, but some of the fabled performance improvements that came with new Python versions convinced us to pick up the work sooner.

Why?

It’s fairly common knowledge that Python version 3.11 brought with it increased speeds, brought about by changes to the startup and runtime which you can read/listen to more about, but the TLDR is;

Depending on your workload, the overall speedup could be 10–60%.

This is mainly a result of changes to CPython:

  • Faster Startup:
    - Frozen imports and static code objects: This change reduces the steps involved in loading modules, resulting in a faster startup time for the Python interpreter. This improvement benefits programs that have a short execution time.
  • Faster runtime:
    - Cheaper, lazy Python frames: Python frames hold execution information and are created when Python functions are called. In Python 3.11, the creation process for frames is streamlined, and memory allocation is avoided by reusing space on the C stack. Additionally, unnecessary information is removed from frame objects, resulting in faster function calls for most Python code.
    - Inlined Python function calls: Python functions can call other Python functions, and in previous versions, this involved calling an interpreting C function. However, in Python 3.11, a new optimisation allows Python to directly jump to the new code inside a new frame when one Python function calls another. This optimisation reduces the overhead and stack space usage, making function calls faster.
    - Specializing Adaptive Interpreter: Python’s runtime has been improved to recognise patterns and stable types in the code being executed. When these patterns are identified, Python replaces the current operations with more specialised ones that are optimised for specific use cases and types. This specialisation, combined with inline caching, significantly improves the performance of various operations such as binary operations, subscripting (indexing), attribute loading, method calls, and more.
  • Miscellaneous Optimisations:
    - Python objects now require less memory because their namespaces are created lazily. Additionally, the sharing of keys between namespace dictionaries is increased, reducing memory usage.
    - “Zero-cost” exceptions are implemented, which means that try statements incur no performance penalty when no exceptions are raised.
    - The representation of exceptions in the interpreter has been made more concise, leading to faster exception handling.
    - The regular expression matching engine (used in string pattern matching) has been optimised using a technique called ‘computed gotos’, resulting in up to 10% faster regular expression execution.

We took these numbers with a healthy amount of scepticism when discussing the task. The results of other benchmarks we’d seen looked good, but surely they wouldn’t be as impressive when we applied them to our decade-old monolith. In the end we figured it to be a win-win, given that the work would need to be carried out eventually anyway and if it yielded no meaningful improvements then at least we were up-to-date.

There were a few hurdles along the way

We tackled the work in two steps:

  • Upgrade Django from 3.2 to version 4.1
  • Upgrade Python from 3.9 to 3.11

The problem with this sort of task is that it’s quite difficult to estimate the effort. Although we anticipated coming across packages that were no longer being maintained for example, without actually starting the process of upgrading it was very hard for us to know which of our 100+ requirements that would cause us the most trouble.

‘This package hasn’t been updated in two years, who maintains it?”

After our initial attempt at upgrading Django we found ourselves with hundreds of distinct integration test failures across all of our endpoints. We soon discovered that this was down to our very own Automock library which itself was relying on packages that hadn’t been maintained in years. We were faced with the first of a number of “do we fix it, or remove it” conversations.

In the spirit of leaving the campground cleaner than we found it, we decided to demolish the campground completely, removing this and a number of other test-related packages in favour of a more standard approach which would not rely on proprietary (and easy to become legacy) packages again, making the whole suite easier to read and maintain.

Some additional tools we found useful during the upgrade process were:

The results

Quantitive

The impact of upgrading to 3.11 was pretty remarkable and felt the instant we deployed it to production. Here you can see an immediate improvement to p95 latency of about 20%:

Spot the deployment

Not only did we see substantial drops in latency, we also saw a reduction in CPU usage..

Overall CPU computation saved vs. previous week. The red bar is the deployment event.

..which has an impact on the number of containers:

As a result the need for containers went down as horizontal scaling became less aggressive.

These gains are simply the impact of upgrading to Python 3.11, there has been no additional work carried out to improve performance off the back of the upgrade and the move to Django 4 in isolation didn’t noticeably change anything in terms of speed or cpu. Therefore it’s safe to assume that anybody running busy services on Flask or FastAPI will also benefit at least in some small way from upgrading Python to the latest version.

It was difficult for us to observe performance improvements at the granularity needed to identify which of the multiple Django middleware across all of our many endpoints were most impacted, however we have seen broadly similar results to those which have been documented here.

It’s safe to assume that anybody running busy services on Flask or FastAPI will also benefit at least in some small way from upgrading Python to the latest version.

Qualitative

In addition to the improvements to performance, there were some other inadvertent wins:

  • A major standardisation of our codebase, including the removal of both flexisettings and Automock packages, reducing a lot of the ‘magic’ happening in our test suite.
  • We gave some of our public repos some love by upgrading both Popget and Celery-Message-Consumer to work with Python 3.11.
  • A ton of refactoring, reducing the overall size of our codebase by just over 2000 lines of code
“One or two lines in a requirements.txt file”

Learnings and next steps

We haven’t even begun to leverage the decrease in latency, adjust container auto-scaling thresholds or anything like that. There are almost certainly further savings to be made as a result of the reduction in CPU usage.

If we were to repeat the process in an ideal scenario given more time, it would have been interesting to track in more detail the impact the upgrade was having at a more granular level within Django, perhaps this would have been more pertinent had we encountered only a slight, or negative, impact on our overall latency.

We also discussed whether we could gain anything by moving over to Poetry as a dependency manager. During the upgrade process we didn’t come across any problem that Poetry itself could have made easier, but that’s not to say it wouldn’t help us in any future upgrades and we should certainly consider the ease of upgrading again in future.

There are some upcoming proposals which will make significant improvements to the performance of Python over the next few releases, such as making the Global Interpreter Lock optional (now targeting Python version 3.13), plus the CPython team’s plan which Guido van Rossum came out of retirement to work on alongside Python core developers Mark Shannon and Eric Snow after being “bored at home” during the pandemic.

We encountered some hurdles during the upgrade process, but it forced us to take another look at many of our requirements and evaluate whether or not the views that require them (and in some cases the Django ‘applications’ themselves) were even still important to us. As a result, we’ve left the service faster, leaner and given it a spring clean while we were at it.

Interested in working for the home of sustainable fashion? We’re hiring!

--

--