Implementing Python Black on a Legacy Codebase
Code formatting is a contentious but important subject. Readability of a codebase can affect the long term productivity of a team which is why organisations often agree standards amongst developers to ensure consistency. However, agreeing on and enforcing these standards can be a productivity drain when writing code, during code reviews, and when on-boarding new staff. We wanted to reduce the time spent correcting each other and discussing code formatting while continuing to achieve a consistent, quality code style.
Black is the uncompromising Python code formatter. By using it, you agree to cede control over minutiae of hand-formatting. In return, Black gives you speed, determinism, and freedom from pycodestyle nagging about formatting. You will save time and mental energy for more important matters.
This introductory paragraph from the black repo sums up perfectly why we wanted to implement such a tool; “speed”, “freedom from nagging”, and “saving time and mental energy”. If you write software, or work with people who write software, then this is hopefully starting to sound appealing already.
The workflow is very straightforward. In essence you write whatever code you want with any formatting you like, and when you commit, a pre-commit hook kicks in and re-formats your code automatically to a consistent style. The main benefits of this are the time saved having to think about how to format your code, and the time saved in pull requests from your colleagues suggesting their code preference.
Here’s how it works:
Easy. The hook has removed the additional blank line and unnecessary spacing around the assertion statement to format it neatly on to a single line.
We went with Black because it was the most effective tool with the least setup overhead. Our Black config file is 5 lines long:
And the accompanying pre-commit config file is 6 lines long:
That’s it! The best part about this was not having to agree any standards as a team, no discussing optimal line length, no single quotes vs. double quotes. Python Black has a set of rules and it implements them without question. “Uncompromising” is right. Once we agreed to use Black, it was a case of formatting the existing codebase, and then running with it.
Implementing Black on an existing codebase was the most time consuming part of this task. Our core backend Python API codebase is a 200,000 line monolith. Running the tool generated a 40,000 line diff! There are two ways to run Black, --fast
or --safe
. In safe
mode Black will check that the reformatted code produces a valid AST that is equivalent to the original, so you can be sure it hasn’t inadvertently changed the behaviour of your code. Black notes that this is however a temporary measure, so we run with it off. Safe mode was however very reassuring when implementing this on the existing codebase. One caveat here is that Black uses Python 3 to do this, so we came across areas where due to Python 2 breaking syntax (think print
), Black was unable to confirm the code was safely reformatted. This ended up being about 25% of our codebase, so the bulk reformatting was ran in two stages. First, running in safe mode over everything it could and then running fast
on the rest. We manually reviewed all the changes on the fast
code, and were confident that combined with our comprehensive test suite and eagle-eyed QA team we’d be ok.
Merging strategy
Our develop
branch is protected and requires that all feature branches are up-to-date with the latest commit before merging. When implementing far-reaching changes such as with Python Black, this can cause a lot of merge conflicts for developers working on the project simultaneously. For example, where Black has formatted a function call that was previously over two lines onto one line, and a developer has updated that same call to pass in a newly named variable. Thankfully, git had us covered using the recursive merge strategy with the ‘ours’ option. (Confusingly, this is not the same as the ‘ours’ merge strategy).
git merge develop -s recursive -X ours
This isn’t a post about git merge strategies, but the above command merges in the now blackened develop
branch to the developer’s local feature branch, taking all the blackened code, but favouring the developer’s local changes in the case of any conflict. This allows the developer to keep in all feature changes they have made, which they can subsequently re-format with pre-commit run — all-files
. The result is a reformatted, up-to-date, feature branch ready for merging.
Bug catching
We started to see the benefits of Black before we had even completed the integration.
The missing comma at the end of 'mentioned_ids'
in this list of excluded fields results in Python implicitly concatenating it with the string on the next line, causing the string 'mentioned_idsranking'
to be the final field name passed to the exclusion mechanism.
Black reformatted the above code to the following:
The behaviour here is intended to suggest to the developer that the two strings should be merged into one. However, in our case the implicit single string should have been two strings, and by formatting the code in this way, the bug was immediately noticeable during review and we were able to correct this (and the unit tests that failed to pick this up!).
Conclusion
As promised, there has been no quibbling about code formatting since adding Black to our codebase. Deployment and integration were seamless. While not every single developer agrees with every single decision made by Black, on the whole it has been a worthwhile addition to the codebase and has had a positive effect on our productivity and code quality.
We’re hiring!
You can check out our open roles here and follow our LinkedIn page. If you don’t see the perfect role for you, or just want to pop by our offices for a chat and a tour get in touch: work@depop.com