Committing for a team

February 10, 2025

In a world where machines write increasing amounts of code, and also help review it; in codebases where patterns emerge and are reused, leading to little need to reinvent the wheel; the primary consideration of a reviewer should be to find what decisions had to be made.

One first decision that can't be escaped is the existence of the change. Why is this change needed?

Another decision is the approach to make the change happen. Why this approach? What other options were discarded? In ideal conditions, an obvious established pattern is followed. In other cases, there is more nuance, or even a detailed explanation may be needed.

These are questions engineers should be asking as they implement changes. Engineers should be able to explain why. An engineer is not someone who can put some some code together—machines can do that increasingly well. An engineer is someone who can ask the right questions, and make decisions using information that machines won't have (product vision, technical direction, feature roadmap, practical limitations, timing or human resource issues...)—and the code is then written to serve that exact vision.

Code contributions (PRs) should therefore include these questions with their answers. Only with that can a reviewer evaluate (joke may be intended) the adequacy of the change, and, ultimately of the code itself. Without this information, reviewers are left to guess, ask, or, in the worst case, assume. And they will have to review the code out of context. The review will likely be flawed.

In contrast, a process that requires code contributions to consider these questions alongside the relevant code, helps make the team and their code review process scalable, and to keep the overall quality of the codebase and the product. This process also fosters the growth of engineers, as they are forced to consider these key aspects of the engineering process (as both authors and reviewers).

Contributing in commits

While resolving those questions in PR descriptions is helpful, it is often insufficient, because a reviewer will have to map those explanations to a considerable amount of changed files and lines of code (the whole PR's diff). This makes it hard to scale the review process.

The unit of change where these questions are most useful is the commit, in that the explanations in the commit message map directly to the commit's code. It is this purpose of improving the review process, together with maintainability and debuggability, that should determine the granularity of a commit.

Commits that exclusively represent the journey of the engineer in answering key questions or to arrive to the final outcome are not helpful. Actually, they can be harmful, as they run counter to the principles of reviewability (more diffs to evaluate), maintainability (more conflicts to resolve) and debuggability (more difficult bisects), and should be avoided. These "working" or interim commits may be useful during development, but should be reorganized as per the guidance above, prior to the change contribution being shared with the team. As Linus put it: "Don't expose your crap".

It's OK to be pragmatic, though. Reviewability, maintainability and debuggability are guiding principles, that sometimes can be in conflict, or may have to be bent for practical reasons.

For example, a mechanical (automated) change might deserve its own commit, so that a reviewer can effectively evaluate its large amount of changes. In this case reviewability trumps debuggability.

Or a commit might include a couple closely-related changes when the overall scope is kept small and this ensures that the commit will build. In this case, reviewability is bent in favor of debuggability.

Or perhaps, occasionally, the effort to rebase and reorder commits is hard justify. Just make sure this is not because you don't know the tools. Keep reading for some pointers, and always feel free to ask a team member for assistance, if LLMs and documentation are not helpful.

Either way, there is room for the engineer to ultimately decide what is the best way to organize code. Like other relevant decisions to the reviewer, they should be justified in the commit messages.

Regarding the order of the commits, it is better to optimize for minimal diffs. For example, while working on a feature, it becomes clear that a certain refactor to existing code would be beneficial. It is preferable for the refactor to occur first. With the refactor, it is easier for a reviewer to understand the main change independently, because it is less complex by virtue of the refactor. However, I'll concede that sometimes the reward of reordering commits is questionable. As before, be pragmatic, be practical. But also don't just ignore the advice!

There is an expectation that commits should build. This makes bisect operations a lot easier to perform successfully. If a commit is not expected to build, its message should be prefixed somehow, so that future engineers may have correct expectations.

It is important to remember that the long term benefits of a good git history hygiene often outweigh the relatively small effort of organizing commits. And it only gets easier with the knowledge of the tools, and experience.

As a result of this process the git history is better organized. This is good for humans, but also for machines. I expect LLMs to be able to use this information in the future as well, similarly to how projects with good documentation can now leverage it with AI tools.

Rebase

With such practice established, the squash and merge strategy in a PR is to be avoided, because it destroys the post-review value of those commits (maintainability, debuggability).

In fact, merge commits in general should be avoided. Merge commits are weird, in that they have two parent commits.

  A---B---C topic
	 /         \
    D---E---F---G---H main

History becomes non-linear, and, as a result, it is hard to diff between an arbitrary pair of commits. Additionally, a merge commit will force you to resolve conflicts without the adequate context of the responsible commit.

In contrast, a rebase operation results in linear history:

                  A'--B'--C' topic
                 /
    D---E---F---G main

You can diff from F to B with no surprises.

So, do not merge or pull. But what should we do instead? We should rebase.

When you rebase, git will try to apply the commits in your current branch on top some other commits. For example, if you rebase my_feature onto main, git will try to apply the commits in my_feature on top of main's HEAD.

If any conflicts are found, you will have to resolve them. This will be easier than with a merge operation, because you'll have the context of the exact commit at fault. When conflicts are resolved, git add the changed files, and git rebase --continue, so that git may continue applying other commits.

So, if you want to update your feature branch with the latest from main, you should: git fetch origin && git rebase main.

To prepare your feature branch to be shared with others for review, git rebase --interactive (git rebase -i) is your friend. It is a really powerful tool that allows you to perform operations (edit, drop, rename, reorder, add, fixup, squash, and more) on a series of commits.

First, you'll pick the earliest commit you want to affect (e.g. SOME_COMMIT) and start the rebase (git rebase -i SOME_COMMIT^). Then you'll tell git what operations to perform on each commit, and then git will go ahead and perform the operations in order.

If git finds conflicts while performing the operations, you will have to resolve them. This will be easier than with a merge operation, because you'll have the context of the exact commit at fault. When conflicts are resolved, git add the changed files, and git rebase --continue, so that git may continue applying other commits.

At any point, you may abort the rebase (git rebase --abort), which will restore your branch to the state right before you run git rebase -i. You may also git rebase --edit-todo if you change your mind in regards to the upcoming rebase operations.

If you want to just add some changes to your latest commit, you may instead git add your changes, then git commit --amend.

Remotes & GitHub

To push your changes after reorganizing your branch, use git push --force-with-lease. This will safely override the remote branch with your rewritten history.

Unfortunately, GitHub doesn't have a good view for PR revisions. It has an activity that shows pushes (and distinguishes force pushes), filterable by branch. But there is no direct link to that view from the PR, and the user has to copy a commit hash manually in order to compare revisions. And the comparison will include commits from main that may have been included after a rebase.

GitLab supports this workflow better, with a Revisions page for each MR.

Conclusion

I am sure I will keep thinking about this and encountering cases worth bringing up. For now, I can say I have seen this method work remarkably well and I can strongly recommend. The apparent friction of this process disappears with familiarity. Consistency in commit and review expectations, like with many other aspects of engineering, also has subtle benefits that I may try to explain in the future.