* How I review patches
@ 2017-02-04 7:27 Junio C Hamano
0 siblings, 0 replies; 1+ messages in thread
From: Junio C Hamano @ 2017-02-04 7:27 UTC (permalink / raw)
Here is a write-up on how I review patches posted on the list, in
the hope that knowing what to expect may help contributors [*0*].
But before going into exactly how I do my reviews, here is a short
list of the goals of doing reviews. I review a patch (or a set of
patches) to ensure that including it to our codebase will NOT:
* hurt users of existing versions of Git. [*1*]
* hurt users of the version of Git the change appears in. [*2*]
* make life harder for developers [*3*] of Git in the future. [*4*]
The remaining paragraphs of this message are not to be taken as me
telling other reviewers how they must conduct their reviews. I will
queue, and I have queued, patches reviewed by other reviewers
without doing detailed reviews myself, as long as I trust that the
reviewers share the same goals in their reviews, and as long as I
trust their competence and taste. How they exactly review may be
different from how I would have reviewed the patches, and that is
1. Description of the problem being solved.
When I see a patch (or a set of patches), I first read the proposed
log message, documentation update and new in-code comments.
These are places where the contributor can (and is expected to)
explain the motivation behind the change. I read them to make sure
that they clearly state what problem is being solved, why a
particular solution was chosen, what exactly that solution is, and
what other solutions were considered but discarded.
Just like "X is broken" in a bug report is not clear enough, "Fix X"
is often not enough in the proposed log message, as it is not clear
which part of what X does is wrong in the contributor's mind, and
why the contributor thinks it is broken. Saying "X currently does Y
but it should do Z instead to help such and such use case." would
help reviewers (and future developers who will read it) understand
the motivation behind the change.
A new feature or enhancement that is worth adding by default needs
to be explained how that new thing works and why it is there to the
end users, so a lack of documentation update is noticed at this
stage as well.
When I find that the explanation is lacking after reading the cover
letter, proposed log message, and documentation updates, I often ask
the contributor to elaborate, before going into the actual diff. I
often suggest "perhaps you meant this?", and I end up reading the
actual diff to base my best guess on in order to do so. This is
also where I notice a tricky code whose "why" is under-explained in
in-code comments [*5*].
To such review comments, I do not want the contributor to just say
"yes, you now understand what I wanted to do with this change". I
want to see the log message, in-code comments and documentation
updated so that other reviewers and future developers will not have
to ask the same question as I asked again.
2. Design of the solution.
After clarifying the original motivation of the contributor, it
sometimes becomes apparent that the patch aims too low and attempts
to solve too narrow an issue. I would point out that, within the
context of the patches, they can and should solve a wider range of
problems of the same class [*6*]. Or the patch may hurt users in
use cases that the contributor did not consider, and the solution
may need to be designed to cover these cases [*7*]. This design
review may cause us to iterate until we have a good description of
the problem and design of the solution.
3. "Code" review.
Once we made sure that the motivation is made clear, the scope of
the change is refined, and the design of the solution described
clearly, I dive into the code changes. What I look for primarily
during this phase is to see what the code does matches the desired
behaviour we established before this phase for correctness. This is
what many people think of as "code review", and it ranges from
spotting style issues and typoes, finding and fixing stupid
off-by-one errors, to noticing future maintainance issues caused by
using a wrong abstration or a misdesigned API.
During the review of the actual code change, I may discover that
some common corner cases are not handled properly, which I would
point out. Or the contributor may have thought about tricky corner
cases and handled them correctly in the patches, but did not explain
the "what" and "why" in the log message. Recording what cases were
considered and decided based on what reasoning in the log message is
important to help future developers and sets the course of evolution
of the codebase, and this may result in updates in the "explanation
of the changes" reviewed early in the cycle.
It is not like I never look at the code until log message and
explanation is perfect; to reduce the number of back-and-forth, I do
comment on the code even before it becomes clear if the design is
sound and clearly described. But at the conceptual level, because
the motivation guides the design and the design guides the
implementation, I tend to review the patches in this order.
Hope the above helps current and future contributors when they are
preparing and reviewing their patch series before submitting.
*0* If disclosing this to contributors turns out to be a good idea,
we may want to add a polished version of this to somewhere in
Documentation/ next to SubmittingPatches.
*1* For example, we need to be very careful when changing the
on-disk or on-wire data. We as developers may always run the
bleeding-edge version of Git, but how well do users with older
version of Git interact with our new shiny toys? Backward
compatibility issues can hurt users of existing versions Git
that do not have the change the patch introduces.
*2* That's called a "bug" in general, but can take different forms
(i.e. implementation bug, documentation bug, design bug, etc.).
Maybe the documentation promises to (or it can be misread to
promise to) do A when the actual code does B instead. Maybe the
code does A most of the time but silently does something else in
a corner case, without documenting it. Maybe the problem the
patches wanted to solve have two different ways to solve,
and the patches chose one way to solve and implemented it
correctly as it documented the feature, but the approach taken
may be a way that forces users to use Git in an awkward way or
encourages them to employ a bad workflow, when there is another
way that helps users better.
*3* Future users of Git cannot be hurt more than the patches
themselves hurts them immediately. Patches may make a design
mistake that makes it hard or impossible to extend a part of the
system further, and users can be robbed their opportunity to use
even better Git in the future---but the "even better Git" does
not exist yet when the patches are accepted anyway, and their
user experience at least does not get worse, at least. The
developers will find a way to work it around and transition
existing users to allow such an extension into existence and
their work may be made harder by such a design mistake, though.
*4* A set of patches may add a helper function that is useful for any
NUL-terminated string, but may make the helper take a pointer to
a strbuf because the callers to it in the patches happen to all
have the string in a strbuf. That forces future developers who
want to call the helper to either wrap their string in a strbuf
or to fix the misdesigned API to the helper function to take a
simple "char *".
A set of patches may introduce a new helper function to split
fields in a string whose format is well known throughout the
system, for which there already exists a helper function to do
the splitting. This doubles the amount of work required when
the format of the string needs to be extended in the future.
A set of patches may conflate two semantically distinct sets of
things and try to parse elements of both set with a single
helper function, only because the elements in these two sets
happen to be the same in the version immediately after applying
the patches. This forces future developers who need to add
more elements to one set without affecting others to split the
helper into two.
All of the above make it harder for developers that need to
enhance the system in the future, but they will NOT hurt users
of existing version or the version immediately after the patches
land. Such a patch series may happen to work correctly
at that version, and no amount of tests or field trials will
reveal these maintainance issues.
*5* In-code comments that do not explain why the code does things in
a particular way but just translates what it does from C to
English add to maintenance burden of having to keep it in sync
with the code, without adding much value to the readers of the
code. They instead should explain why the code does what it
*6* For example, a contributor may originally wants to solve
something only for tags, but it is not uncommon that the issue
that exists for tags are shared by other kinds of refs, and it
may be better to solve it uniformly across tags, branches, etc.
*7* "cover"ing other uses cases does not necessarily have to be done
perfectly. In less likely situations, it may be OK to punt and
say "my code cannot handle it" and die() with a message, and
that is much much better than not considering these situations
and doing wrong things.
^ permalink raw reply [flat|nested] 1+ messages in thread
only message in thread, back to index
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-04 7:27 How I review patches Junio C Hamano
email@example.com list mirror (unofficial, one of many)
Archives are clonable:
git clone --mirror https://public-inbox.org/git
git clone --mirror http://ou63pmih66umazou.onion/git
git clone --mirror http://czquwvybam4bgbro.onion/git
git clone --mirror http://hjrcffqmbrq6wope.onion/git
Newsgroups are available over NNTP:
note: .onion URLs require Tor: https://www.torproject.org/
or Tor2web: https://www.tor2web.org/
AGPL code for this site: git clone https://public-inbox.org/ public-inbox