On Mon, Jan 11, 2021 at 11:47:08AM -0800, Junio C Hamano wrote: > Patrick Steinhardt writes: > > >> It would be much more preferrable to see this series go like so: > >> > >> [1/4] create append_fetch_head() that writes out to > >> fetch_head->fp > >> > >> [1.5/4] convert append_fetch_head() to ALWAYS accumulate in > >> fetch_head->buf, and ALWAYS flushes what is accumulated > >> at the end. > > > > This is a change I'm hesitant to make. The thing is that FETCH_HEAD is > > quite weird as the current design allows different `git fetch --append` > > processes to write to FETCH_HEAD at the same time. > > Hmph, do you mean > > git fetch --append remote-a & git fetch --append remote-b > > or something else [*1*]? That's what I mean. > With the current write-out codepath, there is no guarantee that ... > > > > + fprintf(fetch_head->fp, "%s\t%s\t%s", > > > + old_oid, merge_status_marker, note); > > > + for (i = 0; i < url_len; ++i) > > > + if ('\n' == url[i]) > > > + fputs("\\n", fetch_head->fp); > > > + else > > > + fputc(url[i], fetch_head->fp); > > > + fputc('\n', fetch_head->fp); > > ... these stdio operations for a single record would result in a > single atomic write() that would not compete with writes by other > processes. So I wouldn't call "the current design allows ... at the > same time." It has been broken for years and nobody complained. Fair enough. I somehow blindly assumed that `git fetch --all`, which does invoke `git fetch --append`, did perform the fetch in parallel. If that's not the case: all the better. > > If we change to > > always accumulate first and flush once we're done, then we essentially > > have a change in behaviour as FETCH_HEADs aren't written in an > > interleaving fashion anymore, but only at the end. > > I view it a good thing, actually, for another reason [*2*], but that > would take a follow-up fix that is much more involved, so let's not > go there (yet). At least buffering a single line's worth of output > in a buffer and writing it out in a single write() would get us much > closer to solving the "mixed up output" from multiple processes > problem the current code seems to already have. The buffering of a single line is exactly what we're doing now in the non-atomic case. Previously there had been multiple writes, now we only call `strbuf_write()` once on the buffer for each reference. > > Also, if there is any > > unexpected error in between updating references which causes us to die, > > then we wouldn't have written the already updated references to > > FETCH_HEAD now. > > That one may be a valid concern. > > Thanks. Just to be explicit: are you fine with changes in this patch as-is? They don't solve concurrent writes to FETCH_HEAD, but they should make it easier to solve that if we ever need to. Patrick > > [Footnote] > > *1* "git fetch --all/--multiple" was strictly serial operation to > avoid such a mixed-up output problem, but perhaps we weren't > careful enough when we introduced parallel fetch and broke it? > > *2* When fetching from a single remote, the code makes sure that a > record that is not marked as 'not-for-merge' is listed first by > reordering the records, but it does not work well when fetching > from multiple remotes. Queuing all in the buffer before showing > them would give us an opportunity to sort, but as I said, it > takes coordination among the processes---instead of each process > writing to FETCH_HEAD on their own, somebody has to collect the > records from them, reorder as needed and write them all out. > > cf. https://lore.kernel.org/git/X8oL190Vl03B0cQ%2F@coredump.intra.peff.net/ > >