git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, Jonathan Nieder <jrnieder@gmail.com>,
	Sergey Organov <sorganov@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 5/7] tmp-objdir: new API for creating and removing primary object dirs
Date: Thu, 30 Sep 2021 03:33:48 -0400	[thread overview]
Message-ID: <YVVoXJo3DlPQd1A3@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqsfxof2hr.fsf@gitster.g>

On Tue, Sep 28, 2021 at 09:08:00PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >   Side note: The pretend_object_file() approach is actually even better,
> >   because we know the object is fake. So it does not confuse
> >   write_object_file()'s "do we already have this object" freshening
> >   check.
> >
> >   I suspect it could even be made faster than the tmp_objdir approach.
> >   From our perspective, these objects really are tempfiles. So we could
> >   write them as such, not worrying about things like fsyncing them,
> >   naming them into place, etc. We could just write them out, then mmap
> >   the results, and put the pointers into cached_objects (currently it
> >   insists on malloc-ing a copy of the input buffer, but that seems like
> >   an easy extension to add).
> >
> >   In fact, I think you could get away with just _one_ tempfile per
> >   merge. Open up one tempfile. Write out all of the objects you want to
> >   "store" into it in sequence, and record the lseek() offsets before and
> >   after for each object. Then mmap the whole result, and stuff the
> >   appropriate pointers (based on arithmetic with the offsets) into the
> >   cached_objects list.
> 
> Cute.  The remerge diff code path creates a full tree that records
> the mechanical merge result.  By hooking into the lowest layer of
> write_object() interface, we'd serialize all objects in such a tree
> in the order they are computed (bottom up from the leaf level, I'd
> presume) into a single flat file ;-)

I do still like this approach, but just two possible gotchas I was
thinking of:

 - This side-steps all of our usual code for getting object data into
   memory. In general, I'd expect this content to not be too enormous,
   but it _could_ be if there are many / large blobs in the result. So
   we may end up with large maps. Probably not a big deal on modern
   64-bit systems. Maybe an issue on 32-bit systems, just because of
   virtual address space.

   Likewise, we do support systems with NO_MMAP. They'd work here, but
   it would probably mean putting all that object data into the heap. I
   could live with that, given how rare such systems are these days, and
   that it only matters if you're using --remerge-diff with big blobs.

 - I wonder to what degree --remerge-diff benefits from omitting writes
   for objects we already have. I.e., if you are writing out a whole
   tree representing the conflicted state, then you don't want to write
   all of the trees that aren't interesting. Hopefully the code is
   already figuring out which paths the merge even touched, and ignoring
   the rest. It probably benefits performance-wise from
   write_object_file() deciding to skip some object writes, as well
   (e.g., for resolutions which the final tree already took, as they'd
   be in the merge commit). The whole pretend-we-have-this-object thing
   may want to likewise make sure we don't write out objects that we
   already have in the real odb.

-Peff

  reply	other threads:[~2021-09-30  7:33 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-31  2:26 [PATCH 0/7] Add a new --remerge-diff capability to show & log Elijah Newren via GitGitGadget
2021-08-31  2:26 ` [PATCH 1/7] merge-ort: mark a few more conflict messages as omittable Elijah Newren via GitGitGadget
2021-08-31 21:06   ` Junio C Hamano
2021-09-01  0:03     ` Elijah Newren
2021-09-01 17:19       ` Junio C Hamano
2021-08-31  2:26 ` [PATCH 2/7] merge-ort: add ability to record conflict messages in a file Elijah Newren via GitGitGadget
2021-09-28 22:29   ` Jeff King
2021-09-29  6:25     ` Elijah Newren
2021-09-29 16:14       ` Junio C Hamano
2021-09-29 16:31         ` Elijah Newren
2021-09-30  7:58       ` Jeff King
2021-09-30  8:09         ` Ævar Arnfjörð Bjarmason
2021-10-01  2:07         ` Elijah Newren
2021-10-01  5:28           ` Jeff King
2021-08-31  2:26 ` [PATCH 3/7] ll-merge: add API for capturing warnings in a strbuf instead of stderr Elijah Newren via GitGitGadget
2021-09-28 22:37   ` Jeff King
2021-09-28 23:49     ` Junio C Hamano
2021-09-29  4:03     ` Elijah Newren
2021-08-31  2:26 ` [PATCH 4/7] merge-ort: capture and print ll-merge warnings in our preferred fashion Elijah Newren via GitGitGadget
2021-09-28 22:39   ` Jeff King
2021-09-30 16:53   ` Ævar Arnfjörð Bjarmason
2021-10-01  1:54     ` Elijah Newren
2021-10-01  7:23       ` Ævar Arnfjörð Bjarmason
2021-08-31  2:26 ` [PATCH 5/7] tmp-objdir: new API for creating and removing primary object dirs Elijah Newren via GitGitGadget
2021-09-28  7:55   ` Ævar Arnfjörð Bjarmason
2021-09-29  4:22     ` Elijah Newren
2021-09-30  7:41       ` Jeff King
2021-09-30 14:17       ` Ævar Arnfjörð Bjarmason
2021-10-01  3:55         ` Elijah Newren
2021-09-28 23:17   ` Jeff King
2021-09-29  4:08     ` Junio C Hamano
2021-09-30  7:33       ` Jeff King [this message]
2021-09-30 13:16         ` Ævar Arnfjörð Bjarmason
2021-09-30 21:00           ` Jeff King
2021-10-01  3:11           ` Elijah Newren
2021-10-01  7:30             ` Ævar Arnfjörð Bjarmason
2021-10-01  8:03               ` Elijah Newren
2021-10-01  4:26         ` Elijah Newren
2021-10-01  5:27           ` Jeff King
2021-10-01  7:43             ` Ævar Arnfjörð Bjarmason
2021-09-29  5:05     ` Elijah Newren
2021-09-30  7:26       ` Jeff King
2021-09-30  7:46         ` Jeff King
2021-09-30 20:06           ` Junio C Hamano
2021-10-01  3:59             ` Elijah Newren
2021-10-01 16:36               ` Junio C Hamano
2021-10-01  2:31           ` Elijah Newren
2021-10-01  5:21             ` Jeff King
2021-09-30 19:25         ` Neeraj Singh
2021-09-30 20:19           ` Junio C Hamano
2021-09-30 20:00         ` Junio C Hamano
2021-10-01  2:23         ` Elijah Newren
2021-08-31  2:26 ` [PATCH 6/7] show, log: provide a --remerge-diff capability Elijah Newren via GitGitGadget
2021-08-31  9:19   ` Sergey Organov
2021-09-28  8:01   ` Ævar Arnfjörð Bjarmason
2021-09-29  4:00     ` Elijah Newren
2021-08-31  2:26 ` [PATCH 7/7] doc/diff-options: explain the new --remerge-diff option Elijah Newren via GitGitGadget
2021-08-31 11:05 ` [PATCH 0/7] Add a new --remerge-diff capability to show & log Bagas Sanjaya
2021-08-31 16:16   ` Elijah Newren
2021-08-31 20:03 ` Junio C Hamano
2021-08-31 20:23   ` Elijah Newren
2021-09-01 21:07 ` Junio C Hamano
2021-09-01 21:42   ` Elijah Newren
2021-09-01 21:55     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YVVoXJo3DlPQd1A3@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=newren@gmail.com \
    --cc=sorganov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).