git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Jakub Narębski" <jnareb@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	Stefan Beller <sbeller@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Marc Strapetz <marc.strapetz@syntevo.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: topological index field for commit objects
Date: Fri, 1 Jul 2016 11:59:28 +0200	[thread overview]
Message-ID: <57763F00.4070409@gmail.com> (raw)
In-Reply-To: <20160701065452.GE5358@sigill.intra.peff.net>

W dniu 2016-07-01 o 08:54, Jeff King pisze:
> On Thu, Jun 30, 2016 at 12:30:31PM +0200, Jakub Narębski wrote:
> 
>>> This is one of the open questions. My older patches turned them off when
>>> replacements and grafts are in effect.
>>
>> Well, if you store the cache of generation numbers in the packfile, or in
>> the index of the packfile, or in the bitmap file, or in separate bitmap-like
>> file, generating them on repack, then of course any grafts or replacements
>> invalidate them... though for low level commands (like object counting)
>> replacements are transparent -- or rather they are (and can be) treated as
>> any other ref for reachability analysis.
>>
>> Well, if there are no grafts, you could still use them for doing
>> "git --no-replace-objects log ...", isn't it?
> 
> Yes, replace refs don't invalidate the concept of a cache. It just
> means that you invalidate the invariants of the cache for a specific
> view, so you need a cache which matches that view.
> 
> It has been several years, but I remember at one point having patches
> that summarized the graft/replace state as a single hash, and only used
> the cache if it matched that state. So you could actually keep a cache
> for some set of replace-refs that you have, as well as a cache for the
> case that you've turned them off, etc.
> 
> I don't think that level of complexity is really worth it, though.

Well, you could always update the reachability-helpers cache when running
`git replace` command, and when fetching into 'refs/replace' namespace...

...but this wouldn't take into account the fact that you can change
replace refs "by hand", and that grafts file^{1} is only editable by hand.
So at query time Git would need to check (e.g. via hash of graft file,
hash of packed-refs refs/replace namespace, concatenated) that said
cache is still valid for replace-respecting view. And perhaps update
said cache.

Though if we limit ourself to the replacements mechanism, we could
have a configuration variable saying "I will manipulate replacements
only using git-replace, and I want faster reachability", isn't it?


1.) Can we deprecate and remove grafts mechanism now that we have superior
solution and migration mechanism? 
 
>>>>> I have patches that generate and store the numbers at pack time, similar
>>>>> to the way we do the reachability bitmaps.
>>
>> Ah, so those cached generation numbers are generated and stored at pack
>> time. Where you store them: is it a separate file? Bitmap file? Packfile?
> 
> There were a few iterations of the concept over the years, but the
> pack-time one uses a separate file with the same name prefix as a pack
> (similar to the way bitmaps are stored). The big advantage there is that
> we can piggy-back on the pack .idx to avoid having to write each sha1
> again (20 bytes per commit, whereas the actual data we're caching is
> only 4 bytes).

Does it use any lightweight compression mechanism, or is it not needed?
How does the format of this file looks like?
 
>>> At GitHub we are using them for --contains analysis, along with mass
>>> ahead/behind (e.g., as in https://github.com/gitster/git/branches). My
>>> plan is to send patches upstream, but they need some cleanup first.
>>
>> That would be nice to have, please.
>>
>> Er, is mass ahead/behind something that can be plugged into Git
>> (e.g. for "git branch -v -v"), or is it something GitHub-specific?
> 
> We have a custom command, "git ahead-behind", where you can specify
> arbitrary pairs of commits on stdin. But it's all backed by a function
> which, yes, could be plugged into "branch -v -v". It caches any bitmaps
> it needs, so if you are doing 100 ahead/behind comparisons against
> "master", for example, it only has to find the bitmap for "master" once
> (remember that we sometimes have to traverse to complete a bitmap when
> a branch has been updated since the last repack).

That would be nice to have (perhaps invoked only if number of branches
is high enough; that excludes using it for ahead-behind information that
`git checkout` prints).

-- 
Jakub Narębski


  reply	other threads:[~2016-07-01 10:00 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-29 18:31 topological index field for commit objects Marc Strapetz
2016-06-29 18:59 ` Junio C Hamano
2016-06-29 20:20   ` Stefan Beller
2016-06-29 20:39     ` Junio C Hamano
2016-06-29 20:54       ` Stefan Beller
2016-06-29 21:37         ` Stefan Beller
2016-06-29 21:43           ` Jeff King
2016-06-29 20:56       ` Jeff King
2016-06-29 21:49         ` Jakub Narębski
2016-06-29 22:00           ` Jeff King
2016-06-29 22:11             ` Junio C Hamano
2016-06-29 22:30               ` Jeff King
2016-07-05 11:43                 ` Johannes Schindelin
2016-07-05 12:59                   ` Jakub Narębski
2016-06-30 10:30             ` Jakub Narębski
2016-06-30 18:12               ` Linus Torvalds
2016-06-30 23:39                 ` Jakub Narębski
2016-06-30 23:59                 ` Mike Hommey
2016-07-01  3:17                 ` Jeff King
2016-07-01  6:45                   ` Marc Strapetz
2016-07-01  9:48                   ` Jakub Narębski
2016-07-01 16:08                   ` Junio C Hamano
2016-07-01  6:54               ` Jeff King
2016-07-01  9:59                 ` Jakub Narębski [this message]
2016-07-20  0:07             ` Jakub Narębski
2016-07-20 13:02               ` Jeff King
2017-02-04 13:43                 ` Jakub Narębski
2017-02-17  9:26                   ` Jeff King
2017-02-17  9:28                     ` Jakub Narębski
2016-06-29 22:15       ` Marc Strapetz
2016-06-29 21:00   ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57763F00.4070409@gmail.com \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=marc.strapetz@syntevo.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).