git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* standalone library/tool to query commit-graph?
@ 2019-05-22 18:49 Karl Ostmo
  2019-05-22 18:59 ` Derrick Stolee
  0 siblings, 1 reply; 12+ messages in thread
From: Karl Ostmo @ 2019-05-22 18:49 UTC (permalink / raw)
  To: git

After producing the file ".git/objects/info/commit-graph" with the
command "git commit-graph write", is there a way to answer queries
like "git merge-base --is-ancestor" without having a .git directory?
E.g. is there a library that will operate on the "commit-graph" file
all by itself?

Thanks,
Karl

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: standalone library/tool to query commit-graph?
  2019-05-22 18:49 standalone library/tool to query commit-graph? Karl Ostmo
@ 2019-05-22 18:59 ` Derrick Stolee
  2019-05-23 19:29   ` Jakub Narebski
  0 siblings, 1 reply; 12+ messages in thread
From: Derrick Stolee @ 2019-05-22 18:59 UTC (permalink / raw)
  To: Karl Ostmo, git

On 5/22/2019 2:49 PM, Karl Ostmo wrote:
> After producing the file ".git/objects/info/commit-graph" with the
> command "git commit-graph write", is there a way to answer queries
> like "git merge-base --is-ancestor" without having a .git directory?
> E.g. is there a library that will operate on the "commit-graph" file
> all by itself?

You could certainly build such a tool, assuming your merge-base parameters are
full-length commit ids. If you try to start at ref names, you'll need the .git
directory.

I would not expect such a tool to ever exist in the Git codebase. Instead, you
would need a new project, say "graph-analyzer --graph=<path> --is-ancestor <id1> <id2>"

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: standalone library/tool to query commit-graph?
  2019-05-22 18:59 ` Derrick Stolee
@ 2019-05-23 19:29   ` Jakub Narebski
  2019-05-23 21:54     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Narebski @ 2019-05-23 19:29 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Karl Ostmo, git

Derrick Stolee <stolee@gmail.com> writes:
> On 5/22/2019 2:49 PM, Karl Ostmo wrote:

>> After producing the file ".git/objects/info/commit-graph" with the
>> command "git commit-graph write", is there a way to answer queries
>> like "git merge-base --is-ancestor" without having a .git directory?
>> E.g. is there a library that will operate on the "commit-graph" file
>> all by itself?
>
> You could certainly build such a tool, assuming your merge-base parameters are
> full-length commit ids. If you try to start at ref names, you'll need the .git
> directory.
>
> I would not expect such a tool to ever exist in the Git codebase. Instead, you
> would need a new project, say "graph-analyzer --graph=<path> --is-ancestor <id1> <id2>"

It would be nice if such tool could convert commit-graph into other
commonly used augmented graph storage formats, like GEXF (Graph Exchange
XML Format), GraphML, GML (Graph Modelling Language), Pajek format or
Graphviz .dot format.

Wishfully thinking,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: standalone library/tool to query commit-graph?
  2019-05-23 19:29   ` Jakub Narebski
@ 2019-05-23 21:54     ` Ævar Arnfjörð Bjarmason
  2019-05-23 22:20       ` SZEDER Gábor
  0 siblings, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-23 21:54 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Derrick Stolee, Karl Ostmo, git


On Thu, May 23 2019, Jakub Narebski wrote:

> Derrick Stolee <stolee@gmail.com> writes:
>> On 5/22/2019 2:49 PM, Karl Ostmo wrote:
>
>>> After producing the file ".git/objects/info/commit-graph" with the
>>> command "git commit-graph write", is there a way to answer queries
>>> like "git merge-base --is-ancestor" without having a .git directory?
>>> E.g. is there a library that will operate on the "commit-graph" file
>>> all by itself?
>>
>> You could certainly build such a tool, assuming your merge-base parameters are
>> full-length commit ids. If you try to start at ref names, you'll need the .git
>> directory.
>>
>> I would not expect such a tool to ever exist in the Git codebase. Instead, you
>> would need a new project, say "graph-analyzer --graph=<path> --is-ancestor <id1> <id2>"
>
> It would be nice if such tool could convert commit-graph into other
> commonly used augmented graph storage formats, like GEXF (Graph Exchange
> XML Format), GraphML, GML (Graph Modelling Language), Pajek format or
> Graphviz .dot format.

Wouldn't that make more sense as a hypothetical output format for "log
--graph" rather than something you'd want to emit from the commit-graph?
Presumably you'd want to export in such a format to see the shape of the
repo, and since the commit graph doesn't include any commits outside of
packs you'd miss any loose commits.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: standalone library/tool to query commit-graph?
  2019-05-23 21:54     ` Ævar Arnfjörð Bjarmason
@ 2019-05-23 22:20       ` SZEDER Gábor
  2019-05-23 23:48         ` Derrick Stolee
  0 siblings, 1 reply; 12+ messages in thread
From: SZEDER Gábor @ 2019-05-23 22:20 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jakub Narebski, Derrick Stolee, Karl Ostmo, git

On Thu, May 23, 2019 at 11:54:22PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, May 23 2019, Jakub Narebski wrote:
> 
> > Derrick Stolee <stolee@gmail.com> writes:
> >> On 5/22/2019 2:49 PM, Karl Ostmo wrote:
> >
> >>> After producing the file ".git/objects/info/commit-graph" with the
> >>> command "git commit-graph write", is there a way to answer queries
> >>> like "git merge-base --is-ancestor" without having a .git directory?
> >>> E.g. is there a library that will operate on the "commit-graph" file
> >>> all by itself?
> >>
> >> You could certainly build such a tool, assuming your merge-base parameters are
> >> full-length commit ids. If you try to start at ref names, you'll need the .git
> >> directory.
> >>
> >> I would not expect such a tool to ever exist in the Git codebase. Instead, you
> >> would need a new project, say "graph-analyzer --graph=<path> --is-ancestor <id1> <id2>"
> >
> > It would be nice if such tool could convert commit-graph into other
> > commonly used augmented graph storage formats, like GEXF (Graph Exchange
> > XML Format), GraphML, GML (Graph Modelling Language), Pajek format or
> > Graphviz .dot format.
> 
> Wouldn't that make more sense as a hypothetical output format for "log
> --graph" rather than something you'd want to emit from the commit-graph?
> Presumably you'd want to export in such a format to see the shape of the
> repo, and since the commit graph doesn't include any commits outside of
> packs you'd miss any loose commits.

No, the commit-graph includes loose commits as well.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: standalone library/tool to query commit-graph?
  2019-05-23 22:20       ` SZEDER Gábor
@ 2019-05-23 23:48         ` Derrick Stolee
  2019-05-24  9:34           ` SZEDER Gábor
  0 siblings, 1 reply; 12+ messages in thread
From: Derrick Stolee @ 2019-05-23 23:48 UTC (permalink / raw)
  To: SZEDER Gábor, Ævar Arnfjörð Bjarmason
  Cc: Jakub Narebski, Karl Ostmo, git

On 5/23/2019 6:20 PM, SZEDER Gábor wrote:
> On Thu, May 23, 2019 at 11:54:22PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Thu, May 23 2019, Jakub Narebski wrote:
>>
>>> Derrick Stolee <stolee@gmail.com> writes:
>>>> On 5/22/2019 2:49 PM, Karl Ostmo wrote:
>>>
>>>>> After producing the file ".git/objects/info/commit-graph" with the
>>>>> command "git commit-graph write", is there a way to answer queries
>>>>> like "git merge-base --is-ancestor" without having a .git directory?
>>>>> E.g. is there a library that will operate on the "commit-graph" file
>>>>> all by itself?
>>>>
>>>> You could certainly build such a tool, assuming your merge-base parameters are
>>>> full-length commit ids. If you try to start at ref names, you'll need the .git
>>>> directory.
>>>>
>>>> I would not expect such a tool to ever exist in the Git codebase. Instead, you
>>>> would need a new project, say "graph-analyzer --graph=<path> --is-ancestor <id1> <id2>"
>>>
>>> It would be nice if such tool could convert commit-graph into other
>>> commonly used augmented graph storage formats, like GEXF (Graph Exchange
>>> XML Format), GraphML, GML (Graph Modelling Language), Pajek format or
>>> Graphviz .dot format.
>>
>> Wouldn't that make more sense as a hypothetical output format for "log
>> --graph" rather than something you'd want to emit from the commit-graph?
>> Presumably you'd want to export in such a format to see the shape of the
>> repo, and since the commit graph doesn't include any commits outside of
>> packs you'd miss any loose commits.
> 
> No, the commit-graph includes loose commits as well.

Depends on how you build the commit-graph.

	git commit-graph write
	git commit-graph write --stdin-packs

These options build based on commits in packs (and closes under reachability).

	git commit-graph write --reachable
	git commit-graph write --stdin-commits

These options build based on a set of starting commits. Either the refs (--reachable)
or the input commit ids (--stdin-commits).

But I do like the flexibility of `git log --graph` as you could export the graph
after reparenting (with options like `--simplify-merges -- <path>`). You also would 
not include commits from random topic branches you have sitting around.

-Stolee

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: standalone library/tool to query commit-graph?
  2019-05-23 23:48         ` Derrick Stolee
@ 2019-05-24  9:34           ` SZEDER Gábor
  2019-05-24  9:49             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 12+ messages in thread
From: SZEDER Gábor @ 2019-05-24  9:34 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Ævar Arnfjörð Bjarmason, Jakub Narebski,
	Karl Ostmo, git

On Thu, May 23, 2019 at 07:48:33PM -0400, Derrick Stolee wrote:
> On 5/23/2019 6:20 PM, SZEDER Gábor wrote:
> > On Thu, May 23, 2019 at 11:54:22PM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> and since the commit graph doesn't include any commits outside of
> >> packs you'd miss any loose commits.
> > 
> > No, the commit-graph includes loose commits as well.
> 
> Depends on how you build the commit-graph.

Yeah; I just didn't want to go into details, hoping that this short
reply will be enough to jog Ævar's memory to recall our earlier
discussion about this :)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: standalone library/tool to query commit-graph?
  2019-05-24  9:34           ` SZEDER Gábor
@ 2019-05-24  9:49             ` Ævar Arnfjörð Bjarmason
  2019-05-24 10:06               ` SZEDER Gábor
  2019-06-25 18:27               ` Jakub Narebski
  0 siblings, 2 replies; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-24  9:49 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Derrick Stolee, Jakub Narebski, Karl Ostmo, git


On Fri, May 24 2019, SZEDER Gábor wrote:

> On Thu, May 23, 2019 at 07:48:33PM -0400, Derrick Stolee wrote:
>> On 5/23/2019 6:20 PM, SZEDER Gábor wrote:
>> > On Thu, May 23, 2019 at 11:54:22PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> >> and since the commit graph doesn't include any commits outside of
>> >> packs you'd miss any loose commits.
>> >
>> > No, the commit-graph includes loose commits as well.
>>
>> Depends on how you build the commit-graph.
>
> Yeah; I just didn't want to go into details, hoping that this short
> reply will be enough to jog Ævar's memory to recall our earlier
> discussion about this :)

To clarify (and I should have said) I meant it'll include only packed
commits in the mode Karl Ostmo invoked it in, as Derrick points out.

But yeah, you can of course give it arbitrary starting points, but
needing to deal with those sorts of caveats makes it rather useless in
practice for the sort of use-case Jakub mused about, but more
importantly a full XML dump of the graph isn't going to get much of a
benefit from the commit graph, it helps with algorithms that want to
avoid those sorts of full walks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: standalone library/tool to query commit-graph?
  2019-05-24  9:49             ` Ævar Arnfjörð Bjarmason
@ 2019-05-24 10:06               ` SZEDER Gábor
  2019-05-24 10:49                 ` Ævar Arnfjörð Bjarmason
  2019-06-25 18:27               ` Jakub Narebski
  1 sibling, 1 reply; 12+ messages in thread
From: SZEDER Gábor @ 2019-05-24 10:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, Jakub Narebski, Karl Ostmo, git

On Fri, May 24, 2019 at 11:49:28AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, May 24 2019, SZEDER Gábor wrote:
> 
> > On Thu, May 23, 2019 at 07:48:33PM -0400, Derrick Stolee wrote:
> >> On 5/23/2019 6:20 PM, SZEDER Gábor wrote:
> >> > On Thu, May 23, 2019 at 11:54:22PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >
> >> >> and since the commit graph doesn't include any commits outside of
> >> >> packs you'd miss any loose commits.
> >> >
> >> > No, the commit-graph includes loose commits as well.
> >>
> >> Depends on how you build the commit-graph.
> >
> > Yeah; I just didn't want to go into details, hoping that this short
> > reply will be enough to jog Ævar's memory to recall our earlier
> > discussion about this :)
> 
> To clarify (and I should have said) I meant it'll include only packed
> commits in the mode Karl Ostmo invoked it in, as Derrick points out.

No, even in that mode it will include loose objects as well, if it has
to; that's what the "and closes under reachability" part of Derrick's
reply means and that's what I showed in our earlier discussion at:

  https://public-inbox.org/git/20190322154943.GF22459@szeder.dev/



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: standalone library/tool to query commit-graph?
  2019-05-24 10:06               ` SZEDER Gábor
@ 2019-05-24 10:49                 ` Ævar Arnfjörð Bjarmason
  2019-05-24 11:37                   ` SZEDER Gábor
  0 siblings, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-05-24 10:49 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Derrick Stolee, Jakub Narebski, Karl Ostmo, git


On Fri, May 24 2019, SZEDER Gábor wrote:

> On Fri, May 24, 2019 at 11:49:28AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, May 24 2019, SZEDER Gábor wrote:
>>
>> > On Thu, May 23, 2019 at 07:48:33PM -0400, Derrick Stolee wrote:
>> >> On 5/23/2019 6:20 PM, SZEDER Gábor wrote:
>> >> > On Thu, May 23, 2019 at 11:54:22PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> >
>> >> >> and since the commit graph doesn't include any commits outside of
>> >> >> packs you'd miss any loose commits.
>> >> >
>> >> > No, the commit-graph includes loose commits as well.
>> >>
>> >> Depends on how you build the commit-graph.
>> >
>> > Yeah; I just didn't want to go into details, hoping that this short
>> > reply will be enough to jog Ævar's memory to recall our earlier
>> > discussion about this :)
>>
>> To clarify (and I should have said) I meant it'll include only packed
>> commits in the mode Karl Ostmo invoked it in, as Derrick points out.
>
> No, even in that mode it will include loose objects as well, if it has
> to; that's what the "and closes under reachability" part of Derrick's
> reply means and that's what I showed in our earlier discussion at:
>
>   https://public-inbox.org/git/20190322154943.GF22459@szeder.dev/

I should have said "include any commits outside of packs [to seed the
revision walk]".

As you correctly point out there *are* caveats to that, e.g. it's
possible to have packs & loose commits but you include everything
because of reachability.

For the purposes of the discussion Jakub started upthread the
not-quite-correct-but-close-enough mental model that we generally tend
to accumulate loose objects that later coalesce into packs is close
enough.

I.e. for that reason for most users a "git commit-graph write" won't
produce a graph with all reachable commits, e.g. try cloning git.git,
"git am"-ing a patch on top, and generate it again, it'll be the same
(unless you picked a humongous patch).

Similarly it'll be incomplete for most users that have
gc.writeCommitGraph=true on since they use "gc --auto", and they're
likely in an in-between state where they have a semi-stale graph.

So building tools directly on top of it shouldn't be anyone's first
choice, instead walk the DAG and see if that walking code can as an
optimization optimistically consult the commit-graph.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: standalone library/tool to query commit-graph?
  2019-05-24 10:49                 ` Ævar Arnfjörð Bjarmason
@ 2019-05-24 11:37                   ` SZEDER Gábor
  0 siblings, 0 replies; 12+ messages in thread
From: SZEDER Gábor @ 2019-05-24 11:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, Jakub Narebski, Karl Ostmo, git

On Fri, May 24, 2019 at 12:49:12PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> > On Thu, May 23, 2019 at 07:48:33PM -0400, Derrick Stolee wrote:
> >> >> On 5/23/2019 6:20 PM, SZEDER Gábor wrote:
> >> >> > On Thu, May 23, 2019 at 11:54:22PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> >
> >> >> >> and since the commit graph doesn't include any commits outside of
> >> >> >> packs you'd miss any loose commits.
> >> >> >
> >> >> > No, the commit-graph includes loose commits as well.
> >> >>
> >> >> Depends on how you build the commit-graph.
> >> >
> >> > Yeah; I just didn't want to go into details, hoping that this short
> >> > reply will be enough to jog Ævar's memory to recall our earlier
> >> > discussion about this :)
> >>
> >> To clarify (and I should have said) I meant it'll include only packed
> >> commits in the mode Karl Ostmo invoked it in, as Derrick points out.
> >
> > No, even in that mode it will include loose objects as well, if it has
> > to; that's what the "and closes under reachability" part of Derrick's
> > reply means and that's what I showed in our earlier discussion at:
> >
> >   https://public-inbox.org/git/20190322154943.GF22459@szeder.dev/
> 
> I should have said "include any commits outside of packs [to seed the
> revision walk]".
> 
> As you correctly point out there *are* caveats to that, e.g. it's
> possible to have packs & loose commits but you include everything
> because of reachability.
> 
> For the purposes of the discussion Jakub started upthread the
> not-quite-correct-but-close-enough mental model that we generally tend
> to accumulate loose objects that later coalesce into packs is close
> enough.
> 
> I.e. for that reason for most users a "git commit-graph write" won't
> produce a graph with all reachable commits, e.g. try cloning git.git,
> "git am"-ing a patch on top, and generate it again, it'll be the same
> (unless you picked a humongous patch).

Ok, with this I finally understand what you meant.

And it just reinforces my long-held belief that '--reachable' should be
the default for 'git commit-graph write'...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: standalone library/tool to query commit-graph?
  2019-05-24  9:49             ` Ævar Arnfjörð Bjarmason
  2019-05-24 10:06               ` SZEDER Gábor
@ 2019-06-25 18:27               ` Jakub Narebski
  1 sibling, 0 replies; 12+ messages in thread
From: Jakub Narebski @ 2019-06-25 18:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: SZEDER Gábor, Derrick Stolee, Karl Ostmo, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
[...]

> To clarify (and I should have said) I meant it'll include only packed
> commits in the mode Karl Ostmo invoked it in, as Derrick points out.
>
> But yeah, you can of course give it arbitrary starting points, but
> needing to deal with those sorts of caveats makes it rather useless in
> practice for the sort of use-case Jakub mused about, but more
> importantly a full XML dump of the graph isn't going to get much of a
> benefit from the commit graph, it helps with algorithms that want to
> avoid those sorts of full walks.

Actually for an "XML dump" of a graph of revisions (assuming that you
can give nodes and edges in arbitrary order in this graph output format)
doing it using serialized commit-graph should be faster: you only need
to read one file, and convert it to other format (perhaps even in a
streaming manner).  No need to delta-unpack, decompress and parse commit
objects.

Though on the other hand you are right: if "git log --graph" uses
serialized commit graph, and it is used for XML / JSON dump, it should
also be fast.  If there is no serialized commit graph, you still can
generate XML dump.

Best,
--
Jakub Narębski

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-06-25 18:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-22 18:49 standalone library/tool to query commit-graph? Karl Ostmo
2019-05-22 18:59 ` Derrick Stolee
2019-05-23 19:29   ` Jakub Narebski
2019-05-23 21:54     ` Ævar Arnfjörð Bjarmason
2019-05-23 22:20       ` SZEDER Gábor
2019-05-23 23:48         ` Derrick Stolee
2019-05-24  9:34           ` SZEDER Gábor
2019-05-24  9:49             ` Ævar Arnfjörð Bjarmason
2019-05-24 10:06               ` SZEDER Gábor
2019-05-24 10:49                 ` Ævar Arnfjörð Bjarmason
2019-05-24 11:37                   ` SZEDER Gábor
2019-06-25 18:27               ` Jakub Narebski

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).