git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	git <git@vger.kernel.org>, Jacob Keller <jacob.keller@gmail.com>,
	Kevin Daudt <me@ikke.info>,
	Andreas Schwab <schwab@linux-m68k.org>
Subject: Re: [PATCHv3 7/7] builtin/describe.c: describe a blob
Date: Tue, 14 Nov 2017 12:40:03 -0800	[thread overview]
Message-ID: <CAGZ79kZXxrDKr5PfJ2xx_3hhzscUiQvqOnCGURXCbJSL118trw@mail.gmail.com> (raw)
In-Reply-To: <20171114120208.d0570f20672f117bcf8e5396@google.com>

On Tue, Nov 14, 2017 at 12:02 PM, Jonathan Tan <jonathantanmy@google.com> wrote:
> On Thu,  2 Nov 2017 12:41:48 -0700
> Stefan Beller <sbeller@google.com> wrote:
>
>> Sometimes users are given a hash of an object and they want to
>> identify it further (ex.: Use verify-pack to find the largest blobs,
>> but what are these? or [1])
>>
>> "This is an interesting endeavor, because describing things is hard."
>>   -- me, upon writing this patch.
>>
>> When describing commits, we try to anchor them to tags or refs, as these
>> are conceptually on a higher level than the commit. And if there is no ref
>> or tag that matches exactly, we're out of luck.  So we employ a heuristic
>> to make up a name for the commit. These names are ambiguous, there might
>> be different tags or refs to anchor to, and there might be different
>> path in the DAG to travel to arrive at the commit precisely.
>>
>> When describing a blob, we want to describe the blob from a higher layer
>> as well, which is a tuple of (commit, deep/path) as the tree objects
>> involved are rather uninteresting.  The same blob can be referenced by
>> multiple commits, so how we decide which commit to use?  This patch
>> implements a rather naive approach on this: As there are no back pointers
>> from blobs to commits in which the blob occurs, we'll start walking from
>> any tips available, listing the blobs in-order of the commit and once we
>> found the blob, we'll take the first commit that listed the blob.  For
>> source code this is likely not the first commit that introduced the blob,
>> but rather the latest commit that contained the blob.  For example:
>>
>>   git describe v0.99:Makefile
>>   v0.99-5-gab6625e06a:Makefile
>>
>> tells us the latest commit that contained the Makefile as it was in tag
>> v0.99 is commit v0.99-5-gab6625e06a (and at the same path), as the next
>> commit on top v0.99-6-gb1de9de2b9 ([PATCH] Bootstrap "make dist",
>> 2005-07-11) touches the Makefile.
>>
>> Let's see how this description turns out, if it is useful in day-to-day
>> use as I have the intuition that we'd rather want to see the *first*
>> commit that this blob was introduced to the repository (which can be
>> achieved easily by giving the `--reverse` flag in the describe_blob rev
>> walk).
>
> The method of your intuition indeed seems better - could we just have
> this from the start?

Thanks for the review!

This series was written with the mindset, that a user would only ever
want to describe bad blobs. (bad in terms of file size, unwanted content, etc)

With the --reverse you only see the *first* introduction of said blob,
so finding out if it was re-introduced is still not as easy, whereas "when
was this blob last used" which is what the current algorithm does, covers
that case better.

> Alternatively, to me, it seems that listing commits that *introduces*
> the blob (that is, where it references the blob, but none of its parents
> do) would be the best way. That would then be independent of traversal
> order (and we would no longer need to find a tag etc. to tie the blob
> to).

What if it is introduced multiple times? (either in multiple competing
side branches; or introduced, reverted and re-introduced?)

> If we do that, it seems to me that there is a future optimization that
> could get the first commit to the user more quickly - once a commit
> without the blob and a descendant commit with the blob is found, that
> interval can be bisected, so that the first commit is found in O(log
> number of commits) instead of O(commits). But this can be done later.

bisection assumes that we only have one "event" going from good to
bad, which doesn't hold true here, as the blob can be there at different
occasions of the history.

  reply	other threads:[~2017-11-14 20:40 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-28  0:44 [RFC PATCH 0/3] git-describe <blob> ? Stefan Beller
2017-10-28  0:45 ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
2017-10-28  0:45   ` [PATCH 2/3] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
2017-10-28 17:20     ` Johannes Schindelin
2017-10-29  3:22       ` Stefan Beller
2017-10-29  3:23         ` Stefan Beller
2017-10-29  3:43           ` Junio C Hamano
2017-10-28  0:45   ` [PATCH 3/3] builtin/describe: describe blobs Stefan Beller
2017-10-28 17:32     ` Johannes Schindelin
2017-10-28 22:47       ` Jacob Keller
2017-10-29  3:28       ` Stefan Beller
2017-10-29 12:02         ` Kevin Daudt
2017-10-29 12:07         ` Johannes Schindelin
2017-10-28 17:15   ` [PATCH 1/3] list-objects.c: factor out traverse_trees_and_blobs Johannes Schindelin
2017-10-29  3:13     ` Stefan Beller
2017-10-28 16:04 ` [RFC PATCH 0/3] git-describe <blob> ? Johannes Schindelin
2017-10-31  0:33 ` [PATCH 0/7] git-describe <blob> Stefan Beller
2017-10-31  0:33   ` [PATCH 1/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
2017-10-31  6:07     ` Junio C Hamano
2017-10-31  0:33   ` [PATCH 2/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
2017-10-31  6:57     ` Junio C Hamano
2017-10-31 18:12       ` Stefan Beller
2017-10-31  0:33   ` [PATCH 3/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
2017-10-31  8:15     ` Jacob Keller
2017-10-31  0:33   ` [PATCH 4/7] builtin/describe.c: print debug statements earlier Stefan Beller
2017-10-31  7:03     ` Junio C Hamano
2017-10-31 19:05       ` Stefan Beller
2017-10-31  0:33   ` [PATCH 5/7] builtin/describe.c: factor out describe_commit Stefan Beller
2017-10-31  0:33   ` [PATCH 6/7] builtin/describe.c: describe a blob Stefan Beller
2017-10-31  6:25     ` Junio C Hamano
2017-10-31 19:16       ` Stefan Beller
2017-11-01  3:34         ` Junio C Hamano
2017-11-01 20:58           ` Stefan Beller
2017-11-02  1:53             ` Junio C Hamano
2017-11-02  4:23               ` Junio C Hamano
2017-11-04 21:15                 ` Philip Oakley
2017-11-05  6:28                   ` Junio C Hamano
2017-11-06 23:50                     ` Philip Oakley
2017-11-09 20:30                       ` Stefan Beller
2017-11-10  0:25                         ` Philip Oakley
2017-11-10  1:24                           ` Junio C Hamano
2017-11-10 22:44                             ` [PATCH 0/1] describe a blob: with better docs Stefan Beller
2017-11-10 22:44                               ` [PATCH] builtin/describe.c: describe a blob Stefan Beller
2017-11-13  1:33                                 ` Junio C Hamano
2017-11-14 23:37                                   ` Stefan Beller
2017-11-20 15:22                             ` [PATCH 6/7] " Philip Oakley
2017-11-20 18:18                               ` Philip Oakley
2017-11-01  3:44         ` Junio C Hamano
2017-10-31  0:33   ` [PATCH 7/7] t6120: fix typo in test name Stefan Beller
2017-11-01  1:21     ` Junio C Hamano
2017-11-01 18:13       ` Stefan Beller
2017-11-02  1:36       ` Junio C Hamano
2017-10-31 21:18   ` [PATCHv2 0/7] git describe blob Stefan Beller
2017-10-31 21:18     ` [PATCHv2 1/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
2017-11-01  3:46       ` Junio C Hamano
2017-10-31 21:18     ` [PATCHv2 2/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
2017-11-01  3:50       ` Junio C Hamano
2017-11-01 12:26         ` Johannes Schindelin
2017-11-01 12:37           ` Junio C Hamano
2017-11-01 19:37             ` Stefan Beller
2017-11-01 22:08               ` Johannes Schindelin
2017-11-01 22:19                 ` Stefan Beller
2017-11-01 22:39                   ` Johannes Schindelin
2017-11-01 22:46                     ` Stefan Beller
2017-11-01 21:36             ` Johannes Schindelin
2017-11-01 21:39               ` Jeff King
2017-11-01 22:33                 ` Johannes Schindelin
2017-11-02  1:20                   ` Junio C Hamano
2017-10-31 21:18     ` [PATCHv2 3/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
2017-10-31 21:18     ` [PATCHv2 4/7] builtin/describe.c: print debug statements earlier Stefan Beller
2017-10-31 21:31       ` Eric Sunshine
2017-10-31 21:18     ` [PATCHv2 5/7] builtin/describe.c: factor out describe_commit Stefan Beller
2017-10-31 21:18     ` [PATCHv2 6/7] builtin/describe.c: describe a blob Stefan Beller
2017-10-31 21:49       ` Eric Sunshine
2017-11-01 19:51         ` Stefan Beller
2017-11-01  4:11       ` Junio C Hamano
2017-11-01 12:32         ` Johannes Schindelin
2017-11-01 17:59           ` Stefan Beller
2017-11-01 21:05             ` Jacob Keller
2017-11-01 22:12               ` Johannes Schindelin
2017-11-01 22:21                 ` Stefan Beller
2017-11-01 22:41                   ` Johannes Schindelin
2017-11-01 22:53                     ` Stefan Beller
2017-11-02  6:05                     ` Jacob Keller
2017-11-03  5:18                       ` Junio C Hamano
2017-11-03  6:55                         ` Jacob Keller
2017-11-03 15:02                           ` Junio C Hamano
2017-11-02  7:23                     ` Andreas Schwab
2017-11-02 18:18                       ` Stefan Beller
2017-11-03 12:05                         ` Johannes Schindelin
2017-11-01 21:28         ` Stefan Beller
2017-10-31 21:18     ` [PATCHv2 7/7] t6120: fix typo in test name Stefan Beller
2017-11-01  5:14     ` [PATCHv2 0/7] git describe blob Junio C Hamano
2017-11-02 19:41     ` [PATCHv3 " Stefan Beller
2017-11-02 19:41       ` [PATCHv3 1/7] t6120: fix typo in test name Stefan Beller
2017-11-02 19:41       ` [PATCHv3 2/7] list-objects.c: factor out traverse_trees_and_blobs Stefan Beller
2017-11-02 19:41       ` [PATCHv3 3/7] revision.h: introduce blob/tree walking in order of the commits Stefan Beller
2017-11-14 19:52         ` Jonathan Tan
2017-11-02 19:41       ` [PATCHv3 4/7] builtin/describe.c: rename `oid` to avoid variable shadowing Stefan Beller
2017-11-02 19:41       ` [PATCHv3 5/7] builtin/describe.c: print debug statements earlier Stefan Beller
2017-11-14 19:55         ` Jonathan Tan
2017-11-14 20:00           ` Stefan Beller
2017-11-02 19:41       ` [PATCHv3 6/7] builtin/describe.c: factor out describe_commit Stefan Beller
2017-11-02 19:41       ` [PATCHv3 7/7] builtin/describe.c: describe a blob Stefan Beller
2017-11-14 20:02         ` Jonathan Tan
2017-11-14 20:40           ` Stefan Beller [this message]
2017-11-14 21:17             ` Jonathan Tan
2017-11-03  0:23       ` [PATCHv3 0/7] git describe blob Jacob Keller
2017-11-03  1:46         ` Junio C Hamano
2017-11-03  2:29           ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kZXxrDKr5PfJ2xx_3hhzscUiQvqOnCGURXCbJSL118trw@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=jacob.keller@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ikke.info \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).