git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] diff_unique_abbrev(): document its assumtion and limitation
@ 2016-09-30 17:54 Junio C Hamano
  2016-09-30 18:09 ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2016-09-30 17:54 UTC (permalink / raw)
  To: git

This function is used to add "..." to displayed object names in
"diff --raw --abbrev[=<n>]" output.  It bases its behaviour on an
untold assumption that the abbreviation length requested by the
caller is "reasonble", i.e. most of the objects will abbreviate
within the requested length and the resulting length would never
exceed it by more than a few hexdigits (otherwise the resulting
columns would not align).  Explain that in a comment.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * I had to scratch my head wondering what impact Linus's
   auto-abbrev change will have on this code, which I wrote many
   years ago in 47dd0d59 ("diff: --abbrev option", 2005-12-13).

 diff.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/diff.c b/diff.c
index cefc13eb8e..428ed4f4c9 100644
--- a/diff.c
+++ b/diff.c
@@ -4108,7 +4108,8 @@ void diff_free_filepair(struct diff_filepair *p)
 	free(p);
 }
 
-/* This is different from find_unique_abbrev() in that
+/*
+ * This is different from find_unique_abbrev() in that
  * it stuffs the result with dots for alignment.
  */
 const char *diff_unique_abbrev(const unsigned char *sha1, int len)
@@ -4120,6 +4121,26 @@ const char *diff_unique_abbrev(const unsigned char *sha1, int len)
 
 	abbrev = find_unique_abbrev(sha1, len);
 	abblen = strlen(abbrev);
+
+	/*
+	 * In well-behaved cases, where the abbbreviated result is the
+	 * same as the requested length, append three dots after the
+	 * abbreviation (hence the whole logic is limited to the case
+	 * where abblen < 37); when the actual abbreviated result is a
+	 * bit longer than the requested length, we reduce the number
+	 * of dots so that they match the well-behaved ones.  However,
+	 * if the actual abbreviation is longer than the requested
+	 * length by more than three, we give up on aligning, and add
+	 * three dots anyway, to indicate that the output is not the
+	 * full object name.  Yes, this may be suboptimal, but this
+	 * appears only in "diff --raw --abbrev" output and it is not
+	 * worth the effort to change it now.  Note that this would
+	 * likely to work fine when the automatic sizing of default
+	 * abbreviation length is used--we would be fed -1 in "len" in
+	 * that case, and will end up always appending three-dots, but
+	 * the automatic sizing is supposed to give abblen that ensures
+	 * uniqueness across all objects (statistically speaking).
+	 */
 	if (abblen < 37) {
 		static char hex[41];
 		if (len < abblen && abblen <= len + 2)
-- 
2.10.0-612-g22341905f2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] diff_unique_abbrev(): document its assumtion and limitation
  2016-09-30 17:54 [PATCH] diff_unique_abbrev(): document its assumtion and limitation Junio C Hamano
@ 2016-09-30 18:09 ` Jeff King
  2016-09-30 19:19   ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2016-09-30 18:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Sep 30, 2016 at 10:54:53AM -0700, Junio C Hamano wrote:

> This function is used to add "..." to displayed object names in
> "diff --raw --abbrev[=<n>]" output.  It bases its behaviour on an
> untold assumption that the abbreviation length requested by the
> caller is "reasonble", i.e. most of the objects will abbreviate
> within the requested length and the resulting length would never
> exceed it by more than a few hexdigits (otherwise the resulting
> columns would not align).  Explain that in a comment.

Heh, I have actually have a similar patch that renames it to
diff_aligned_abbrev(). Because I wanted to add another function:

  static const char *diff_abbrev_oid(const struct object_id *oid,
                                     int abbrev)
  {
        if (startup_info->have-repository)
                return find_unique_abbrev(oid->hash, abbrev);
        else {
                char *hex = oid_to_hex(oid);
                if (abbrev < 0) || abbrev > GIT_SHA1_HEXSZ)
                        die("BUG: oid abbreviation out of range: %d", abbrev);
                hex[abbrev] = '\0';
                return hex;
        }
  }

and I didn't want people to confuse the two. Now that function _would_
want to be updated as a result of the other conversation (it would need
to do something sensible with "-1", like turning it into "7", or
whatever else is deemed reasonable outside of a repository).

Anyway. I just wonder if you want to give it a better name while you are
at it.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] diff_unique_abbrev(): document its assumtion and limitation
  2016-09-30 18:09 ` Jeff King
@ 2016-09-30 19:19   ` Junio C Hamano
  2016-10-01  9:15     ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2016-09-30 19:19 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Linus Torvalds

Jeff King <peff@peff.net> writes:

> ... Now that function _would_
> want to be updated as a result of the other conversation (it would need
> to do something sensible with "-1", like turning it into "7", or
> whatever else is deemed reasonable outside of a repository).
>
> Anyway. I just wonder if you want to give it a better name while you are
> at it.

I'd say the patch to introduce the new function that makes the old
name potentially confusing is a good one to do the rename.  Until
then I do not think there is no need to rename the existing one ;-)

Related tangent about "like turning it into", I am thinking adding
something like this as a preparatory step to Linus's auto-sizing
serires.  That way, we do not have to spell "7"

Having to spell FALLBACK_DEFAULT_ABBREV over and over again would be
more irritating than having to spell "7" often, but I think it would
be a sign of a deeper problem if it turns out we have to repeat this
constant in many places, so an irritatingly long name may serve as a
canary in the coalmine ;-)

-- >8 --
Subject: abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing

We'll be introducing a new way to decide the default abbreviation
length by initialising DEFAULT_ABBREV to -1 to signal the first call
to "find unique abbreviation" codepath to compute a reasonable value
based on the number of objects we have to avoid collisions.

We have long relied on DEFAULT_ABBREV being a positive concrete
value that is used as the abbreviation length when no extra
configuration or command line option has overridden it.  Some
codepaths wants to use such a positive concrete default value
even before making their first request to actually trigger the
computation for the auto sized default.

Introduce FALLBACK_DEFAULT_ABBREV and use it to the code that
attempts to align the report from "git fetch".  For now, this
macro is also used to initialize the default_abbrev variable,
but the auto-sizing code will use -1 and then use the value of
FALLBACK_DEFAULT_ABBREV as the starting point of auto-sizing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 builtin/fetch.c | 3 +++
 cache.h         | 3 +++
 environment.c   | 2 +-
 transport.h     | 3 +--
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e4639d8eb1..5d6994d8e7 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -16,6 +16,9 @@
 #include "connected.h"
 #include "argv-array.h"
 
+#define TRANSPORT_SUMMARY(x) \
+	(int)(TRANSPORT_SUMMARY_WIDTH + strlen(x) - gettext_width(x)), (x)
+
 static const char * const builtin_fetch_usage[] = {
 	N_("git fetch [<options>] [<repository> [<refspec>...]]"),
 	N_("git fetch [<options>] <group>"),
diff --git a/cache.h b/cache.h
index 4ff196c259..677554c59f 100644
--- a/cache.h
+++ b/cache.h
@@ -1133,6 +1133,9 @@ static inline unsigned int hexval(unsigned char c)
 #define MINIMUM_ABBREV minimum_abbrev
 #define DEFAULT_ABBREV default_abbrev
 
+/* used when the code does not know or care what the default abbrev is */
+#define FALLBACK_DEFAULT_ABBREV 7
+
 struct object_context {
 	unsigned char tree[20];
 	char path[PATH_MAX];
diff --git a/environment.c b/environment.c
index 96160a75a5..c8860f722d 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = FALLBACK_DEFAULT_ABBREV;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/transport.h b/transport.h
index c68140892c..ea25e42317 100644
--- a/transport.h
+++ b/transport.h
@@ -135,8 +135,7 @@ struct transport {
 #define TRANSPORT_PUSH_CERT_IF_ASKED 4096
 #define TRANSPORT_PUSH_ATOMIC 8192
 
-#define TRANSPORT_SUMMARY_WIDTH (2 * DEFAULT_ABBREV + 3)
-#define TRANSPORT_SUMMARY(x) (int)(TRANSPORT_SUMMARY_WIDTH + strlen(x) - gettext_width(x)), (x)
+#define TRANSPORT_SUMMARY_WIDTH (2 * FALLBACK_DEFAULT_ABBREV + 3)
 
 /* Returns a transport suitable for the url */
 struct transport *transport_get(struct remote *, const char *);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] diff_unique_abbrev(): document its assumtion and limitation
  2016-09-30 19:19   ` Junio C Hamano
@ 2016-10-01  9:15     ` Jeff King
  2016-10-03 17:08       ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2016-10-01  9:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Fri, Sep 30, 2016 at 12:19:51PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > ... Now that function _would_
> > want to be updated as a result of the other conversation (it would need
> > to do something sensible with "-1", like turning it into "7", or
> > whatever else is deemed reasonable outside of a repository).
> >
> > Anyway. I just wonder if you want to give it a better name while you are
> > at it.
> 
> I'd say the patch to introduce the new function that makes the old
> name potentially confusing is a good one to do the rename.  Until
> then I do not think there is no need to rename the existing one ;-)

I guess my point was that the poor name may have contributed to the need
to explain it. But I'm happy to deal with it in my series (I also
updated it to use "struct oid"; I'll probably rebase mine on top of your
comment to save you dealing with the nasty merge).

> Related tangent about "like turning it into", I am thinking adding
> something like this as a preparatory step to Linus's auto-sizing
> serires.  That way, we do not have to spell "7"
> [...]
> -- >8 --
> Subject: abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing

Yep, looks like a good idea.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] diff_unique_abbrev(): document its assumtion and limitation
  2016-10-01  9:15     ` Jeff King
@ 2016-10-03 17:08       ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2016-10-03 17:08 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Linus Torvalds

Jeff King <peff@peff.net> writes:

> I guess my point was that the poor name may have contributed to the need
> to explain it.

The comment was not about "it may not be obvious but this tries to
pad and align", but to say "the way this tries to pad and align is
based on an unsaid assumption that leads to this limitation".  I do
agree it is a good idea to rename it to a name that has 'pad' or
'align' in addition to 'unique', but I doubt renaming alone would
reduce the need for the new comment.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-10-03 17:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-30 17:54 [PATCH] diff_unique_abbrev(): document its assumtion and limitation Junio C Hamano
2016-09-30 18:09 ` Jeff King
2016-09-30 19:19   ` Junio C Hamano
2016-10-01  9:15     ` Jeff King
2016-10-03 17:08       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).