git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] Fix issues with rename detection limits
@ 2017-11-10 17:39 Elijah Newren
  2017-11-10 17:39 ` [PATCH 1/4] sequencer: Warn when internal merge may be suboptimal due to renameLimit Elijah Newren
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Elijah Newren @ 2017-11-10 17:39 UTC (permalink / raw)
  To: git

In a repo with around 60k files with deep directory hierarchies (due to
being predominantly java code) and which had a few high level directories
moved around (making it appear to git as dozens of thousands of individual
file renames), attempts to cherry-pick commits across product versions
resulted in non-sensical delete/modify conflicts (rather than
rename/modify as expected).  We had a few teams who were in the unenviable
position of having to backport hundreds or thousands of commits across
such product versions, and the result was months of pain, which could have
been alleviated were it not for a few small git bugs:

  * When you try to cherry-pick changes, unlike when you merge two
    branches, git will not notify you when you need to set a higher
    merge.renameLimit.

  * If you know git internals well enough, you can try to trick git into
    telling you the correct renameLimit by doing a merge instead of a
    cherry-pick.  If you do that, and have a renameLimit that is too
    small, git will let you know the value you need.  You can then undo
    the merge and proceed with the cherry-pick.  Except that...

  * If you are performing a merge and specify a large renameLimit that
    isn't large enough, and the necessary renameLimit you need is greater
    than 32767, then git tells you nothing, leading you to believe that
    the limit you specified is high enough, but only to watch it still
    mess up the merge badly.

  * If you happen to specify a merge.renameLimit that *is* high enough,
    but just happens to be greater than 32767, then git will silently
    pretend you specified 32767, determine that 32767 is not high enough,
    not tell you anything about it's silent clamping, and then proceed to
    mess up the merge or cherry-pick badly.  Not only do you get no
    feedback, the clamping to 32767 isn't documented anywhere even if you
    tried to read every manual page.

Folks did discover the merge.renameLimit and tried setting it to various
values, spread over the spectrum from 1000 (the default) up to 999999999
(or maybe more, that's just the highest number I heard), completely
unaware that most their attempts were ignored and wondering why git
couldn't handle things and couldn't explain why either.

Eventually folks wrote scripts that would run the output of format-patch
through a bunch of sed commands to pretend patches were against the
filename on the other side of history and then pipe back through git-am.
Such scripts grew as more and more rename rules were added.

I eventually learned of one of these scripts and said something close to,
"You don't need these pile of rename rules; just set merge.renameLimit to
something higher."  When they responded that merge.renameLimit didn't
work, I didn't believe them.  This patch series, along with two others
that I will be sending shortly, represent my attempt to continue to not
believe them.  :-)

Elijah Newren (4):
  sequencer: Warn when internal merge may be suboptimal due to
    renameLimit
  Remove silent clamp of renameLimit
  progress: Fix progress meters when dealing with lots of work
  sequencer: Show rename progress during cherry picks

 diff.c            |  2 +-
 diffcore-rename.c | 15 ++++++---------
 progress.c        | 22 +++++++++++-----------
 progress.h        |  8 ++++----
 sequencer.c       |  2 ++
 5 files changed, 24 insertions(+), 25 deletions(-)

-- 
2.15.0.5.g9567be9905


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] sequencer: Warn when internal merge may be suboptimal due to renameLimit
  2017-11-10 17:39 [PATCH 0/4] Fix issues with rename detection limits Elijah Newren
@ 2017-11-10 17:39 ` Elijah Newren
  2017-11-13  5:16   ` Junio C Hamano
  2017-11-10 17:39 ` [PATCH 2/4] Remove silent clamp of renameLimit Elijah Newren
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren @ 2017-11-10 17:39 UTC (permalink / raw)
  To: git

Having renamed files silently treated as deleted/modified conflicts
instead of correctly resolving the renamed/modified case has caused lots
of pain for some users.  Eventually, some of them figured out to set
merge.renameLimit to something higher, but without feedback about what
value it should have, they were just repeatedly guessing and retrying.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 sequencer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sequencer.c b/sequencer.c
index f2a10cc4f2..2b4cecb617 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -462,6 +462,7 @@ static int do_recursive_merge(struct commit *base, struct commit *next,
 	if (is_rebase_i(opts) && clean <= 0)
 		fputs(o.obuf.buf, stdout);
 	strbuf_release(&o.obuf);
+	diff_warn_rename_limit("merge.renamelimit", o.needed_rename_limit, 0);
 	if (clean < 0)
 		return clean;
 
-- 
2.15.0.5.g9567be9905


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] Remove silent clamp of renameLimit
  2017-11-10 17:39 [PATCH 0/4] Fix issues with rename detection limits Elijah Newren
  2017-11-10 17:39 ` [PATCH 1/4] sequencer: Warn when internal merge may be suboptimal due to renameLimit Elijah Newren
@ 2017-11-10 17:39 ` Elijah Newren
  2017-11-10 18:26   ` Stefan Beller
  2017-11-10 17:39 ` [PATCH 3/4] progress: Fix progress meters when dealing with lots of work Elijah Newren
  2017-11-10 17:39 ` [PATCH 4/4] sequencer: Show rename progress during cherry picks Elijah Newren
  3 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren @ 2017-11-10 17:39 UTC (permalink / raw)
  To: git

In commit 0024a5492 (Fix the rename detection limit checking; 2007-09-14),
the renameLimit was clamped to 32767.  This appears to have been to simply
avoid integer overflow in the following computation:

   num_create * num_src <= rename_limit * rename_limit

although it also could be viewed as a hardcoded bound on the amount of CPU
time we're willing to allow users to tell git to spend on handling
renames.  An upper bound may make sense, particularly as the computation
is O(rename_limit^2), but only if the bound is documented and communicated
to the user -- neither of which were true.

In fact, the silent clamping of the renameLimit to a smaller value and
lack of reporting of the needed renameLimit when it was too large made it
appear to the user as though they had used a high enough value; however,
git would proceed to mess up the merge or cherry-pick badly based on the
lack of rename detection.  Some hardy folks, despite the lack of feedback
on the correct limit to choose, were desperate enough to repeatedly retry
their cherry-picks with increasingly larger renameLimit values (going
orders of magnitude beyond the built-in limit of 32767), but were
consistently met with the same failure.

Although large limits can make things slow, we have users who would be
ecstatic to have a small five file change be correctly cherry picked even
if they have to manually specify a large limit and it took git ten minutes
to compute it.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diff.c            |  2 +-
 diffcore-rename.c | 11 ++++-------
 2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 6fd288420b..c6597e3231 100644
--- a/diff.c
+++ b/diff.c
@@ -5524,7 +5524,7 @@ void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc)
 		warning(_(rename_limit_warning));
 	else
 		return;
-	if (0 < needed && needed < 32767)
+	if (0 < needed)
 		warning(_(rename_limit_advice), varname, needed);
 }
 
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 0d8c3d2ee4..7f9a463f5a 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -391,14 +391,10 @@ static int too_many_rename_candidates(int num_create,
 	 * growing larger than a "rename_limit" square matrix, ie:
 	 *
 	 *    num_create * num_src > rename_limit * rename_limit
-	 *
-	 * but handles the potential overflow case specially (and we
-	 * assume at least 32-bit integers)
 	 */
-	if (rename_limit <= 0 || rename_limit > 32767)
-		rename_limit = 32767;
 	if ((num_create <= rename_limit || num_src <= rename_limit) &&
-	    (num_create * num_src <= rename_limit * rename_limit))
+	    ((double)num_create * (double)num_src
+	     <= (double)rename_limit * (double)rename_limit))
 		return 0;
 
 	options->needed_rename_limit =
@@ -415,7 +411,8 @@ static int too_many_rename_candidates(int num_create,
 		num_src++;
 	}
 	if ((num_create <= rename_limit || num_src <= rename_limit) &&
-	    (num_create * num_src <= rename_limit * rename_limit))
+	    ((double)num_create * (double)num_src
+	     <= (double)rename_limit * (double)rename_limit))
 		return 2;
 	return 1;
 }
-- 
2.15.0.5.g9567be9905


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] progress: Fix progress meters when dealing with lots of work
  2017-11-10 17:39 [PATCH 0/4] Fix issues with rename detection limits Elijah Newren
  2017-11-10 17:39 ` [PATCH 1/4] sequencer: Warn when internal merge may be suboptimal due to renameLimit Elijah Newren
  2017-11-10 17:39 ` [PATCH 2/4] Remove silent clamp of renameLimit Elijah Newren
@ 2017-11-10 17:39 ` Elijah Newren
  2017-11-13  5:24   ` Junio C Hamano
  2017-11-10 17:39 ` [PATCH 4/4] sequencer: Show rename progress during cherry picks Elijah Newren
  3 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren @ 2017-11-10 17:39 UTC (permalink / raw)
  To: git

The possibility of setting merge.renameLimit beyond 2^16 raises the
possibility that the values passed to progress can exceed 2^32.  For my
usecase of interest, I only needed to pass a value a little over 2^31.  If
I were only interested in fixing my usecase, I could have simply changed
last_value from int to unsigned, and casted each of rename_dst_nr and
rename_src_nr (in merge-recursive.c) from int to unsigned just before
multiplying them.  However, as long as we're making changes to allow
larger progress meters, we may as well make a little more room in general.
Use uint64_t, because it "ought to be enough for anybody".  :-)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diffcore-rename.c |  4 ++--
 progress.c        | 22 +++++++++++-----------
 progress.h        |  8 ++++----
 3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 7f9a463f5a..6ba6157c61 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -531,7 +531,7 @@ void diffcore_rename(struct diff_options *options)
 	if (options->show_rename_progress) {
 		progress = start_delayed_progress(
 				_("Performing inexact rename detection"),
-				rename_dst_nr * rename_src_nr);
+				(uint64_t)rename_dst_nr * (uint64_t)rename_src_nr);
 	}
 
 	mx = xcalloc(st_mult(NUM_CANDIDATE_PER_DST, num_create), sizeof(*mx));
@@ -568,7 +568,7 @@ void diffcore_rename(struct diff_options *options)
 			diff_free_filespec_blob(two);
 		}
 		dst_cnt++;
-		display_progress(progress, (i+1)*rename_src_nr);
+		display_progress(progress, (uint64_t)(i+1)*(uint64_t)rename_src_nr);
 	}
 	stop_progress(&progress);
 
diff --git a/progress.c b/progress.c
index 289678d43d..7e4a2f9532 100644
--- a/progress.c
+++ b/progress.c
@@ -30,8 +30,8 @@ struct throughput {
 
 struct progress {
 	const char *title;
-	int last_value;
-	unsigned total;
+	uint64_t last_value;
+	uint64_t total;
 	unsigned last_percent;
 	unsigned delay;
 	unsigned delayed_percent_threshold;
@@ -79,7 +79,7 @@ static int is_foreground_fd(int fd)
 	return tpgrp < 0 || tpgrp == getpgid(0);
 }
 
-static int display(struct progress *progress, unsigned n, const char *done)
+static int display(struct progress *progress, uint64_t n, const char *done)
 {
 	const char *eol, *tp;
 
@@ -106,7 +106,7 @@ static int display(struct progress *progress, unsigned n, const char *done)
 		if (percent != progress->last_percent || progress_update) {
 			progress->last_percent = percent;
 			if (is_foreground_fd(fileno(stderr)) || done) {
-				fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
+				fprintf(stderr, "%s: %3u%% (%lu/%lu)%s%s",
 					progress->title, percent, n,
 					progress->total, tp, eol);
 				fflush(stderr);
@@ -116,7 +116,7 @@ static int display(struct progress *progress, unsigned n, const char *done)
 		}
 	} else if (progress_update) {
 		if (is_foreground_fd(fileno(stderr)) || done) {
-			fprintf(stderr, "%s: %u%s%s",
+			fprintf(stderr, "%s: %lu%s%s",
 				progress->title, n, tp, eol);
 			fflush(stderr);
 		}
@@ -127,7 +127,7 @@ static int display(struct progress *progress, unsigned n, const char *done)
 	return 0;
 }
 
-static void throughput_string(struct strbuf *buf, off_t total,
+static void throughput_string(struct strbuf *buf, uint64_t total,
 			      unsigned int rate)
 {
 	strbuf_reset(buf);
@@ -138,7 +138,7 @@ static void throughput_string(struct strbuf *buf, off_t total,
 	strbuf_addstr(buf, "/s");
 }
 
-void display_throughput(struct progress *progress, off_t total)
+void display_throughput(struct progress *progress, uint64_t total)
 {
 	struct throughput *tp;
 	uint64_t now_ns;
@@ -200,12 +200,12 @@ void display_throughput(struct progress *progress, off_t total)
 		display(progress, progress->last_value, NULL);
 }
 
-int display_progress(struct progress *progress, unsigned n)
+int display_progress(struct progress *progress, uint64_t n)
 {
 	return progress ? display(progress, n, NULL) : 0;
 }
 
-static struct progress *start_progress_delay(const char *title, unsigned total,
+static struct progress *start_progress_delay(const char *title, uint64_t total,
 					     unsigned percent_threshold, unsigned delay)
 {
 	struct progress *progress = malloc(sizeof(*progress));
@@ -227,12 +227,12 @@ static struct progress *start_progress_delay(const char *title, unsigned total,
 	return progress;
 }
 
-struct progress *start_delayed_progress(const char *title, unsigned total)
+struct progress *start_delayed_progress(const char *title, uint64_t total)
 {
 	return start_progress_delay(title, total, 0, 2);
 }
 
-struct progress *start_progress(const char *title, unsigned total)
+struct progress *start_progress(const char *title, uint64_t total)
 {
 	return start_progress_delay(title, total, 0, 0);
 }
diff --git a/progress.h b/progress.h
index 6392b63371..70a4d4a0d6 100644
--- a/progress.h
+++ b/progress.h
@@ -3,10 +3,10 @@
 
 struct progress;
 
-void display_throughput(struct progress *progress, off_t total);
-int display_progress(struct progress *progress, unsigned n);
-struct progress *start_progress(const char *title, unsigned total);
-struct progress *start_delayed_progress(const char *title, unsigned total);
+void display_throughput(struct progress *progress, uint64_t total);
+int display_progress(struct progress *progress, uint64_t n);
+struct progress *start_progress(const char *title, uint64_t total);
+struct progress *start_delayed_progress(const char *title, uint64_t total);
 void stop_progress(struct progress **progress);
 void stop_progress_msg(struct progress **progress, const char *msg);
 
-- 
2.15.0.5.g9567be9905


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] sequencer: Show rename progress during cherry picks
  2017-11-10 17:39 [PATCH 0/4] Fix issues with rename detection limits Elijah Newren
                   ` (2 preceding siblings ...)
  2017-11-10 17:39 ` [PATCH 3/4] progress: Fix progress meters when dealing with lots of work Elijah Newren
@ 2017-11-10 17:39 ` Elijah Newren
  2017-11-13  5:25   ` Junio C Hamano
  3 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren @ 2017-11-10 17:39 UTC (permalink / raw)
  To: git

When trying to cherry-pick a change that has lots of renames, it is
somewhat unsettling to wait a really long time without any feedback.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 sequencer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sequencer.c b/sequencer.c
index 2b4cecb617..247d93f363 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -448,6 +448,7 @@ static int do_recursive_merge(struct commit *base, struct commit *next,
 	o.branch2 = next ? next_label : "(empty tree)";
 	if (is_rebase_i(opts))
 		o.buffer_output = 2;
+	o.show_rename_progress = 1;
 
 	head_tree = parse_tree_indirect(head);
 	next_tree = next ? next->tree : empty_tree();
-- 
2.15.0.5.g9567be9905


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] Remove silent clamp of renameLimit
  2017-11-10 17:39 ` [PATCH 2/4] Remove silent clamp of renameLimit Elijah Newren
@ 2017-11-10 18:26   ` Stefan Beller
  2017-11-10 18:36     ` Elijah Newren
  0 siblings, 1 reply; 15+ messages in thread
From: Stefan Beller @ 2017-11-10 18:26 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git

On Fri, Nov 10, 2017 at 9:39 AM, Elijah Newren <newren@gmail.com> wrote:
> In commit 0024a5492 (Fix the rename detection limit checking; 2007-09-14),
> the renameLimit was clamped to 32767.  This appears to have been to simply
> avoid integer overflow in the following computation:
>
>    num_create * num_src <= rename_limit * rename_limit
>
> although it also could be viewed as a hardcoded bound on the amount of CPU
> time we're willing to allow users to tell git to spend on handling
> renames.  An upper bound may make sense, particularly as the computation
> is O(rename_limit^2), but only if the bound is documented and communicated
> to the user -- neither of which were true.
>
> In fact, the silent clamping of the renameLimit to a smaller value and
> lack of reporting of the needed renameLimit when it was too large made it
> appear to the user as though they had used a high enough value; however,
> git would proceed to mess up the merge or cherry-pick badly based on the
> lack of rename detection.  Some hardy folks, despite the lack of feedback
> on the correct limit to choose, were desperate enough to repeatedly retry
> their cherry-picks with increasingly larger renameLimit values (going
> orders of magnitude beyond the built-in limit of 32767), but were
> consistently met with the same failure.
>
> Although large limits can make things slow, we have users who would be
> ecstatic to have a small five file change be correctly cherry picked even
> if they have to manually specify a large limit and it took git ten minutes
> to compute it.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  diff.c            |  2 +-
>  diffcore-rename.c | 11 ++++-------
>  2 files changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 6fd288420b..c6597e3231 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -5524,7 +5524,7 @@ void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc)
>                 warning(_(rename_limit_warning));
>         else
>                 return;
> -       if (0 < needed && needed < 32767)
> +       if (0 < needed)
>                 warning(_(rename_limit_advice), varname, needed);
>  }
>
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index 0d8c3d2ee4..7f9a463f5a 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -391,14 +391,10 @@ static int too_many_rename_candidates(int num_create,
>          * growing larger than a "rename_limit" square matrix, ie:
>          *
>          *    num_create * num_src > rename_limit * rename_limit
> -        *
> -        * but handles the potential overflow case specially (and we
> -        * assume at least 32-bit integers)
>          */
> -       if (rename_limit <= 0 || rename_limit > 32767)
> -               rename_limit = 32767;
>         if ((num_create <= rename_limit || num_src <= rename_limit) &&
> -           (num_create * num_src <= rename_limit * rename_limit))
> +           ((double)num_create * (double)num_src
> +            <= (double)rename_limit * (double)rename_limit))
>                 return 0;

From a technical perspective, I would think that if
(num_create <= rename_limit || num_src <= rename_limit)
holds true, that the double-cast condition would also be always true?
Could we just remove that last check?

Or phrased differently, if we can cast to double and extend the check
here, do we have to adapt code at other places as well?

>
>         options->needed_rename_limit =
> @@ -415,7 +411,8 @@ static int too_many_rename_candidates(int num_create,
>                 num_src++;
>         }
>         if ((num_create <= rename_limit || num_src <= rename_limit) &&
> -           (num_create * num_src <= rename_limit * rename_limit))
> +           ((double)num_create * (double)num_src
> +            <= (double)rename_limit * (double)rename_limit))
>                 return 2;
>         return 1;
>  }
> --
> 2.15.0.5.g9567be9905
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] Remove silent clamp of renameLimit
  2017-11-10 18:26   ` Stefan Beller
@ 2017-11-10 18:36     ` Elijah Newren
  2017-11-10 23:42       ` brian m. carlson
  0 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren @ 2017-11-10 18:36 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Thanks for taking a look!

On Fri, Nov 10, 2017 at 10:26 AM, Stefan Beller <sbeller@google.com> wrote:
<snip>
>> -       if (rename_limit <= 0 || rename_limit > 32767)
>> -               rename_limit = 32767;
>>         if ((num_create <= rename_limit || num_src <= rename_limit) &&
>> -           (num_create * num_src <= rename_limit * rename_limit))
>> +           ((double)num_create * (double)num_src
>> +            <= (double)rename_limit * (double)rename_limit))
>>                 return 0;
>
> From a technical perspective, I would think that if
> (num_create <= rename_limit || num_src <= rename_limit)
> holds true, that the double-cast condition would also be always true?
> Could we just remove that last check?

Not necessarily.  For example, if num_create = rename_limit-1 and
num_src = rename_limit+2, then the first condition will be satisfied
but the second won't.  If it was && rather than ||, then the second
condition would be superfluous.

> Or phrased differently, if we can cast to double and extend the check
> here, do we have to adapt code at other places as well?

Good point, and yes.  Perhaps I should have re-ordered my patch series
because I came back to it later and realized that the progress code
was broken due to overflow/wraparound, and a patch 3 fixed that.

Further, the later patch used uint64_t instead of double.  While
double works, perhaps I should change the double here to uint64_t for
consistency?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] Remove silent clamp of renameLimit
  2017-11-10 18:36     ` Elijah Newren
@ 2017-11-10 23:42       ` brian m. carlson
  2017-11-11 16:39         ` Elijah Newren
  0 siblings, 1 reply; 15+ messages in thread
From: brian m. carlson @ 2017-11-10 23:42 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Stefan Beller, git

[-- Attachment #1: Type: text/plain, Size: 1522 bytes --]

On Fri, Nov 10, 2017 at 10:36:17AM -0800, Elijah Newren wrote:
> Thanks for taking a look!
> 
> On Fri, Nov 10, 2017 at 10:26 AM, Stefan Beller <sbeller@google.com> wrote:
> > From a technical perspective, I would think that if
> > (num_create <= rename_limit || num_src <= rename_limit)
> > holds true, that the double-cast condition would also be always true?
> > Could we just remove that last check?
> 
> Not necessarily.  For example, if num_create = rename_limit-1 and
> num_src = rename_limit+2, then the first condition will be satisfied
> but the second won't.  If it was && rather than ||, then the second
> condition would be superfluous.
> 
> > Or phrased differently, if we can cast to double and extend the check
> > here, do we have to adapt code at other places as well?
> 
> Good point, and yes.  Perhaps I should have re-ordered my patch series
> because I came back to it later and realized that the progress code
> was broken due to overflow/wraparound, and a patch 3 fixed that.
> 
> Further, the later patch used uint64_t instead of double.  While
> double works, perhaps I should change the double here to uint64_t for
> consistency?

I'm wondering if maybe you want to use size_t.  If you end up using an
unsigned type, you might be able to leverage unsigned_mult_overflows to
avoid having to write this by hand.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] Remove silent clamp of renameLimit
  2017-11-10 23:42       ` brian m. carlson
@ 2017-11-11 16:39         ` Elijah Newren
  2017-11-11 17:32           ` brian m. carlson
  0 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren @ 2017-11-11 16:39 UTC (permalink / raw)
  To: brian m. carlson, Elijah Newren, Stefan Beller, git

Hi,

On Fri, Nov 10, 2017 at 3:42 PM, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> On Fri, Nov 10, 2017 at 10:36:17AM -0800, Elijah Newren wrote:
<snip>
>> Further, the later patch used uint64_t instead of double.  While
>> double works, perhaps I should change the double here to uint64_t for
>> consistency?
>
> I'm wondering if maybe you want to use size_t.  If you end up using an
> unsigned type, you might be able to leverage unsigned_mult_overflows to
> avoid having to write this by hand.

Thanks for pointing out unsigned_mult_overflows; I was unaware of it.
I think I'd prefer to not use it though; the fact that I had a case
that genuinely needed a value greater than 2^31 (at least before my
performance patches) meant that a slightly bigger repo is going to
eventually need a value over 2^32, so I think we should just cast to a
type that can hold it.

I'm curious why you suggest size_t, though.  I have always associated
that with an amount of memory that will be used, and there's no
allocation based on this result.  Was I wrong to make that
association, or is there another good reason here?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] Remove silent clamp of renameLimit
  2017-11-11 16:39         ` Elijah Newren
@ 2017-11-11 17:32           ` brian m. carlson
  0 siblings, 0 replies; 15+ messages in thread
From: brian m. carlson @ 2017-11-11 17:32 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Stefan Beller, git

[-- Attachment #1: Type: text/plain, Size: 1353 bytes --]

On Sat, Nov 11, 2017 at 08:39:11AM -0800, Elijah Newren wrote:
> Thanks for pointing out unsigned_mult_overflows; I was unaware of it.
> I think I'd prefer to not use it though; the fact that I had a case
> that genuinely needed a value greater than 2^31 (at least before my
> performance patches) meant that a slightly bigger repo is going to
> eventually need a value over 2^32, so I think we should just cast to a
> type that can hold it.

That's fine.  I had considered this in the context of 64-bit values, but
I suppose that the likelihood of us hitting 2**64 iterations (and
performing reasonably as well) is unlikely.

> I'm curious why you suggest size_t, though.  I have always associated
> that with an amount of memory that will be used, and there's no
> allocation based on this result.  Was I wrong to make that
> association, or is there another good reason here?

I usually like size_t for values that are unsigned and don't need to be
fixed size because it's usually the largest efficient unsigned type.
However, if you want something to handle items larger than 2**32, then I
agree that it's maybe not a great idea if we want it to work on 32-bit
systems.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] sequencer: Warn when internal merge may be suboptimal due to renameLimit
  2017-11-10 17:39 ` [PATCH 1/4] sequencer: Warn when internal merge may be suboptimal due to renameLimit Elijah Newren
@ 2017-11-13  5:16   ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-11-13  5:16 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git

Elijah Newren <newren@gmail.com> writes:

> Subject: Re: [PATCH 1/4] sequencer: Warn when internal merge may be suboptimal due to renameLimit

Style: please downcase s/Warn/warn/ to fit this better in "shortlog --no-merges"
output.

> Having renamed files silently treated as deleted/modified conflicts
> instead of correctly resolving the renamed/modified case has caused lots
> of pain for some users.  Eventually, some of them figured out to set
> merge.renameLimit to something higher, but without feedback about what
> value it should have, they were just repeatedly guessing and retrying.

It would have been _much_ easier to read if you refrained from
stating the gripes more prominently than the source of the gripe
that you are fixing.  E.g.

	If one side of a merge have renamed more paths than
	merge.renamelimit since the sides diverged, the recursive
	merge strategy stops detecting renames and instead leaves
	these many paths as delete/modify conflicts, without warning
	what is going on and giving hints on how to tell Git that it
	is allowed to spend more cycles to detect renames.

perhaps.

The addition of a call to diff_warn_rename_limit looks quite
sensible.

Thanks.

> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  sequencer.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/sequencer.c b/sequencer.c
> index f2a10cc4f2..2b4cecb617 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -462,6 +462,7 @@ static int do_recursive_merge(struct commit *base, struct commit *next,
>  	if (is_rebase_i(opts) && clean <= 0)
>  		fputs(o.obuf.buf, stdout);
>  	strbuf_release(&o.obuf);
> +	diff_warn_rename_limit("merge.renamelimit", o.needed_rename_limit, 0);
>  	if (clean < 0)
>  		return clean;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] progress: Fix progress meters when dealing with lots of work
  2017-11-10 17:39 ` [PATCH 3/4] progress: Fix progress meters when dealing with lots of work Elijah Newren
@ 2017-11-13  5:24   ` Junio C Hamano
  2017-11-13 20:05     ` Elijah Newren
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-11-13  5:24 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git

Elijah Newren <newren@gmail.com> writes:

> Subject: Re: [PATCH 3/4] progress: Fix progress meters when dealing with lots of work

Style: s/Fix/fix/;

> The possibility of setting merge.renameLimit beyond 2^16 raises the
> possibility that the values passed to progress can exceed 2^32.  For my
> usecase of interest, I only needed to pass a value a little over 2^31.  If
> I were only interested in fixing my usecase, I could have simply changed
> last_value from int to unsigned, and casted each of rename_dst_nr and
> rename_src_nr (in merge-recursive.c) from int to unsigned just before
> multiplying them.  However, as long as we're making changes to allow
> larger progress meters, we may as well make a little more room in general.
> Use uint64_t, because it "ought to be enough for anybody".  :-)

The middle part of the log message may waste more mental bandwidth
of readers than it is worth.  It might have gave you satisfaction to
be able to vent, but don't (the place to do so is after the three
dash lines).

I am not sure if we want all codepaths to do 64-bit math for
progress meter, but let's see what others would think.

> -static int display(struct progress *progress, unsigned n, const char *done)
> +static int display(struct progress *progress, uint64_t n, const char *done)
>  {
>  	const char *eol, *tp;
>  
> @@ -106,7 +106,7 @@ static int display(struct progress *progress, unsigned n, const char *done)
>  		if (percent != progress->last_percent || progress_update) {
>  			progress->last_percent = percent;
>  			if (is_foreground_fd(fileno(stderr)) || done) {
> -				fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
> +				fprintf(stderr, "%s: %3u%% (%lu/%lu)%s%s",

Are these (and there are probably other instances in this patch) %lu
correct?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] sequencer: Show rename progress during cherry picks
  2017-11-10 17:39 ` [PATCH 4/4] sequencer: Show rename progress during cherry picks Elijah Newren
@ 2017-11-13  5:25   ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-11-13  5:25 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git

Elijah Newren <newren@gmail.com> writes:

> Subject: Re: [PATCH 4/4] sequencer: Show rename progress during cherry picks

Style: s/Show/show/

> When trying to cherry-pick a change that has lots of renames, it is
> somewhat unsettling to wait a really long time without any feedback.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  sequencer.c | 1 +
>  1 file changed, 1 insertion(+)

Makes sense.


> diff --git a/sequencer.c b/sequencer.c
> index 2b4cecb617..247d93f363 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -448,6 +448,7 @@ static int do_recursive_merge(struct commit *base, struct commit *next,
>  	o.branch2 = next ? next_label : "(empty tree)";
>  	if (is_rebase_i(opts))
>  		o.buffer_output = 2;
> +	o.show_rename_progress = 1;
>  
>  	head_tree = parse_tree_indirect(head);
>  	next_tree = next ? next->tree : empty_tree();

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] progress: Fix progress meters when dealing with lots of work
  2017-11-13  5:24   ` Junio C Hamano
@ 2017-11-13 20:05     ` Elijah Newren
  2017-11-14  1:18       ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Elijah Newren @ 2017-11-13 20:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

Thanks for the reviews and suggestions!

On Sun, Nov 12, 2017 at 9:24 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Elijah Newren <newren@gmail.com> writes:
>
>> Subject: Re: [PATCH 3/4] progress: Fix progress meters when dealing with lots of work
>
> Style: s/Fix/fix/;

I also messed this up in a lot of my patches in my other patch series.
I've fixed them all up, but I'll wait to resubmit those other series
until I get some other reviews.

> The middle part of the log message may waste more mental bandwidth
> of readers than it is worth.  It might have gave you satisfaction to
> be able to vent, but don't (the place to do so is after the three
> dash lines).

Cleaned it up, along with the other commit message you pointed out;
I'll resubmit shortly.

> I am not sure if we want all codepaths to do 64-bit math for
> progress meter, but let's see what others would think.

If others don't want to do 64-bit math for the progress meter, what
would they like to see done instead?  I can see a few options:
  1) Have two separate progress codepaths, one for 32-bith math and
one for 64-bit math.
  2) Instead of counting pairs of source/dest files compared, just
count number of dest paths completed.  (Thus, we wouldn't need a value
big enough to hold rename_dst_nr * rename_src_nr, just big enough to
hold rename_dst_nr).
  3) just let the progress meter overflow and show nonsensical values
  4) don't show the progress meter if overflow would happen
  5) something else I'm not thinking of.

>> -                             fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
>> +                             fprintf(stderr, "%s: %3u%% (%lu/%lu)%s%s",
>
> Are these (and there are probably other instances in this patch) %lu
> correct?

Oops, no.  I think %llu is right, though looking around the code it
appears folks use PRIuMAX and avoid %llu due to possible issues with
old windows compilers.  Not sure if that's still relevant, but I'll
try to remain consistent with what I see elsewhere and include that
fix in my re-roll.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] progress: Fix progress meters when dealing with lots of work
  2017-11-13 20:05     ` Elijah Newren
@ 2017-11-14  1:18       ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-11-14  1:18 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

Elijah Newren <newren@gmail.com> writes:

>   2) Instead of counting pairs of source/dest files compared, just
> count number of dest paths completed.  (Thus, we wouldn't need a value
> big enough to hold rename_dst_nr * rename_src_nr, just big enough to
> hold rename_dst_nr).

This sounds like the most sensible thing to do even if we do switch
to larger integer size, but that is orthogonal to the main point of
this series.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-11-14  1:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-10 17:39 [PATCH 0/4] Fix issues with rename detection limits Elijah Newren
2017-11-10 17:39 ` [PATCH 1/4] sequencer: Warn when internal merge may be suboptimal due to renameLimit Elijah Newren
2017-11-13  5:16   ` Junio C Hamano
2017-11-10 17:39 ` [PATCH 2/4] Remove silent clamp of renameLimit Elijah Newren
2017-11-10 18:26   ` Stefan Beller
2017-11-10 18:36     ` Elijah Newren
2017-11-10 23:42       ` brian m. carlson
2017-11-11 16:39         ` Elijah Newren
2017-11-11 17:32           ` brian m. carlson
2017-11-10 17:39 ` [PATCH 3/4] progress: Fix progress meters when dealing with lots of work Elijah Newren
2017-11-13  5:24   ` Junio C Hamano
2017-11-13 20:05     ` Elijah Newren
2017-11-14  1:18       ` Junio C Hamano
2017-11-10 17:39 ` [PATCH 4/4] sequencer: Show rename progress during cherry picks Elijah Newren
2017-11-13  5:25   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).