git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Peart <peartben@gmail.com>
To: "Eckhard Maaß" <eckhard.s.maass@googlemail.com>,
	"Ben Peart" <Ben.Peart@microsoft.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	"newren@gmail.com" <newren@gmail.com>,
	"gitster@pobox.com" <gitster@pobox.com>,
	"pclouds@gmail.com" <pclouds@gmail.com>,
	"vmiklos@frugalware.org" <vmiklos@frugalware.org>,
	Alejandro Pauly <alpauly@microsoft.com>,
	"Johannes.Schindelin@gmx.de" <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH v3] add status config and command line options for rename detection
Date: Mon, 14 May 2018 08:57:45 -0400	[thread overview]
Message-ID: <d306e8bf-1847-ca81-16ac-fa99ee3f6755@gmail.com> (raw)
In-Reply-To: <20180512080437.GA16679@esm>



On 5/12/2018 4:04 AM, Eckhard Maaß wrote:
> On Fri, May 11, 2018 at 12:56:39PM +0000, Ben Peart wrote:
>> After performing a merge that has conflicts git status will, by default,
>> attempt to detect renames which causes many objects to be examined.  In a
>> virtualized repo, those objects do not exist locally so the rename logic
>> triggers them to be fetched from the server. This results in the status call
>> taking hours to complete on very large repos vs seconds with this patch.
> 
> I see where your need comes from, but as you based this on my little
> patch one can achieve this already with tweaking diff.renames itself. I
> do wonder why there is a special need for the status command here

The rename detection feature is nice and we'd like to leave it on 
whenever possible.  The performance issues only occur when in the middle 
of a merge - normal status commands behave reasonably.  As a result, we 
don't want to just turn it off completely by setting diff.renames.

Until we come up with a more elegant solution, we currently turn it off 
completely for merge via the new merge settings and then intercept calls 
to status and if there is a MERGE_HEAD we turn it off for status just 
for that specific call.  I view this as a temporary solution so would 
not want to put that logic into git proper as it is quite specific to 
when running git on a virtualized repo.

> if there is, I personally would like it more in a style that you could
> take all the options provided by diff.*-configuration and prefix that
> with status, eg status.diff.renames = true. What do you think? If you
> really only need this for merges, maybe a more specialised option is
> called for that only kicks in when there is a merge going on?
> 
> I would like that status behaves as similar as possible to
> diff/show/log. Special options will pull away from that again - passing
> -m to show or log will lead to the same performance issues, correct?
> Could it be feasible to impose an overall time limit on the detection?
> 

I agree that they should behave as similar as possible which is why all 
the new settings default to the diff setting when not explicitly set.  I 
believe this is a good model - if you don't do anything special you get 
the default/same behavior but if you know and need special behavior, you 
now have that option.

> And after writing this I wonder what were your experience with just
> tweaking renameLimit - setting it very low should have helped the
> fetching from server part already, shouldn't it?
> 
>> Add --no-renames command line option to status that enables overriding the
>> config setting from the command line. Add --find-renames[=<n>] command line
>> option to status that enables detecting renames and optionally setting the
>> similarity index.
> 
> Would it be reasonable to extend this so that we just use the same
> machinery for parsing command line options for the diffcore options and
> pass this along? It seems to me that git status wants the same init as
> diff/show/log has anyway. But I like the direction towards passing more
> command line options to the git status command.
> 

I agree that it is unfortunate that diff/merge/status all parse and deal 
with config settings differently.  I'd be happy to see someone tackle 
that and move the code to a single, coherent model but that is beyond 
the scope of this patch.

>>   static void wt_longstatus_print_unmerged_header(struct wt_status *s)
>> @@ -592,6 +595,9 @@ static void wt_status_collect_changes_worktree(struct wt_status *s)
>>   	}
>>   	rev.diffopt.format_callback = wt_status_collect_changed_cb;
>>   	rev.diffopt.format_callback_data = s;
>> +	rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename;
>> +	rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
>> +	rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
>>   	copy_pathspec(&rev.prune_data, &s->pathspec);
>>   	run_diff_files(&rev, 0);
>>   }
>> @@ -625,6 +631,9 @@ static void wt_status_collect_changes_index(struct wt_status *s)
>>   	rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
>>   	rev.diffopt.format_callback = wt_status_collect_updated_cb;
>>   	rev.diffopt.format_callback_data = s;
>> +	rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename;
>> +	rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
>> +	rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
>>   	copy_pathspec(&rev.prune_data, &s->pathspec);
>>   	run_diff_index(&rev, 1);
>>   }
>> @@ -982,6 +991,9 @@ static void wt_longstatus_print_verbose(struct wt_status *s)
>>   	setup_revisions(0, NULL, &rev, &opt);
>>   
>>   	rev.diffopt.output_format |= DIFF_FORMAT_PATCH;
>> +	rev.diffopt.detect_rename = s->detect_rename >= 0 ? s->detect_rename : rev.diffopt.detect_rename;
>> +	rev.diffopt.rename_limit = s->rename_limit >= 0 ? s->rename_limit : rev.diffopt.rename_limit;
>> +	rev.diffopt.rename_score = s->rename_score >= 0 ? s->rename_score : rev.diffopt.rename_score;
>>   	rev.diffopt.file = s->fp;
>>   	rev.diffopt.close_file = 0;
>>   	/*
> 
> Somehow I am inclined that those should be factored out to a common
> method if the rest of the patch stays as it is.
> 

I debated that as well but given the logic is so simple, opted to stick 
with this.  I also debated whether it would be clearer in the form:

if (s->detect_rename >= 0)
	rev.diffopt.detect_rename = s->detect_rename;

But decided git contributes are used to seeing dense code :) and this 
style better matched what I saw in the merge settings.

> Greetings,
> Eckhard
> 

  reply	other threads:[~2018-05-14 12:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-09 14:42 [PATCH v1] add status config and command line options for rename detection Ben Peart
2018-05-09 15:59 ` Duy Nguyen
2018-05-09 17:04   ` Ben Peart
2018-05-09 16:56 ` Elijah Newren
2018-05-09 19:54   ` Ben Peart
2018-05-10 14:16 ` [PATCH v2] " Ben Peart
2018-05-10 16:19   ` Elijah Newren
2018-05-10 19:09     ` Ben Peart
2018-05-10 22:31       ` Elijah Newren
2018-05-11 12:50         ` Ben Peart
2018-05-11  1:57     ` Junio C Hamano
2018-05-11  6:39   ` Junio C Hamano
2018-05-11 12:56 ` [PATCH v3] " Ben Peart
2018-05-11 14:33   ` Elijah Newren
2018-05-12  8:04   ` Eckhard Maaß
2018-05-14 12:57     ` Ben Peart [this message]
2018-05-11 15:38 ` [PATCH v4] " Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d306e8bf-1847-ca81-16ac-fa99ee3f6755@gmail.com \
    --to=peartben@gmail.com \
    --cc=Ben.Peart@microsoft.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=alpauly@microsoft.com \
    --cc=eckhard.s.maass@googlemail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=vmiklos@frugalware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).