leftoverbits - search results

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |

* Re: [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  2021-07-21 17:23  6%         ` Taylor Blau
@ 2021-07-23  7:39  0%           ` Jeff King
  0 siblings, 0 replies; 200+ results
From: Jeff King @ 2021-07-23  7:39 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 01:23:34PM -0400, Taylor Blau wrote:

> > I don't know if it's better to have a poorly-formatted HTML file, or
> > none at all. :)
> >
> > Personally, I would just read the source. And I have a slight concern
> > that if we start "cleaning it up" to render as asciidoc, the source
> > might end up a lot less readable (though I'd reserve judgement until
> > actually seeing it).
> 
> Yeah, the actual source is pretty readable (and it's what I had been
> looking at, although it is sometimes convenient to have a version I can
> read in my web browser). But it's definitely not good Asciidoc.
> 
> I briefly considered cleaning it up, but decided against it. Usually I
> would opt to clean it up, but this series is already so large that I
> figured it would make a negative impact on the reviewer experience to
> read a clean-up patch here.
> 
> I wouldn't be opposed to coming back to it in the future, once the dust
> settles. I guess we can consider this #leftoverbits until then.

Yeah, I definitely don't want to see that cleanup as a dependency for
this series. It's already long enough as it is. Coming back to it later
is just fine with me.

The question here is: should we continue to omit it from the html build,
since it does not render well (i.e., should we simply drop this patch).

-Peff

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default
  @ 2021-07-21 17:23  6%         ` Taylor Blau
  2021-07-23  7:39  0%           ` Jeff King
  0 siblings, 1 reply; 200+ results
From: Taylor Blau @ 2021-07-21 17:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jonathantanmy

On Wed, Jul 21, 2021 at 06:08:18AM -0400, Jeff King wrote:
> On Wed, Jul 21, 2021 at 05:58:41AM -0400, Jeff King wrote:
>
> > On Mon, Jun 21, 2021 at 06:25:07PM -0400, Taylor Blau wrote:
> >
> > > Even though the 'TECH_DOCS' variable was introduced all the way back in
> > > 5e00439f0a (Documentation: build html for all files in technical and
> > > howto, 2012-10-23), the 'bitmap-format' document was never added to that
> > > list when it was created.
> > >
> > > Prepare for changes to this file by including it in the list of
> > > technical documentation that 'make doc' will build by default.
> >
> > OK. I don't care that much about being able to format this as html, but
> > I agree it's good to be consistent with the other stuff in technical/.
> >
> > The big question is whether it looks OK rendered by asciidoc, and the
> > answer seems to be "yes" (from a cursory look I gave it).
>
> Actually, I take it back. After looking more carefully, it renders quite
> poorly. There's a lot of structural indentation that ends up being
> confused as code blocks.
>
> I don't know if it's better to have a poorly-formatted HTML file, or
> none at all. :)
>
> Personally, I would just read the source. And I have a slight concern
> that if we start "cleaning it up" to render as asciidoc, the source
> might end up a lot less readable (though I'd reserve judgement until
> actually seeing it).

Yeah, the actual source is pretty readable (and it's what I had been
looking at, although it is sometimes convenient to have a version I can
read in my web browser). But it's definitely not good Asciidoc.

I briefly considered cleaning it up, but decided against it. Usually I
would opt to clean it up, but this series is already so large that I
figured it would make a negative impact on the reviewer experience to
read a clean-up patch here.

I wouldn't be opposed to coming back to it in the future, once the dust
settles. I guess we can consider this #leftoverbits until then.

Thanks,
Taylor

^ permalink raw reply	[relevance 6%]

* Re: [PATCH] refs file backend: remove dead "errno == EISDIR" code
  2021-07-14 19:07  4%   ` Ævar Arnfjörð Bjarmason
@ 2021-07-14 23:15  0%     ` Jeff King
  0 siblings, 0 replies; 200+ results
From: Jeff King @ 2021-07-14 23:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Han-Wen Nienhuys, Michael Haggerty

On Wed, Jul 14, 2021 at 09:07:41PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Isn't that pseudo-code missing a conditional that's there in the real
> > code? In refs_resolve_ref_unsafe(), I see:
> >
> >        if (refs_read_raw_ref(refs, refname,
> >                              oid, &sb_refname, &read_flags)) {
> >                *flags |= read_flags;
> >
> >                /* In reading mode, refs must eventually resolve */
> >                if (resolve_flags & RESOLVE_REF_READING)
> >                        return NULL;
> >
> >                /*
> >                 * Otherwise a missing ref is OK. But the files backend
> >                 * may show errors besides ENOENT if there are
> >                 * similarly-named refs.
> >                 */
> >                if (errno != ENOENT &&
> >                    errno != EISDIR &&
> >                    errno != ENOTDIR)
> >                        return NULL;
> >
> > So if RESOLVE_REF_READING is set, we can return NULL immediately, with
> > errno set to EISDIR. Which contradicts this:
> 
> I opted (perhaps unwisely) to elide that since as you note above we
> don't take that path in relation to the removed code. I.e. I'm
> describing the relevant codepath we take nowadays given the code & its
> callers.

It's not clear to me that we don't take that path, though. The call in
files_reflog_expire() looks like it violates the assertion in your
commit message (that we would never return NULL with errno as EISDIR).

So I'm not entirely sure that the code is in fact dead (though I
couldn't find an easy way to trigger it from the command line). I do
think it probably can't do anything useful, and it is probably still OK
to delete. But in my mind that is quite a different argument.

Maybe that is splitting hairs, but I definitely try to err on the side
of caution and over-analysis when touching tricky code (and the
ref-backend code is in my experience one of the trickiest spots for
corner cases, races, etc).

> > So when is RESOLVE_REF_READING set? The resolve_flags parameter is
> > passed in by the caller. In lock_ref_oid_basic(), it comes from this:
> >
> >     int mustexist = (old_oid && !is_null_oid(old_oid));
> >     [...]
> >     if (mustexist)
> >             resolve_flags |= RESOLVE_REF_READING;
> >
> > So do any callers pass in old_oid? Surprisingly few. It used to be
> > called from other locking functions, but these days it looks like it is
> > only files_reflog_expire().
> 
> In general (and not being too familiar with this area) and per:
> 
>     7521cc4611 (refs.c: make delete_ref use a transaction, 2014-04-30)
>     92b1551b1d (refs: resolve symbolic refs first, 2016-04-25)
>     029cdb4ab2 (refs.c: make prune_ref use a transaction to delete the ref, 2014-04-30)
> 
> And:
> 
>     https://lore.kernel.org/git/20140902205841.GA18279@google.com/    
> 
> I wonder if these remaining cases can be migrated over to lock_raw_ref()
> or the transaction API, as many other similar callers have been already.
> 
> But that's a bigger change, I won't be doing that now, just wondering if
> these are some #leftoverbits or if there's a good reason they were left.

Quite possibly. It's been a while since I've looked this deep at the ref
code. It is weird that only one remaining caller passes old_oid. If even
that one could be converted, the whole lock_ref_oid_basic() could be
simplified a bit.

I agree that's a bigger change, so it might make sense to do smaller
cleanups in the interim.

> > So...I think it's fine? But the argument in your commit message seems to
> > have missed this case entirely.
> 
> Perhaps more succinctly: If we have a directory in the way, it's going
> to be impossible for the "old_oid" condition to be satisfied in any case
> in the file backend.
> 
> Even if we still had a caller that did "care" about that what could they
> hope to get from an "old_oid=<some-OID>" for a lock on "foo/bar" where
> "foo" is an empty directory?
> 
> Except of course for the case where it's not a directory but packed, but
> as you noted that's handled in another case.

Yeah, I think that's reasonably compelling. It's possible there are some
races unaccounted for here (like somebody else creating and deleting
shared-prefix loose refs at the same time), but it may be OK to just
accept those. The code is "we saw a failure, see if deleting stale
directories helps". And if the worst case is that this doesn't kick in
an obscure race (where we'd probably end up failing the whole operation
anyway), that's OK.

-Peff

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] refs file backend: remove dead "errno == EISDIR" code
  @ 2021-07-14 19:07  4%   ` Ævar Arnfjörð Bjarmason
  2021-07-14 23:15  0%     ` Jeff King
  0 siblings, 1 reply; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2021-07-14 19:07 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Han-Wen Nienhuys, Michael Haggerty


On Wed, Jul 14 2021, Jeff King wrote:

> On Wed, Jul 14, 2021 at 01:17:14PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Since a1c1d8170d (refs_resolve_ref_unsafe: handle d/f conflicts for
>> writes, 2017-10-06) we don't, because our our callstack will look
>> something like:
>> 
>>     files_copy_or_rename_ref() -> lock_ref_oid_basic() -> refs_resolve_ref_unsafe()
>> 
>> And then the refs_resolve_ref_unsafe() call here will in turn (in the
>> code added in a1c1d8170d) do the equivalent of this (via a call to
>> refs_read_raw_ref()):
>> 
>> 	/* Via refs_read_raw_ref() */
>> 	fd = open(path, O_RDONLY);
>> 	if (fd < 0)
>> 		/* get errno == EISDIR */
>> 	/* later, in refs_resolve_ref_unsafe() */
>> 	if ([...] && errno != EISDIR)
>> 		return NULL;
>> 	[...]
>> 	/* returns the refs/heads/foo to the caller, even though it's a directory */
>> 	return refname;
>
> Isn't that pseudo-code missing a conditional that's there in the real
> code? In refs_resolve_ref_unsafe(), I see:
>
>        if (refs_read_raw_ref(refs, refname,
>                              oid, &sb_refname, &read_flags)) {
>                *flags |= read_flags;
>
>                /* In reading mode, refs must eventually resolve */
>                if (resolve_flags & RESOLVE_REF_READING)
>                        return NULL;
>
>                /*
>                 * Otherwise a missing ref is OK. But the files backend
>                 * may show errors besides ENOENT if there are
>                 * similarly-named refs.
>                 */
>                if (errno != ENOENT &&
>                    errno != EISDIR &&
>                    errno != ENOTDIR)
>                        return NULL;
>
> So if RESOLVE_REF_READING is set, we can return NULL immediately, with
> errno set to EISDIR. Which contradicts this:

I opted (perhaps unwisely) to elide that since as you note above we
don't take that path in relation to the removed code. I.e. I'm
describing the relevant codepath we take nowadays given the code & its
callers.

But will reword etc., thanks.

>> I.e. even though we got an "errno == EISDIR" we won't take this
>> branch, since in cases of EISDIR "resolved" is always
>> non-NULL. I.e. we pretend at this point as though everything's OK and
>> there is no "foo" directory.
>
> So when is RESOLVE_REF_READING set? The resolve_flags parameter is
> passed in by the caller. In lock_ref_oid_basic(), it comes from this:
>
>     int mustexist = (old_oid && !is_null_oid(old_oid));
>     [...]
>     if (mustexist)
>             resolve_flags |= RESOLVE_REF_READING;
>
> So do any callers pass in old_oid? Surprisingly few. It used to be
> called from other locking functions, but these days it looks like it is
> only files_reflog_expire().

In general (and not being too familiar with this area) and per:

    7521cc4611 (refs.c: make delete_ref use a transaction, 2014-04-30)
    92b1551b1d (refs: resolve symbolic refs first, 2016-04-25)
    029cdb4ab2 (refs.c: make prune_ref use a transaction to delete the ref, 2014-04-30)

And:

    https://lore.kernel.org/git/20140902205841.GA18279@google.com/    

I wonder if these remaining cases can be migrated over to lock_raw_ref()
or the transaction API, as many other similar callers have been already.

But that's a bigger change, I won't be doing that now, just wondering if
these are some #leftoverbits or if there's a good reason they were left.

> I'm not sure if this case is important or not. If we're expecting the
> ref to exist, then an in-the-way directory is going to mean failure
> either way. It could still exist within the packed-refs file, but then
> refs_read_raw_ref() would not return failure.
>
> So...I think it's fine? But the argument in your commit message seems to
> have missed this case entirely.

Perhaps more succinctly: If we have a directory in the way, it's going
to be impossible for the "old_oid" condition to be satisfied in any case
in the file backend.

Even if we still had a caller that did "care" about that what could they
hope to get from an "old_oid=<some-OID>" for a lock on "foo/bar" where
"foo" is an empty directory?

Except of course for the case where it's not a directory but packed, but
as you noted that's handled in another case.

Perhaps it's informative that the below diff-on-top also passes all
tests, i.e. that we have largely the same
"refs_read_raw_ref(refs->packed_ref_store" copy/pasted in
files_read_raw_ref() in two adjacent places, we're just changing what
errno we pass upwards.

It thoroughly tramples on Han-Wen's series, and it's easier to deal with
(if at all) once his lands, just thought it might be interesting:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 7e4963fd07..4a97cd48d9 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -356,6 +356,8 @@ static int files_read_raw_ref(struct ref_store *ref_store,
 	int ret = -1;
 	int save_errno;
 	int remaining_retries = 3;
+	int lstat_bad_or_not_file = 0;
+	int lstat_errno = 0;
 
 	*type = 0;
 	strbuf_reset(&sb_path);
@@ -382,11 +384,28 @@ static int files_read_raw_ref(struct ref_store *ref_store,
 		goto out;
 
 	if (lstat(path, &st) < 0) {
-		if (errno != ENOENT)
+		lstat_bad_or_not_file = 1;
+		lstat_errno = errno;
+	} else if (S_ISDIR(st.st_mode)) {
+		/*
+		 * Maybe it's an empty directory, maybe it's not, in
+		 * either case this ref does not exist in the files
+		 * backend (but may be packet), later code will handle
+		 * the "create and maybe remove_empty_directories()"
+		 * case if needed, or die otherwise.
+		 */
+		lstat_bad_or_not_file = 1;
+	}
+
+	if (lstat_bad_or_not_file) {
+		if (lstat_errno && lstat_errno != ENOENT)
 			goto out;
 		if (refs_read_raw_ref(refs->packed_ref_store, refname,
 				      oid, referent, type)) {
-			errno = ENOENT;
+			if (lstat_errno)
+				errno = ENOENT;
+			else
+				errno = EISDIR;
 			goto out;
 		}
 		ret = 0;
@@ -417,22 +436,6 @@ static int files_read_raw_ref(struct ref_store *ref_store,
 		 */
 	}
 
-	/* Is it a directory? */
-	if (S_ISDIR(st.st_mode)) {
-		/*
-		 * Even though there is a directory where the loose
-		 * ref is supposed to be, there could still be a
-		 * packed ref:
-		 */
-		if (refs_read_raw_ref(refs->packed_ref_store, refname,
-				      oid, referent, type)) {
-			errno = EISDIR;
-			goto out;
-		}
-		ret = 0;
-		goto out;
-	}
-
 	/*
 	 * Anything else, just open it and try to use it as
 	 * a ref

^ permalink raw reply related	[relevance 4%]

* Re: [PATCH 00/12] Fix all leaks in tests t0002-t0099: Part 2
  @ 2021-06-21 21:54  6% ` Elijah Newren
  0 siblings, 0 replies; 200+ results
From: Elijah Newren @ 2021-06-21 21:54 UTC (permalink / raw)
  To: Andrzej Hunt; +Cc: Git Mailing List, Christian Couder

On Sun, Jun 20, 2021 at 8:14 AM <andrzej@ahunt.org> wrote:
>
> From: Andrzej Hunt <andrzej@ahunt.org>
>
> This series plugs more of the leaks that were found while running
> t0002-t0099 with LSAN.
>
> See also the first series (already merged) at [1]. I'm currently
> expecting at least another 2 series before t0002-t0099 run leak free.
> I'm not being particularly systematic about the order of patches -
> although I am trying to send out "real" (if mostly small) leaks first,
> before sending out the more boring patches that add free()/UNLEAK() to
> cmd_* and direct helpers thereof.

I've read over the series.  It provides some good clear fixes.  I
noted on patches 2, 6, and 12 that a some greps suggested that leaks
similar to the ones being fixed likely also affect other places of the
codebase.  Those other places don't need to be fixed as part of this
series, but they might be good items for #leftoverbits or GSoC early
tasks (cc: Christian in case he wants to record those somewhere).

I cc'ed Stolee on patch 4 because he suggested he wanted to read it in
an earlier discussion.

Phillip noted some issues with patch 11, and I added a couple more.
The ownership of opts->strategy appears to be pretty messy and in need
of cleanup.

All the patches other than 11 look good to me.

^ permalink raw reply	[relevance 6%]

* Re: git-sh-prompt: bash: GIT_PS1_COMPRESSSPARSESTATE: unbound variable
  @ 2021-05-20  0:09  6%             ` Elijah Newren
  0 siblings, 0 replies; 200+ results
From: Elijah Newren @ 2021-05-20  0:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christoph Anton Mitterer, Git Mailing List, ville.skytta

On Wed, May 19, 2021 at 4:29 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christoph Anton Mitterer <calestyo@scientia.net> writes:
>
> > Hey there.
> >
> > I think I found another case of an unbound variable:
> >
> > Completing e.g.:
> > git commit --[press TAB]
> >
> > gives:
> > $ git commit  --bash: GIT_COMPLETION_SHOW_ALL: unbound variable

That particular case was fixed by Ville Skyttä in commit c5c0548d793e
(completion: audit and guard $GIT_* against unset use, 2021-04-08).

> It seems that OMIT_SPARSESTATE would have the same issue.
>
>         if [ -z "${GIT_PS1_COMPRESSSPARSESTATE}" ] &&
>            [ -z "${GIT_PS1_OMITSPARSESTATE}" ] &&
>
> all coming from afda36db (git-prompt: include sparsity state as
> well, 2020-06-21).
>
> But I think we have already seen the fix in 5c0cbdb1 (git-prompt:
> work under set -u, 2021-05-13), which may or may not appear in the
> upcoming release.

Yeah, I fixed the ones I introduced in git-prompt.sh --
GIT_PS1_COMPRESSSPARSESTATE and GIT_PS1_OMITSPARSESTATE.

> There still are unprotected mentions of GIT_PS1_SHOWUPSTREAM even
> with that fix, though.

Yeah, neither my fix (which was only trying to fix the problems I
introduced in git-prompt.sh) nor Ville's fix (which was focused on
git-completion.bash) caught that one.

Do you want to make a patch for that, Christoph?  If not, #leftoverbits?

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v2 1/1] repack: avoid loosening promisor objects in partial clones
  2021-04-19 23:09  6%     ` Junio C Hamano
@ 2021-04-21 19:25  0%       ` Rafael Silva
  0 siblings, 0 replies; 200+ results
From: Rafael Silva @ 2021-04-21 19:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Jonathan Tan, SZEDER Gábor


Junio C Hamano <gitster@pobox.com> writes:

> Rafael Silva <rafaeloliveira.cs@gmail.com> writes:
>
>> When `git repack -A -d` is run in a partial clone, `pack-objects`
>> is invoked twice: once to repack all promisor objects, and once to
>> repack all non-promisor objects. The latter `pack-objects` invocation
>> is with --exclude-promisor-objects and --unpack-unreachable, which
>> loosens all unused objects. Unfortunately, this includes promisor
>> objects.
>>
>> Because the -d argument to `git repack` subsequently deletes all loose
>> objects also in packs, these just-loosened promisor objects will be
>> immediately deleted. However, this extra disk churn is unnecessary in
>> the first place.  For example, a newly-clone partial repo that filters
>
> "in a newly-cloned partial repo", I'd think.
>

Thanks, will fix on the next revision.

>> For testing, we need to validate whether any object was loosened.
>> However, the "evidence" (loosened objects) is deleted during the
>> process which prevents us from inspecting the object directory.
>> Instead, let's teach `pack-objects` to count loosened objects and
>> emit via trace2 thus allowing inspecting the debug events after the
>> process is finished. This new event is used on the added regression
>> test.
>
> Nicely designed.
>

Thanks :)

>> +	uint32_t loosened_objects_nr = 0;
>>  	struct object_id oid;
>>  
>>  	for (p = get_all_packs(the_repository); p; p = p->next) {
>> @@ -3492,11 +3493,16 @@ static void loosen_unused_packed_objects(void)
>>  			nth_packed_object_id(&oid, p, i);
>>  			if (!packlist_find(&to_pack, &oid) &&
>>  			    !has_sha1_pack_kept_or_nonlocal(&oid) &&
>> -			    !loosened_object_can_be_discarded(&oid, p->mtime))
>> +			    !loosened_object_can_be_discarded(&oid, p->mtime)) {
>>  				if (force_object_loose(&oid, p->mtime))
>>  					die(_("unable to force loose object"));
>> +				loosened_objects_nr++;
>> +			}
>>  		}
>>  	}
>> +
>> +	trace2_data_intmax("pack-objects", the_repository,
>> +			   "loosen_unused_packed_objects/loosened", loosened_objects_nr);
>>  }
>
> OK, so this is just the "stats".
>
>> diff --git a/builtin/repack.c b/builtin/repack.c
>> index 2847fdfbab..5f9bc74adc 100644
>> --- a/builtin/repack.c
>> +++ b/builtin/repack.c
>> @@ -20,7 +20,7 @@ static int delta_base_offset = 1;
>>  static int pack_kept_objects = -1;
>>  static int write_bitmaps = -1;
>>  static int use_delta_islands;
>> -static char *packdir, *packtmp;
>> +static char *packdir, *packtmp_name, *packtmp;
>>  
>>  static const char *const git_repack_usage[] = {
>>  	N_("git repack [<options>]"),
>> @@ -530,7 +530,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>>  	}
>>  
>>  	packdir = mkpathdup("%s/pack", get_object_directory());
>> -	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
>> +	packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
>> +	packtmp = mkpathdup("%s/%s", packdir, packtmp_name);
>
> Just a mental note, but we should move away from ".tmp-$$" that is a
> remnant from the days back when this was a shell script, and use the
> tempfile.h API (#leftoverbits).  Such a change must not be part of
> this topic, of course.
>

Indeed. This should be move tempfile.h API.

>
> Thanks.  Will queue and see what others say.

Thanks for reviewing it.

-- 
Thanks
Rafael

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 1/1] repack: avoid loosening promisor objects in partial clones
  @ 2021-04-19 23:09  6%     ` Junio C Hamano
  2021-04-21 19:25  0%       ` Rafael Silva
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2021-04-19 23:09 UTC (permalink / raw)
  To: Rafael Silva; +Cc: git, Jeff King, Jonathan Tan, SZEDER Gábor

Rafael Silva <rafaeloliveira.cs@gmail.com> writes:

> When `git repack -A -d` is run in a partial clone, `pack-objects`
> is invoked twice: once to repack all promisor objects, and once to
> repack all non-promisor objects. The latter `pack-objects` invocation
> is with --exclude-promisor-objects and --unpack-unreachable, which
> loosens all unused objects. Unfortunately, this includes promisor
> objects.
>
> Because the -d argument to `git repack` subsequently deletes all loose
> objects also in packs, these just-loosened promisor objects will be
> immediately deleted. However, this extra disk churn is unnecessary in
> the first place.  For example, a newly-clone partial repo that filters

"in a newly-cloned partial repo", I'd think.

> For testing, we need to validate whether any object was loosened.
> However, the "evidence" (loosened objects) is deleted during the
> process which prevents us from inspecting the object directory.
> Instead, let's teach `pack-objects` to count loosened objects and
> emit via trace2 thus allowing inspecting the debug events after the
> process is finished. This new event is used on the added regression
> test.

Nicely designed.

> +	uint32_t loosened_objects_nr = 0;
>  	struct object_id oid;
>  
>  	for (p = get_all_packs(the_repository); p; p = p->next) {
> @@ -3492,11 +3493,16 @@ static void loosen_unused_packed_objects(void)
>  			nth_packed_object_id(&oid, p, i);
>  			if (!packlist_find(&to_pack, &oid) &&
>  			    !has_sha1_pack_kept_or_nonlocal(&oid) &&
> -			    !loosened_object_can_be_discarded(&oid, p->mtime))
> +			    !loosened_object_can_be_discarded(&oid, p->mtime)) {
>  				if (force_object_loose(&oid, p->mtime))
>  					die(_("unable to force loose object"));
> +				loosened_objects_nr++;
> +			}
>  		}
>  	}
> +
> +	trace2_data_intmax("pack-objects", the_repository,
> +			   "loosen_unused_packed_objects/loosened", loosened_objects_nr);
>  }

OK, so this is just the "stats".

> diff --git a/builtin/repack.c b/builtin/repack.c
> index 2847fdfbab..5f9bc74adc 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -20,7 +20,7 @@ static int delta_base_offset = 1;
>  static int pack_kept_objects = -1;
>  static int write_bitmaps = -1;
>  static int use_delta_islands;
> -static char *packdir, *packtmp;
> +static char *packdir, *packtmp_name, *packtmp;
>  
>  static const char *const git_repack_usage[] = {
>  	N_("git repack [<options>]"),
> @@ -530,7 +530,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	}
>  
>  	packdir = mkpathdup("%s/pack", get_object_directory());
> -	packtmp = mkpathdup("%s/.tmp-%d-pack", packdir, (int)getpid());
> +	packtmp_name = xstrfmt(".tmp-%d-pack", (int)getpid());
> +	packtmp = mkpathdup("%s/%s", packdir, packtmp_name);

Just a mental note, but we should move away from ".tmp-$$" that is a
remnant from the days back when this was a shell script, and use the
tempfile.h API (#leftoverbits).  Such a change must not be part of
this topic, of course.

Thanks.  Will queue and see what others say.


^ permalink raw reply	[relevance 6%]

* Re: [PATCH] transport: respect verbosity when setting upstream
  @ 2021-04-16 18:48  6%     ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2021-04-16 18:48 UTC (permalink / raw)
  To: Øystein Walle; +Cc: Eric Sunshine, Git List

Øystein Walle <oystwa@gmail.com> writes:

> On Thu, 15 Apr 2021 at 17:29, Eric Sunshine <sunshine@sunshineco.com> wrote:
>
>> I wondered why you used `tee` here since it adds no value (as far as I
>> can tell), but I see that you copied it from the test preceding this
>> one. So... [intentionally left blank]
>
> Indeed, I wondered about that too; it seems a plain redirection will do
> the trick. But a mix of laziness and not second-guessing others' work made
> me leave it as it is.

Let's agree to mark it as #leftoverbits then?

Thanks for a fix, additional tests, and a good review.


^ permalink raw reply	[relevance 6%]

* Re: Pain points in Git's patch flow
  @ 2021-04-14  8:02  8%   ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2021-04-14  8:02 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Jonathan Nieder, Raxel Gutierrez, mricon, patchwork, Taylor Blau,
	Emily Shaffer, Git Users

Bagas Sanjaya <bagasdotme@gmail.com> writes:

> There is no lists of "beginner-friendly" issues that can be worked on by
> new contributors. They had to search this ML archive for bug report
> issues and determine themselves which are beginner-friendly.

Yeah, looking for "#leftoverbits" or "low-hanging" on the list
archive is often cited as a way, and it does seem easy enough to
do.  You go to https://lore.kernel.org/git/, type "leftoverbits"
or "low-hanging" in the text input and press SEARCH.

But that is only half of the story.

Anybody can throw random ideas and label them "#leftoverbits" or
"low-hanging fruit", but some of these ideas might turn out to be
ill-conceived or outright nonsense.  Limiting search to the
utterances by those with known good taste does help, but as a
newbie, you do not know who these people with good taste are.

It might help to have a curated list of starter tasks, but I suspect
that they tend to get depleted rather quickly---by definition the
ones on the list are easy to do and there is nothing to stop an
eager newbie from eating all of them in one sitting X-(.

So, I dunno.  We seem to suffer from the same lack of good starter
tasks before each GSoC begins.

^ permalink raw reply	[relevance 8%]

* Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
  2021-04-07 20:31  5%   ` Junio C Hamano
@ 2021-04-08 12:05  0%     ` ZheNing Hu
  0 siblings, 0 replies; 200+ results
From: ZheNing Hu @ 2021-04-08 12:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: ZheNing Hu via GitGitGadget, Git List, Jeff King,
	Christian Couder, Hariom Verma, Eric Sunshine, Derrick Stolee,
	René Scharfe

Junio C Hamano <gitster@pobox.com> 于2021年4月8日周四 上午4:31写道：
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Subject: Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
>
> The implementation changed so much from the initial attempt, for
> which the above title may have been appropriate, that reusing single
> strbuf over and over is not the most important part of the change
> anymore, I am afraid.  Besides, it uses TWO strbufs ;-)
>
> Subject: [PATCH] ref-filter: introduce show_ref_array_items() helper
>
> or something like that?
>

Yep, I may think that its core is still reusing strbufs, but
"introduce show_ref_array_items()"  will be more accurate.

> > From: ZheNing Hu <adlternative@gmail.com>
> >
> > When we use `git for-each-ref`, every ref will call
> > `show_ref_array_item()` and allocate its own final strbuf
> > and error strbuf. Instead, we can reuse these two strbuf
> > for each step ref's output.
> >
> > The performance for `git for-each-ref` on the Git repository
> > itself with performance testing tool `hyperfine` changes from
> > 18.7 ms ± 0.4 ms to 18.2 ms ± 0.3 ms.
> >
> > This approach is similar to the one used by 79ed0a5
> > (cat-file: use a single strbuf for all output, 2018-08-14)
> > to speed up the cat-file builtin.
> >
> > Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> > ---
> >     [GSOC] ref-filter: use single strbuf for all output
> >
> >     Now git for-each-ref can reuse two buffers for all refs output, the
> >     performance is slightly improved.
> >
> >     Now there may be a question : Should the original interface
> >     show_ref_array_items be retained?
> > ...
> >        /*  Callback function for parsing the sort option */
>
> Again, not a very useful range-diff as the implementation changed so much.
>

This makes me wonder if I should give up GGG in the future.
I also don’t want a rang-diff with a big difference.

>
> >  builtin/for-each-ref.c |  4 +---
> >  ref-filter.c           | 20 ++++++++++++++++++++
> >  ref-filter.h           |  5 +++++
> >  3 files changed, 26 insertions(+), 3 deletions(-)
> >
> > diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> > index cb9c81a04606..d630402230f3 100644
> > --- a/builtin/for-each-ref.c
> > +++ b/builtin/for-each-ref.c
> > @@ -16,7 +16,6 @@ static char const * const for_each_ref_usage[] = {
> >
> >  int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> >  {
> > -     int i;
> >       struct ref_sorting *sorting = NULL, **sorting_tail = &sorting;
> >       int maxcount = 0, icase = 0;
> >       struct ref_array array;
> > @@ -80,8 +79,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
> >
> >       if (!maxcount || array.nr < maxcount)
> >               maxcount = array.nr;
> > -     for (i = 0; i < maxcount; i++)
> > -             show_ref_array_item(array.items[i], &format);
> > +     show_ref_array_items(array.items, &format, maxcount);
>
> The intention of this call is to pass an array and the number of
> elements in the array as a pair to the function, right?  When you
> design the API for a new helper function, do not split them apart by
> inserting an unrelated parameter in the middle.
>

Eh, are you saying that `maxcount` is irrelevant here? There should be
`maxcount`, because we need to limit the number of iterations here.

> > diff --git a/ref-filter.c b/ref-filter.c
> > index f0bd32f71416..27bbf9b6c8ac 100644
> > --- a/ref-filter.c
> > +++ b/ref-filter.c
> > @@ -2435,6 +2435,26 @@ int format_ref_array_item(struct ref_array_item *info,
> >       return 0;
> >  }
> >
> > +void show_ref_array_items(struct ref_array_item **info,
> > +                      const struct ref_format *format,
> > +                      size_t n)
>
> IOW,
>
>         void show_ref_array_items(const struct ref_format *format,
>                                   struct ref_array_item *info[], size_t n)
>

Yes, it will be more obvious in the form of an array.

> > +{
> > +     struct strbuf final_buf = STRBUF_INIT;
> > +     struct strbuf error_buf = STRBUF_INIT;
> > +     size_t i;
> > +
> > +     for (i = 0; i < n; i++) {
> > +             if (format_ref_array_item(info[i], format, &final_buf, &error_buf))
> > +                     die("%s", error_buf.buf);
>
> OK, the contents of error_buf is already localized, so it is correct
> not to have _() around the "%s" here.
>
> > +             fwrite(final_buf.buf, 1, final_buf.len, stdout);
> > +             strbuf_reset(&error_buf);
> > +             strbuf_reset(&final_buf);
> > +             putchar('\n');
>
> This is inherited code, but splitting fwrite() and putchar() apart
> like this makes the code hard to follow.  Perhaps clean it up later
> when nothing else is going on in the code as leftoverbits, outside
> the topic.
>

Ok, swap the position of reset and putchar.

> > +     }
> > +     strbuf_release(&error_buf);
> > +     strbuf_release(&final_buf);
> > +}
> > +
> >  void show_ref_array_item(struct ref_array_item *info,
> >                        const struct ref_format *format)
> >  {
>
> Isn't the point of the new helper function so that this can become a
> thin wrapper around it, i.e.
>
>         void show_ref_array_item(...)
>         {
>                 show_ref_array_items(format, &info, 1);
>         }
>

Maybe it makes sense. But as Peff said, Maybe we can just delete it.

> > diff --git a/ref-filter.h b/ref-filter.h
> > index 19ea4c413409..eb7e79a6676d 100644
> > --- a/ref-filter.h
> > +++ b/ref-filter.h
> > @@ -121,6 +121,11 @@ int format_ref_array_item(struct ref_array_item *info,
> >                         struct strbuf *error_buf);
> >  /*  Print the ref using the given format and quote_style */
> >  void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> > +/*  Print the refs using the given format and quote_style and maxcount */
> > +void show_ref_array_items(struct ref_array_item **info,
> > +                      const struct ref_format *format,
> > +                      size_t n);
>
> The inconsistency between "maxcount" vs "n" is irritating.  Calling
> the parameter with a name that has the word "info" (because the new
> parameter is about that array) and a word like "nelem" to hint that
> it is the number of elements in the array) would be sensible.
>
> void show_ref_array_items(const struct ref_format *format,
>                           struct ref_array_item *info[], size_t info_count);
>
> or something along the line, perhaps?
>

Aha, I guess this is the reason for the misunderstanding above.
Yes, `info_count` is the correct meaning and the meaning of `n` is
wrong.

Thanks.
--
ZheNing Hu

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output
  @ 2021-04-07 20:31  5%   ` Junio C Hamano
  2021-04-08 12:05  0%     ` ZheNing Hu
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2021-04-07 20:31 UTC (permalink / raw)
  To: ZheNing Hu via GitGitGadget
  Cc: git, Jeff King, Christian Couder, Hariom Verma, Eric Sunshine,
	Derrick Stolee, René Scharfe, ZheNing Hu

"ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Subject: Re: [PATCH v2] [GSOC] ref-filter: use single strbuf for all output

The implementation changed so much from the initial attempt, for
which the above title may have been appropriate, that reusing single
strbuf over and over is not the most important part of the change
anymore, I am afraid.  Besides, it uses TWO strbufs ;-)

Subject: [PATCH] ref-filter: introduce show_ref_array_items() helper

or something like that?

> From: ZheNing Hu <adlternative@gmail.com>
>
> When we use `git for-each-ref`, every ref will call
> `show_ref_array_item()` and allocate its own final strbuf
> and error strbuf. Instead, we can reuse these two strbuf
> for each step ref's output.
>
> The performance for `git for-each-ref` on the Git repository
> itself with performance testing tool `hyperfine` changes from
> 18.7 ms ± 0.4 ms to 18.2 ms ± 0.3 ms.
>
> This approach is similar to the one used by 79ed0a5
> (cat-file: use a single strbuf for all output, 2018-08-14)
> to speed up the cat-file builtin.
>
> Signed-off-by: ZheNing Hu <adlternative@gmail.com>
> ---
>     [GSOC] ref-filter: use single strbuf for all output
>     
>     Now git for-each-ref can reuse two buffers for all refs output, the
>     performance is slightly improved.
>     
>     Now there may be a question : Should the original interface
>     show_ref_array_items be retained?
> ...
>        /*  Callback function for parsing the sort option */

Again, not a very useful range-diff as the implementation changed so much.


>  builtin/for-each-ref.c |  4 +---
>  ref-filter.c           | 20 ++++++++++++++++++++
>  ref-filter.h           |  5 +++++
>  3 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
> index cb9c81a04606..d630402230f3 100644
> --- a/builtin/for-each-ref.c
> +++ b/builtin/for-each-ref.c
> @@ -16,7 +16,6 @@ static char const * const for_each_ref_usage[] = {
>  
>  int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  {
> -	int i;
>  	struct ref_sorting *sorting = NULL, **sorting_tail = &sorting;
>  	int maxcount = 0, icase = 0;
>  	struct ref_array array;
> @@ -80,8 +79,7 @@ int cmd_for_each_ref(int argc, const char **argv, const char *prefix)
>  
>  	if (!maxcount || array.nr < maxcount)
>  		maxcount = array.nr;
> -	for (i = 0; i < maxcount; i++)
> -		show_ref_array_item(array.items[i], &format);
> +	show_ref_array_items(array.items, &format, maxcount);

The intention of this call is to pass an array and the number of
elements in the array as a pair to the function, right?  When you
design the API for a new helper function, do not split them apart by
inserting an unrelated parameter in the middle.

> diff --git a/ref-filter.c b/ref-filter.c
> index f0bd32f71416..27bbf9b6c8ac 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -2435,6 +2435,26 @@ int format_ref_array_item(struct ref_array_item *info,
>  	return 0;
>  }
>  
> +void show_ref_array_items(struct ref_array_item **info,
> +			 const struct ref_format *format,
> +			 size_t n)

IOW,

	void show_ref_array_items(const struct ref_format *format,
				  struct ref_array_item *info[], size_t n)

> +{
> +	struct strbuf final_buf = STRBUF_INIT;
> +	struct strbuf error_buf = STRBUF_INIT;
> +	size_t i;
> +
> +	for (i = 0; i < n; i++) {
> +		if (format_ref_array_item(info[i], format, &final_buf, &error_buf))
> +			die("%s", error_buf.buf);

OK, the contents of error_buf is already localized, so it is correct
not to have _() around the "%s" here.

> +		fwrite(final_buf.buf, 1, final_buf.len, stdout);
> +		strbuf_reset(&error_buf);
> +		strbuf_reset(&final_buf);
> +		putchar('\n');

This is inherited code, but splitting fwrite() and putchar() apart
like this makes the code hard to follow.  Perhaps clean it up later
when nothing else is going on in the code as leftoverbits, outside
the topic.

> +	}
> +	strbuf_release(&error_buf);
> +	strbuf_release(&final_buf);
> +}
> +
>  void show_ref_array_item(struct ref_array_item *info,
>  			 const struct ref_format *format)
>  {

Isn't the point of the new helper function so that this can become a
thin wrapper around it, i.e.

	void show_ref_array_item(...)
        {
		show_ref_array_items(format, &info, 1);
	}

> diff --git a/ref-filter.h b/ref-filter.h
> index 19ea4c413409..eb7e79a6676d 100644
> --- a/ref-filter.h
> +++ b/ref-filter.h
> @@ -121,6 +121,11 @@ int format_ref_array_item(struct ref_array_item *info,
>  			  struct strbuf *error_buf);
>  /*  Print the ref using the given format and quote_style */
>  void show_ref_array_item(struct ref_array_item *info, const struct ref_format *format);
> +/*  Print the refs using the given format and quote_style and maxcount */
> +void show_ref_array_items(struct ref_array_item **info,
> +			 const struct ref_format *format,
> +			 size_t n);

The inconsistency between "maxcount" vs "n" is irritating.  Calling
the parameter with a name that has the word "info" (because the new
parameter is about that array) and a word like "nelem" to hint that
it is the number of elements in the array) would be sensible.

void show_ref_array_items(const struct ref_format *format,
			  struct ref_array_item *info[], size_t info_count);

or something along the line, perhaps?


^ permalink raw reply	[relevance 5%]

* Re: [PATCH v8 1/2] [GSOC] commit: add --trailer option
  2021-03-17  8:08  0%         ` Ævar Arnfjörð Bjarmason
@ 2021-03-17 13:54  0%           ` ZheNing Hu
  0 siblings, 0 replies; 200+ results
From: ZheNing Hu @ 2021-03-17 13:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: ZheNing Hu via GitGitGadget, Git List, Bradley M. Kuhn,
	Junio C Hamano, Brandon Casey, Shourya Shukla, Christian Couder,
	Rafael Silva

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年3月17日周三 下午4:08写道：
> > Logically speaking, `interpret_trailers` should be dedicated to `commit`
> > or other sub-commands that require trailers.
> >
> > But I think that in the later stage, the parse_options of the `cmd_commit`
> > can keep the unrecognized options, and then these choices can be directly
> > passed to the `interpret_trailers` backend.
>
> We have this interaction with e.g. range-diff and "log", it's often
> surprising. You add an option to one command and it appears in the
> other.
>

All right, I'm wrong, I may have reference to an wrong experience
of `difftool`-->`diff`.

> >>    It seems to me to be a good idea to (at least for testing) convert
> >>    the --signoff trailer to your implementation. We have plenty of tests
> >>    for it, does migrating it over pass or fail those?
> >>
> > I don’t know how to migrating yet, it may take a long time.
> > Even I think I can leave it as #leftoverbit later.
>
> Sure, I mean (having looked at it) that at least for your own local
> testing it would make sense to change it (even if just search-replacing
> the --signoff in the test suite) to see if it behaves as you
> expect. I.e. does the --trailer behavior mirror --signoff?
>
> >>  * I also agree with Junio that we shouldn't have a --fixed-by or
> >>    whatever and wouldn't add --signoff today, but it seems very useful
> >>    to me to have a shortcut like:
> >>
> >>        --trailer "Signed-off-by"
> >>
> >>    I.e. omitting the value, or:
> >>
> >>       --trailer "Signed-off-by="
> >>
> >>    Or some other thing we deem sufficiently useful/sane
> >>    syntax/unambiguous.n
> >>
> >>    Then the value would be provided by fmt_name(WANT_COMMITTER_IDENT)
> >>    just as we do in append_signoff() now. I think a *very common* case
> >>    for this would be something like:
> >>
> >>        git commit --amend -v --trailer "Reviewed-by"
> >>
> >>    And it would be useful to help that along and not have to do:
> >>
> >>        git commit --amend -v --trailer "Reviewed-by=$(git config user.name) <$(git config user.email)>"
> >>
> >>    Or worse yet, manually typo your name/e-mail address, as I'm sure I
> >>    and many others will inevitably do when using this option...
> >>

Well, that's what I think here:

Now we can go through:

$ git -c trailer.signoff.key = "Signed-off-by" commit --trailer
"signoff = commiter <email>"

to get a trailer: "Signed-off-by: commiter <email>", this means we
can't just do simple string
matching in `cmd_commit` to replace `--trailer="Signed-off-by"` or
`--trailer="Reviewed-by"` to
user's own identity, to replace the trailers which have omitting value
 we passed in, but I think
we can provide a new option to `commit` which can mandatory that
trailers with no value can be
 replaced with the identity of the user.

e.g.

$ git -c trailer.signoff.key = "Signed-off-by" commit --trailer
"signoff" --trailer "Helped-by" \
 --trailer "Helped-by = C <E>" --own_ident

will output like this:

Signed-off-by: $(git config user.name) <$(git config user.email)>
Signed-off-by: $(git config user.name) <$(git config user.email)>
Helped-by: $(git config user.name) <$(git config user.email)>
Helped-by: C <E>

I don't know if this idea is good, I will try to do it first.

Thanks.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v8 1/2] [GSOC] commit: add --trailer option
  2021-03-17  2:01  5%       ` ZheNing Hu
@ 2021-03-17  8:08  0%         ` Ævar Arnfjörð Bjarmason
  2021-03-17 13:54  0%           ` ZheNing Hu
  0 siblings, 1 reply; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2021-03-17  8:08 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: ZheNing Hu via GitGitGadget, Git List, Bradley M. Kuhn,
	Junio C Hamano, Brandon Casey, Shourya Shukla, Christian Couder,
	Rafael Silva


On Wed, Mar 17 2021, ZheNing Hu wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年3月16日周二 下午8:52写道：
>> > +             if (run_command(&run_trailer))
>> > +                     strvec_clear(&run_trailer.args);
>>
>> This is git-commit, shouldn't we die() here instead of ignoring errors
>> in sub-processes?
>
> After thinking about it carefully, your opinion is more
> reasonable, because if the user uses the wrong `--trailer`
> and does not get the information he needs, I think he will
> have to use `--amend` to modify, and `die()` can exit
> this commit directly.

Yeah, we don't want to silently lose data.

>>
>> > +             strvec_clear(&trailer_args);
>> > +     }
>> > +
>> >       /*
>> >        * Reject an attempt to record a non-merge empty commit without
>> >        * explicit --allow-empty. In the cherry-pick case, it may be
>> > @@ -1507,6 +1529,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>> >               OPT_STRING(0, "fixup", &fixup_message, N_("commit"), N_("use autosquash formatted message to fixup specified commit")),
>> >               OPT_STRING(0, "squash", &squash_message, N_("commit"), N_("use autosquash formatted message to squash specified commit")),
>> >               OPT_BOOL(0, "reset-author", &renew_authorship, N_("the commit is authored by me now (used with -C/-c/--amend)")),
>> > +             OPT_CALLBACK_F(0, "trailer", NULL, N_("trailer"), N_("trailer(s) to add"), PARSE_OPT_NONEG, opt_pass_trailer),
>> >               OPT_BOOL('s', "signoff", &signoff, N_("add a Signed-off-by trailer")),
>>
>> Not required for this change, but perhaps a change here to N_() (if we
>> can get it to fit) + doc update saying that we prefer
>> --trailer="Signed-Off-By: to --signoff"? More on that later.
>>
>> >               OPT_FILENAME('t', "template", &template_file, N_("use specified template file")),
>> >               OPT_BOOL('e', "edit", &edit_flag, N_("force edit of commit")),
>> > diff --git a/t/t7502-commit-porcelain.sh b/t/t7502-commit-porcelain.sh
>> > index 6396897cc818..0acf23799931 100755
>> > --- a/t/t7502-commit-porcelain.sh
>> > +++ b/t/t7502-commit-porcelain.sh
>> > @@ -154,6 +154,26 @@ test_expect_success 'sign off' '
>> >
>> >  '
>> >
>> > +test_expect_success 'trailer' '
>> > +     >file1 &&
>> > +     git add file1 &&
>> > +     git commit -s --trailer "Signed-off-by:C O Mitter1 <committer1@example.com>" \
>> > +             --trailer "Helped-by:C O Mitter2 <committer2@example.com>"  \
>> > +             --trailer "Reported-by:C O Mitter3 <committer3@example.com>" \
>> > +             --trailer "Mentored-by:C O Mitter4 <committer4@example.com>" \
>> > +             -m "hello" &&
>> > +     git cat-file commit HEAD >commit.msg &&
>> > +     sed -e "1,7d" commit.msg >actual &&
>> > +     cat >expected <<-\EOF &&
>> > +     Signed-off-by: C O Mitter <committer@example.com>
>> > +     Signed-off-by: C O Mitter1 <committer1@example.com>
>> > +     Helped-by: C O Mitter2 <committer2@example.com>
>> > +     Reported-by: C O Mitter3 <committer3@example.com>
>> > +     Mentored-by: C O Mitter4 <committer4@example.com>
>> > +     EOF
>> > +     test_cmp expected actual
>> > +'
>> > +
>>
>> How does this interact with cases where the user has configured
>> "trailer.separators" to have a value that doesn't contain ":"?  I
>> haven't tested, but my reading of git-interpret-trailers(1) is that if
>> you supplied "=" instead that case would just work:
>>
>>     By default only : is recognized as a trailer separator, except that
>>     = is always accepted on the command line for compatibility with
>>     other git commands.
>>
> But interpret_trailers interface allow us use "=" instead of other separators.
>
> I did a simple test and modified the configuration "trailer.separators"
> and it still works. Now things are good here:
>
> $ git -c trailer.separators="@" commit --trailer="Signed-off-by=C O <email>"
>
> or
>
> $ git -c trailer.separators="@" commit --trailer="Signed-off-by@C O <email>"
>
> Both can work normally,
>
> --trailer="Signed-off-by@ C O <email>"
>
> will output in the commit message.
>
>> I don't know if that does the right thing in the presence of
>> --if-exists=add.
>>
>
> Yesterday, Christian Couder and I had already discussed this issue:
> Your idea is correct, I should not add "--if-exists = add",  this will destroy
> the user's rights to configure by using `git -c trailer.if-exist`.
>
>> So it would be good to update these tests so you test:
>>
>>  * For the --if-exists=add case at all, there's no tests for it
>>    now. I.e. add some trailers manually to the commit (via -F or
>>    whatever) and then see if they get added to, replacet etc.
>>
>>  * Ditto but for the user having configured trailer.separators (see the
>>    test_config helper for how to set config in a test). I.e. if it's "="
>>    does adding trailers work, how about if it's "=" on the CLI but the
>>    config/commit message has ";" instead of ":" or something?
>>
>
> As mentioned above, it works normally.
>
>>  * Hrm, actually I think tweaking "-c trailer.ifexists" won't work at
>>    all, since the CLI switch would override it. I honestly don't know,
>>    but why not not supply it and keep the addIfDifferentNeighbor
>>    default?
>>
>>    If it's essential that seems like a good test / documentation
>>    addition...
>>
>>  * For the above -c ... case I can't think of a good way to deal with it
>>    that doesn't involve pulling in git_trailer_config() into
>>    git_commit_config(), but perhaps the least nasty way is to just set a
>>    flag in git_commit_config() if we see a "trailer.ifexists" flag, and
>>    if so don't provide "--if-exists=add", if there's no config (this
>>    will include "git -c ... commit" we set provide "--if-exists=add" )
>>    or as noted above, maybe we can skip the whole thing and use the
>>    addIfDifferentNeighbor default.
>>
>
> Has been restored to the default settings.

To clarify: What I really mean is for all these things you've tested:
let's add those to the tests as part of the patch.

>> And, not needed for this patch but worth thinking about:
>>
>>  * We pass through --trailer to git-interpret-trailers, what should we
>>    do about the other options? Should git-commit eventually support
>>    --trailer-where and pass it along as --where to
>>    git-interpret-trailers, or is "git -c trailer.where=... commit" good
>>    enough?
>>
> Logically speaking, `interpret_trailers` should be dedicated to `commit`
> or other sub-commands that require trailers.
>
> But I think that in the later stage, the parse_options of the `cmd_commit`
> can keep the unrecognized options, and then these choices can be directly
> passed to the `interpret_trailers` backend.

We have this interaction with e.g. range-diff and "log", it's often
surprising. You add an option to one command and it appears in the
other.

>>  * It would be good to test for and document if that "-c trailer.*"
>>    trick works (no reason it shouldn't). I.e. to add something like this
>>    after what you have (along with tests, and check if it's even true):
>>
>
> I haven't tested them for the time being, but I will do it.
>
>>        Only the `--trailer` argument to
>>        linkgit:git-interpret-trailers[1] is supported. Other
>>        pass-through switches may be added in the future, but currently
>>        you'll need to pass arguments to
>>        linkgit:git-interpret-trailers[1] along as config, e.g. `git -c
>>        trailer.where=start commit [...] --trailer=[...]`.
>>
>
> I think this is worth writing in the documentation.
>
>>  * We have a longer-term goal of having the .mailmap apply to trailers,
>>    it would be nice if git-interpret-trailers had some fuzzy-matching to
>>    check if the RHS of a trailer is a name/E-Mail pair, and if so did
>>    stricter validation on it with the ident functions we use for fsck
>>    etc. (that's copied & subtly different in several different places in
>>    the codebase, unfortunately[1]).
>>
>
> I may not know much about fuzzy-matching, which may be worth studying later.
>
>> More thoughts:
>>
>>  * Having written all the above I checked how --signoff is implemented.
>>
>>    It seems to me to be a good idea to (at least for testing) convert
>>    the --signoff trailer to your implementation. We have plenty of tests
>>    for it, does migrating it over pass or fail those?
>>
> I don’t know how to migrating yet, it may take a long time.
> Even I think I can leave it as #leftoverbit later.

Sure, I mean (having looked at it) that at least for your own local
testing it would make sense to change it (even if just search-replacing
the --signoff in the test suite) to see if it behaves as you
expect. I.e. does the --trailer behavior mirror --signoff?

>>  * I also agree with Junio that we shouldn't have a --fixed-by or
>>    whatever and wouldn't add --signoff today, but it seems very useful
>>    to me to have a shortcut like:
>>
>>        --trailer "Signed-off-by"
>>
>>    I.e. omitting the value, or:
>>
>>       --trailer "Signed-off-by="
>>
>>    Or some other thing we deem sufficiently useful/sane
>>    syntax/unambiguous.n
>>
>>    Then the value would be provided by fmt_name(WANT_COMMITTER_IDENT)
>>    just as we do in append_signoff() now. I think a *very common* case
>>    for this would be something like:
>>
>>        git commit --amend -v --trailer "Reviewed-by"
>>
>>    And it would be useful to help that along and not have to do:
>>
>>        git commit --amend -v --trailer "Reviewed-by=$(git config user.name) <$(git config user.email)>"
>>
>>    Or worse yet, manually typo your name/e-mail address, as I'm sure I
>>    and many others will inevitably do when using this option...
>>
> I think this idea is very good and easy to implement.
> We only need to do a simple string match when we get the "trailer" string,
> If it can be completed, it can indeed bring great convenience to users.
>
>> 1. https://lore.kernel.org/git/87bld8ov9q.fsf@evledraar.gmail.com/
>
> Thanks, Ævar Arnfjörð Bjarmason!

And thanks for working on this.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v8 1/2] [GSOC] commit: add --trailer option
  @ 2021-03-17  2:01  5%       ` ZheNing Hu
  2021-03-17  8:08  0%         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 200+ results
From: ZheNing Hu @ 2021-03-17  2:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: ZheNing Hu via GitGitGadget, Git List, Bradley M. Kuhn,
	Junio C Hamano, Brandon Casey, Shourya Shukla, Christian Couder,
	Rafael Silva

Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年3月16日周二 下午8:52写道：
> > +             if (run_command(&run_trailer))
> > +                     strvec_clear(&run_trailer.args);
>
> This is git-commit, shouldn't we die() here instead of ignoring errors
> in sub-processes?

After thinking about it carefully, your opinion is more
reasonable, because if the user uses the wrong `--trailer`
and does not get the information he needs, I think he will
have to use `--amend` to modify, and `die()` can exit
this commit directly.

>
> > +             strvec_clear(&trailer_args);
> > +     }
> > +
> >       /*
> >        * Reject an attempt to record a non-merge empty commit without
> >        * explicit --allow-empty. In the cherry-pick case, it may be
> > @@ -1507,6 +1529,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
> >               OPT_STRING(0, "fixup", &fixup_message, N_("commit"), N_("use autosquash formatted message to fixup specified commit")),
> >               OPT_STRING(0, "squash", &squash_message, N_("commit"), N_("use autosquash formatted message to squash specified commit")),
> >               OPT_BOOL(0, "reset-author", &renew_authorship, N_("the commit is authored by me now (used with -C/-c/--amend)")),
> > +             OPT_CALLBACK_F(0, "trailer", NULL, N_("trailer"), N_("trailer(s) to add"), PARSE_OPT_NONEG, opt_pass_trailer),
> >               OPT_BOOL('s', "signoff", &signoff, N_("add a Signed-off-by trailer")),
>
> Not required for this change, but perhaps a change here to N_() (if we
> can get it to fit) + doc update saying that we prefer
> --trailer="Signed-Off-By: to --signoff"? More on that later.
>
> >               OPT_FILENAME('t', "template", &template_file, N_("use specified template file")),
> >               OPT_BOOL('e', "edit", &edit_flag, N_("force edit of commit")),
> > diff --git a/t/t7502-commit-porcelain.sh b/t/t7502-commit-porcelain.sh
> > index 6396897cc818..0acf23799931 100755
> > --- a/t/t7502-commit-porcelain.sh
> > +++ b/t/t7502-commit-porcelain.sh
> > @@ -154,6 +154,26 @@ test_expect_success 'sign off' '
> >
> >  '
> >
> > +test_expect_success 'trailer' '
> > +     >file1 &&
> > +     git add file1 &&
> > +     git commit -s --trailer "Signed-off-by:C O Mitter1 <committer1@example.com>" \
> > +             --trailer "Helped-by:C O Mitter2 <committer2@example.com>"  \
> > +             --trailer "Reported-by:C O Mitter3 <committer3@example.com>" \
> > +             --trailer "Mentored-by:C O Mitter4 <committer4@example.com>" \
> > +             -m "hello" &&
> > +     git cat-file commit HEAD >commit.msg &&
> > +     sed -e "1,7d" commit.msg >actual &&
> > +     cat >expected <<-\EOF &&
> > +     Signed-off-by: C O Mitter <committer@example.com>
> > +     Signed-off-by: C O Mitter1 <committer1@example.com>
> > +     Helped-by: C O Mitter2 <committer2@example.com>
> > +     Reported-by: C O Mitter3 <committer3@example.com>
> > +     Mentored-by: C O Mitter4 <committer4@example.com>
> > +     EOF
> > +     test_cmp expected actual
> > +'
> > +
>
> How does this interact with cases where the user has configured
> "trailer.separators" to have a value that doesn't contain ":"?  I
> haven't tested, but my reading of git-interpret-trailers(1) is that if
> you supplied "=" instead that case would just work:
>
>     By default only : is recognized as a trailer separator, except that
>     = is always accepted on the command line for compatibility with
>     other git commands.
>
But interpret_trailers interface allow us use "=" instead of other separators.

I did a simple test and modified the configuration "trailer.separators"
and it still works. Now things are good here:

$ git -c trailer.separators="@" commit --trailer="Signed-off-by=C O <email>"

or

$ git -c trailer.separators="@" commit --trailer="Signed-off-by@C O <email>"

Both can work normally,

--trailer="Signed-off-by@ C O <email>"

will output in the commit message.

> I don't know if that does the right thing in the presence of
> --if-exists=add.
>

Yesterday, Christian Couder and I had already discussed this issue:
Your idea is correct, I should not add "--if-exists = add",  this will destroy
the user's rights to configure by using `git -c trailer.if-exist`.

> So it would be good to update these tests so you test:
>
>  * For the --if-exists=add case at all, there's no tests for it
>    now. I.e. add some trailers manually to the commit (via -F or
>    whatever) and then see if they get added to, replacet etc.
>
>  * Ditto but for the user having configured trailer.separators (see the
>    test_config helper for how to set config in a test). I.e. if it's "="
>    does adding trailers work, how about if it's "=" on the CLI but the
>    config/commit message has ";" instead of ":" or something?
>

As mentioned above, it works normally.

>  * Hrm, actually I think tweaking "-c trailer.ifexists" won't work at
>    all, since the CLI switch would override it. I honestly don't know,
>    but why not not supply it and keep the addIfDifferentNeighbor
>    default?
>
>    If it's essential that seems like a good test / documentation
>    addition...
>
>  * For the above -c ... case I can't think of a good way to deal with it
>    that doesn't involve pulling in git_trailer_config() into
>    git_commit_config(), but perhaps the least nasty way is to just set a
>    flag in git_commit_config() if we see a "trailer.ifexists" flag, and
>    if so don't provide "--if-exists=add", if there's no config (this
>    will include "git -c ... commit" we set provide "--if-exists=add" )
>    or as noted above, maybe we can skip the whole thing and use the
>    addIfDifferentNeighbor default.
>

Has been restored to the default settings.

> And, not needed for this patch but worth thinking about:
>
>  * We pass through --trailer to git-interpret-trailers, what should we
>    do about the other options? Should git-commit eventually support
>    --trailer-where and pass it along as --where to
>    git-interpret-trailers, or is "git -c trailer.where=... commit" good
>    enough?
>
Logically speaking, `interpret_trailers` should be dedicated to `commit`
or other sub-commands that require trailers.

But I think that in the later stage, the parse_options of the `cmd_commit`
can keep the unrecognized options, and then these choices can be directly
passed to the `interpret_trailers` backend.

>  * It would be good to test for and document if that "-c trailer.*"
>    trick works (no reason it shouldn't). I.e. to add something like this
>    after what you have (along with tests, and check if it's even true):
>

I haven't tested them for the time being, but I will do it.

>        Only the `--trailer` argument to
>        linkgit:git-interpret-trailers[1] is supported. Other
>        pass-through switches may be added in the future, but currently
>        you'll need to pass arguments to
>        linkgit:git-interpret-trailers[1] along as config, e.g. `git -c
>        trailer.where=start commit [...] --trailer=[...]`.
>

I think this is worth writing in the documentation.

>  * We have a longer-term goal of having the .mailmap apply to trailers,
>    it would be nice if git-interpret-trailers had some fuzzy-matching to
>    check if the RHS of a trailer is a name/E-Mail pair, and if so did
>    stricter validation on it with the ident functions we use for fsck
>    etc. (that's copied & subtly different in several different places in
>    the codebase, unfortunately[1]).
>

I may not know much about fuzzy-matching, which may be worth studying later.

> More thoughts:
>
>  * Having written all the above I checked how --signoff is implemented.
>
>    It seems to me to be a good idea to (at least for testing) convert
>    the --signoff trailer to your implementation. We have plenty of tests
>    for it, does migrating it over pass or fail those?
>
I don’t know how to migrating yet, it may take a long time.
Even I think I can leave it as #leftoverbit later.

>  * I also agree with Junio that we shouldn't have a --fixed-by or
>    whatever and wouldn't add --signoff today, but it seems very useful
>    to me to have a shortcut like:
>
>        --trailer "Signed-off-by"
>
>    I.e. omitting the value, or:
>
>       --trailer "Signed-off-by="
>
>    Or some other thing we deem sufficiently useful/sane
>    syntax/unambiguous.n
>
>    Then the value would be provided by fmt_name(WANT_COMMITTER_IDENT)
>    just as we do in append_signoff() now. I think a *very common* case
>    for this would be something like:
>
>        git commit --amend -v --trailer "Reviewed-by"
>
>    And it would be useful to help that along and not have to do:
>
>        git commit --amend -v --trailer "Reviewed-by=$(git config user.name) <$(git config user.email)>"
>
>    Or worse yet, manually typo your name/e-mail address, as I'm sure I
>    and many others will inevitably do when using this option...
>
I think this idea is very good and easy to implement.
We only need to do a simple string match when we get the "trailer" string,
If it can be completed, it can indeed bring great convenience to users.

> 1. https://lore.kernel.org/git/87bld8ov9q.fsf@evledraar.gmail.com/

Thanks, Ævar Arnfjörð Bjarmason!

--
ZheNing Hu

^ permalink raw reply	[relevance 5%]

* Re: Regarding GSoC Project 2021
  @ 2021-03-14  1:26  6% ` ZheNing Hu
  0 siblings, 0 replies; 200+ results
From: ZheNing Hu @ 2021-03-14  1:26 UTC (permalink / raw)
  To: Shubham Verma; +Cc: Git List, Christian Couder, Hariom verma

Hi, Shubhum Verma,

Shubham Verma <shubhunic@gmail.com> 于2021年3月14日周日 上午2:41写道：
>
> Hello Everyone,
>
> I am interested in working on the project "Use ref-filter formats in
> git cat-file" this summer as a GSoC student.
>
> But I saw ZheNing Hu already started contributing to this project. So,
> it is possible that more than one student can work on the same project
> or I have to choose another project?
>
> Otherwise, I will start working on the project "Finish convert git
> submodule script to builtin".
>
> Thank You! & Regards,
> Shubham

I haven't started sending relevant patches for `git cat-file`.
Now I'm still focused on solving some GGG issues, or
Add small options for subcommands that might be fun or useful,
And so you can still choose `git cat-file` to apply for.
But I recommend that you think about and complete a micro project
before applying for "git cat-file"or "git submodule", usually in
https://github.com/gitgitgadget/git/labels/leftoverbits or
https://github.com/gitgitgadget/git/labels/good first issue

Thanks.
--
ZheNing Hu

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 10/11] merge-ort: write $GIT_DIR/AUTO_MERGE whenever we hit a conflict
  @ 2021-03-08 21:51  6%     ` Elijah Newren
  0 siblings, 0 replies; 200+ results
From: Elijah Newren @ 2021-03-08 21:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Jonathan Nieder,
	Derrick Stolee, Junio C Hamano

On Mon, Mar 8, 2021 at 5:11 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Fri, Mar 05 2021, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > There are a variety of questions users might ask while resolving
> > conflicts:
> >   * What changes have been made since the previous (first) parent?
> >   * What changes are staged?
> >   * What is still unstaged? (or what is still conflicted?)
> >   * What changes did I make to resolve conflicts so far?
> > The first three of these have simple answers:
> >   * git diff HEAD
> >   * git diff --cached
> >   * git diff
> > There was no way to answer the final question previously.  Adding one
> > is trivial in merge-ort, since it works by creating a tree representing
> > what should be written to the working copy complete with conflict
> > markers.  Simply write that tree to .git/AUTO_MERGE, allowing users to
> > answer the fourth question with
> >   * git diff AUTO_MERGE
> >
> > I avoided using a name like "MERGE_AUTO", because that would be
> > merge-specific (much like MERGE_HEAD, REBASE_HEAD, REVERT_HEAD,
> > CHERRY_PICK_HEAD) and I wanted a name that didn't change depending on
> > which type of operation the merge was part of.
>
> That's a really cool feature. I'm starting to like this "ort" thing :)
>
> (After knowing almost nothing about it until a few days ago...)
>
> > Ensure that paths which clean out other temporary operation-specific
> > files (e.g. CHERRY_PICK_HEAD, MERGE_MSG, rebase-merge/ state directory)
> > also clean out this AUTO_MERGE file.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  branch.c         |  1 +
> >  builtin/rebase.c |  1 +
> >  merge-ort.c      | 10 ++++++++++
> >  path.c           |  1 +
> >  path.h           |  2 ++
> >  sequencer.c      |  5 +++++
> >  6 files changed, 20 insertions(+)
> >
> > diff --git a/branch.c b/branch.c
> > index 9c9dae1eae32..b71a2de29dbe 100644
> > --- a/branch.c
> > +++ b/branch.c
> > @@ -344,6 +344,7 @@ void remove_merge_branch_state(struct repository *r)
> >       unlink(git_path_merge_rr(r));
> >       unlink(git_path_merge_msg(r));
> >       unlink(git_path_merge_mode(r));
> > +     unlink(git_path_auto_merge(r));
> >       save_autostash(git_path_merge_autostash(r));
> >  }
> >
> > diff --git a/builtin/rebase.c b/builtin/rebase.c
> > index de400f9a1973..7d9afe118fd4 100644
> > --- a/builtin/rebase.c
> > +++ b/builtin/rebase.c
> > @@ -739,6 +739,7 @@ static int finish_rebase(struct rebase_options *opts)
> >       int ret = 0;
> >
> >       delete_ref(NULL, "REBASE_HEAD", NULL, REF_NO_DEREF);
> > +     unlink(git_path_auto_merge(the_repository));
> >       apply_autostash(state_dir_path("autostash", opts));
> >       close_object_store(the_repository->objects);
> >       /*
> > diff --git a/merge-ort.c b/merge-ort.c
> > index 37b69cbe0f9a..cf927cd160e1 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -3362,6 +3362,9 @@ void merge_switch_to_result(struct merge_options *opt,
> >  {
> >       assert(opt->priv == NULL);
> >       if (result->clean >= 0 && update_worktree_and_index) {
> > +             const char *filename;
> > +             FILE *fp;
> > +
> >               trace2_region_enter("merge", "checkout", opt->repo);
> >               if (checkout(opt, head, result->tree)) {
> >                       /* failure to function */
> > @@ -3380,6 +3383,13 @@ void merge_switch_to_result(struct merge_options *opt,
> >               }
> >               opt->priv = NULL;
> >               trace2_region_leave("merge", "record_conflicted", opt->repo);
> > +
> > +             trace2_region_enter("merge", "write_auto_merge", opt->repo);
> > +             filename = git_path_auto_merge(opt->repo);
> > +             fp = xfopen(filename, "w");
> > +             fprintf(fp, "%s\n", oid_to_hex(&result->tree->object.oid));
> > +             fclose(fp);
> > +             trace2_region_leave("merge", "write_auto_merge", opt->repo);
>
> This isn't a new problem since you're just folling an existing pattern,
> but here you (rightly) do xopen()< and the:n

Looks like your comment got garbled/truncated.  Do you remember the
rest of what you were going to say here?

> >       }
> >
> >       if (display_update_msgs) {
> > diff --git a/path.c b/path.c
> > index 7b385e5eb282..9e883eb52446 100644
> > --- a/path.c
> > +++ b/path.c
> > @@ -1534,5 +1534,6 @@ REPO_GIT_PATH_FUNC(merge_rr, "MERGE_RR")
> >  REPO_GIT_PATH_FUNC(merge_mode, "MERGE_MODE")
> >  REPO_GIT_PATH_FUNC(merge_head, "MERGE_HEAD")
> >  REPO_GIT_PATH_FUNC(merge_autostash, "MERGE_AUTOSTASH")
> > +REPO_GIT_PATH_FUNC(auto_merge, "AUTO_MERGE")
> >  REPO_GIT_PATH_FUNC(fetch_head, "FETCH_HEAD")
> >  REPO_GIT_PATH_FUNC(shallow, "shallow")
> > diff --git a/path.h b/path.h
> > index e7e77da6aaa5..251c78d98000 100644
> > --- a/path.h
> > +++ b/path.h
> > @@ -176,6 +176,7 @@ struct path_cache {
> >       const char *merge_mode;
> >       const char *merge_head;
> >       const char *merge_autostash;
> > +     const char *auto_merge;
> >       const char *fetch_head;
> >       const char *shallow;
> >  };
> > @@ -191,6 +192,7 @@ const char *git_path_merge_rr(struct repository *r);
> >  const char *git_path_merge_mode(struct repository *r);
> >  const char *git_path_merge_head(struct repository *r);
> >  const char *git_path_merge_autostash(struct repository *r);
> > +const char *git_path_auto_merge(struct repository *r);
> >  const char *git_path_fetch_head(struct repository *r);
> >  const char *git_path_shallow(struct repository *r);
> >
> > diff --git a/sequencer.c b/sequencer.c
> > index d2332d3e1787..472cdd8c620d 100644
> > --- a/sequencer.c
> > +++ b/sequencer.c
> > @@ -2096,6 +2096,7 @@ static int do_pick_commit(struct repository *r,
> >               refs_delete_ref(get_main_ref_store(r), "", "CHERRY_PICK_HEAD",
> >                               NULL, 0);
> >               unlink(git_path_merge_msg(r));
> > +             unlink(git_path_auto_merge(r));
>
> Shouldn't this & the rest ideally be at least unlink_or_warn()?

Perhaps, but I think that should be a follow-on series or
#leftoverbits.  I'm having enough trouble getting reviews (I think I'm
burning Stolee out after the last half year) without making my series
longer for tangential cleanups.  :-)

I'm starting to worry that despite having all the remaining patches
ready (and most have been ready for months), that we won't be able to
get merge-ort done before git-2.32 -- the -rc0 for it is now just
under three months away.

> >               fprintf(stderr,
> >                       _("dropping %s %s -- patch contents already upstream\n"),
> >                       oid_to_hex(&commit->object.oid), msg.subject);
> > @@ -2451,6 +2452,8 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
> >               need_cleanup = 1;
> >       }
> >
> > +     unlink(git_path_auto_merge(r));
> > +
> >       if (!need_cleanup)
> >               return;
> >
> > @@ -4111,6 +4114,7 @@ static int pick_commits(struct repository *r,
> >                       unlink(rebase_path_stopped_sha());
> >                       unlink(rebase_path_amend());
> >                       unlink(git_path_merge_head(r));
> > +                     unlink(git_path_auto_merge(r));
> >                       delete_ref(NULL, "REBASE_HEAD", NULL, REF_NO_DEREF);
> >
> >                       if (item->command == TODO_BREAK) {
> > @@ -4505,6 +4509,7 @@ static int commit_staged_changes(struct repository *r,
> >               return error(_("could not commit staged changes."));
> >       unlink(rebase_path_amend());
> >       unlink(git_path_merge_head(r));
> > +     unlink(git_path_auto_merge(r));
> >       if (final_fixup) {
> >               unlink(rebase_path_fixup_msg());
> >               unlink(rebase_path_squash_msg());

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 08/11] merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries
  @ 2021-03-08 20:54  6%     ` Elijah Newren
  0 siblings, 0 replies; 200+ results
From: Elijah Newren @ 2021-03-08 20:54 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Jonathan Nieder,
	Derrick Stolee, Junio C Hamano

On Mon, Mar 8, 2021 at 5:06 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
>
> On Fri, Mar 05 2021, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > When merge conflicts occur in paths removed by a sparse-checkout, we
> > need to unsparsify those paths (clear the SKIP_WORKTREE bit), and write
> > out the conflicted file to the working copy.  In the very unlikely case
> > that someone manually put a file into the working copy at the location
> > of the SKIP_WORKTREE file, we need to avoid overwriting whatever edits
> > they have made and move that file to a different location first.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  merge-ort.c                       | 43 +++++++++++++++++++++----------
> >  t/t6428-merge-conflicts-sparse.sh |  4 +--
> >  2 files changed, 32 insertions(+), 15 deletions(-)
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index a998f843a1da..37b69cbe0f9a 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -3235,23 +3235,27 @@ static int checkout(struct merge_options *opt,
> >       return ret;
> >  }
> >
> > -static int record_conflicted_index_entries(struct merge_options *opt,
> > -                                        struct index_state *index,
> > -                                        struct strmap *paths,
> > -                                        struct strmap *conflicted)
> > +static int record_conflicted_index_entries(struct merge_options *opt)
> >  {
> >       struct hashmap_iter iter;
> >       struct strmap_entry *e;
> > +     struct index_state *index = opt->repo->index;
> > +     struct checkout state = CHECKOUT_INIT;
> >       int errs = 0;
> >       int original_cache_nr;
> >
> > -     if (strmap_empty(conflicted))
> > +     if (strmap_empty(&opt->priv->conflicted))
> >               return 0;
> >
> > +     /* If any entries have skip_worktree set, we'll have to check 'em out */
> > +     state.force = 1;
> > +     state.quiet = 1;
> > +     state.refresh_cache = 1;
> > +     state.istate = index;
> >       original_cache_nr = index->cache_nr;
> >
> >       /* Put every entry from paths into plist, then sort */
> > -     strmap_for_each_entry(conflicted, &iter, e) {
> > +     strmap_for_each_entry(&opt->priv->conflicted, &iter, e) {
> >               const char *path = e->key;
> >               struct conflict_info *ci = e->value;
> >               int pos;
> > @@ -3292,9 +3296,23 @@ static int record_conflicted_index_entries(struct merge_options *opt,
> >                        * the higher order stages.  Thus, we need override
> >                        * the CE_SKIP_WORKTREE bit and manually write those
> >                        * files to the working disk here.
> > -                      *
> > -                      * TODO: Implement this CE_SKIP_WORKTREE fixup.
> >                        */
> > +                     if (ce_skip_worktree(ce)) {
> > +                             struct stat st;
> > +
> > +                             if (!lstat(path, &st)) {
> > +                                     char *new_name = unique_path(&opt->priv->paths,
> > +                                                                  path,
> > +                                                                  "cruft");
> > +
> > +                                     path_msg(opt, path, 1,
> > +                                              _("Note: %s not up to date and in way of checking out conflicted version; old copy renamed to %s"),
> > +                                              path, new_name);
>
> I see this follows existing uses in merge-ort.c, but I wonder if this
> won't be quite unreadable on long paths, i.e.:
>
>     <long x> renamed to <long x.new>
>
> As opposed to:
>
>     We had to rename your thing:
>         from: <long x>
>           to: <long x.new>

Makes sense, but it seems like something we'd want to do to a lot of
messages rather than just this one.  For now, especially given that I
expect this particular message to be *very* rare, I think I'll leave
this one as-is for now but we can address this idea in a subsequent
series or as #leftoverbits.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 05/11] merge-ort: let renormalization change modify/delete into clean delete
  @ 2021-03-08 12:55  6%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2021-03-08 12:55 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Jonathan Nieder, Derrick Stolee, Junio C Hamano,
	Elijah Newren


On Fri, Mar 05 2021, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> When we have a modify/delete conflict, but the only change to the
> modification is e.g. change of line endings, then if renormalization is
> requested then we should be able to recognize such a case as a
> not-modified/delete and resolve the conflict automatically.
>
> This fixes t6418.10 under GIT_TEST_MERGE_ALGORITHM=ort.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 61 insertions(+), 2 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 87c553c0882c..c4bd88b9d3db 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -2416,6 +2416,60 @@ static int string_list_df_name_compare(const char *one, const char *two)
>  	return onelen - twolen;
>  }
>  
> +static int read_oid_strbuf(struct merge_options *opt,
> +			   const struct object_id *oid,
> +			   struct strbuf *dst)
> +{
> +	void *buf;
> +	enum object_type type;
> +	unsigned long size;
> +	buf = read_object_file(oid, &type, &size);
> +	if (!buf)
> +		return err(opt, _("cannot read object %s"), oid_to_hex(oid));
> +	if (type != OBJ_BLOB) {
> +		free(buf);
> +		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));

As an aside I've got another series I'll submit soon which refactors all
these "object is not xyz" calls to a utility function, so in this case
we'd also say what it was other than a blob.

Fine to keep this here, just a #leftoverbits note to myself to
eventually migrate this.

> +	}
> +	strbuf_attach(dst, buf, size, size + 1);
> +	return 0;
> +}
> +
> +static int blob_unchanged(struct merge_options *opt,
> +			  const struct version_info *base,
> +			  const struct version_info *side,
> +			  const char *path)
> +{
> +	struct strbuf basebuf = STRBUF_INIT;
> +	struct strbuf sidebuf = STRBUF_INIT;
> +	int ret = 0; /* assume changed for safety */
> +	const struct index_state *idx = &opt->priv->attr_index;
> +
> +	initialize_attr_index(opt);
> +
> +	if (base->mode != side->mode)
> +		return 0;
> +	if (oideq(&base->oid, &side->oid))
> +		return 1;
> +
> +	if (read_oid_strbuf(opt, &base->oid, &basebuf) ||
> +	    read_oid_strbuf(opt, &side->oid, &sidebuf))
> +		goto error_return;
> +	/*
> +	 * Note: binary | is used so that both renormalizations are
> +	 * performed.  Comparison can be skipped if both files are
> +	 * unchanged since their sha1s have already been compared.
> +	 */
> +	if (renormalize_buffer(idx, path, basebuf.buf, basebuf.len, &basebuf) |
> +	    renormalize_buffer(idx, path, sidebuf.buf, sidebuf.len, &sidebuf))
> +		ret = (basebuf.len == sidebuf.len &&
> +		       !memcmp(basebuf.buf, sidebuf.buf, basebuf.len));
> +
> +error_return:
> +	strbuf_release(&basebuf);
> +	strbuf_release(&sidebuf);
> +	return ret;
> +}
> +
>
>  struct directory_versions {
>  	/*
>  	 * versions: list of (basename -> version_info)
> @@ -3003,8 +3057,13 @@ static void process_entry(struct merge_options *opt,
>  		modify_branch = (side == 1) ? opt->branch1 : opt->branch2;
>  		delete_branch = (side == 1) ? opt->branch2 : opt->branch1;
>  
> -		if (ci->path_conflict &&
> -		    oideq(&ci->stages[0].oid, &ci->stages[side].oid)) {
> +		if (opt->renormalize &&
> +		    blob_unchanged(opt, &ci->stages[0], &ci->stages[side],
> +				   path)) {
> +			ci->merged.is_null = 1;
> +			ci->merged.clean = 1;
> +		} else if (ci->path_conflict &&
> +			   oideq(&ci->stages[0].oid, &ci->stages[side].oid)) {

Small note (no need for re-roll or whatever) on having read a bit of
merge-ort.c code recently: I'd find this thing a bit easier on the eyes
if ci->stages[0] and ci->stages[side] were split into a variable before
the if/else, i.e. used as "side_0.oid and side_n.oid" and "side_0 and
side_n" in this case..

That would also avoid the wrapping of at least one argument list here.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 1/2] remote: add camel-cased *.tagOpt key, like clone
  2021-02-25  1:21  5% [PATCH 1/2] remote: add camel-cased *.tagOpt key, like clone Ævar Arnfjörð Bjarmason
@ 2021-02-25  3:02  0% ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2021-02-25  3:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Bert Wesarg

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change "git remote add" so that it adds a *.tagOpt key, and not the
> lower-cased *.tagopt on "git remote add --no-tags", just as "git clone
> --no-tags" would do.
>
> This doesn't matter for anything that reads the config. It's just
> prettier if we write config keys in their documented camelCase form to
> user-readable config files.
>
> When I added support for "clone -no-tags" in 0dab2468ee5 (clone: add a
> --no-tags option to clone without tags, 2017-04-26) I made it use
> the *.tagOpt form, but the older "git remote add" added in
> 111fb858654 (remote add: add a --[no-]tags option, 2010-04-20) has
> been using *.tagopt all this time.
>
> It's easy enough to add a test for this, so let's do that. We can't
> use "git config -l" there, because it'll normalize the keys to their
> lower-cased form. Let's add the test for "git clone" too for good
> measure, not just to the "git remote" codepath we're fixing.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>
> I also noticed that we write e.g. init.objectformat instead of
> init.objectFormat, and core.logallrefupdates etc. If anyone's got an
> even even worse case of OCD there's an interesting #leftoverbits
> project there of scouring the code for more cases of this sort of
> thing...
>
>  builtin/remote.c         | 2 +-
>  t/t5505-remote.sh        | 1 +
>  t/t5612-clone-refspec.sh | 1 +
>  3 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/remote.c b/builtin/remote.c
> index d11a5589e49..f286ae97538 100644
> --- a/builtin/remote.c
> +++ b/builtin/remote.c
> @@ -221,7 +221,7 @@ static int add(int argc, const char **argv)
>  
>  	if (fetch_tags != TAGS_DEFAULT) {
>  		strbuf_reset(&buf);
> -		strbuf_addf(&buf, "remote.%s.tagopt", name);
> +		strbuf_addf(&buf, "remote.%s.tagOpt", name);

Good find.

A general rule for a name used to refer to a configuration variable
the C code ought to be

 - if it is used to match what the system gave us, make sure we use
   all lowercase for the first and the last component and match with
   strcmp(), not with strcasecmp().

 - if it is used to update, make sure we use the canonical spelling,
   if only for the documentation value.

> diff --git a/t/t5505-remote.sh b/t/t5505-remote.sh
> index 045398b94e6..2a7b5cd00a0 100755
> --- a/t/t5505-remote.sh
> +++ b/t/t5505-remote.sh
> @@ -594,6 +594,7 @@ test_expect_success 'add --no-tags' '
>  		cd add-no-tags &&
>  		git init &&
>  		git remote add -f --no-tags origin ../one &&
> +		grep tagOpt .git/config &&
>  		git tag -l some-tag >../test/output &&
>  		git tag -l foobar-tag >../test/output &&
>  		git config remote.origin.tagopt >>../test/output
> diff --git a/t/t5612-clone-refspec.sh b/t/t5612-clone-refspec.sh
> index 6a6af7449ca..3126cfd7e9d 100755
> --- a/t/t5612-clone-refspec.sh
> +++ b/t/t5612-clone-refspec.sh
> @@ -97,6 +97,7 @@ test_expect_success 'by default no tags will be kept updated' '
>  test_expect_success 'clone with --no-tags' '
>  	(
>  		cd dir_all_no_tags &&
> +		grep tagOpt .git/config &&
>  		git fetch &&
>  		git for-each-ref refs/tags >../actual
>  	) &&

^ permalink raw reply	[relevance 0%]

* [PATCH 1/2] remote: add camel-cased *.tagOpt key, like clone
@ 2021-02-25  1:21  5% Ævar Arnfjörð Bjarmason
  2021-02-25  3:02  0% ` Junio C Hamano
  0 siblings, 1 reply; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2021-02-25  1:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Bert Wesarg,
	Ævar Arnfjörð Bjarmason

Change "git remote add" so that it adds a *.tagOpt key, and not the
lower-cased *.tagopt on "git remote add --no-tags", just as "git clone
--no-tags" would do.

This doesn't matter for anything that reads the config. It's just
prettier if we write config keys in their documented camelCase form to
user-readable config files.

When I added support for "clone -no-tags" in 0dab2468ee5 (clone: add a
--no-tags option to clone without tags, 2017-04-26) I made it use
the *.tagOpt form, but the older "git remote add" added in
111fb858654 (remote add: add a --[no-]tags option, 2010-04-20) has
been using *.tagopt all this time.

It's easy enough to add a test for this, so let's do that. We can't
use "git config -l" there, because it'll normalize the keys to their
lower-cased form. Let's add the test for "git clone" too for good
measure, not just to the "git remote" codepath we're fixing.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

I also noticed that we write e.g. init.objectformat instead of
init.objectFormat, and core.logallrefupdates etc. If anyone's got an
even even worse case of OCD there's an interesting #leftoverbits
project there of scouring the code for more cases of this sort of
thing...

 builtin/remote.c         | 2 +-
 t/t5505-remote.sh        | 1 +
 t/t5612-clone-refspec.sh | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index d11a5589e49..f286ae97538 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -221,7 +221,7 @@ static int add(int argc, const char **argv)
 
 	if (fetch_tags != TAGS_DEFAULT) {
 		strbuf_reset(&buf);
-		strbuf_addf(&buf, "remote.%s.tagopt", name);
+		strbuf_addf(&buf, "remote.%s.tagOpt", name);
 		git_config_set(buf.buf,
 			       fetch_tags == TAGS_SET ? "--tags" : "--no-tags");
 	}
diff --git a/t/t5505-remote.sh b/t/t5505-remote.sh
index 045398b94e6..2a7b5cd00a0 100755
--- a/t/t5505-remote.sh
+++ b/t/t5505-remote.sh
@@ -594,6 +594,7 @@ test_expect_success 'add --no-tags' '
 		cd add-no-tags &&
 		git init &&
 		git remote add -f --no-tags origin ../one &&
+		grep tagOpt .git/config &&
 		git tag -l some-tag >../test/output &&
 		git tag -l foobar-tag >../test/output &&
 		git config remote.origin.tagopt >>../test/output
diff --git a/t/t5612-clone-refspec.sh b/t/t5612-clone-refspec.sh
index 6a6af7449ca..3126cfd7e9d 100755
--- a/t/t5612-clone-refspec.sh
+++ b/t/t5612-clone-refspec.sh
@@ -97,6 +97,7 @@ test_expect_success 'by default no tags will be kept updated' '
 test_expect_success 'clone with --no-tags' '
 	(
 		cd dir_all_no_tags &&
+		grep tagOpt .git/config &&
 		git fetch &&
 		git for-each-ref refs/tags >../actual
 	) &&
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[relevance 5%]

* Re: [PATCH 3/9] t3905: move all commands into test cases
  @ 2021-02-02 21:41  6%   ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2021-02-02 21:41 UTC (permalink / raw)
  To: Denton Liu; +Cc: Git Mailing List

Denton Liu <liu.denton@gmail.com> writes:

>  test_expect_success 'stash save --include-untracked stashed the untracked files' '
> +	tracked=$(git rev-parse --short $(echo 1 | git hash-object --stdin)) &&
> +	untracked=$(git rev-parse --short $(echo untracked | git hash-object --stdin)) &&

Not a new issue introduced by this patch, but

 * these will fail if blobs that record "1\n" and "untracked\n" do
   not exist in the repository already, because the hash-object
   command lacks the "-w" option.

 * the reason why they do not fail is because there are these blobs
   already; grabbing them using extended SHA-1 expression may be
   simpler to read, e.g.

	tracked=$(git rev-parse --short HEAD:file)

 * even if it is not trivial to get to such a blob object, it
   probably is easier to read the test if a file that has the
   desired contents in it is used, not an "echo", e.g.

	untracked=$(git rev-parse --short $(git hash-object -w untracked/untracked))

We may want to clean these up someday, but it does not have to be
part of this topic (#leftoverbits).

> +	cat >expect.diff <<-EOF &&
> +	diff --git a/HEAD b/HEAD
> +	new file mode 100644
> +	index 0000000..$tracked
> +	--- /dev/null
> +	+++ b/HEAD
> +	@@ -0,0 +1 @@
> +	+1
> +	diff --git a/file2 b/file2
> +	new file mode 100644
> +	index 0000000..$tracked
> +	--- /dev/null
> +	+++ b/file2
> +	@@ -0,0 +1 @@
> +	+1
> +	diff --git a/untracked/untracked b/untracked/untracked
> +	new file mode 100644
> +	index 0000000..$untracked
> +	--- /dev/null
> +	+++ b/untracked/untracked
> +	@@ -0,0 +1 @@
> +	+untracked
> +	EOF

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files
  @ 2021-01-25 17:44  5%       ` Taylor Blau
  0 siblings, 0 replies; 200+ results
From: Taylor Blau @ 2021-01-25 17:44 UTC (permalink / raw)
  To: Jeff King; +Cc: git, dstolee, gitster, jrnieder

On Fri, Jan 22, 2021 at 05:54:18PM -0500, Jeff King wrote:
> All of which is a really verbose way of saying: you might want to add a
> few words after the comma:
>
>   In real-world usage, Git is often performing many operations in the
>   revindex (i.e., rather than asking about a single object, we'd
>   generally ask about a range of history).
>
> :) But hopefully it shows that including the offsets is not really
> making things better for the cold cache anyway.

Thanks for including a compelling argument in favor of the approach that
I took in this patch.

I added something along the lines of what you suggested to the final
paragraph, so now it concludes nicely instead of ending in a comma. I
briefly considered whether I should add something about how these
operations scale and how the warming efforts are really amortized across
all of the objects, but I decided against it.

I think that this argument is already documented here, and that there's
no way to concisely state it in an already long patch. Interested
readers will easily be able to find our discussion here, which is good.

> >  Documentation/technical/pack-format.txt |  17 ++++
> >  builtin/repack.c                        |   1 +
> >  object-store.h                          |   3 +
> >  pack-revindex.c                         | 112 +++++++++++++++++++++---
> >  pack-revindex.h                         |   7 +-
> >  packfile.c                              |  13 ++-
> >  packfile.h                              |   1 +
> >  tmp-objdir.c                            |   4 +-
> >  8 files changed, 145 insertions(+), 13 deletions(-)
>
> Oh, there's a patch here, too. :)

:-).

> It mostly looks good to me. I agree with Junio that "compute" is a
> better verb than "load" for generating the in-memory revindex.

Yeah, I settled on load_pack_revindex() either calling
"create_pack_revindex_in_memory()" or "load_pack_revindex_from_disk()".

> > +static int load_pack_revindex_from_disk(struct packed_git *p)
> > +{
> > +	char *revindex_name;
> > +	int ret;
> > +	if (open_pack_index(p))
> > +		return -1;
> > +
> > +	revindex_name = pack_revindex_filename(p);
> > +
> > +	ret = load_revindex_from_disk(revindex_name,
> > +				      p->num_objects,
> > +				      &p->revindex_map,
> > +				      &p->revindex_size);
> > +	if (ret)
> > +		goto cleanup;
> > +
> > +	p->revindex_data = (char *)p->revindex_map + 12;
>
> Junio mentioned once spot where we lose constness through a cast. This
> is another. I wonder if revindex_map should just be a "char *" to make
> pointer arithmetic easier without having to cast.
>
> But also...
>
> > +	if (p->revindex)
> > +		return p->revindex[pos].nr;
> > +	else
> > +		return get_be32((char *)p->revindex_data + (pos * sizeof(uint32_t)));
>
> If p->revindex_data were "const uint32_t *", then this line would just
> be:
>
>   return get_be32(p->revindex_data + pos);
>
> Not a huge deal either way since the whole point is to abstract this
> behind a function where it only has to be written once. I don't think
> there is any downside from the compiler's view (and we already use this
> trick for the bitmap name-hash cache).

Honestly, I'm not a huge fan of implicitly scaling pos by
sizeof(*p->revindex_data), but I can understand why it reads more
clearly here. I don't really feel strongly either way, so I'm happy to
change it in favor of your suggestion.

Of course, since RIDX_HEADER_SIZE is in bytes, not uint32_t's (and it
has to be, since it's also used in the RIDX_MIN_SIZE macro, which is
compared against the st_size of stating the .rev file), you have to do
gross stuff like:

  p->revindex_data = (const uint32_t *)((const char *)p->revindex_map + RIDX_HEADER_SIZE);

But I guess the tradeoff is worth it, since the readers are easier to
parse.

> Thinking out loud a bit: a .rev file means we're spending an extra map
> per pack (but not a descriptor, since we close after mmap). And like the
> .idx files (but unlike .pack file maps), we don't keep track of these
> and try to close them when under memory pressure. I think that's
> probably OK in terms of bytes. It may mean running up against operating
> system number-of-mmap limits more quickly when you have a very large
> number of packs, as mentioned in:
>
>   https://lore.kernel.org/git/20200601044511.GA2529317@coredump.intra.peff.net/
>
> But this is probably bumping the number of problematic packs from 30k to
> 20k. Both are sufficiently ridiculous that I don't think it matters in
> practice.

Agreed.

> > diff --git a/tmp-objdir.c b/tmp-objdir.c
> > index 42ed4db5d3..da414df14f 100644
> > --- a/tmp-objdir.c
> > +++ b/tmp-objdir.c
> > @@ -187,7 +187,9 @@ static int pack_copy_priority(const char *name)
> >  		return 2;
> >  	if (ends_with(name, ".idx"))
> >  		return 3;
> > -	return 4;
> > +	if (ends_with(name, ".rev"))
> > +		return 4;
> > +	return 5;
> >  }
>
> Probably not super important, but: should the .idx file still come last
> here? Simultaneous readers won't start using the pack until the .idx
> file is present. We'd probably prefer they see the whole thing
> atomically, than see a .idx missing its .rev (they won't ever produce a
> wrong answer, but they'll generate the in-core revindex on the fly when
> they don't need to).
>
> I guess one could argue that .bitmap files should get similar treatment,
> but we'd not generally see those in the quarantine objdir anyway, so
> nobody ever gave it much thought.

Yeah, you're right (.idx files should come last, and probably an
argument to include .bitmap files here, too, exists. I'll leave the
latter as #leftoverbits).

> -Peff

Thanks,
Taylor

^ permalink raw reply	[relevance 5%]

* Re: [PATCH v5 3/3] ls-files.c: add --deduplicate option
  @ 2021-01-21 11:00  6%       ` 胡哲宁
  0 siblings, 0 replies; 200+ results
From: 胡哲宁 @ 2021-01-21 11:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: ZheNing Hu via GitGitGadget, Git List, Eric Sunshine

Junio C Hamano <gitster@pobox.com> 于2021年1月21日周四 上午5:26写道：
>
> "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > @@ -321,30 +324,46 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
> >
> >               construct_fullname(&fullname, repo, ce);
> >
> > +             if (skipping_duplicates && last_shown_ce &&
> > +                     !strcmp(last_shown_ce->name,ce->name))
> > +                             continue;
>
> Style.  Missing SP after comma.
Get it.
>
> >               if ((dir->flags & DIR_SHOW_IGNORED) &&
> >                       !ce_excluded(dir, repo->index, fullname.buf, ce))
> >                       continue;
> >               if (ce->ce_flags & CE_UPDATE)
> >                       continue;
> >               if (show_cached || show_stage) {
> > +                     if (skipping_duplicates && last_shown_ce &&
> > +                             !strcmp(last_shown_ce->name,ce->name))
> > +                                     continue;
>
> OK.  When show_stage is set, skipping_duplicates is automatically
> turned off (and show_unmerged is automatically covered as it turns
> show_stage on automatically).  So this feature has really become
> "are we showing only names, and if so, did we show an entry of the
> same name before?".
Yeah,showing only names,so I yesterday ask such question :)
>
> >                       if (!show_unmerged || ce_stage(ce))
> >                               show_ce(repo, dir, ce, fullname.buf,
> >                                       ce_stage(ce) ? tag_unmerged :
> >                                       (ce_skip_worktree(ce) ? tag_skip_worktree :
> >                                               tag_cached));
> > +                     if (show_cached && skipping_duplicates)
> > +                             last_shown_ce = ce;
>
> The code that calls show_ce() belonging to a totally separate if()
> statement makes my stomach hurt---how are we going to guarantee that
> "last shown" really will keep track of what was shown last?
>
> Shouldn't the above be more like this?
>
> -                       if (!show_unmerged || ce_stage(ce))
> +                       if (!show_unmerged || ce_stage(ce)) {
>                                 show_ce(repo, dir, ce, fullname.buf,
>                                         ce_stage(ce) ? tag_unmerged :
>                                         (ce_skip_worktree(ce) ? tag_skip_worktree :
>                                                 tag_cached));
> +                               last_shown_ce = ce;
> +                       }
>
well,I am also thinking about this question :"last_shown_ce" is not true
last shown ce,but may be If "last_shown_ce" truly seen every last shown
ce ,We may need more cumbersome logic to make the program correct.
I have tried the processing method of your above code before, but found
 that some errors may have occurred.
> It does maintain last_shown_ce even when skipping_duplicates is not
> set, but I think that is overall win.  Assigning unconditionally
> would be cheaper than making a conditional jump on the variable and
> make assignment (or not).
>
> >               }
> >               if (ce_skip_worktree(ce))
> >                       continue;
> > +             if (skipping_duplicates && last_shown_ce &&
> > +                     !strcmp(last_shown_ce->name,ce->name))
> > +                             continue;
>
> Style.  Missing SP after comma.
>
> OK, if we've shown an entry of the same name under skip-duplicates
> mode, and the code that follows will show the same entry (if they
> decide to show it), so we can go to the next entry early.
>
> >               err = lstat(fullname.buf, &st);
> >               if (err) {
> > -                     if (errno != ENOENT && errno != ENOTDIR)
> > -                             error_errno("cannot lstat '%s'", fullname.buf);
> > -                     if (show_deleted)
> > +                     if (skipping_duplicates && show_deleted && show_modified)
> >                               show_ce(repo, dir, ce, fullname.buf, tag_removed);
> > -                     if (show_modified)
> > -                             show_ce(repo, dir, ce, fullname.buf, tag_modified);
> > +                     else {
> > +                             if (errno != ENOENT && errno != ENOTDIR)
> > +                                     error_errno("cannot lstat '%s'", fullname.buf);
> > +                             if (show_deleted)
> > +                                     show_ce(repo, dir, ce, fullname.buf, tag_removed);
> > +                             if (show_modified)
> > +                                     show_ce(repo, dir, ce, fullname.buf, tag_modified);
> > +                     }
> >               } else if (show_modified && ie_modified(repo->index, ce, &st, 0))
> >                       show_ce(repo, dir, ce, fullname.buf, tag_modified);
>
> This part will change shape quite a bit when we follow the
> suggestion I made on 1/3, so I won't analyze how correct this
> version is.
>
Fine...
> > +             last_shown_ce = ce;
> >       }
> >
> >       strbuf_release(&fullname);
> > @@ -571,6 +590,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
> >                       N_("pretend that paths removed since <tree-ish> are still present")),
> >               OPT__ABBREV(&abbrev),
> >               OPT_BOOL(0, "debug", &debug_mode, N_("show debugging data")),
> > +             OPT_BOOL(0,"deduplicate",&skipping_duplicates,N_("suppress duplicate entries")),
> >               OPT_END()
> >       };
> >
> > @@ -610,6 +630,8 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
> >                * you also show the stage information.
> >                */
> >               show_stage = 1;
> > +     if (show_tag || show_stage)
> > +             skipping_duplicates = 0;
>
> OK.
>
> >       if (dir.exclude_per_dir)
> >               exc_given = 1;
> >
>
> Thanks.

Thanks,Junio,I find my PR in gitgitgadget have been accepted.
By the way,
I found the problem "leftoverbit" and "good first issue" on gitgitgadget
It may not have been updated for a long time, and most of the above
may have been resolved.

Should it do an update?
Then we can happily be a "bounty hunter" in the git community, haha!

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v2] use delete_refs when deleting tags or branches
  2021-01-15 18:43  4%   ` Elijah Newren
@ 2021-01-16  2:27  0%     ` Phil Hord
  0 siblings, 0 replies; 200+ results
From: Phil Hord @ 2021-01-16  2:27 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git, Martin Ågren, Junio C Hamano

On Fri, Jan 15, 2021 at 10:43 AM Elijah Newren <newren@gmail.com> wrote:
>
> Hi,
>
> On Thu, Jan 14, 2021 at 6:00 PM Phil Hord <phil.hord@gmail.com> wrote:
> >
> > I noticed this is still only in my local branch.   Can I get an ACK/NAK?
>
> Sorry for missing this when you posted in August.  Thanks for sending
> in the update from v1.
>
> For other reviewers: v1 is over here:
> https://lore.kernel.org/git/20190808035935.30023-1-phil.hord@gmail.com/,
> and has review comments from Martin, me, Peff, and Junio.
>
> > On Tue, Aug 13, 2019 at 7:49 PM Phil Hord <phil.hord@gmail.com> wrote:
> >>
> >> From: Phil Hord <phil.hord@gmail.com>
> >>
> >> 'git tag -d' and 'git branch -d' both accept one or more refs to
> >> delete, but each deletion is done by calling `delete_ref` on each argv.
> >> This is very slow when removing from packed refs as packed-refs is
> >> locked and rewritten each time. Use delete_refs instead so all the
> >> removals can be done inside a single transaction with a single update.
>
> Awesome, thanks for also fixing up git branch with v2.
>
> >> Since delete_refs performs all the packed-refs delete operations
> >> inside a single transaction, if any of the deletes fail then all
> >> them will be skipped. In practice, none of them should fail since
> >> we verify the hash of each one before calling delete_refs, but some
> >> network error or odd permissions problem could have different results
> >> after this change.
> >>
> >> Also, since the file-backed deletions are not performed in the same
> >> transaction, those could succeed even when the packed-refs transaction
> >> fails.
> >>
> >> After deleting refs, report the deletion's success only if the ref was
> >> actually deleted. For branch deletion, remove the branch config only
> >> if the branch ref is actually removed.
> >>
> >> A manual test deleting 24,000 tags took about 30 minutes using
> >> delete_ref.  It takes about 5 seconds using delete_refs.
>
> As I said on v1, it's really nice to have this fixed.  Thanks for doing it.
>
> >>
> >> Signed-off-by: Phil Hord <phil.hord@gmail.com>
> >> ---
> >> This reroll adds the same delete_refs change to 'git branch'. It checks
> >> individual refs after the operation to report correctly on each whether
> >> it was successfully deleted or not. Maybe this is an unnecessary step,
> >> though. This handles the weird case where some file system error
> >> prevented us from deleting refs, leaving us with an error from
> >> delete_refs but without any idea which refs might have been affected.
> >>
> >>  builtin/branch.c | 50 +++++++++++++++++++++++++++++-------------------
> >>  builtin/tag.c    | 45 +++++++++++++++++++++++++++++++++----------
> >>  2 files changed, 65 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/builtin/branch.c b/builtin/branch.c
> >> index 2ef214632f..2273239f41 100644
> >> --- a/builtin/branch.c
> >> +++ b/builtin/branch.c
> >> @@ -202,6 +202,9 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
> >>         int remote_branch = 0;
> >>         struct strbuf bname = STRBUF_INIT;
> >>         unsigned allowed_interpret;
> >> +       struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
> >> +       struct string_list_item *item;
> >> +       int refname_pos = 0;
> >>
> >>         switch (kinds) {
> >>         case FILTER_REFS_REMOTES:
> >> @@ -209,12 +212,13 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
> >>                 /* For subsequent UI messages */
> >>                 remote_branch = 1;
> >>                 allowed_interpret = INTERPRET_BRANCH_REMOTE;
> >> -
> >> +               refname_pos = 13;
> >>                 force = 1;
> >>                 break;
> >>         case FILTER_REFS_BRANCHES:
> >>                 fmt = "refs/heads/%s";
> >>                 allowed_interpret = INTERPRET_BRANCH_LOCAL;
> >> +               refname_pos = 11;
> >>                 break;
> >>         default:
> >>                 die(_("cannot use -a with -d"));
> >> @@ -265,30 +269,36 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
> >>                         goto next;
> >>                 }
> >>
> >> -               if (delete_ref(NULL, name, is_null_oid(&oid) ? NULL : &oid,
> >> -                              REF_NO_DEREF)) {
> >> -                       error(remote_branch
> >> -                             ? _("Error deleting remote-tracking branch '%s'")
> >> -                             : _("Error deleting branch '%s'"),
> >> -                             bname.buf);
> >> -                       ret = 1;
> >> -                       goto next;
>
> The code used to set the return code to 1 if it failed to delete a branch
>
> >> -               }
> >> -               if (!quiet) {
> >> -                       printf(remote_branch
> >> -                              ? _("Deleted remote-tracking branch %s (was %s).\n")
> >> -                              : _("Deleted branch %s (was %s).\n"),
> >> -                              bname.buf,
> >> -                              (flags & REF_ISBROKEN) ? "broken"
> >> -                              : (flags & REF_ISSYMREF) ? target
> >> -                              : find_unique_abbrev(&oid, DEFAULT_ABBREV));
> >> -               }
> >> -               delete_branch_config(bname.buf);
> >> +               item = string_list_append(&refs_to_delete, name);
> >> +               item->util = xstrdup((flags & REF_ISBROKEN) ? "broken"
> >> +                                   : (flags & REF_ISSYMREF) ? target
> >> +                                   : find_unique_abbrev(&oid, DEFAULT_ABBREV));
> >>
> >>         next:
> >>                 free(target);
> >>         }
> >>
> >> +       delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);
> >> +
> >> +       for_each_string_list_item(item, &refs_to_delete) {
> >> +               char * describe_ref = item->util;
> >> +               char * name = item->string;
> >> +               if (ref_exists(name))
> >> +                       ret = 1;
>
> Now it sets the return code if the branch still exists after trying to
> delete.  I thought that was subtly different...but I tried doing a
> branch deletion of a non-existent branch since I thought that would be
> the only difference -- however, that errors out earlier in the
> codepath before even getting to the stage of deleting refs.  So I
> think these are effectively the same.
>
> >> +               else {
> >> +                       char * refname = name + refname_pos;
> >> +                       if (!quiet)
> >> +                               printf(remote_branch
> >> +                                       ? _("Deleted remote-tracking branch %s (was %s).\n")
> >> +                                       : _("Deleted branch %s (was %s).\n"),
> >> +                                       name + refname_pos, describe_ref);
>
> Neither remote_branch nor refname_pos are changing throughout this
> loop, which I at first thought was in error, but it looks like git
> branch only allows you to delete one type or the other -- not a
> mixture.  So this is correct.
>
> >> +
> >> +                       delete_branch_config(refname);
> >> +               }
> >> +               free(describe_ref);
> >> +       }
> >> +       string_list_clear(&refs_to_delete, 0);
> >> +
> >>         free(name);
> >>         strbuf_release(&bname);
> >>
> >> diff --git a/builtin/tag.c b/builtin/tag.c
> >> index e0a4c25382..0d11ffcd04 100644
> >> --- a/builtin/tag.c
> >> +++ b/builtin/tag.c
> >> @@ -72,10 +72,10 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >>  }
> >>
> >>  typedef int (*each_tag_name_fn)(const char *name, const char *ref,
> >> -                               const struct object_id *oid, const void *cb_data);
> >> +                               const struct object_id *oid, void *cb_data);
> >>
> >>  static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
> >> -                            const void *cb_data)
> >> +                            void *cb_data)
> >>  {
> >>         const char **p;
> >>         struct strbuf ref = STRBUF_INIT;
> >> @@ -97,18 +97,43 @@ static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
> >>         return had_error;
> >>  }
> >>
> >> -static int delete_tag(const char *name, const char *ref,
> >> -                     const struct object_id *oid, const void *cb_data)
> >> +static int collect_tags(const char *name, const char *ref,
> >> +                       const struct object_id *oid, void *cb_data)
> >>  {
> >> -       if (delete_ref(NULL, ref, oid, 0))
> >> -               return 1;
>
> This used to return 1 if it failed to delete a ref.
>
> >> -       printf(_("Deleted tag '%s' (was %s)\n"), name,
> >> -              find_unique_abbrev(oid, DEFAULT_ABBREV));
> >> +       struct string_list *ref_list = cb_data;
> >> +
> >> +       string_list_append(ref_list, ref);
> >> +       ref_list->items[ref_list->nr - 1].util = oiddup(oid);
> >>         return 0;
>
> Now it unconditionally returns 0.
>
> >>  }
> >>
> >> +static int delete_tags(const char **argv)
> >> +{
> >> +       int result;
> >> +       struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
> >> +       struct string_list_item *item;
> >> +
> >> +       result = for_each_tag_name(argv, collect_tags, (void *)&refs_to_delete);
> >> +       delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);
>
> You now only look at the result of collecting the tags, and ignore the
> result of trying to delete them...
>
> >> +
> >> +       for_each_string_list_item(item, &refs_to_delete) {
> >> +               const char * name = item->string;
> >> +               struct object_id * oid = item->util;
> >> +               if (ref_exists(name))
> >> +                       result = 1;
>
> ...except that you check if the refs still exist afterward and set the
> return code based on it.  Like with the branch case, I can't come up
> with a case where the difference matters.  I suspect there's a race
> condition there somewhere, but once you start going down that road I
> think the old code may have had a bunch of races too.  It might be
> nice to document with a comment that there's a small race condition
> with someone else trying to forcibly re-create the ref at the same
> time you are trying to delete, but I don't think it's a big deal.
>
> If you did use the result of delete_refs(), you might have to double
> check that the callers (git.c:handle_builtin() -> git.c:run_builtin()
> -> builtin/tag.c:cmd_tag() -> builtin/tag.c:delete_tags()) are all
> okay with the return code; it looks like handle_builtin() would pass
> the return code to exit() and the git-tag manpage doesn't document the
> return status, so you've at least got some leeway in terms of what
> values are acceptable.  Or you could just normalize the return value
> of delete_refs() down to 0 or 1.  But you'd only need to worry about
> that if the race condition is something we're worried enough to
> tackle.

Interesting. I was worried about imposing a requirement on delete_refs
that any non-zero return must mean that something was not deleted
which should have been. Maybe that's not such a worry, though, and it
would be acceptable to return a 1 even if all the refs were deleted
even though some error occurred further down the line.

I tried normalizing the value and then also verifying each ref was
removed, but that seemed wrong.  Maybe it's ok to just normalize it
and not react to still-existing refs.

> >> +               else
> >> +                       printf(_("Deleted tag '%s' (was %s)\n"),
> >> +                               item->string + 10,
> >> +                               find_unique_abbrev(oid, DEFAULT_ABBREV));
> >> +
> >> +               free(oid);
> >> +       }
> >> +       string_list_clear(&refs_to_delete, 0);
> >> +       return result;
> >> +}
> >> +
> >>  static int verify_tag(const char *name, const char *ref,
> >> -                     const struct object_id *oid, const void *cb_data)
> >> +                     const struct object_id *oid, void *cb_data)
> >>  {
> >>         int flags;
> >>         const struct ref_format *format = cb_data;
> >> @@ -511,7 +536,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
> >>         if (filter.merge_commit)
> >>                 die(_("--merged and --no-merged options are only allowed in list mode"));
> >>         if (cmdmode == 'd')
> >> -               return for_each_tag_name(argv, delete_tag, NULL);
> >> +               return delete_tags(argv);
> >>         if (cmdmode == 'v') {
> >>                 if (format.format && verify_ref_format(&format))
> >>                         usage_with_options(git_tag_usage, options);
> >> --
> >> 2.23.0.rc1.174.g4cc1b04b4c
>
> Overall, I like the patch.  Peff commented on v1 that the basic idea
> (use the part of the refs API that batches operations) is the right
> thing to do.  I'm not that familiar with refs-touching code, but your
> patch makes sense to me.  I think I spotted a minor issue (you ignore
> the return status of delete_refs(), then later check the existence of
> the refs afterwards to determine success, which I believe is a minor
> and unlikely race condition), but I'm not sure it's worth fixing;
> perhaps just mark it with #leftoverbits and move on -- the faster
> branch and tag deletion is a very nice improvement.
>
> I notice Martin said on v1 that there was a testcase that had problems
> with your patch; I tested v2 and it looks like you fixed any such
> issues.  I think you also addressed the feedback from Junio, though
> his comments about the return code and the minor race condition I
> noticed around it might mean it'd be good to get his comments.
>
> Anyway,
> Acked-by: Elijah Newren <newren@gmail.com>
>
> I would say Reviewed-by, but I'd like to get Junio's comments on the
> return code and minor race.

Thanks for the detailed review and thoughts.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] use delete_refs when deleting tags or branches
       [not found]     ` <CABURp0rtkmo7MSSCVrdNXT0UzV9XqV_kXOGkC23C+_vMENNJUg@mail.gmail.com>
@ 2021-01-15 18:43  4%   ` Elijah Newren
  2021-01-16  2:27  0%     ` Phil Hord
  0 siblings, 1 reply; 200+ results
From: Elijah Newren @ 2021-01-15 18:43 UTC (permalink / raw)
  To: Phil Hord; +Cc: Git, Martin Ågren, Junio C Hamano

Hi,

On Thu, Jan 14, 2021 at 6:00 PM Phil Hord <phil.hord@gmail.com> wrote:
>
> I noticed this is still only in my local branch.   Can I get an ACK/NAK?

Sorry for missing this when you posted in August.  Thanks for sending
in the update from v1.

For other reviewers: v1 is over here:
https://lore.kernel.org/git/20190808035935.30023-1-phil.hord@gmail.com/,
and has review comments from Martin, me, Peff, and Junio.

> On Tue, Aug 13, 2019 at 7:49 PM Phil Hord <phil.hord@gmail.com> wrote:
>>
>> From: Phil Hord <phil.hord@gmail.com>
>>
>> 'git tag -d' and 'git branch -d' both accept one or more refs to
>> delete, but each deletion is done by calling `delete_ref` on each argv.
>> This is very slow when removing from packed refs as packed-refs is
>> locked and rewritten each time. Use delete_refs instead so all the
>> removals can be done inside a single transaction with a single update.

Awesome, thanks for also fixing up git branch with v2.

>> Since delete_refs performs all the packed-refs delete operations
>> inside a single transaction, if any of the deletes fail then all
>> them will be skipped. In practice, none of them should fail since
>> we verify the hash of each one before calling delete_refs, but some
>> network error or odd permissions problem could have different results
>> after this change.
>>
>> Also, since the file-backed deletions are not performed in the same
>> transaction, those could succeed even when the packed-refs transaction
>> fails.
>>
>> After deleting refs, report the deletion's success only if the ref was
>> actually deleted. For branch deletion, remove the branch config only
>> if the branch ref is actually removed.
>>
>> A manual test deleting 24,000 tags took about 30 minutes using
>> delete_ref.  It takes about 5 seconds using delete_refs.

As I said on v1, it's really nice to have this fixed.  Thanks for doing it.

>>
>> Signed-off-by: Phil Hord <phil.hord@gmail.com>
>> ---
>> This reroll adds the same delete_refs change to 'git branch'. It checks
>> individual refs after the operation to report correctly on each whether
>> it was successfully deleted or not. Maybe this is an unnecessary step,
>> though. This handles the weird case where some file system error
>> prevented us from deleting refs, leaving us with an error from
>> delete_refs but without any idea which refs might have been affected.
>>
>>  builtin/branch.c | 50 +++++++++++++++++++++++++++++-------------------
>>  builtin/tag.c    | 45 +++++++++++++++++++++++++++++++++----------
>>  2 files changed, 65 insertions(+), 30 deletions(-)
>>
>> diff --git a/builtin/branch.c b/builtin/branch.c
>> index 2ef214632f..2273239f41 100644
>> --- a/builtin/branch.c
>> +++ b/builtin/branch.c
>> @@ -202,6 +202,9 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>>         int remote_branch = 0;
>>         struct strbuf bname = STRBUF_INIT;
>>         unsigned allowed_interpret;
>> +       struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
>> +       struct string_list_item *item;
>> +       int refname_pos = 0;
>>
>>         switch (kinds) {
>>         case FILTER_REFS_REMOTES:
>> @@ -209,12 +212,13 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>>                 /* For subsequent UI messages */
>>                 remote_branch = 1;
>>                 allowed_interpret = INTERPRET_BRANCH_REMOTE;
>> -
>> +               refname_pos = 13;
>>                 force = 1;
>>                 break;
>>         case FILTER_REFS_BRANCHES:
>>                 fmt = "refs/heads/%s";
>>                 allowed_interpret = INTERPRET_BRANCH_LOCAL;
>> +               refname_pos = 11;
>>                 break;
>>         default:
>>                 die(_("cannot use -a with -d"));
>> @@ -265,30 +269,36 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>>                         goto next;
>>                 }
>>
>> -               if (delete_ref(NULL, name, is_null_oid(&oid) ? NULL : &oid,
>> -                              REF_NO_DEREF)) {
>> -                       error(remote_branch
>> -                             ? _("Error deleting remote-tracking branch '%s'")
>> -                             : _("Error deleting branch '%s'"),
>> -                             bname.buf);
>> -                       ret = 1;
>> -                       goto next;

The code used to set the return code to 1 if it failed to delete a branch

>> -               }
>> -               if (!quiet) {
>> -                       printf(remote_branch
>> -                              ? _("Deleted remote-tracking branch %s (was %s).\n")
>> -                              : _("Deleted branch %s (was %s).\n"),
>> -                              bname.buf,
>> -                              (flags & REF_ISBROKEN) ? "broken"
>> -                              : (flags & REF_ISSYMREF) ? target
>> -                              : find_unique_abbrev(&oid, DEFAULT_ABBREV));
>> -               }
>> -               delete_branch_config(bname.buf);
>> +               item = string_list_append(&refs_to_delete, name);
>> +               item->util = xstrdup((flags & REF_ISBROKEN) ? "broken"
>> +                                   : (flags & REF_ISSYMREF) ? target
>> +                                   : find_unique_abbrev(&oid, DEFAULT_ABBREV));
>>
>>         next:
>>                 free(target);
>>         }
>>
>> +       delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);
>> +
>> +       for_each_string_list_item(item, &refs_to_delete) {
>> +               char * describe_ref = item->util;
>> +               char * name = item->string;
>> +               if (ref_exists(name))
>> +                       ret = 1;

Now it sets the return code if the branch still exists after trying to
delete.  I thought that was subtly different...but I tried doing a
branch deletion of a non-existent branch since I thought that would be
the only difference -- however, that errors out earlier in the
codepath before even getting to the stage of deleting refs.  So I
think these are effectively the same.

>> +               else {
>> +                       char * refname = name + refname_pos;
>> +                       if (!quiet)
>> +                               printf(remote_branch
>> +                                       ? _("Deleted remote-tracking branch %s (was %s).\n")
>> +                                       : _("Deleted branch %s (was %s).\n"),
>> +                                       name + refname_pos, describe_ref);

Neither remote_branch nor refname_pos are changing throughout this
loop, which I at first thought was in error, but it looks like git
branch only allows you to delete one type or the other -- not a
mixture.  So this is correct.

>> +
>> +                       delete_branch_config(refname);
>> +               }
>> +               free(describe_ref);
>> +       }
>> +       string_list_clear(&refs_to_delete, 0);
>> +
>>         free(name);
>>         strbuf_release(&bname);
>>
>> diff --git a/builtin/tag.c b/builtin/tag.c
>> index e0a4c25382..0d11ffcd04 100644
>> --- a/builtin/tag.c
>> +++ b/builtin/tag.c
>> @@ -72,10 +72,10 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
>>  }
>>
>>  typedef int (*each_tag_name_fn)(const char *name, const char *ref,
>> -                               const struct object_id *oid, const void *cb_data);
>> +                               const struct object_id *oid, void *cb_data);
>>
>>  static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
>> -                            const void *cb_data)
>> +                            void *cb_data)
>>  {
>>         const char **p;
>>         struct strbuf ref = STRBUF_INIT;
>> @@ -97,18 +97,43 @@ static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
>>         return had_error;
>>  }
>>
>> -static int delete_tag(const char *name, const char *ref,
>> -                     const struct object_id *oid, const void *cb_data)
>> +static int collect_tags(const char *name, const char *ref,
>> +                       const struct object_id *oid, void *cb_data)
>>  {
>> -       if (delete_ref(NULL, ref, oid, 0))
>> -               return 1;

This used to return 1 if it failed to delete a ref.

>> -       printf(_("Deleted tag '%s' (was %s)\n"), name,
>> -              find_unique_abbrev(oid, DEFAULT_ABBREV));
>> +       struct string_list *ref_list = cb_data;
>> +
>> +       string_list_append(ref_list, ref);
>> +       ref_list->items[ref_list->nr - 1].util = oiddup(oid);
>>         return 0;

Now it unconditionally returns 0.

>>  }
>>
>> +static int delete_tags(const char **argv)
>> +{
>> +       int result;
>> +       struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
>> +       struct string_list_item *item;
>> +
>> +       result = for_each_tag_name(argv, collect_tags, (void *)&refs_to_delete);
>> +       delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);

You now only look at the result of collecting the tags, and ignore the
result of trying to delete them...

>> +
>> +       for_each_string_list_item(item, &refs_to_delete) {
>> +               const char * name = item->string;
>> +               struct object_id * oid = item->util;
>> +               if (ref_exists(name))
>> +                       result = 1;

...except that you check if the refs still exist afterward and set the
return code based on it.  Like with the branch case, I can't come up
with a case where the difference matters.  I suspect there's a race
condition there somewhere, but once you start going down that road I
think the old code may have had a bunch of races too.  It might be
nice to document with a comment that there's a small race condition
with someone else trying to forcibly re-create the ref at the same
time you are trying to delete, but I don't think it's a big deal.

If you did use the result of delete_refs(), you might have to double
check that the callers (git.c:handle_builtin() -> git.c:run_builtin()
-> builtin/tag.c:cmd_tag() -> builtin/tag.c:delete_tags()) are all
okay with the return code; it looks like handle_builtin() would pass
the return code to exit() and the git-tag manpage doesn't document the
return status, so you've at least got some leeway in terms of what
values are acceptable.  Or you could just normalize the return value
of delete_refs() down to 0 or 1.  But you'd only need to worry about
that if the race condition is something we're worried enough to
tackle.

>> +               else
>> +                       printf(_("Deleted tag '%s' (was %s)\n"),
>> +                               item->string + 10,
>> +                               find_unique_abbrev(oid, DEFAULT_ABBREV));
>> +
>> +               free(oid);
>> +       }
>> +       string_list_clear(&refs_to_delete, 0);
>> +       return result;
>> +}
>> +
>>  static int verify_tag(const char *name, const char *ref,
>> -                     const struct object_id *oid, const void *cb_data)
>> +                     const struct object_id *oid, void *cb_data)
>>  {
>>         int flags;
>>         const struct ref_format *format = cb_data;
>> @@ -511,7 +536,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
>>         if (filter.merge_commit)
>>                 die(_("--merged and --no-merged options are only allowed in list mode"));
>>         if (cmdmode == 'd')
>> -               return for_each_tag_name(argv, delete_tag, NULL);
>> +               return delete_tags(argv);
>>         if (cmdmode == 'v') {
>>                 if (format.format && verify_ref_format(&format))
>>                         usage_with_options(git_tag_usage, options);
>> --
>> 2.23.0.rc1.174.g4cc1b04b4c

Overall, I like the patch.  Peff commented on v1 that the basic idea
(use the part of the refs API that batches operations) is the right
thing to do.  I'm not that familiar with refs-touching code, but your
patch makes sense to me.  I think I spotted a minor issue (you ignore
the return status of delete_refs(), then later check the existence of
the refs afterwards to determine success, which I believe is a minor
and unlikely race condition), but I'm not sure it's worth fixing;
perhaps just mark it with #leftoverbits and move on -- the faster
branch and tag deletion is a very nice improvement.

I notice Martin said on v1 that there was a testcase that had problems
with your patch; I tested v2 and it looks like you fixed any such
issues.  I think you also addressed the feedback from Junio, though
his comments about the return code and the minor race condition I
noticed around it might mean it'd be good to get his comments.

Anyway,
Acked-by: Elijah Newren <newren@gmail.com>

I would say Reviewed-by, but I'd like to get Junio's comments on the
return code and minor race.

^ permalink raw reply	[relevance 4%]

* Re: [PATCH v5 00/11] [GSoC] Implement Corrected Commit Date
  2020-12-30  4:35  4%   ` Derrick Stolee
@ 2021-01-10 14:06  0%     ` Abhishek Kumar
  0 siblings, 0 replies; 200+ results
From: Abhishek Kumar @ 2021-01-10 14:06 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: abhishekkumar8222, git, gitgitgadget, jnareb, me

On Tue, Dec 29, 2020 at 11:35:56PM -0500, Derrick Stolee wrote:
> On 12/28/2020 6:15 AM, Abhishek Kumar via GitGitGadget wrote:
> > This patch series implements the corrected commit date offsets as generation
> > number v2, along with other pre-requisites.
> 
> Abhishek,
> 
> Thank you for this version. I appreciate your hard work on this topic,
> especially after GSoC ended and you returned to being a full-time student.
> 
> My hope was that I could completely approve this series and only provide
> forward-fixes from here on out, as necessary. I think there are a few minor
> typos that you might want to address, but I was also able to understand your
> intention.
> 
> I did make a particular case about a SEGFAULT I hit that I have been unable
> to replicate. I saw it both in my copy of torvalds/linux and of
> chromium/chromium. I have the file for chromium/chromium that is in a bad
> state where a GDAT value includes the bit saying it should be in the long
> offsets chunk, but that chunk doesn't exist. Further, that chunk doesn't
> exist in a from-scratch write.

I hope validating mixed generation chain while writing as well was
enough to fix the SEGFAULT.

>
> I'm now taking backups of my existing commit-graph files before any later
> test, but it doesn't repro for my Git repository or any other repo I try on
> purpose.
> 
> However, I did some performance testing to double-check your numbers. I sent
> a patch [1] that helps with some of the hard numbers.
> 
> [1] https://lore.kernel.org/git/pull.828.git.1609302714183.gitgitgadget@gmail.com/
> 
> The big question is whether the overhead from using a slab to store the
> generation values is worth it. I still think it is, for these reasons:
> 
> 1. Generation number v2 is measurably better than v1 in most user cases.
> 
> 2. Generation number v2 is slower than using committer date due to the
>    overhead, but _guarantees correctness_.
> 
> I like to use "git log --graph -<N>" to compare against topological levels
> (v1), for various levels of <N>. When <N> is small, we hope to minimize
> the amount we need to walk using the extra commit-date information as an
> assistance. Repos like git/git and torvalds/linux use the philosophy of
> "base your changes on oldest applicable commit" enough that v1 struggles
> sometimes.
> 
> git/git: N=1000
> 
> 	Benchmark #1: baseline
> 	Time (mean ± σ):     100.3 ms ±   4.2 ms    [User: 89.0 ms, System: 11.3 ms]
> 	Range (min … max):    94.5 ms … 105.1 ms    28 runs
> 	
> 	Benchmark #2: test
> 	Time (mean ± σ):      35.8 ms ±   3.1 ms    [User: 29.6 ms, System: 6.2 ms]
> 	Range (min … max):    29.8 ms …  40.6 ms    81 runs
> 	
> 	Summary
> 	'test' ran
> 	2.80 ± 0.27 times faster than 'baseline'
> 
> This is a dramatic improvement! Using my topo-walk stats commit, I see that
> v1 walks 58,805 commits as part of the in-degree walk while v2 only walks
> 4,335 commits!
> 
> torvalds/linux: N=1000 (starting at v5.10)
> 
> 	Benchmark #1: baseline
> 	Time (mean ± σ):      90.8 ms ±   3.7 ms    [User: 75.2 ms, System: 15.6 ms]
> 	Range (min … max):    85.2 ms …  96.2 ms    31 runs
> 	
> 	Benchmark #2: test
> 	Time (mean ± σ):      49.2 ms ±   3.5 ms    [User: 36.9 ms, System: 12.3 ms]
> 	Range (min … max):    42.9 ms …  54.0 ms    61 runs
> 	
> 	Summary
> 	'test' ran
> 	1.85 ± 0.15 times faster than 'baseline'
> 
> Similarly, v1 walked 38,161 commits compared to 4,340 by v2.
> 
> If I increase N to something like 10,000, then usually these values get
> washed out due to the width of the parallel topics.

That's not too bad, as large N would be needed rather infrequently.

> 
> The place we were still using commit-date as a heuristic was paint_down_to_common
> which caused a regression the first time we used v1, at least for certain cases.
> 
> Specifically, computing the merge-base in torvalds/linux between v4.8 and v4.9
> hit a strangeness about a pair of recent commits both based on a very old commit,
> but the generation numbers forced walking farther than necessary. This doesn't
> happen with v2, but we see the overhead cost of the slabs:
> 
> 	Benchmark #1: baseline
> 	Time (mean ± σ):     112.9 ms ±   2.8 ms    [User: 96.5 ms, System: 16.3 ms]
> 	Range (min … max):   107.7 ms … 118.0 ms    26 runs
> 	
> 	Benchmark #2: test
> 	Time (mean ± σ):     147.1 ms ±   5.2 ms    [User: 132.7 ms, System: 14.3 ms]
> 	Range (min … max):   141.4 ms … 162.2 ms    18 runs
> 	
> 	Summary
> 	'baseline' ran
> 	1.30 ± 0.06 times faster than 'test'
> 
> The overhead still exists for a more recent pair of versions (v5.0 and v5.1):
> 
> 	Benchmark #1: baseline
> 	Time (mean ± σ):      25.1 ms ±   3.2 ms    [User: 18.6 ms, System: 6.5 ms]
> 	Range (min … max):    19.0 ms …  32.8 ms    99 runs
> 	
> 	Benchmark #2: test
> 	Time (mean ± σ):      33.3 ms ±   3.3 ms    [User: 26.5 ms, System: 6.9 ms]
> 	Range (min … max):    27.0 ms …  38.4 ms    105 runs
> 	
> 	Summary
> 	'baseline' ran
> 	1.33 ± 0.22 times faster than 'test'
> 
> I still think this overhead is worth it. In case not everyone agrees, it _might_
> be worth a command-line option to skip the GDAT chunk. That also prevents an
> ability to eventually wean entirely of generation number v1 and allow the commit
> date to take the full 64-bit column (instead of only 34 bits, saving 30 for
> topo-levels).

Thank you for the detailed benchmarking and discussion. 

I don't think there is any disagreement on utility of corrected commit
dates so far. 

We will run out of 34-bits for the commit date by the year 2514, so I
am not exactly worried about weaning of generation number v1 anytime
soon.

> 
> Again, such a modification should not be considered required for this series.
> 
> > ----------------------------------------------------------------------------
> > 
> > Improvements left for a future series:
> > 
> >  * Save commits with generation data overflow and extra edge commits instead
> >    of looping over all commits. cf. 858sbel67n.fsf@gmail.com
> >  * Verify both topological levels and corrected commit dates when present.
> >    cf. 85pn4tnk8u.fsf@gmail.com
> 
> These seem like reasonable things to delay for a later series
> or for #leftoverbits
> 
> Thanks,
> -Stolee
> 

Thanks
- Abhishek

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v5 00/11] [GSoC] Implement Corrected Commit Date
  @ 2020-12-30  4:35  4%   ` Derrick Stolee
  2021-01-10 14:06  0%     ` Abhishek Kumar
  0 siblings, 1 reply; 200+ results
From: Derrick Stolee @ 2020-12-30  4:35 UTC (permalink / raw)
  To: Abhishek Kumar via GitGitGadget, git
  Cc: Jakub Narębski, Taylor Blau, Abhishek Kumar

On 12/28/2020 6:15 AM, Abhishek Kumar via GitGitGadget wrote:
> This patch series implements the corrected commit date offsets as generation
> number v2, along with other pre-requisites.

Abhishek,

Thank you for this version. I appreciate your hard work on this topic,
especially after GSoC ended and you returned to being a full-time student.

My hope was that I could completely approve this series and only provide
forward-fixes from here on out, as necessary. I think there are a few minor
typos that you might want to address, but I was also able to understand your
intention.

I did make a particular case about a SEGFAULT I hit that I have been unable
to replicate. I saw it both in my copy of torvalds/linux and of
chromium/chromium. I have the file for chromium/chromium that is in a bad
state where a GDAT value includes the bit saying it should be in the long
offsets chunk, but that chunk doesn't exist. Further, that chunk doesn't
exist in a from-scratch write.

I'm now taking backups of my existing commit-graph files before any later
test, but it doesn't repro for my Git repository or any other repo I try on
purpose.

However, I did some performance testing to double-check your numbers. I sent
a patch [1] that helps with some of the hard numbers.

[1] https://lore.kernel.org/git/pull.828.git.1609302714183.gitgitgadget@gmail.com/

The big question is whether the overhead from using a slab to store the
generation values is worth it. I still think it is, for these reasons:

1. Generation number v2 is measurably better than v1 in most user cases.

2. Generation number v2 is slower than using committer date due to the
   overhead, but _guarantees correctness_.

I like to use "git log --graph -<N>" to compare against topological levels
(v1), for various levels of <N>. When <N> is small, we hope to minimize
the amount we need to walk using the extra commit-date information as an
assistance. Repos like git/git and torvalds/linux use the philosophy of
"base your changes on oldest applicable commit" enough that v1 struggles
sometimes.

git/git: N=1000

	Benchmark #1: baseline
	Time (mean ± σ):     100.3 ms ±   4.2 ms    [User: 89.0 ms, System: 11.3 ms]
	Range (min … max):    94.5 ms … 105.1 ms    28 runs

	Benchmark #2: test
	Time (mean ± σ):      35.8 ms ±   3.1 ms    [User: 29.6 ms, System: 6.2 ms]
	Range (min … max):    29.8 ms …  40.6 ms    81 runs

	Summary
	'test' ran
	2.80 ± 0.27 times faster than 'baseline'

This is a dramatic improvement! Using my topo-walk stats commit, I see that
v1 walks 58,805 commits as part of the in-degree walk while v2 only walks
4,335 commits!

torvalds/linux: N=1000 (starting at v5.10)

	Benchmark #1: baseline
	Time (mean ± σ):      90.8 ms ±   3.7 ms    [User: 75.2 ms, System: 15.6 ms]
	Range (min … max):    85.2 ms …  96.2 ms    31 runs

	Benchmark #2: test
	Time (mean ± σ):      49.2 ms ±   3.5 ms    [User: 36.9 ms, System: 12.3 ms]
	Range (min … max):    42.9 ms …  54.0 ms    61 runs

	Summary
	'test' ran
	1.85 ± 0.15 times faster than 'baseline'

Similarly, v1 walked 38,161 commits compared to 4,340 by v2.

If I increase N to something like 10,000, then usually these values get
washed out due to the width of the parallel topics.

The place we were still using commit-date as a heuristic was paint_down_to_common
which caused a regression the first time we used v1, at least for certain cases.

Specifically, computing the merge-base in torvalds/linux between v4.8 and v4.9
hit a strangeness about a pair of recent commits both based on a very old commit,
but the generation numbers forced walking farther than necessary. This doesn't
happen with v2, but we see the overhead cost of the slabs:

	Benchmark #1: baseline
	Time (mean ± σ):     112.9 ms ±   2.8 ms    [User: 96.5 ms, System: 16.3 ms]
	Range (min … max):   107.7 ms … 118.0 ms    26 runs

	Benchmark #2: test
	Time (mean ± σ):     147.1 ms ±   5.2 ms    [User: 132.7 ms, System: 14.3 ms]
	Range (min … max):   141.4 ms … 162.2 ms    18 runs

	Summary
	'baseline' ran
	1.30 ± 0.06 times faster than 'test'

The overhead still exists for a more recent pair of versions (v5.0 and v5.1):

	Benchmark #1: baseline
	Time (mean ± σ):      25.1 ms ±   3.2 ms    [User: 18.6 ms, System: 6.5 ms]
	Range (min … max):    19.0 ms …  32.8 ms    99 runs

	Benchmark #2: test
	Time (mean ± σ):      33.3 ms ±   3.3 ms    [User: 26.5 ms, System: 6.9 ms]
	Range (min … max):    27.0 ms …  38.4 ms    105 runs

	Summary
	'baseline' ran
	1.33 ± 0.22 times faster than 'test'

I still think this overhead is worth it. In case not everyone agrees, it _might_
be worth a command-line option to skip the GDAT chunk. That also prevents an
ability to eventually wean entirely of generation number v1 and allow the commit
date to take the full 64-bit column (instead of only 34 bits, saving 30 for
topo-levels).

Again, such a modification should not be considered required for this series.

> ----------------------------------------------------------------------------
> 
> Improvements left for a future series:
> 
>  * Save commits with generation data overflow and extra edge commits instead
>    of looping over all commits. cf. 858sbel67n.fsf@gmail.com
>  * Verify both topological levels and corrected commit dates when present.
>    cf. 85pn4tnk8u.fsf@gmail.com

These seem like reasonable things to delay for a later series
or for #leftoverbits

Thanks,
-Stolee

^ permalink raw reply	[relevance 4%]

* Re: [PATCH 1/2] pack-format.txt: define "varint" format
  @ 2020-12-29 22:41  6%       ` Martin Ågren
  0 siblings, 0 replies; 200+ results
From: Martin Ågren @ 2020-12-29 22:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ross Light, Git Mailing List

On Mon, 21 Dec 2020 at 22:40, Junio C Hamano <gitster@pobox.com> wrote:
>
> Martin Ågren <martin.agren@gmail.com> writes:
>
> > We define our varint format pretty much on the fly as we describe a pack
> > file entry. In preparation for referring to it in more places in this
> > document, define "varint" and refer to it.

> We need to be careful when using a generic "varint" to mean the
> older variant as it would confuse readers of OFS_DELTA section.
>
>         ... goes and looks ...
>
> The phrase "offset encoding" is used in the document to talk about
> OFS_DELTA offset.  It is actually what the rest of the code thinks
> is the canonical varint defined in varint.[ch]).
>
> A way to avoid confusion would be to refrain from using "varint" as
> the primary way to describe this size field; instead explain it as
> the "size encoding", to match "offset encoding" used for OFS_DELTA.

Thank you very much for these comments. I will post a v2 soon, which
will do exactly this: avoid "varint" in favor of "size encoding".

> It may also help if we added to the description of "offset encoding"
> that it is what other parts of the system consider the canonical
> "varint" encoding.

I will leave this as #leftoverbits, though. I'll "only" fix the omission
reported by Ross.

Martin

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v5 1/1] mergetool: add automerge configuration
  2020-12-24  0:32  6%           ` Junio C Hamano
@ 2020-12-24  1:36  0%             ` Felipe Contreras
  0 siblings, 0 replies; 200+ results
From: Felipe Contreras @ 2020-12-24  1:36 UTC (permalink / raw)
  To: Junio C Hamano, Felipe Contreras
  Cc: git, David Aguilar, Johannes Sixt, Seth House

Junio C Hamano wrote:
> Felipe Contreras <felipe.contreras@gmail.com> writes:
> 
> >> Ah, I forgot about that one.  I think "the number of conflicts" was
> >> a UI mistake (the original that it mimics is "merge" from RCS suite,
> >> which uses 1 and 2 for "conflicts" and "trouble") but we know we
> >> will get conflicts, so it is wrong to expect success from the
> >> command.  Deliberately ignoring the return status is the right thing
> >> to do.
> >
> > I agree. My bet is that nobody is checking the return status of "git
> > merge-file" to find out the number of conflicts. Plus, how can you check
> > the difference between 255 conflicts and error -1?
> 
> Yup, I already mentioned UI mistake so you do not have to repeat

You said it was a UI mistake, not me. I am a different mind than yours.

This [1] is the first time *you* communicated it was a UI mistake.

This [2] is the first time *I* communicated it was a UI mistake.

I communicated that fact after you, so I did not repeat anything,
because I hadn't said that before. *You* did, not *me*.

> it to consume more bandwidth.

This is what is consuming bandwidth.

Not me stating *for the first time* that I agree what you just stated.

You could have skipped what I said *for the first time*, if you didn't
find it particularly interesting, and that would have saved bandwidth.

> > We could do something like --marker-size=13 to minimize the chances of
> > that happening.
> >
> > In that case I would prefer '/^<\{13\} /' (to avoid too many
> > characters). I see those regexes used elsewhere in git, but I don't know
> > how portable that is.
> 
> If it is used elsewhere with "sed", then that would be OK, but if it
> is not with "sed" but with "grep", that's quite a different story.

In t/t3427-rebase-subtree.sh there is:

  sed -e "s%\([0-9a-f]\{40\} \)files_subtree/%\1%"

Not sure if that counts. There's other places in the tests.

However, I don't see the point if the marker-size is a low enough number, like 7.

> > So, do we want those three things?
> >
> >  1. A non-standard marker-size
> >  2. Check beforehand the existence of those markers and disable
> >     automerge
> >  3. Check afterwards the existence of those markers and disable
> >     automerge
> 
> I do not think 3 is needed if we do 2 and I do not think 1 would
> particularly be useful *UNLESS* the code consults with the attribute
> system to see what marker size the path uses to avoid crashing with
> the non-standard marker-size the path already uses.

But what is more likely? a) That the marker-size is 7 (the default), or
b) that the marker-size is not the default, but that there's a
marker-size attribute *and* the value is precisely 13?

I think a) is way more likely than b).

> So the easiest would be not to do anything for now, with a note
> about known limitations in the doc.  The second easiest would be to
> do 2. alone.  We could do 1. to be more complete but I tend to think
> that it is better to leave it as #leftoverbits.

OK. I think 1. is low-hanging fruit, but I'm fine with not doing
anything, or trying 2.

I don't think 2. would be that hard, so I will try that before
re-rolling the series.

(unless somebody replies to my other pending arguments)

Cheers.

[1] https://lore.kernel.org/git/xmqqblek8e94.fsf@gitster.c.googlers.com/
[2] https://lore.kernel.org/git/5fe3dd62e12f8_7855a2081f@natae.notmuch/

-- 
Felipe Contreras

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v5 1/1] mergetool: add automerge configuration
  @ 2020-12-24  0:32  6%           ` Junio C Hamano
  2020-12-24  1:36  0%             ` Felipe Contreras
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2020-12-24  0:32 UTC (permalink / raw)
  To: Felipe Contreras; +Cc: git, David Aguilar, Johannes Sixt, Seth House

Felipe Contreras <felipe.contreras@gmail.com> writes:

>> Ah, I forgot about that one.  I think "the number of conflicts" was
>> a UI mistake (the original that it mimics is "merge" from RCS suite,
>> which uses 1 and 2 for "conflicts" and "trouble") but we know we
>> will get conflicts, so it is wrong to expect success from the
>> command.  Deliberately ignoring the return status is the right thing
>> to do.
>
> I agree. My bet is that nobody is checking the return status of "git
> merge-file" to find out the number of conflicts. Plus, how can you check
> the difference between 255 conflicts and error -1?

Yup, I already mentioned UI mistake so you do not have to repeat it
to consume more bandwidth.  We're in agreement already.

> We could do something like --marker-size=13 to minimize the chances of
> that happening.
>
> In that case I would prefer '/^<\{13\} /' (to avoid too many
> characters). I see those regexes used elsewhere in git, but I don't know
> how portable that is.

If it is used elsewhere with "sed", then that would be OK, but if it
is not with "sed" but with "grep", that's quite a different story.

> So, do we want those three things?
>
>  1. A non-standard marker-size
>  2. Check beforehand the existence of those markers and disable
>     automerge
>  3. Check afterwards the existence of those markers and disable
>     automerge

I do not think 3 is needed if we do 2 and I do not think 1 would
particularly be useful *UNLESS* the code consults with the attribute
system to see what marker size the path uses to avoid crashing with
the non-standard marker-size the path already uses.

So the easiest would be not to do anything for now, with a note
about known limitations in the doc.  The second easiest would be to
do 2. alone.  We could do 1. to be more complete but I tend to think
that it is better to leave it as #leftoverbits.

^ permalink raw reply	[relevance 6%]

* [PATCH v4 00/20] make "mktag" use fsck_tag() & more
  2020-12-09 20:01  2% ` [PATCH v3 " Ævar Arnfjörð Bjarmason
  2020-12-09 22:30  0%   ` Junio C Hamano
@ 2020-12-23  1:35  2%   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2020-12-23  1:35 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, brian m . carlson, Eric Sunshine,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

So, when re-rolling this with Junio's small fixup this grew in scope a
bit, but should paradoxically be easier to deal with even though it's
2x the size now. Read on:

Ævar Arnfjörð Bjarmason (20):
  mktag doc: say <hash> not <sha1>
  mktag doc: grammar fix, when exists -> when it exists
  mktag doc: update to explain why to use this
  mktag tests: don't needlessly use a subshell
  mktag tests: remove needless SHA-1 hardcoding
  mktag tests: improve verify_object() test coverage
  mktag tests: don't pipe to stderr needlessly
  mktag tests: don't create "mytag" twice
  mktag tests: stress test whitespace handling
  mktag tests: test "hash-object" compatibility

I re-arranged this series so the doc/test patches for existing
behavior all come first now. There's some new patches there (see
range-diff), but all rather easy-to review fixes or tests for existing
behavior.

  mktag: use default strbuf_read() hint
  mktag: remove redundant braces in one-line body "if"
  mktag: use puts(str) instead of printf("%s\n", str)

Trivial coding style changes, the puts() patch is new.

  mktag: use fsck instead of custom verify_tag()

Still the real meat of the series, unchanged in any meaningful way,
except in (as seen in the range-diff) carrying forward doc/test
changes made earlier.

  fsck: make fsck_config() re-usable
  mktag: allow turning off fsck.extraHeaderEntry

ditto unchanged.

  mktag: allow omitting the header/body \n separator

I discovered a regression in mktag in git since 2008 where it refuses
to accept input without an empty newline separating the body & message
in cases where there's no message.

Now we again accept the same input as hash-object, and with the new
"hash-object" test integration earlier in the series we're confident
that mktag & hash-object do the same thing in all these cases.

  mktag: convert to parse-options
  mktag: mark strings for translation
  mktag: add a --no-strict option

The #leftoverbits I suggested in v3 of converting to parse-options &
doing i18n for mktag, and finally supporting --no-strict so you can
make it behave like "fsck" does in its default mode.

 Documentation/git-hash-object.txt |   4 +
 Documentation/git-mktag.txt       |  42 +++++-
 builtin/fsck.c                    |  20 +--
 builtin/mktag.c                   | 235 +++++++++++-------------------
 fsck.c                            |  59 +++++++-
 fsck.h                            |  16 ++
 parse-options.h                   |   1 +
 t/t1006-cat-file.sh               |   2 +-
 t/t3800-mktag.sh                  | 211 +++++++++++++++++++++------
 9 files changed, 361 insertions(+), 229 deletions(-)

Range-diff:
 1:  aee3f52a47 =  1:  a31c305cfc mktag doc: say <hash> not <sha1>
 -:  ---------- >  2:  81cb4cba5c mktag doc: grammar fix, when exists -> when it exists
 8:  fa04664f7f !  3:  b4bc6f894c mktag doc: update to explain why to use this
    @@ Commit message
         documentation wouldn't have much of an idea what the difference
         was.
     
    -    Let's make it clear that it's to do with slightly different fsck
    -    validation logic, and cross-link the "mktag" and "hash-object"
    -    documentation to aid discover-ability.
    +    Let's allude to our own validation logic, and cross-link the "mktag"
    +    and "hash-object" documentation to aid discover-ability. A follow-up
    +    change to migrate "mktag" to use "fsck" validation will make the part
    +    about validation logic clearer.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ Documentation/git-mktag.txt: SYNOPSIS
     +    git hash-object -t tag -w --stdin <my-tag
     +
     +The difference is that mktag will die before writing the tag if the
    -+tag doesn't pass a linkgit:git-fsck[1] check.
    -+
    -+The "fsck" check done mktag is is stricter than what
    -+linkgit:git-fsck[1] would run by default in that all `fsck.<msg-id>`
    -+messages are promoted from warnings to errors (so e.g. a missing
    -+"tagger" line is an error). Extra headers in the object are also an
    -+error under mktag, but ignored by linkgit:git-fsck[1].
    ++tag doesn't pass a sanity check.
      
      Tag Format
      ----------
 4:  1f06b9c0cf =  4:  acb94e0289 mktag tests: don't needlessly use a subshell
 5:  5d1cb73ca3 =  5:  4ae76ec5e3 mktag tests: remove needless SHA-1 hardcoding
 6:  cf86f4ca37 =  6:  9effb4532b mktag tests: improve verify_object() test coverage
 -:  ---------- >  7:  b81d31a917 mktag tests: don't pipe to stderr needlessly
 -:  ---------- >  8:  11f59718b4 mktag tests: don't create "mytag" twice
 -:  ---------- >  9:  dd6b012b0c mktag tests: stress test whitespace handling
 -:  ---------- > 10:  56c6b562fd mktag tests: test "hash-object" compatibility
 2:  6e98557709 = 11:  1e2e4ec269 mktag: use default strbuf_read() hint
 3:  8e5fe08f15 = 12:  be2ab3edab mktag: remove redundant braces in one-line body "if"
 -:  ---------- > 13:  d8514df970 mktag: use puts(str) instead of printf("%s\n", str)
 7:  5812ee53c9 ! 14:  346d73cc97 mktag: use fsck instead of custom verify_tag()
    @@ Commit message
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + ## Documentation/git-mktag.txt ##
    +@@ Documentation/git-mktag.txt: write a tag found in `my-tag`:
    +     git hash-object -t tag -w --stdin <my-tag
    + 
    + The difference is that mktag will die before writing the tag if the
    +-tag doesn't pass a sanity check.
    ++tag doesn't pass a linkgit:git-fsck[1] check.
    ++
    ++The "fsck" check done mktag is stricter than what linkgit:git-fsck[1]
    ++would run by default in that all `fsck.<msg-id>` messages are promoted
    ++from warnings to errors (so e.g. a missing "tagger" line is an error).
    ++
    ++Extra headers in the object are also an error under mktag, but ignored
    ++by linkgit:git-fsck[1]
    + 
    + Tag Format
    + ----------
    +
      ## builtin/mktag.c ##
     @@
      #include "tag.h"
    @@ builtin/mktag.c
     +
     +	buffer = read_object_file(tagged_oid, &type, &size);
     +	if (!buffer)
    -+		die("could not read tagged object '%s'\n",
    ++		die("could not read tagged object '%s'",
     +		    oid_to_hex(tagged_oid));
     +	if (type != *tagged_type)
    -+		die("object '%s' tagged as '%s', but is a '%s' type\n",
    ++		die("object '%s' tagged as '%s', but is a '%s' type",
     +		    oid_to_hex(tagged_oid),
     +		    type_name(*tagged_type), type_name(type));
     +
    @@ t/t3800-mktag.sh: tagger  <> 0 +0000
      
     -check_verify_failure 'disallow missing tag author name' \
     -	'^error: char.*: missing tagger name$'
    -+test_expect_success 'allow missing tag author name' '
    -+	git mktag <tag.sig
    -+'
    ++test_expect_mktag_success 'allow missing tag author name'
      
      ############################################################
      # 14. disallow missing tag author name
    @@ t/t3800-mktag.sh: tagger T A Gger <
      
      ############################################################
      # 15. allow empty tag email
    -@@ t/t3800-mktag.sh: test_expect_success \
    -     'git mktag <tag.sig >.git/refs/tags/mytag 2>message'
    +@@ t/t3800-mktag.sh: EOF
    + test_expect_mktag_success 'allow empty tag email'
      
      ############################################################
     -# 16. disallow spaces in tag email
    @@ t/t3800-mktag.sh: tagger T A Gger <tag ger@example.com> 0 +0000
      
     -check_verify_failure 'disallow spaces in tag email' \
     -	'^error: char.*: malformed tagger field$'
    -+test_expect_success 'allow spaces in tag email like fsck' '
    -+	git mktag <tag.sig
    -+'
    ++test_expect_mktag_success 'allow spaces in tag email like fsck'
      
      ############################################################
      # 17. disallow missing tag timestamp
    @@ t/t3800-mktag.sh: tagger T A Gger <tagger@example.com> 1206478233 -1430
      
     -check_verify_failure 'detect invalid tag timezone3' \
     -	'^error: char.*: malformed tag timezone$'
    -+test_expect_success 'allow invalid tag timezone' '
    -+	git mktag <tag.sig
    -+'
    ++test_expect_mktag_success 'allow invalid tag timezone'
      
      ############################################################
      # 23. detect invalid header entry
    @@ t/t3800-mktag.sh: this line should not be here
      check_verify_failure 'detect invalid header entry' \
     -	'^error: char.*: trailing garbage in tag header$'
     +	'^error:.* extraHeaderEntry:'
    -+
    -+cat >tag.sig <<EOF
    -+object $head
    -+type commit
    -+tag mytag
    -+tagger T A Gger <tagger@example.com> 1206478233 -0500
    -+
    -+
    -+this line comes after an extra newline
    -+EOF
    -+
    -+test_expect_success \
    -+    'allow extra newlines at start of body' \
    -+    'git mktag <tag.sig >.git/refs/tags/mytag 2>message'
    + 
    + cat >tag.sig <<EOF
    + object $head
    +@@ t/t3800-mktag.sh: tagger T A Gger <tagger@example.com> 1206478233 -0500$space
    + EOF
    + 
    + check_verify_failure 'extra whitespace at end of headers' \
    +-	'^error: char.*: malformed tag timezone$'
    ++	'^error:.* badTimezone:'
    + 
    + cat >tag.sig <<EOF
    + object $head
    +@@ t/t3800-mktag.sh: tagger T A Gger <tagger@example.com> 1206478233 -0500
    + EOF
    + 
    + check_verify_failure 'disallow no header / body newline separator' \
    +-	'^error: char.*: trailing garbage in tag header$'
    ++	'^error:.* extraHeaderEntry:'
      
      ############################################################
      # 24. create valid tag
 9:  30eff9170f = 15:  0e7994d8fc fsck: make fsck_config() re-usable
10:  11139ec2b8 ! 16:  5e8046022b mktag: allow turning off fsck.extraHeaderEntry
    @@ Commit message
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Documentation/git-mktag.txt ##
    -@@ Documentation/git-mktag.txt: tag doesn't pass a linkgit:git-fsck[1] check.
    - The "fsck" check done mktag is is stricter than what
    - linkgit:git-fsck[1] would run by default in that all `fsck.<msg-id>`
    - messages are promoted from warnings to errors (so e.g. a missing
    --"tagger" line is an error). Extra headers in the object are also an
    --error under mktag, but ignored by linkgit:git-fsck[1].
    -+"tagger" line is an error).
    -+
    -+Extra headers in the object are also an error under mktag, but ignored
    +@@ Documentation/git-mktag.txt: would run by default in that all `fsck.<msg-id>` messages are promoted
    + from warnings to errors (so e.g. a missing "tagger" line is an error).
    + 
    + Extra headers in the object are also an error under mktag, but ignored
    +-by linkgit:git-fsck[1]
     +by linkgit:git-fsck[1]. This extra check can be turned off by setting
     +the appropriate `fsck.<msg-id>` varible:
     +
 -:  ---------- > 17:  32698e1d00 mktag: allow omitting the header/body \n separator
 -:  ---------- > 18:  b6a22f2f99 mktag: convert to parse-options
 -:  ---------- > 19:  7fc0b81df7 mktag: mark strings for translation
 -:  ---------- > 20:  6fa443d528 mktag: add a --no-strict option
-- 
2.29.2.222.g5d2a92d10f8


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v3 00/10] make "mktag" use fsck_tag()
  2020-12-09 20:01  2% ` [PATCH v3 " Ævar Arnfjörð Bjarmason
@ 2020-12-09 22:30  0%   ` Junio C Hamano
  2020-12-23  1:35  2%   ` [PATCH v4 00/20] make "mktag" use fsck_tag() & more Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 200+ results
From: Junio C Hamano @ 2020-12-09 22:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, brian m . carlson, Eric Sunshine,
	Johannes Schindelin

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> This version should address all the comments Junio made on v2. Changes:
>
>  * The whole "extra" fsck option is gone, I just didn't realize I
>    could set the new check to "ignore", and then manually promote it.
>
>  * Ejected "mktag: reword write_object_file() error". It was the same
>    phrasing as "git tag" uses, let's just keep it.
>
>  * Clarifications in docs/commit messages
>
>  * There's 2 extra patches at the end now which take the first steps
>    into making "git mktag" more of a normal builtin. It reads fsck.*
>    config variables, so you can turn off that "no extra headers" check
>    through the normal fsck.<msg-id>=ignore config.
>
>    It should also be moved to getopts, and we could make it support
>    --no-strict to have the same idea of error/warning as fsck itself,
>    but that's #leftoverbits, along with moving it to i18n.
>
>    It would be nice to have patches 1-8 merged down if they're deemed
>    ready, and if 9-10 aren't deemed wanted just discard them. I think
>    it makes sense though...

Thanks.  I haven't read the individual patches, but spotted an
obvious "is is" typo in the doc while scanning through the end
result of applying all of them.

 Documentation/git-mktag.txt | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git c/Documentation/git-mktag.txt w/Documentation/git-mktag.txt
index e1506dde56..2c1afedef6 100644
--- c/Documentation/git-mktag.txt
+++ w/Documentation/git-mktag.txt
@@ -27,10 +27,9 @@ write a tag found in `my-tag`:
 The difference is that mktag will die before writing the tag if the
 tag doesn't pass a linkgit:git-fsck[1] check.
 
-The "fsck" check done mktag is is stricter than what
-linkgit:git-fsck[1] would run by default in that all `fsck.<msg-id>`
-messages are promoted from warnings to errors (so e.g. a missing
-"tagger" line is an error).
+The "fsck" check done mktag is stricter than what linkgit:git-fsck[1]
+would run by default in that all `fsck.<msg-id>` messages are promoted
+from warnings to errors (so e.g. a missing "tagger" line is an error).
 
 Extra headers in the object are also an error under mktag, but ignored
 by linkgit:git-fsck[1]. This extra check can be turned off by setting

^ permalink raw reply related	[relevance 0%]

* [PATCH v3 00/10] make "mktag" use fsck_tag()
  @ 2020-12-09 20:01  2% ` Ævar Arnfjörð Bjarmason
  2020-12-09 22:30  0%   ` Junio C Hamano
  2020-12-23  1:35  2%   ` [PATCH v4 00/20] make "mktag" use fsck_tag() & more Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2020-12-09 20:01 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, brian m . carlson, Eric Sunshine,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

This version should address all the comments Junio made on v2. Changes:

 * The whole "extra" fsck option is gone, I just didn't realize I
   could set the new check to "ignore", and then manually promote it.

 * Ejected "mktag: reword write_object_file() error". It was the same
   phrasing as "git tag" uses, let's just keep it.

 * Clarifications in docs/commit messages

 * There's 2 extra patches at the end now which take the first steps
   into making "git mktag" more of a normal builtin. It reads fsck.*
   config variables, so you can turn off that "no extra headers" check
   through the normal fsck.<msg-id>=ignore config.

   It should also be moved to getopts, and we could make it support
   --no-strict to have the same idea of error/warning as fsck itself,
   but that's #leftoverbits, along with moving it to i18n.

   It would be nice to have patches 1-8 merged down if they're deemed
   ready, and if 9-10 aren't deemed wanted just discard them. I think
   it makes sense though...

Ævar Arnfjörð Bjarmason (10):
  mktag doc: say <hash> not <sha1>
  mktag: use default strbuf_read() hint
  mktag: remove redundant braces in one-line body "if"
  mktag tests: don't needlessly use a subshell
  mktag tests: remove needless SHA-1 hardcoding
  mktag tests: improve verify_object() test coverage
  mktag: use fsck instead of custom verify_tag()
  mktag doc: update to explain why to use this
  fsck: make fsck_config() re-usable
  mktag: allow turning off fsck.extraHeaderEntry

 Documentation/git-hash-object.txt |   4 +
 Documentation/git-mktag.txt       |  34 ++++-
 builtin/fsck.c                    |  20 +--
 builtin/mktag.c                   | 204 +++++++++---------------------
 fsck.c                            |  57 ++++++++-
 fsck.h                            |  16 +++
 t/t1006-cat-file.sh               |   2 +-
 t/t3800-mktag.sh                  | 132 ++++++++++++++-----
 8 files changed, 261 insertions(+), 208 deletions(-)

Range-diff:
 1:  f46abb37df9 =  1:  aee3f52a478 mktag doc: say <hash> not <sha1>
 2:  1b4d9a53302 =  2:  6e98557709a mktag: use default strbuf_read() hint
 3:  83f4af6013e <  -:  ----------- mktag: reword write_object_file() error
 4:  bca1484ed96 =  3:  8e5fe08f155 mktag: remove redundant braces in one-line body "if"
 5:  ac7c4097c90 =  4:  1f06b9c0cf9 mktag tests: don't needlessly use a subshell
 6:  5e076659e45 !  5:  5d1cb73ca35 mktag tests: remove needless SHA-1 hardcoding
    @@ t/t3800-mktag.sh: EOF
      
      ############################################################
     -#  3. object line SHA1 check
    -+#  3. object line SHA check
    ++#  3. object line hash check
      
      cat >tag.sig <<EOF
     -object zz9e9b33986b1c2670fff52c5067603117b3e895
 7:  a048c3e6401 !  6:  cf86f4ca37d mktag tests: improve verify_object() test coverage
    @@ t/t3800-mktag.sh: check_verify_failure '"type" line type-name length check' \
      
      ############################################################
     -#  9. verify object (SHA1/type) check
    -+#  9. verify object (SHA/type) check
    ++#  9. verify object (hash/type) check
      
      cat >tag.sig <<EOF
      object $(test_oid deadbeef)
    @@ t/t3800-mktag.sh: check_verify_failure '"type" line type-name length check' \
     +
     +EOF
     +
    -+check_verify_failure 'verify object (SHA/type) check -- correct type, nonexisting object' \
    ++check_verify_failure 'verify object (hash/type) check -- correct type, nonexisting object' \
     +	'^error: char7: could not verify object.*$'
     +
     +cat >tag.sig <<EOF
    @@ t/t3800-mktag.sh: check_verify_failure '"type" line type-name length check' \
      EOF
      
     -check_verify_failure 'verify object (SHA1/type) check' \
    -+check_verify_failure 'verify object (SHA/type) check -- made-up type, nonexisting object' \
    ++check_verify_failure 'verify object (hash/type) check -- made-up type, nonexisting object' \
     +	'^fatal: invalid object type'
     +
     +cat >tag.sig <<EOF
    @@ t/t3800-mktag.sh: check_verify_failure '"type" line type-name length check' \
     +
     +EOF
     +
    -+check_verify_failure 'verify object (SHA/type) check -- incorrect type, valid object' \
    ++check_verify_failure 'verify object (hash/type) check -- incorrect type, valid object' \
      	'^error: char7: could not verify object.*$'
      
     +cat >tag.sig <<EOF
    @@ t/t3800-mktag.sh: check_verify_failure '"type" line type-name length check' \
     +
     +EOF
     +
    -+check_verify_failure 'verify object (SHA/type) check -- incorrect type, valid object' \
    ++check_verify_failure 'verify object (hash/type) check -- incorrect type, valid object' \
     +	'^error: char7: could not verify object'
     +
      ############################################################
 8:  dab44d32359 <  -:  ----------- fsck: add new "extra" checks for "mktag"
 9:  8ff853caeea !  7:  5812ee53c97 mktag: use fsck instead of custom verify_tag()
    @@ Commit message
         back to the same commit[1]. Let's unify them so we're not maintaining
         two sets functions to verify that a tag is OK.
     
    -    Moving to fsck_tag() required teaching it to optionally use some
    -    validations that only the old mktag code could perform. That was done
    -    in an earlier commit, the "extraHeaderEntry" and
    -    "extraHeaderBodyNewline" tests being added here make use of that
    -    logic.
    +    The behavior of fsck_tag() and the old "mktag" code being removed here
    +    is different in few aspects.
     
    -    There was other "mktag" validation logic that I think makes sense to
    -    just remove. Namely:
    +    I think it makes sense to remove some of those checks, namely:
     
          A. fsck only cares that the timezone matches [-+][0-9]{4}. The mktag
             code disallowed values larger than 1400.
    @@ Commit message
          C. Like B, but "mktag" disallowed spaces in the <email> part, fsck
             allows it.
     
    -    We didn't only lose obscure validation logic, we also gained some:
    +    In some ways fsck_tag() is stricter than "mktag" was, namely:
     
          D. fsck disallows zero-padded dates, but mktag didn't care. So
             e.g. the timestamp "0000000000 +0000" produces an error now. A
             test in "t1006-cat-file.sh" relied on this, it's been changed to
             use "hash-object" (without fsck) instead.
     
    +    There was one check I deemed worth keeping by porting it over to
    +    fsck_tag():
    +
    +     E. "mktag" did not allow any custom headers, and by extension (as an
    +        empty commit is allowed) also forbade an extra stray trailing
    +        newline after the headers it knew about.
    +
    +        Add a new check in the "ignore" category to fsck and use it. This
    +        somewhat abuses the facility added in efaba7cc77f (fsck:
    +        optionally ignore specific fsck issues completely, 2015-06-22).
    +
    +        This is somewhat of hack, but probably the least invasive change
    +        we can make here. The fsck command will shuffle these categories
    +        around, e.g. under --strict the "info" becomes a "warn" and "warn"
    +        becomes "error". Existing users of fsck's (and others,
    +        e.g. index-pack) --strict option rely on this.
    +
    +        So we need to put something into a category that'll be ignored by
    +        all existing users of the API. Pretending that
    +        fsck.extraHeaderEntry=error ("ignore" by default) was set serves
    +        to do this for us.
    +
         1. ec4465adb38 (Add "tag" objects that can be used to sign other
            objects., 2005-04-25)
     
    @@ builtin/mktag.c
     +	switch (msg_type) {
     +	case FSCK_WARN:
     +	case FSCK_ERROR:
    -+	case FSCK_EXTRA:
     +		/*
     +		 * We treat both warnings and errors as errors, things
     +		 * like missing "tagger" lines are "only" warnings
    @@ builtin/mktag.c: int cmd_mktag(int argc, const char **argv, const char *prefix)
     -	   "object <sha1>\ntype\ntagger " */
     -	if (verify_tag(buf.buf, buf.len) < 0)
     -		die("invalid tag signature file");
    -+	fsck_options.extra = 1;
     +	fsck_options.error_func = mktag_fsck_error_func;
    ++	fsck_set_msg_type(&fsck_options, "extraheaderentry", "warn");
     +	if (fsck_tag_standalone(NULL, buf.buf, buf.len, &fsck_options,
     +				&tagged_oid, &tagged_type))
     +		die("tag on stdin did not pass our strict fsck check");
    @@ builtin/mktag.c: int cmd_mktag(int argc, const char **argv, const char *prefix)
     +		die("tag on stdin did not refer to a valid object");
      
      	if (write_object_file(buf.buf, buf.len, tag_type, &result) < 0)
    - 		die("unable to write annotated tag object");
    + 		die("unable to write tag file");
     
      ## fsck.c ##
    +@@ fsck.c: static struct oidset gitmodules_done = OIDSET_INIT;
    + 	/* infos (reported as warnings, but ignored by default) */ \
    + 	FUNC(GITMODULES_PARSE, INFO) \
    + 	FUNC(BAD_TAG_NAME, INFO) \
    +-	FUNC(MISSING_TAGGER_ENTRY, INFO)
    ++	FUNC(MISSING_TAGGER_ENTRY, INFO) \
    ++	/* ignored (elevated when requested) */ \
    ++	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
    + 
    + #define MSG_ID(id, msg_type) FSCK_MSG_##id,
    + enum fsck_msg_id {
     @@ fsck.c: static int fsck_tag(const struct object_id *oid, const char *buffer,
      		    unsigned long size, struct fsck_options *options)
      {
    @@ fsck.c: static int fsck_tag(const struct object_id *oid, const char *buffer,
      		ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
      	if (ret)
      		goto done;
    +@@ fsck.c: static int fsck_tag(const struct object_id *oid, const char *buffer,
    + 	else
    + 		ret = fsck_ident(&buffer, oid, OBJ_TAG, options);
    + 
    ++	if (!starts_with(buffer, "\n")) {
    ++		/*
    ++		 * The verify_headers() check will allow
    ++		 * e.g. "[...]tagger <tagger>\nsome
    ++		 * garbage\n\nmessage" to pass, thinking "some
    ++		 * garbage" could be a custom header. E.g. "mktag"
    ++		 * doesn't want any unknown headers.
    ++		 */
    ++		ret = report(options, oid, OBJ_TAG, FSCK_MSG_EXTRA_HEADER_ENTRY, "invalid format - extra header(s) after 'tagger'");
    ++		if (ret)
    ++			goto done;
    ++	}
    ++
    + done:
    + 	strbuf_release(&sb);
    + 	return ret;
     
      ## fsck.h ##
     @@ fsck.h: int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
    @@ t/t3800-mktag.sh: tagger . <> 0 +0000
     +check_verify_failure '"object" line label check' '^error:.* missingObject:'
      
      ############################################################
    - #  3. object line SHA check
    + #  3. object line hash check
     @@ t/t3800-mktag.sh: tagger . <> 0 +0000
      
      EOF
    @@ t/t3800-mktag.sh: tag mytag
     +	'^error:.* badType:'
      
      ############################################################
    - #  9. verify object (SHA/type) check
    + #  9. verify object (hash/type) check
     @@ t/t3800-mktag.sh: tagger . <> 0 +0000
      EOF
      
    - check_verify_failure 'verify object (SHA/type) check -- correct type, nonexisting object' \
    + check_verify_failure 'verify object (hash/type) check -- correct type, nonexisting object' \
     -	'^error: char7: could not verify object.*$'
     +	'^fatal: could not read tagged object'
      
    @@ t/t3800-mktag.sh: tagger . <> 0 +0000
     @@ t/t3800-mktag.sh: tagger . <> 0 +0000
      EOF
      
    - check_verify_failure 'verify object (SHA/type) check -- made-up type, nonexisting object' \
    + check_verify_failure 'verify object (hash/type) check -- made-up type, nonexisting object' \
     -	'^fatal: invalid object type'
     +	'^error:.* badType:'
      
    @@ t/t3800-mktag.sh: tagger . <> 0 +0000
     @@ t/t3800-mktag.sh: tagger . <> 0 +0000
      EOF
      
    - check_verify_failure 'verify object (SHA/type) check -- incorrect type, valid object' \
    + check_verify_failure 'verify object (hash/type) check -- incorrect type, valid object' \
     -	'^error: char7: could not verify object.*$'
     +	'^error:.* badType:'
      
    @@ t/t3800-mktag.sh: tagger . <> 0 +0000
     @@ t/t3800-mktag.sh: tagger . <> 0 +0000
      EOF
      
    - check_verify_failure 'verify object (SHA/type) check -- incorrect type, valid object' \
    + check_verify_failure 'verify object (hash/type) check -- incorrect type, valid object' \
     -	'^error: char7: could not verify object'
     +	'^fatal: object.*tagged as.*tree.*but is.*commit'
      
    @@ t/t3800-mktag.sh: this line should not be here
     +tagger T A Gger <tagger@example.com> 1206478233 -0500
     +
     +
    -+this line should be one line up
    ++this line comes after an extra newline
     +EOF
     +
    -+check_verify_failure 'detect invalid header entry' \
    -+	'^error:.* extraHeaderBodyNewline:'
    ++test_expect_success \
    ++    'allow extra newlines at start of body' \
    ++    'git mktag <tag.sig >.git/refs/tags/mytag 2>message'
      
      ############################################################
      # 24. create valid tag
10:  e38feefd3f8 !  8:  fa04664f7f1 mktag doc: update to explain why to use this
    @@ Documentation/git-mktag.txt: SYNOPSIS
     +Reads a tag contents on standard input and creates a tag object. The
     +output is the new tag's <object> identifier.
     +
    -+This command accepts a subset of what linkgit:git-hash-object[1] would
    -+accept with `-t tag --stdin`. I.e. both of these work:
    ++This command is mostly equivalent to linkgit:git-hash-object[1]
    ++invoked with `-t tag -w --stdin`. I.e. both of these will create and
    ++write a tag found in `my-tag`:
     +
     +    git mktag <my-tag
    -+    git hash-object -t tag --stdin <my-tag
    ++    git hash-object -t tag -w --stdin <my-tag
     +
    -+The difference between the two is that mktag does the equivalent of a
    -+linkgit:git-fsck(1) check on its input, and furthermore disallows some
    -+thing linkgit:git-hash-object[1] would pass, e.g. extra headers in the
    -+object before the message.
    ++The difference is that mktag will die before writing the tag if the
    ++tag doesn't pass a linkgit:git-fsck[1] check.
    ++
    ++The "fsck" check done mktag is is stricter than what
    ++linkgit:git-fsck[1] would run by default in that all `fsck.<msg-id>`
    ++messages are promoted from warnings to errors (so e.g. a missing
    ++"tagger" line is an error). Extra headers in the object are also an
    ++error under mktag, but ignored by linkgit:git-fsck[1].
      
      Tag Format
      ----------
    @@ Documentation/git-mktag.txt: exists, is separated by a blank line from the heade
      message part may contain a signature that Git itself doesn't
      care about, but that can be verified with gpg.
      
    -+HISTORY
    -+-------
    -+
    -+In versions of Git before v2.30.0 the "mktag" command's validation
    -+logic was subtly different than that of linkgit:git-fsck[1]. It is now
    -+a strict superset of linkgit:git-fsck[1]'s validation logic.
    -+
     +SEE ALSO
     +--------
     +linkgit:git-hash-object[1],
 -:  ----------- >  9:  30eff9170fb fsck: make fsck_config() re-usable
 -:  ----------- > 10:  11139ec2b8d mktag: allow turning off fsck.extraHeaderEntry
-- 
2.29.2.222.g5d2a92d10f8


^ permalink raw reply	[relevance 2%]

* Re: Unexpected behavior with branch.*.{remote,pushremote,merge}
  2020-12-04 21:00  6%       ` Jeff King
@ 2020-12-04 22:20  0%         ` Ben Denhartog
  0 siblings, 0 replies; 200+ results
From: Ben Denhartog @ 2020-12-04 22:20 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: git

I guess from my perspective, for these repositories, my fork _is_ the "origin"; I tend to mirror the repositories I contribute to (e.g. use the "fork" feature on Git{Hub,Lab}/etc), then clone my mirror, which lends itself to that mental model (origin is "mine"). 

-- 
  Ben Denhartog
  ben@sudoforge.com

On Fri, Dec 4, 2020, at 14:00, Jeff King wrote:
> On Fri, Dec 04, 2020 at 11:57:23AM -0800, Junio C Hamano wrote:
> 
> > > * Refactor away from usage of FETCH_HEAD
> > 
> > Yes, "fetch --all" is about updating the remote-tracking branches
> > and in retrospect, perhaps we might have avoided confusion if we
> > made it not to touch FETCH_HEAD, but it is not going to change now.
> 
> I think its behavior of appending all of the entries is sensible (or at
> least is the least-surprising thing). The only weird part is that it
> does not keep the "make sure heads for merging come before not-for-merge
> entries" property that individual ones have.
> 
> It could take a final pass after all of the sub-fetches have run and do
> that. I don't have any plans to work on it, but I'm tempted to call it a
> #leftoverbits candidate.
> 
> > > * Set `remote.pushdefault = origin`
> > > * Set `push.default = current` (instead of `simple`, and is what
> > > my global config sets this to)
> > 
> > I have a feeling that simple vs current does not make a difference
> > if you are pusing main to main, and if so, push.default could be
> > left to the default settings of 'simple'.  But the key to successful
> > use of the triangular workflow is to configure so that "fetch/pull"
> > goes to one place (i.e. your upstream) and "push" goes to another
> > (i.e. your publishing repository), and "remote.pushdefault" is a
> > good ingredient to do so.
> 
> I think my advice is just out-of-date (by quite a lot). In the early
> days, I remember being bitten by (or at least confused by) simple and
> how its use of upstream could work with multiple remotes. But we long
> ago fixed that, with ed2b18292b (push: change `simple` to accommodate
> triangular workflows, 2013-06-19), and these days it is explicitly
> documented to work the same as "current" when pushing to another remote.
> 
> > It is however more common to use 'origin' as the name of your
> > upstream repository (so that "git fetch" and "git pull" would grab
> > things from there by default) and set remote.pushdefault to the
> > remote you push into, though (iow, I found remote.pushdefault
> > pointing at 'origin' a bit unusual).  Doing so may make your
> > triangular workflow work smoother.
> 
> Yeah, I wasn't going to nitpick his remote names, but that's the same
> convention I use. :) If people have custom forks of a repository that I
> access, I usually just name the remote for them after their username
> (including my own).
> 
> -Peff
>

^ permalink raw reply	[relevance 0%]

* Re: Unexpected behavior with branch.*.{remote,pushremote,merge}
  @ 2020-12-04 21:00  6%       ` Jeff King
  2020-12-04 22:20  0%         ` Ben Denhartog
  0 siblings, 1 reply; 200+ results
From: Jeff King @ 2020-12-04 21:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ben Denhartog, git

On Fri, Dec 04, 2020 at 11:57:23AM -0800, Junio C Hamano wrote:

> > * Refactor away from usage of FETCH_HEAD
> 
> Yes, "fetch --all" is about updating the remote-tracking branches
> and in retrospect, perhaps we might have avoided confusion if we
> made it not to touch FETCH_HEAD, but it is not going to change now.

I think its behavior of appending all of the entries is sensible (or at
least is the least-surprising thing). The only weird part is that it
does not keep the "make sure heads for merging come before not-for-merge
entries" property that individual ones have.

It could take a final pass after all of the sub-fetches have run and do
that. I don't have any plans to work on it, but I'm tempted to call it a
#leftoverbits candidate.

> > * Set `remote.pushdefault = origin`
> > * Set `push.default = current` (instead of `simple`, and is what
> > my global config sets this to)
> 
> I have a feeling that simple vs current does not make a difference
> if you are pusing main to main, and if so, push.default could be
> left to the default settings of 'simple'.  But the key to successful
> use of the triangular workflow is to configure so that "fetch/pull"
> goes to one place (i.e. your upstream) and "push" goes to another
> (i.e. your publishing repository), and "remote.pushdefault" is a
> good ingredient to do so.

I think my advice is just out-of-date (by quite a lot). In the early
days, I remember being bitten by (or at least confused by) simple and
how its use of upstream could work with multiple remotes. But we long
ago fixed that, with ed2b18292b (push: change `simple` to accommodate
triangular workflows, 2013-06-19), and these days it is explicitly
documented to work the same as "current" when pushing to another remote.

> It is however more common to use 'origin' as the name of your
> upstream repository (so that "git fetch" and "git pull" would grab
> things from there by default) and set remote.pushdefault to the
> remote you push into, though (iow, I found remote.pushdefault
> pointing at 'origin' a bit unusual).  Doing so may make your
> triangular workflow work smoother.

Yeah, I wasn't going to nitpick his remote names, but that's the same
convention I use. :) If people have custom forks of a repository that I
access, I usually just name the remote for them after their username
(including my own).

-Peff

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v3] usage: add trace2 entry upon warning()
  @ 2020-11-24 22:15  6%   ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-11-24 22:15 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> Emit a trace2 error event whenever warning() is called, just like when
> die(), error(), or usage() is called.
>
> This helps debugging issues that would trigger warnings but not errors.
> In particular, this might have helped debugging an issue I encountered
> with commit graphs at $DAYJOB [1].
>
> There is a tradeoff between including potentially relevant messages and
> cluttering up the trace output produced. I think that warning() messages
> should be included in traces, because by its nature, Git is used over
> multiple invocations of the Git tool, and a failure (currently traced)
> in a Git invocation might be caused by an unexpected interaction in a
> previous Git invocation that only has a warning (currently untraced) as
> a symptom - as is the case in [1].
>
> [1] https://lore.kernel.org/git/20200629220744.1054093-1-jonathantanmy@google.com/
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> Thanks, Junio. That comment looks good. Here is the version with Junio's
> suggested comment included, for everyone's reference.

Heh, I meant it as more of a #leftoverbit, not directly applicable to
this particular patch, but would be a good follow-up topic, as I would
have expected that die/warn/error should lose their own comments where
they call trace2_cmd_error_va() in the same patch that adds the comment
for callers near the function.

Let's use v2 if the difference between v2 and v3 is only the
addition of the comment before trace2_cmd_error_va() function decl
to help the callers.


^ permalink raw reply	[relevance 6%]

* Re: [PATCH 1/7] t1300: test "set all" mode with value_regex
  @ 2020-11-22  3:31  4%         ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-11-22  3:31 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Jeff King, Derrick Stolee via GitGitGadget, git, Jonathan Nieder,
	Emily Shaffer, Johannes Schindelin, Derrick Stolee,
	Derrick Stolee

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>> So that got a bit off-track, but I think:
>> 
>>   - t1300 already is very much like this, so it's not a new thing
>> 
>>   - but I would be happy not to see it go further in that direction,
>>     even if it means inconsistency with the rest of the script
>
> I agree we shouldn't make things worse.

I started looking at early parts of t1300 and here is how far I
managed to get before I can no longer keep staring the existing
tests without vomitting.

I am reasonably happy with the "let's keep the vanilla untouched one
in .git/config-initial, refrain from using [core] and other sections
that MUST be in the initial configuration for testing, and use a
wrapper that reads expected addition to the initial one from the
standard input for validation" approach I came up with, but I am not
happy with the name 'compare_expect'; 'validate_config_result' might
be a better name.

In any case, the reason I am sending this out early is if people
find this approach to clean things up a sensible one.  If we can
find concensus, perhaps I (or somebody else---hint, hint) can find
time to do the #leftoverbits following the approach after the
ds/config-literal-value and ds/maintenance-part-3 topics graduate
to 'master'.



 t/t1300-config.sh | 139 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 71 insertions(+), 68 deletions(-)

diff --git c/t/t1300-config.sh w/t/t1300-config.sh
index df13afaffd..c33520d7fa 100755
--- c/t/t1300-config.sh
+++ w/t/t1300-config.sh
@@ -7,80 +7,84 @@ test_description='Test git config in different settings'
 
 . ./test-lib.sh
 
-test_expect_success 'clear default config' '
-	rm -f .git/config
+test_expect_success 'save away default config' '
+	cp .git/config .git/config-initial
 '
 
-cat > expect << EOF
-[core]
-	penguin = little blue
-EOF
-test_expect_success 'initial' '
-	git config core.penguin "little blue" &&
+compare_expect () {
+	{
+		cat .git/config-initial &&
+		sed -e 's/^[|]//'
+	} >expect &&
 	test_cmp expect .git/config
+}
+
+test_expect_success 'initial' '
+	git config configtest.penguin "little blue" &&
+	compare_expect <<-\EOF
+	[configtest]
+	|	penguin = little blue
+	EOF
 '
 
-cat > expect << EOF
-[core]
-	penguin = little blue
-	Movie = BadPhysics
-EOF
 test_expect_success 'mixed case' '
-	git config Core.Movie BadPhysics &&
-	test_cmp expect .git/config
+	git config ConfigTest.Movie BadPhysics &&
+	compare_expect <<-\EOF
+	[configtest]
+	|	penguin = little blue
+	|	Movie = BadPhysics
+	EOF
 '
 
-cat > expect << EOF
-[core]
-	penguin = little blue
-	Movie = BadPhysics
-[Cores]
-	WhatEver = Second
-EOF
 test_expect_success 'similar section' '
-	git config Cores.WhatEver Second &&
-	test_cmp expect .git/config
+	git config ConfigTests.WhatEver Second &&
+	compare_expect <<-\EOF
+	[configtest]
+	|	penguin = little blue
+	|	Movie = BadPhysics
+	[ConfigTests]
+	|	WhatEver = Second
+	EOF
 '
 
-cat > expect << EOF
-[core]
-	penguin = little blue
-	Movie = BadPhysics
-	UPPERCASE = true
-[Cores]
-	WhatEver = Second
-EOF
 test_expect_success 'uppercase section' '
-	git config CORE.UPPERCASE true &&
-	test_cmp expect .git/config
+	git config CONFIGTEST.UPPERCASE true &&
+	compare_expect <<-\EOF
+	[configtest]
+	|	penguin = little blue
+	|	Movie = BadPhysics
+	|	UPPERCASE = true
+	[ConfigTests]
+	|	WhatEver = Second
+	EOF
 '
 
 test_expect_success 'replace with non-match' '
-	git config core.penguin kingpin !blue
+	git config configtest.penguin kingpin !blue
 '
 
 test_expect_success 'replace with non-match (actually matching)' '
-	git config core.penguin "very blue" !kingpin
+	git config configtest.penguin "very blue" !kingpin
 '
 
-cat > expect << EOF
-[core]
-	penguin = very blue
-	Movie = BadPhysics
-	UPPERCASE = true
-	penguin = kingpin
-[Cores]
-	WhatEver = Second
-EOF
-
-test_expect_success 'non-match result' 'test_cmp expect .git/config'
+test_expect_success 'non-match result' '
+	compare_expect <<-\EOF
+	[configtest]
+	|	penguin = very blue
+	|	Movie = BadPhysics
+	|	UPPERCASE = true
+	|	penguin = kingpin
+	[ConfigTests]
+	|	WhatEver = Second
+	EOF
+'
 
 test_expect_success 'find mixed-case key by canonical name' '
-	test_cmp_config Second cores.whatever
+	test_cmp_config Second configtests.whatever
 '
 
 test_expect_success 'find mixed-case key by non-canonical name' '
-	test_cmp_config Second CoReS.WhAtEvEr
+	test_cmp_config Second CoNfIgTeSts.WhAtEvEr
 '
 
 test_expect_success 'subsections are not canonicalized by git-config' '
@@ -94,28 +98,27 @@ test_expect_success 'subsections are not canonicalized by git-config' '
 	test_cmp_config two section.SubSection.key
 '
 
-cat > .git/config <<\EOF
-[alpha]
-bar = foo
-[beta]
-baz = multiple \
-lines
-foo = bar
-EOF
-
 test_expect_success 'unset with cont. lines' '
-	git config --unset beta.baz
+	{
+		cat .git/config-initial &&
+		cat <<-\EOF
+		[alpha]
+		bar = foo
+		[beta]
+		baz = multiple \
+		lines
+		foo = bar
+		EOF
+	} >.git/config &&
+	git config --unset beta.baz &&
+	compare_expect <<-\EOF
+	[alpha]
+	bar = foo
+	[beta]
+	foo = bar
+	EOF
 '
 
-cat > expect <<\EOF
-[alpha]
-bar = foo
-[beta]
-foo = bar
-EOF
-
-test_expect_success 'unset with cont. lines is correct' 'test_cmp expect .git/config'
-
 cat > .git/config << EOF
 [beta] ; silly comment # another comment
 noIndent= sillyValue ; 'nother silly comment

^ permalink raw reply related	[relevance 4%]

* Re: [PATCH v3] help.c: configurable suggestions
  @ 2020-11-18 17:16  5% ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-11-18 17:16 UTC (permalink / raw)
  To: Drew DeVault; +Cc: git, lanodan

Drew DeVault <sir@cmpwn.com> writes:

> This allows users to disable guessing the commands or options that they
> meant to use.

It is unclear from this description alone why this is needed.  The
seller of this change needs to emphasize how this is better than
setting the variable to "0" (do not autocorrect).  My guess is that
some users do not even need the suggestion of correct spelling when
they made a typo?

>  help.autoCorrect::
> -	Automatically correct and execute mistyped commands after
> -	waiting for the given number of deciseconds (0.1 sec). If more
> -	than one command can be deduced from the entered text, nothing
> -	will be executed.  If the value of this option is negative,
> -	the corrected command will be executed immediately. If the
> -	value is 0 - the command will be just shown but not executed.
> -	This is the default.
> +	If git detects typos and can identify exactly one valid command similar
> +	to the error, git will automatically run the intended command after
> +	waiting a duration of time defined by this configuration value in
> +	deciseconds (0.1 sec).  If this value is 0, the suggested corrections
> +	will be shown, but not executed. If "immediate", the suggested command
> +	is run immediately. If "never", suggestions are not shown at all. The
> +	default value is zero.

Imagine existing users who set the variable to -1 long time ago
wonders what it is doing in their ~/.gitconfig file.  This paragraph
still needs to describe how a negative value is treated.

> diff --git a/help.c b/help.c
> index 919cbb9206..61a7c1ea17 100644
> --- a/help.c
> +++ b/help.c
> @@ -472,12 +472,26 @@ int is_in_cmdlist(struct cmdnames *c, const char *s)
>  static int autocorrect;
>  static struct cmdnames aliases;
>  
> +#define AUTOCORRECT_NEVER (-2)
> +#define AUTOCORRECT_IMMEDIATELY (-1)
> +
>  static int git_unknown_cmd_config(const char *var, const char *value, void *cb)
>  {
>  	const char *p;
>  
> -	if (!strcmp(var, "help.autocorrect"))
> -		autocorrect = git_config_int(var,value);
> +	if (!strcmp(var, "help.autocorrect")) {
> +		if (!value)
> +			return config_error_nonbool(var);
> +		if (!strcmp(value, "never")) {
> +			autocorrect = AUTOCORRECT_NEVER;
> +		} else if (!strcmp(value, "immediate")) {
> +			autocorrect = AUTOCORRECT_IMMEDIATELY;
> +		} else {
> +			int v = git_config_int(var, value);
> +			autocorrect = (v < 0)
> +				? AUTOCORRECT_IMMEDIATELY : v;
> +		}
> +	}
>  	/* Also use aliases for command lookup */
>  	if (skip_prefix(var, "alias.", &p))
>  		add_cmdname(&aliases, p, strlen(p));
> @@ -525,6 +539,11 @@ const char *help_unknown_cmd(const char *cmd)
>  
>  	read_early_config(git_unknown_cmd_config, NULL);
>  
> +	if (autocorrect == AUTOCORRECT_NEVER) {
> +		fprintf_ln(stderr, _("git: '%s' is not a git command. See 'git --help'."), cmd);
> +		exit(1);

OK, so when we encounter an unknown word that follows "git" on the
comand line, we immediately exit without doing the guesswork.  This
is needed because we are skipping not just the guessing and
suggesting, but showing this error message by exiting early here.

Makes sense.

> +	}
> +
>  	load_command_list("git-", &main_cmds, &other_cmds);
>  
>  	add_cmd_list(&main_cmds, &aliases);
> @@ -594,7 +613,7 @@ const char *help_unknown_cmd(const char *cmd)
>  			   _("WARNING: You called a Git command named '%s', "
>  			     "which does not exist."),
>  			   cmd);
> -		if (autocorrect < 0)
> +		if (autocorrect == AUTOCORRECT_IMMEDIATELY)
>  			fprintf_ln(stderr,
>  				   _("Continuing under the assumption that "
>  				     "you meant '%s'."),
> @@ -706,10 +725,16 @@ NORETURN void help_unknown_ref(const char *ref, const char *cmd,
>  			       const char *error)
>  {
>  	int i;
> -	struct string_list suggested_refs = guess_refs(ref);
> +	struct string_list suggested_refs;
>  
>  	fprintf_ln(stderr, _("%s: %s - %s"), cmd, ref, error);
>  
> +	if (autocorrect == AUTOCORRECT_NEVER) {
> +		exit(1);
> +	}

I am not sure how the change in this hunk is justifiable.

The auto correction is about guessing mistyped commands and
optionally going forward to execute it.  Some users (like me) who
set it to 0 currently may want to set it to "never", but I'd imagine
that not all of these users would want to lose the local branch vs
remote-tracking branch correction (certainly not me).  It might be
convenient for some folks if both of these can be squelched with a
single knob, but it must be possible to enable them independently.

This is a tangent, but it seems to me that help_unknown_ref() is way
under-used and lacks usefulness too much---probably these contribute
to each other.

It only gets used by "git merge $name" and only when the missing
$name branch exists as remote-tracking branches of some remotes,
i.e.  "you said $name, which does not exist, but you could have
typed $remote/$name to refer to the remote-tracking branch").

I would imagine that it would help users who type "git checkout
mext" to suggest "you may have typoed 'next'", for example, and "git
checkout -b my-topic next" to offer "you said 'next' that does not
exist; did you mean 'origin/next'?"  It might be a good exercise to
first make inventory of other possible places that would benefit,
which may suggest what other sources guess_refs() should grab its
candidates from (#leftoverbits).

> +
> +	suggested_refs = guess_refs(ref);
> +
>  	if (suggested_refs.nr > 0) {
>  		fprintf_ln(stderr,
>  			   Q_("\nDid you mean this?",

^ permalink raw reply	[relevance 5%]

* Re: Specify resume point with git difftool?
  @ 2020-11-16 19:26  5% ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-11-16 19:26 UTC (permalink / raw)
  To: Ryan Zoeller; +Cc: git@vger.kernel.org

Ryan Zoeller <rtzoeller@rtzoeller.com> writes:

> Is there a way to tell git "resume the difftool process at file n"?
> The difftool prompt counts which file I'm on ("Viewing (10/20):
> 'filename'"), so it seems like I ought to be able to jump ahead by
> specifying a starting index (or range to view).

There is no such support in the code.

diff.c::run_external_diff() maitains and increments the counters
used to show the prompt in the form of a pair of environment
variables, GIT_DIFF_PATH_TOTAL and GIT_DIFF_PATH_COUNTER, and they
are used in git-difftool--helper::launch_merge_tool() when asking
you if you want to run the difftool backend on that 10th file out of
the 20 files.

Right now, you can only say Yes or No to that prompt, but it
shouldn't be too hard to add another choice to the response to the
prompt, saying "skip to 15th file", for example, and record that
"15" in a temporary file in $GIT_DIR/ and exit without running the
difftool backend on the 10th file, so that later invocation of the
git-difftool-helper script can skip without prompting you until it
is the turn for 15th file.

The launch_merge_tool() function needs to be modified in the
following way to do so:

 - At the beginning, see if $GIT_DIR/difftool-skip-to file exists.

   - If exists, read its contents.

   - See if the value is larger than $GIT_DIFF_PATH_COUNTER.  If so,
     just 'return' without doing anything else.

   - Remove that file (we are at the 15th path and done skipping).

 - Update the "Viewing .../ Launch?" prompt and offer another choice
   "Skip to?".

 - Update the if/then/fi statement that processes the answer to the
   prompt (right now, it takes n as a sign to skip the file).  When
   the user says "skip to 15th", create $GIT_DIR/difftool-skip-to
   file and record "15" in it and 'return'.

#leftoverbits.

^ permalink raw reply	[relevance 5%]

* Re: git-log: documenting pathspec usage
  @ 2020-11-16 12:37  6% ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2020-11-16 12:37 UTC (permalink / raw)
  To: Adam Spiers; +Cc: git mailing list


On Mon, Nov 16 2020, Adam Spiers wrote:

> Hi all,
>
> I just noticed that git-log.txt has: 
>
>     SYNOPSIS
>     --------
>     [verse]
>     'git log' [<options>] [<revision range>] [[--] <path>...]
>
> and builtin/log.c has: 
>
>     static const char * const builtin_log_usage[] = {
>             N_("git log [<options>] [<revision-range>] [[--] <path>...]"),
>
> IIUC, the references to <path> should actually be <pathspec> instead,
> as seen with other pathspec-supporting commands such as git add/rm
> whose man pages are extra helpful in explicitly calling out how
> pathspecs can be used, e.g.:
>
>     OPTIONS
>     -------
>     <pathspec>...::
>             Files to add content from.  Fileglobs (e.g. `*.c`) can
>             be given to add all matching files.  Also a
>             leading directory name (e.g. `dir` to add `dir/file1`
>             and `dir/file2`) can be given to update the index to
>             match the current state of the directory as a whole (e.g.
>             specifying `dir` will record not just a file `dir/file1`
>             modified in the working tree, a file `dir/file2` added to
>             the working tree, but also a file `dir/file3` removed from
>             the working tree). Note that older versions of Git used
>             to ignore removed files; use `--no-all` option if you want
>             to add modified or new files but ignore removed ones.
>     +
>     For more details about the <pathspec> syntax, see the 'pathspec' entry
>     in linkgit:gitglossary[7].
>
> Would it be fair to say the git-log usage syntax and man page should
> be updated to match?  If so perhaps I can volunteer for that.

It seems like a good idea to make these consistent, if you're feeling
more ambitious than just git-log's manpage then:
    
    $ git grep '<pathspec>' -- Documentation/git-*.txt|wc -l
    54
    $ git grep '<path>' -- Documentation/git-*.txt|wc -l
    161

Most/all of these should probably be changed to one or the other.

I've also long wanted (but haven't come up with a patch for) that part
of gitglossary to be ripped out into its own manual page,
e.g. "gitpathspec(5)". And if possible for "PATTERN FORMAT" in
"gitignore" to be unified with that/other docs that describe how our
wildmatch.c works.

There's also the "Conditional includes" section in git-config(1) that
repeats some of that, and probably other stuff I'm forgetting
#leftoverbits.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v4] diff: make diff_free_filespec_data accept NULL
  @ 2020-11-11 16:28  6%             ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2020-11-11 16:28 UTC (permalink / raw)
  To: Jinoh Kang; +Cc: Junio C Hamano, git

Hi Jinoh,

On Wed, 11 Nov 2020, Jinoh Kang wrote:

> On 11/10/20 3:38 PM, Johannes Schindelin wrote:
> >
> >> +	git checkout -B conflict-a &&
> >> +	git checkout -B conflict-b &&
> >> +	git checkout conflict-a &&
> >> +	echo conflict-a >>file &&
> >> +	git add file &&
> >> +	git commit -m conflict-a &&
> >> +	git checkout conflict-b &&
> >> +	echo conflict-b >>file &&
> >> +	git add file &&
> >> +	git commit -m conflict-b &&
> >> +	git checkout master &&
> >> +	git merge conflict-a &&
> >> +	test_must_fail git merge conflict-b &&
> >> +	: >expect &&
> >> +	git difftool --cached --no-prompt >actual &&
> >> +	test_cmp expect actual
> >
> > Shouldn't this use the `test_must_be_empty` function instead?
> >
> > How about writing the test case this way:
> >
> > test_expect_success 'difftool --cached with unmerged files' '
> > 	test_when_finished git reset --hard &&
> >
> > 	test_commit conflicting &&
> > 	test_commit conflict-a a conflicting.t &&
> > 	git reset --hard conflicting &&
> > 	test_commit conflict-b b conflicting.t &&
> > 	test_must_fail git merge conflict-a &&
> >
> > 	git difftool --cached --no-prompt >out &&
> > 	test_must_be_empty out
> > '
>
> The original test code was copied from the "difftool --dir-diff with
> unmerged files" case above.
>
> It might be worth cleaning it up too, but let's leave it for another
> time.

Indeed. #leftoverbits

Thanks,
Dscho

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 1/2] update-ref: Allow creation of multiple transactions
  @ 2020-11-06 19:30  6%         ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-11-06 19:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git

Jeff King <peff@peff.net> writes:

> On Thu, Nov 05, 2020 at 01:34:20PM -0800, Junio C Hamano wrote:
>
>> > The tests all look quite reasonable to me. Touching .git/refs like this
>> > is a bit gross (and something we may have to deal with if we introduce
>> > reftables, etc). But it's pretty pervasive in this file, so matching
>> > the existing style is the best option for now.
>>  ...
> Yeah, I agree completely that we could be using rev-parse in this
> instance. But it's definitely not alone there:
> ...

Yup, this morning I was reviewing what we said in the previous day's
exchanges and noticed that you weren't advocating but merely saying
it is not making things worse, and I agree with the assessment.

Perhaps two #leftoverbits are to 

 (1) clean up this test to create refs using "update-ref", and
     verify refs using "show-ref --verify".

 (2) If (1) had to leave some direct filesystem access due to the
     built-in safety that cannot be circumvented, decide which is
     more appropirate between a test-update-ref test helper only to
     be used in tests, or a "--force" option usable to corrupt
     repositories with "update-ref", implement it, and use it to
     finish cleaning up tests.

Thanks.






^ permalink raw reply	[relevance 6%]

* Re: [PATCH v9 1/3] push: add reflog check for "--force-if-includes"
  @ 2020-10-02 16:22  7%         ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-10-02 16:22 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Srinidhi Kaushik, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Having said that, the change I suggested (to use `get_reachable_subset()`
> instead of repeated `in_merge_bases_many()`) is _still_ the right thing to
> do: we are not actually interested in the merge bases at all, but in
> reachability, and in the future there might be more efficient ways to
> determine that than painting down all the way to merge bases.

I agree with you that the age-old implementation has an obvious room
for optimization.  I think I already pointed out a #leftoverbit that
we can invent a version of paint_down_to_common() that can
short-circuit and return immediately after one side (the "commit"
side) gets painted, so that in_merge_bases_many() can stop
immediately after finding out that the answer is "true".

The function is *not* about computing the merge base across the
commits on the "reference" side but finding out if "commit" is
reachable from any in the "reference" side, so (1) it has a wrong
name and more importantly (2) it wants to do something quite similar
to get_reachable_subset(), but it is much less ambitious.

get_reachable_subset() is capable of doing a lot more.  Unlike the
older in_merge_bases_many() that allowed only one commit on the
candidate for an ancestor side, it can throw a set and ask "which
ones among these are reachable from the other set".

So from the "semantics" point of view, get_reachable_subset() is
overkill and less suitable than in_merge_bases_many() for this
particular application.  We know we have only one candidate, and we
want to ask "is this reachable, or not?" a single bit question.  In
any case, they should yield the right answer from correctness point
of view ;-)

Having said that.

I do not think in the longer term we should keep both.  Clearly the
get_reachable_subset() function can handle more general cases, so it
would make a lot of sense to make in_merge_bases_many() into a thin
wrapper that feeds just a single commit array on one side to be
filtered while feeding the "reference" commits to the other side, as
long as we can demonstrate that the result is just as correct as,
and it is not slower than, the current implementation.  That may be
a bit larger than a typical #leftoverbit but would be a good clean-up
project.

^ permalink raw reply	[relevance 7%]

* Re: [PATCH v7 1/3] push: add reflog check for "--force-if-includes"
  @ 2020-09-27 12:27  6%       ` Srinidhi Kaushik
  0 siblings, 0 replies; 200+ results
From: Srinidhi Kaushik @ 2020-09-27 12:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

On 09/26/2020 16:42, Junio C Hamano wrote:
> Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> writes:
> 
> > @@ -2252,11 +2263,11 @@ int is_empty_cas(const struct push_cas_option *cas)
> >  /*
> >   * Look at remote.fetch refspec and see if we have a remote
> >   * tracking branch for the refname there.  Fill its current
> > - * value in sha1[].
> > + * value in sha1[], and as a string.
> 
> I think the array being referred to was renamed to oid[] sometime
> ago.  "and as a string" makes it sound as if sha1[] gets the value
> as 40-hex object name text, but that is not what is being done.
> 
>     Fill the name of the remote-tracking branch in *dst_refname,
>     and the name of the commit object at tis tip in oid[].
> 
> perhaps?

Of course, that sounds better; will update.
 
> > + * The struct "reflog_commit_list" and related helper functions
> > + * for list manipulation are used for collecting commits into a
> > + * list during reflog traversals in "if_exists_or_grab_until()".
> 
> Has the name of that function changed since this comment was
> written?

Heh, it sure has. It should have been "check_and_collect_until()".
 
> > + */
> > +struct reflog_commit_list {
> > +	struct commit **items;
> 
> Name an array in singular when its primary use is to work on an
> element at a time---that will let you say item[4] to call the 4-th
> item, instead of items[4] that smells awkward.
> 
> An array that is used mostly to pass around a collection as a whole
> is easier to think about when given a plural name, though.

Yup.

> > +
> > +/* Get the timestamp of the latest entry. */
> > +static int peek_reflog(struct object_id *o_oid, struct object_id *n_oid,
> > +		       const char *ident, timestamp_t timestamp,
> > +		       int tz, const char *message, void *cb_data)
> > +{
> > +	timestamp_t *ts = cb_data;
> > +	*ts = timestamp;
> > +	return 1;
> > +}
> 
> The idea is to use a callback that immediately says "no more" to
> grab the data from the first item in the iteration.  It feels
> somewhat awkward but because there is no "give us the Nth entry" API
> function, it is the cleanest way we can do this.

I considered using "grab_1st_entry_timestamp()" briefy, but
"peek_reflog" is shorter compared to that.

> > +	/* Look-up the commit and append it to the list. */
> > +	if ((commit = lookup_commit_reference(the_repository, n_oid)))
> > +		add_commit(cb->local_commits, commit);
> 
> This is merely a minor naming thing, but if you rename add_commit()
> to append_commit(), you probably do not even need the comment before
> this statement.

Will do.

> >  	return 0;
> >  }
> >  
> > +#define MERGE_BASES_BATCH_SIZE 8
> 
> Hmph.  Do we still need batching?
> 
> > +/*
> > + * Iterate through the reflog of the local ref to check if there is an entry
> > + * for the given remote-tracking ref; runs until the timestamp of an entry is
> > + * older than latest timestamp of remote-tracking ref's reflog. Any commits
> > + * are that seen along the way are collected into a list to check if the
> > + * remote-tracking ref is reachable from any of them.
> > + */
> > +static int is_reachable_in_reflog(const char *local, const struct ref *remote)
> > +{
> > +	timestamp_t date;
> > +	struct commit *commit;
> > +	struct commit **chunk;
> > +	struct check_and_collect_until_cb_data cb;
> > +	struct reflog_commit_list list = { NULL, 0, 0 };
> > +	size_t count = 0, batch_size = 0;
> > +	int ret = 0;
> > +
> > +	commit = lookup_commit_reference(the_repository, &remote->old_oid);
> > +	if (!commit)
> > +		goto cleanup_return;
> > +
> > +	/*
> > +	 * Get the timestamp from the latest entry
> > +	 * of the remote-tracking ref's reflog.
> > +	 */
> > +	for_each_reflog_ent_reverse(remote->tracking_ref, peek_reflog, &date);
> > +
> > +	cb.remote_commit = commit;
> > +	cb.local_commits = &list;
> > +	cb.remote_reflog_timestamp = date;
> > +	ret = for_each_reflog_ent_reverse(local, check_and_collect_until, &cb);
> > +
> > +	/* We found an entry in the reflog. */
> > +	if (ret > 0)
> > +		goto cleanup_return;
> 
> Good.  So '1' from the callback is "we found one, no need to look
> further and no need to do merge-base", and '-1' from the callback is
> "we looked at all entries that are young enough to matter and we
> didn't find exact match".  Makes sense.
> 
> > +	/*
> > +	 * Check if the remote commit is reachable from any
> > +	 * of the commits in the collected list, in batches.
> > +	 */
> 
> I do not know if batching would help (have you measured it?), but if
> we were to batch, it is more common to arrange the loop like this:
> 
> 	for (chunk = list.items;
>              chunk < list.items + list.nr;
> 	     chunk += size) {
>              	size = list.items + list.nr - chunk;
>                 if (MERGE_BASES_BATCH_SIZE < size)
> 			size = MERGE_BASES_BATCH_SIZE;
> 		... use chunk[0..size] ...
> 		chunk += size;
> 	}
> 
> That is, assume that we can grab everything during this round, and
> if that bites off too many, clamp it to the maximum value.  If you
> are not comfortable with pointer arithmetic, it is also fine to use
> an auxiliary variable 'count', but ...

Actually, the "for" version looks much cleaner and avoids the use
of "count". However, I think ...

>               chunk += size;

... should be skipped because "for ( ... ; chunk += size)" is already
doing it for us; otherwise we would offset 16 entries instead of 8
per iteration, no?

> > +	chunk = list.items;
> > +	while (count < list.nr) {
> > +		batch_size = MERGE_BASES_BATCH_SIZE;
> > +
> > +		/* For any leftover entries. */
> > +		if ((count + MERGE_BASES_BATCH_SIZE) > list.nr)
> > +			batch_size = list.nr - count;
> > +
> > +		if ((ret = in_merge_bases_many(commit, batch_size, chunk)))
> > +			break;
> > +
> > +		chunk += batch_size;
> > +		count += MERGE_BASES_BATCH_SIZE;
> 
> ... you are risking chunk and count to go out of sync here.
> 
> It does not matter within this loop (count will point beyond the end
> of list.item[] while chunk will never go past the array), but future
> developers can be confused into thinking that they can use chunk and
> count interchangeably after this loop exits, and at that point the
> discrepancy may start to matter.

I agree, it should have been "count += batch_size;". But, I think the
"for" version looks cleaner; I will change it to that the next set.
 
> But all of the above matters if it is a good idea to batch.  Does it
> make a difference?
> 
>     ... goes and looks at in_merge_bases_many() ...
> 
> Ah, it probably would.  
> 
> I thought in_merge_bases_many() would stop early as soon as any of
> the traversal from chunk[] reaches commit, but it uses a rather more
> generic paint_down_to_common() so extra items in chunk[] that are
> topologically older than commit would result in additional traversal
> from commit down to them, which would not contribute much to the end
> result.  It may be a good #leftovebit idea for future improvement to
> teach in_merge_bases_many() to use a custom replacement for
> paint_down_to_common() that stops early as soon as we find the
> answer is true.

If we consider the amount of time it takes when "in_merge_bases_many()"
has to be run for all the entries, there isn't much of a difference in
performance between batching and non-batching -- they took about the
same. But, as you said if the remote is reachable in the first few
entries, batching would help with returning early if a descendant is
found.

Making the function stop early when a descendent is found
does sound like a good #leftoverbits idea. :)

Thanks again, for a detailed review.
-- 
Srinidhi Kaushik

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v2] bisect: don't use invalid oid as rev when starting
  @ 2020-09-24 20:53  4%       ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-09-24 20:53 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Christian Couder, Miriam Rubio, Johannes Schindelin

Junio C Hamano <gitster@pobox.com> writes:

> I didn't audit the following hits of get_oid_committish().  There
> might be a similar mistake as you made in v2, or there may not be.
>
> I am undecided if I should just move on, marking them as
> left-over-bits ;-)
>
>
>
> builtin/blame.c:		if (get_oid_committish(i->string, &oid))

This one throws the object name of revs to be skipped to a list, and
because revision traversal works on commit objects, if the user
gives an annotated tag and expects the underlying commit is ignored,
it may appear as a bug.  But in the same function a list of revs to
be ignored is read from file using oidset_parse_file() that in turn
uses parse_oid_hex() without even validating if the named object
exists, I would say it is OK---after all, if it hurts, the user can
refrain from doing so ;-)

But it would be nice to fix all issues around this caller.  After
collecting the object names to an oidset, somebody should go through
the list, peel them down to commit and make sure they exist, or
something like that.  A possible #leftoverbits.

> builtin/checkout.c:		repo_get_oid_committish(the_repository, branch->name, &branch->oid);

This one is probably OK as branch refs are supposed to point at
commits and not annotated tags that point at commits.

> builtin/rev-parse.c:	if (!get_oid_committish(start, &start_oid) && !get_oid_committish(end, &end_oid)) {

This one handles "rev-parse v1.0..v2.0" which gives "^v1.0 v2.0" but
using the (unpeeled) object name.  It is fine and should not be
changed to auto-peel.

> builtin/rev-parse.c:	if (get_oid_committish(arg, &oid) ||

This is immediately followed by lookup_commit_reference() to peel as
needed.  OK.

> commit.c:	if (get_oid_committish(name, &oid))

This is part of lookup_commit_reference_by_name(), which peels and
parses it down to an in-core commit object instance.  OK.

> revision.c:	if (get_oid_committish(arg, &oid))

This is followed by a loop to peel it as needed.  OK.

> sequencer.c:		    !get_oid_committish(buf.buf, &oid))

This feeds the contents of rebase-merge/stopped-sha file.  I presume
that the contents of this file (which is not directly shown to the
end users) is always a commit object name, so this is OK.  Use of
_committish() may probably be overkill for this internal bookkeeping
file.  If we stop make_patch() from shortening then probably we can
change it to parse_oid_hex() to expect and read the full object
name.

> sha1-name.c:		st = repo_get_oid_committish(r, sb.buf, &oid_tmp);
> sha1-name.c:	if (repo_get_oid_committish(r, dots[3] ? (dots + 3) : "HEAD", &oid_tmp))

Since I know those who wrote this old part of the codebase knew what
they were doing, I do not have to comment, but these are fine.  They
are all peeled to commit as appropriate by calling
lookup_commit_reference_gently() before feeding the result to
get_merge_bases().

> sha1-name.c:int repo_get_oid_committish(struct repository *r,

This is the implementation ;-)

> t/helper/test-reach.c:		if (get_oid_committish(buf.buf + 2, &oid))

This peels afterwards, so it is OK.

The true reason I went through all the callers was to see if _all_
the callers want to either ignore the resulting object name (i.e.
they want to make sure that the arg given can be peeled down to an
appropriate type) or wants the object name to be peeled to the type.
If that were the case (and from the above, it clearly isn't), we
could change the semantics of get_oid_*ish() so that the resulting
oid is already peeled down to the wanted type and that could simplify
the current callers that are peeling the result themselves.

But because some callers do not want to see the result peeled, we
shouldn't touch what get_oid_*ish() functions do.

^ permalink raw reply	[relevance 4%]

* Re: Git in Outreachy?
  2020-09-21  4:22  0%     ` Christian Couder
  2020-09-21  7:59  0%       ` Kaartic Sivaraam
@ 2020-09-21 20:56  0%       ` Shourya Shukla
  1 sibling, 0 replies; 200+ results
From: Shourya Shukla @ 2020-09-21 20:56 UTC (permalink / raw)
  To: Christian Couder
  Cc: Kaartic Sivaraam, git, Christian Couder, Johannes Schindelin,
	Jeff King

On Mon, Sep 21, 2020 at 9:52 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Sun, Sep 20, 2020 at 6:31 PM Kaartic Sivaraam
> <kaartic.sivaraam@gmail.com> wrote:
> >
> > On 07-09-2020 00:26, Kaartic Sivaraam wrote:>
> > >> I would appreciate help to find project ideas though. Are there still
> > >> scripts that are worth converting to C (excluding git-bisect.sh and
> > >> git-submodule.sh that are still worked on)?
> > >
> > > I think Dscho's e-mail linked below gives a nice overview of the various
> > > scripts and their likely status as of Jan2020:
> > >
> > > https://lore.kernel.org/git/nycvar.QRO.7.76.6.2001301154170.46@tvgsbejvaqbjf.bet/
> > >
> > > I'm guessing only the status of submodule has changed as it's being
> > > worked on now.
> >
> > After giving it a second thought, I believe I should take back my word
> > about the git-submodule status changing. There still seems to be some
> > work left for it.
>
> Yeah, there is some work left, but Shourya said he was interested in
> continuing to work on it.

Yeah, I am a bit busy right now catching up with the classes and assignments
in my college. I will try to deliver a follow-up v2 to 'submodule add'
in a couple
of weeks.

> > To be clear,
> >
> > - there's 'add', whose conversion is currently stalled [1]
>
> Yeah, but it hasn't been stalled for a long time, and sometimes it
> takes time after the GSoC or Outreachy period for former GSoC students
> or Outreachy interns to resume their work.
>
> > - there's 'update', which still has a decent amount of code [2]
> >   in the shell script.
> > - we still have to complete the conversion completely converting
> >   moving the rest of the bits from `git-submodule.sh` to C which is
> >   mostly just the option parsing. This might be more trickier than
> >   it sounds as we would've to ensure the we don't accidentally
> >   change behaviour of the options when moving the option parsing to C.
> >
> >   There's also an e-mail from Junio which is relevant [3]
> >
> > I'm not sure if this would be enough for a complete project on it's own.
> > I'm also not sure whether 'add' would get converted in the meantime. In
> > any case, I believe we could add a few other small refactoring projects
> > to make up for the rest of the period. For instance,
> >
> > - Replace more instances of `the_index` and `the_repository`
> >   (https://github.com/gitgitgadget/git/issues/379)
> >
> > - Turn the `fetch_if_missing` global into a field of `struct repository`
> >   (https://github.com/gitgitgadget/git/issues/251)
> >
> > - Possibly others from #leftoverbits
> >
> > Thoughts?
>
> Yeah, without 'add' we would have enough related issues for another
> project. I would prefer though that we wait for at least 3 months
> without any progress before suggesting them as a project. That's what
> we usually do and I think it's the right thing to do.

If we are talking about submodules, then one project can be to improve
the parsing of 'submodule--helper.c' and try to eliminate the shell scripting
for this purpose. Another thing which can be done is to clean up the helper
sub-commands which were created to aid the conversion (iff they are of
little to no use now). I do not have an exact idea if the "improve parsing" and
conversion of a couple of subcommands* will be a project big enough
for Outreachy
or not though.

*'submodule update' is a bit messed up right now and will need a solid
conversion
to C since some of its fragments are there in the C code while some
aren't. Also, the shell
code of this subcommand is still there meaning that the fragments do
not play any direct role
in the functioning of the subcommand. I can pass on the conversion of
'update' to Outreachy
if the addition of this will amount to a complete project for a
potential Outreachy intern.

Regards,
Shourya Shukla

^ permalink raw reply	[relevance 0%]

* Re: Git in Outreachy?
  2020-09-21  4:22  0%     ` Christian Couder
@ 2020-09-21  7:59  0%       ` Kaartic Sivaraam
  2020-09-21 20:56  0%       ` Shourya Shukla
  1 sibling, 0 replies; 200+ results
From: Kaartic Sivaraam @ 2020-09-21  7:59 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Christian Couder, Johannes Schindelin, Jeff King,
	Shourya Shukla

On 21-09-2020 09:52, Christian Couder wrote:
> On Sun, Sep 20, 2020 at 6:31 PM Kaartic Sivaraam
> <kaartic.sivaraam@gmail.com> wrote:
>>
>> On 07-09-2020 00:26, Kaartic Sivaraam wrote:>
>>>
>>> I'm guessing only the status of submodule has changed as it's being
>>> worked on now.
>>
>> After giving it a second thought, I believe I should take back my word
>> about the git-submodule status changing. There still seems to be some
>> work left for it.
> 
> Yeah, there is some work left, but Shourya said he was interested in
> continuing to work on it.
> 

Yeah, he's most welcome to resume his work :)

>> To be clear,
>>
>> - there's 'add', whose conversion is currently stalled [1]
> 
> Yeah, but it hasn't been stalled for a long time, and sometimes it
> takes time after the GSoC or Outreachy period for former GSoC students
> or Outreachy interns to resume their work.
>

Ok. Got it.

>> - there's 'update', which still has a decent amount of code [2]
>>   in the shell script.
>> - we still have to complete the conversion completely converting
>>   moving the rest of the bits from `git-submodule.sh` to C which is
>>   mostly just the option parsing. This might be more trickier than
>>   it sounds as we would've to ensure the we don't accidentally
>>   change behaviour of the options when moving the option parsing to C.
>>
>>   There's also an e-mail from Junio which is relevant [3]
>>
>> I'm not sure if this would be enough for a complete project on it's own.
>> I'm also not sure whether 'add' would get converted in the meantime. In
>> any case, I believe we could add a few other small refactoring projects
>> to make up for the rest of the period. For instance,
>>
>> - Replace more instances of `the_index` and `the_repository`
>>   (https://github.com/gitgitgadget/git/issues/379)
>>
>> - Turn the `fetch_if_missing` global into a field of `struct repository`
>>   (https://github.com/gitgitgadget/git/issues/251)
>>
>> - Possibly others from #leftoverbits
>>
>> Thoughts?
> 
> Yeah, without 'add' we would have enough related issues for another
> project. I would prefer though that we wait for at least 3 months
> without any progress before suggesting them as a project. That's what
> we usually do and I think it's the right thing to do.
> 

Ok. That makes sense. I also missed the fact that Shourya had also
expressed interested in converting the 'update' part of submodule in his
final GSoC report [A]. So, I agree that it's too early to propose the
rest of the submodule work for Outreachy.

Thanks.


References
===
[A]: https://shouryashukla.blogspot.com/2020/08/the-final-report.html

--
Sivaraam

^ permalink raw reply	[relevance 0%]

* Re: Git in Outreachy?
  2020-09-20 16:31  5%   ` Kaartic Sivaraam
@ 2020-09-21  4:22  0%     ` Christian Couder
  2020-09-21  7:59  0%       ` Kaartic Sivaraam
  2020-09-21 20:56  0%       ` Shourya Shukla
  0 siblings, 2 replies; 200+ results
From: Christian Couder @ 2020-09-21  4:22 UTC (permalink / raw)
  To: Kaartic Sivaraam
  Cc: git, Christian Couder, Johannes Schindelin, Jeff King,
	Shourya Shukla

On Sun, Sep 20, 2020 at 6:31 PM Kaartic Sivaraam
<kaartic.sivaraam@gmail.com> wrote:
>
> On 07-09-2020 00:26, Kaartic Sivaraam wrote:>
> >> I would appreciate help to find project ideas though. Are there still
> >> scripts that are worth converting to C (excluding git-bisect.sh and
> >> git-submodule.sh that are still worked on)?
> >
> > I think Dscho's e-mail linked below gives a nice overview of the various
> > scripts and their likely status as of Jan2020:
> >
> > https://lore.kernel.org/git/nycvar.QRO.7.76.6.2001301154170.46@tvgsbejvaqbjf.bet/
> >
> > I'm guessing only the status of submodule has changed as it's being
> > worked on now.
>
> After giving it a second thought, I believe I should take back my word
> about the git-submodule status changing. There still seems to be some
> work left for it.

Yeah, there is some work left, but Shourya said he was interested in
continuing to work on it.

> To be clear,
>
> - there's 'add', whose conversion is currently stalled [1]

Yeah, but it hasn't been stalled for a long time, and sometimes it
takes time after the GSoC or Outreachy period for former GSoC students
or Outreachy interns to resume their work.

> - there's 'update', which still has a decent amount of code [2]
>   in the shell script.
> - we still have to complete the conversion completely converting
>   moving the rest of the bits from `git-submodule.sh` to C which is
>   mostly just the option parsing. This might be more trickier than
>   it sounds as we would've to ensure the we don't accidentally
>   change behaviour of the options when moving the option parsing to C.
>
>   There's also an e-mail from Junio which is relevant [3]
>
> I'm not sure if this would be enough for a complete project on it's own.
> I'm also not sure whether 'add' would get converted in the meantime. In
> any case, I believe we could add a few other small refactoring projects
> to make up for the rest of the period. For instance,
>
> - Replace more instances of `the_index` and `the_repository`
>   (https://github.com/gitgitgadget/git/issues/379)
>
> - Turn the `fetch_if_missing` global into a field of `struct repository`
>   (https://github.com/gitgitgadget/git/issues/251)
>
> - Possibly others from #leftoverbits
>
> Thoughts?

Yeah, without 'add' we would have enough related issues for another
project. I would prefer though that we wait for at least 3 months
without any progress before suggesting them as a project. That's what
we usually do and I think it's the right thing to do.

^ permalink raw reply	[relevance 0%]

* Re: Git in Outreachy?
  @ 2020-09-20 16:31  5%   ` Kaartic Sivaraam
  2020-09-21  4:22  0%     ` Christian Couder
  0 siblings, 1 reply; 200+ results
From: Kaartic Sivaraam @ 2020-09-20 16:31 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Johannes Schindelin, Jeff King

On 07-09-2020 00:26, Kaartic Sivaraam wrote:>
>> I would appreciate help to find project ideas though. Are there still
>> scripts that are worth converting to C (excluding git-bisect.sh and
>> git-submodule.sh that are still worked on)? 
> 
> I think Dscho's e-mail linked below gives a nice overview of the various
> scripts and their likely status as of Jan2020:
> 
> https://lore.kernel.org/git/nycvar.QRO.7.76.6.2001301154170.46@tvgsbejvaqbjf.bet/
> 
> I'm guessing only the status of submodule has changed as it's being
> worked on now.
> 

After giving it a second thought, I believe I should take back my word
about the git-submodule status changing. There still seems to be some
work left for it. To be clear,

- there's 'add', whose conversion is currently stalled [1]
- there's 'update', which still has a decent amount of code [2]
  in the shell script.
- we still have to complete the conversion completely converting
  moving the rest of the bits from `git-submodule.sh` to C which is
  mostly just the option parsing. This might be more trickier than
  it sounds as we would've to ensure the we don't accidentally
  change behaviour of the options when moving the option parsing to C.

  There's also an e-mail from Junio which is relevant [3]

I'm not sure if this would be enough for a complete project on it's own.
I'm also not sure whether 'add' would get converted in the meantime. In
any case, I believe we could add a few other small refactoring projects
to make up for the rest of the period. For instance,

- Replace more instances of `the_index` and `the_repository`
  (https://github.com/gitgitgadget/git/issues/379)

- Turn the `fetch_if_missing` global into a field of `struct repository`
  (https://github.com/gitgitgadget/git/issues/251)

- Possibly others from #leftoverbits

Thoughts?


References
===
[1]:
http://public-inbox.org/git/20200824090359.403944-1-shouryashukla.oo@gmail.com/
[2]: https://github.com/git/git/blob/v2.28.0/git-submodule.sh#L554-L713
[3]: https://lore.kernel.org/git/xmqqtuzrrk8r.fsf@gitster.c.googlers.com/

-- 
Sivaraam

^ permalink raw reply	[relevance 5%]

* Re: [PATCH v2] gitweb: map names/emails with mailmap
  @ 2020-09-07 22:10  6%             ` Emma Brooks
  0 siblings, 0 replies; 200+ results
From: Emma Brooks @ 2020-09-07 22:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Jakub Narębski

On 2020-09-04 20:26:11-0700, Junio C Hamano wrote:
> Emma Brooks <me@pluvano.com> writes:
> 
> > However, I couldn't find a way to get "rev-list --format" to separate
> > commits with NULs.
> 
> A workaround would be "git rev-list --format='%s%x00'", iow,
> manually insert NUL
> 
> I would have expected "-z" to replace LF with NUL, but that does not
> appear to work X-<.

Thanks. I'll need to ignore the extra LF when parsing then. Later, "-z"
support could be added/fixed in rev-list (#leftoverbits?) and gitweb
could be updated to use that instead.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH] push: make `--force-with-lease[=<ref>]` safer
  2020-09-04 18:51  2% [PATCH] push: make `--force-with-lease[=<ref>]` safer Srinidhi Kaushik
  2020-09-07 15:23  0% ` Phillip Wood
@ 2020-09-07 19:45  0% ` Johannes Schindelin
  1 sibling, 0 replies; 200+ results
From: Johannes Schindelin @ 2020-09-07 19:45 UTC (permalink / raw)
  To: Srinidhi Kaushik; +Cc: git

Hi Srinidhi,

On Sat, 5 Sep 2020, Srinidhi Kaushik wrote:

> The `--force-with-lease` option in `git-push`, makes sure that
> refs on remote aren't clobbered by unexpected changes when the
> "<expect>" ref value is explicitly specified.
>
> For other cases (i.e., `--force-with-lease[=<ref>]`) where the tip
> of the remote tracking branch is populated as the "<expect>" value,
> there is a possibility of allowing unwanted overwrites on the remote
> side when some tools that implicitly fetch remote-tracking refs in
> the background are used with the repository. If a remote-tracking ref
> was updated when a rewrite is happening locally and if those changes
> are pushed by omitting the "<expect>" value in `--force-with-lease`,
> any new changes from the updated tip will be lost locally and will
> be overwritten on the remote.
>
> This problem can be addressed by checking the `reflog` of the branch
> that is being pushed and verify if there in a entry with the remote
> tracking ref. By running this check, we can ensure that refs being
> are fetched in the background while a "lease" is being held are not
> overlooked before a push, and any new changes can be acknowledged
> and (if necessary) integrated locally.
>
> The new check will cause `git-push` to fail if it detects the presence
> of any updated refs that we do not have locally and reject the push
> stating `implicit fetch` as the reason.
>
> An experimental configuration setting: `push.rejectImplicitFetch`
> which defaults to `true` (when `features.experimental` is enabled)
> has been added, to allow `git-push` to reject a push if the check
> fails.
>
> Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
> ---
>
> Hello,
> I picked this up from #leftoverbits over at GitHub [1] from the open
> issues list. This idea [2], for a safer `--force-with-lease` was
> originally proposed by Johannes on the mailing list.
>
> [1]: https://github.com/gitgitgadget/git/issues/640
> [2]: https://lore.kernel.org/git/nycvar.QRO.7.76.6.1808272306271.73@tvgsbejvaqbjf.bet/

First of all: thank you for picking this up! The contribution is
pleasantly well-written, thank you also for that.

Now, to be honest, I thought that this mode would merit a new option
rather than piggy-backing on top of `--force-with-lease`. The reason is
that `--force-with-lease` targets a slightly different use case than mine:
it makes sure that we do not overwrite remote refs unless we already had a
chance to inspect them.

In contrast, my workflow uses `git pull --rebase` in two or more separate
worktrees, e.g. when developing a patch on two different Operating
Systems, I frequently forget to pull (to my public repository) on one
side, and I want to avoid force-pushing in that case, even if VS Code (or
I, via `git remote update`) fetched the ref (but failing to rebase the
local branch on top of it).

However, in other scenarios I very much do _not_ want to incorporate the
remote ref. For example, I often fetch
https://github.com/git-for-windows/git.wiki.git to check for the
occasional bogus change. Whenever I see such a bogus change, and it is at
the tip of the branch, I want to force-push _without_ incorporating the
bogus change into the local branch, yet I _do_ want to use
`--force-with-lease` because an independent change could have come in via
the Wiki in the meantime.

So I think that the original `--force-with-lease` and the mode you
implemented target subtly different use cases that are both valid, and
therefore I would like to request a separate option for the latter.

However, I have to admit that I could not think of a good name for that
option. "Implicit fetch" seems a bit too vague here, because the local
branch was not fetched, and certainly not implicitly, yet the logic
revolves around the local branch having been rebased to the
remote-tracking ref at some stage.

Even if we went with the config option to modify `--force-with-lease`'s
behavior, I would recommend separating out the `feature.experimental`
changes into their own patch, so that they can be reverted easily in case
the experimental feature is made the default.

A couple more comments:

> @@ -1471,16 +1489,21 @@ void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
>  		 * If the remote ref has moved and is now different
>  		 * from what we expect, reject any push.
>  		 *
> -		 * It also is an error if the user told us to check
> -		 * with the remote-tracking branch to find the value
> -		 * to expect, but we did not have such a tracking
> -		 * branch.
> +		 * It also is an error if the user told us to check with the
> +		 * remote-tracking branch to find the value to expect, but we
> +		 * did not have such a tracking branch, or we have one that
> +		 * has new changes.

If I were you, I would try to keep the original formatting, so that it
becomes more obvious that the part ", or we have [...]" was appended.

>  		if (ref->expect_old_sha1) {
>  			if (!oideq(&ref->old_oid, &ref->old_oid_expect))
>  				reject_reason = REF_STATUS_REJECT_STALE;
> +			else if (reject_implicit_fetch() && ref->implicit_fetch)
> +				reject_reason = REF_STATUS_REJECT_IMPLICIT_FETCH;
>  			else
> -				/* If the ref isn't stale then force the update. */
> +				/*
> +				 * If the ref isn't stale, or there was no

Should this "or" not be an "and" instead?

> +				 * implicit fetch, force the update.
> +				 */
>  				force_ref_update = 1;
>  		}
> [...]
>  static void apply_cas(struct push_cas_option *cas,
>  		      struct remote *remote,
>  		      struct ref *ref)
>  {
> -	int i;
> +	int i, do_reflog_check = 0;
> +	struct object_id oid;
> +	struct ref *local_ref = get_local_ref(ref->name);
>
>  	/* Find an explicit --<option>=<name>[:<value>] entry */
>  	for (i = 0; i < cas->nr; i++) {
>  		struct push_cas *entry = &cas->entry[i];
>  		if (!refname_match(entry->refname, ref->name))
>  			continue;
> +
>  		ref->expect_old_sha1 = 1;
>  		if (!entry->use_tracking)
>  			oidcpy(&ref->old_oid_expect, &entry->expect);
>  		else if (remote_tracking(remote, ref->name, &ref->old_oid_expect))
>  			oidclr(&ref->old_oid_expect);
> -		return;
> +		else
> +			do_reflog_check = 1;
> +
> +		goto reflog_check;

Hmm. I do not condemn `goto` statements in general, but this one makes the
flow harder to follow. I would prefer something like this:

-- snip --
 		else if (remote_tracking(remote, ref->name, &ref->old_oid_expect))
 			oidclr(&ref->old_oid_expect);
+		else if (local_ref && !read_ref(local_ref->name, &oid))
+			ref->implicit_fetch =
+				!remote_ref_in_reflog(&ref->old_oid, &oid,
+						      local_ref->name);
 		return;
-- snap --

Again, thank you so much for working on this!

Ciao,
Dscho

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] push: make `--force-with-lease[=<ref>]` safer
  2020-09-04 18:51  2% [PATCH] push: make `--force-with-lease[=<ref>]` safer Srinidhi Kaushik
@ 2020-09-07 15:23  0% ` Phillip Wood
  2020-09-07 19:45  0% ` Johannes Schindelin
  1 sibling, 0 replies; 200+ results
From: Phillip Wood @ 2020-09-07 15:23 UTC (permalink / raw)
  To: Srinidhi Kaushik, git; +Cc: Johannes Schindelin

Hi Srinidhi

Thanks for working on this, making --force-with-lease safer would be a 
valuable contribution

On 04/09/2020 19:51, Srinidhi Kaushik wrote:
> The `--force-with-lease` option in `git-push`, makes sure that
> refs on remote aren't clobbered by unexpected changes when the
> "<expect>" ref value is explicitly specified.

I think it would help to write out 
`--force-with-lease[=<refname>[:<expect>]]` so readers know what 
"<expect>" is referring to

> For other cases (i.e., `--force-with-lease[=<ref>]`) where the tip
> of the remote tracking branch is populated as the "<expect>" value,
> there is a possibility of allowing unwanted overwrites on the remote
> side when some tools that implicitly fetch remote-tracking refs in
> the background are used with the repository. If a remote-tracking ref
> was updated when a rewrite is happening locally and if those changes
> are pushed by omitting the "<expect>" value in `--force-with-lease`,
> any new changes from the updated tip will be lost locally and will
> be overwritten on the remote.
> 
> This problem can be addressed by checking the `reflog` of the branch
> that is being pushed and verify if there in a entry with the remote
> tracking ref. By running this check, we can ensure that refs being
> are fetched in the background while a "lease" is being held are not
> overlooked before a push, and any new changes can be acknowledged
> and (if necessary) integrated locally.

An addition safety measure would be to check the reflog of the local 
commit and the tip of the remote tracking branch dates overlap. 
Otherwise if there is an implicit fetch of a remote head that has been 
rewound we still push the local branch when we shouldn't.

> The new check will cause `git-push` to fail if it detects the presence
> of any updated refs that we do not have locally and reject the push
> stating `implicit fetch` as the reason.

'implicit fetch' is a rather terse message - can we say something along 
the lines of "the remote has been updated since the last merge/push"?

> An experimental configuration setting: `push.rejectImplicitFetch`
> which defaults to `true` (when `features.experimental` is enabled)
> has been added, to allow `git-push` to reject a push if the check
> fails.

Making this available with features.experimental initially is probably a 
good idea, I hope it will become the default if in future versions.

> Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
> ---
> 
> Hello,
> I picked this up from #leftoverbits over at GitHub [1] from the open
> issues list. This idea [2], for a safer `--force-with-lease` was
> originally proposed by Johannes on the mailing list.
> 
> [1]: https://github.com/gitgitgadget/git/issues/640
> [2]: https://lore.kernel.org/git/nycvar.QRO.7.76.6.1808272306271.73@tvgsbejvaqbjf.bet/
> 
> Thanks.
> 
>   Documentation/config/feature.txt |  3 +
>   Documentation/config/push.txt    | 14 +++++
>   Documentation/git-push.txt       |  6 ++
>   builtin/send-pack.c              |  5 ++
>   remote.c                         | 96 +++++++++++++++++++++++++++++---
>   remote.h                         |  4 +-
>   send-pack.c                      |  1 +
>   t/t5533-push-cas.sh              | 86 ++++++++++++++++++++++++++++
>   transport-helper.c               |  5 ++
>   transport.c                      |  5 ++
>   10 files changed, 217 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt
> index c0cbf2bb1c..f93e9fd898 100644
> --- a/Documentation/config/feature.txt
> +++ b/Documentation/config/feature.txt
> @@ -18,6 +18,9 @@ skipping more commits at a time, reducing the number of round trips.
>   * `protocol.version=2` speeds up fetches from repositories with many refs by
>   allowing the client to specify which refs to list before the server lists
>   them.
> ++
> +* `push.rejectImplicitFetch=true` runs additional checks for linkgit:git-push[1]
> +`--force-with-lease` to mitigate implicit updates of remote-tracking refs.
> 
>   feature.manyFiles::
>   	Enable config options that optimize for repos with many files in the
> diff --git a/Documentation/config/push.txt b/Documentation/config/push.txt
> index f5e5b38c68..1a7184034d 100644
> --- a/Documentation/config/push.txt
> +++ b/Documentation/config/push.txt
> @@ -114,3 +114,17 @@ push.recurseSubmodules::
>   	specifying '--recurse-submodules=check|on-demand|no'.
>   	If not set, 'no' is used by default, unless 'submodule.recurse' is
>   	set (in which case a 'true' value means 'on-demand').
> +
> +push.rejectImplicitFetch::
> +	If set to `true`, runs additional checks for the `--force-with-lease`
> +	option when used with linkgit:git-push[1] if the expected value for
> +	the remote ref is unspecified (`--force-with-lease[=<ref>]`), and
> +	instead asked depend on the current value of the remote-tracking ref.
> +	The check ensures that the commit at the tip of the remote-tracking
> +	branch -- which may have been implicitly updated by tools that fetch
> +	remote refs by running linkgit:git-fetch[1] in the background -- has
> +	been integrated locally, when holding the "lease". If the new changes
> +	from such remote-tracking refs have not been updated locally before
> +	pushing, linkgit:git-push[1] will fail indicating the reject reason
> +	as `implicit fetch`. Enabling `feature.experimental` makes this option
> +	default to `true`.
> diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt
> index 3b8053447e..2176a743f3 100644
> --- a/Documentation/git-push.txt
> +++ b/Documentation/git-push.txt
> @@ -320,6 +320,12 @@ seen and are willing to overwrite, then rewrite history, and finally
>   force push changes to `master` if the remote version is still at
>   `base`, regardless of what your local `remotes/origin/master` has been
>   updated to in the background.
> ++
> +Alternatively, setting the (experimental) `push.rejectImplicitFetch` option
> +to `true` will ensure changes from remote-tracking refs that are updated in the
> +background using linkgit:git-fetch[1] are accounted for (either by integrating
> +them locally, or explicitly specifying an overwrite), by rejecting to update
> +such refs.
> 
>   -f::
>   --force::
> diff --git a/builtin/send-pack.c b/builtin/send-pack.c
> index 2b9610f121..6500a8267a 100644
> --- a/builtin/send-pack.c
> +++ b/builtin/send-pack.c
> @@ -69,6 +69,11 @@ static void print_helper_status(struct ref *ref)
>   			msg = "stale info";
>   			break;
> 
> +		case REF_STATUS_REJECT_IMPLICIT_FETCH:
> +			res = "error";
> +			msg = "implicit fetch";
> +			break;
> +
>   		case REF_STATUS_REJECT_ALREADY_EXISTS:
>   			res = "error";
>   			msg = "already exists";
> diff --git a/remote.c b/remote.c
> index c5ed74f91c..ee2dedd15b 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -49,6 +49,8 @@ static const char *pushremote_name;
>   static struct rewrites rewrites;
>   static struct rewrites rewrites_push;
> 
> +static struct object_id cas_reflog_check_oid;
> +

rather than using a global variable I think it would be better just to 
pass this value around using the cb_data argument of the reflog callback 
function

>   static int valid_remote(const struct remote *remote)
>   {
>   	return (!!remote->url) || (!!remote->foreign_vcs);
> @@ -1446,6 +1448,22 @@ int match_push_refs(struct ref *src, struct ref **dst,
>   	return 0;
>   }
> 
> +/*
> + * Consider `push.rejectImplicitFetch` to be set to true if experimental
> + * features are enabled; use user-defined value if set explicitly.
> + */
> +int reject_implicit_fetch()
> +{
> +	int conf = 0;
> +	if (!git_config_get_bool("push.rejectImplicitFetch", &conf))
> +		return conf;
> +
> +	if (!git_config_get_bool("feature.experimental", &conf))
> +		return conf;
> +
> +	return conf;
> +}
> +
>   void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
>   			     int force_update)
>   {
> @@ -1471,16 +1489,21 @@ void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
>   		 * If the remote ref has moved and is now different
>   		 * from what we expect, reject any push.
>   		 *
> -		 * It also is an error if the user told us to check
> -		 * with the remote-tracking branch to find the value
> -		 * to expect, but we did not have such a tracking
> -		 * branch.
> +		 * It also is an error if the user told us to check with the
> +		 * remote-tracking branch to find the value to expect, but we
> +		 * did not have such a tracking branch, or we have one that
> +		 * has new changes.
>   		 */
>   		if (ref->expect_old_sha1) {
>   			if (!oideq(&ref->old_oid, &ref->old_oid_expect))
>   				reject_reason = REF_STATUS_REJECT_STALE;
> +			else if (reject_implicit_fetch() && ref->implicit_fetch)
> +				reject_reason = REF_STATUS_REJECT_IMPLICIT_FETCH;
>   			else
> -				/* If the ref isn't stale then force the update. */
> +				/*
> +				 * If the ref isn't stale, or there was no
> +				 * implicit fetch, force the update.
> +				 */
>   				force_ref_update = 1;
>   		}
> 
> @@ -2272,23 +2295,67 @@ static int remote_tracking(struct remote *remote, const char *refname,
>   	return 0;
>   }
> 
> +static int oid_in_reflog_ent(struct object_id *ooid, struct object_id *noid,
> +			     const char *ident, timestamp_t timestamp, int tz,
> +			     const char *message, void *cb_data)
> +{

using the callback data we would have something like

struct oid *remote_head = cb_data;
return oideq(noid, remote_head);

> +	return oideq(noid, &cas_reflog_check_oid);
> +}
> +
> +/*
> + * Iterate through the reflog of a local branch and check if the tip of the
> + * remote-tracking branch is reachable from one of the entries.
> + */
> +static int remote_ref_in_reflog(const struct object_id *r_oid,
> +				const struct object_id *l_oid,
> +				const char *local_ref_name)
> +{
> +	int ret = 0;
> +	cas_reflog_check_oid = *r_oid;
> +
> +	struct commit *r_commit, *l_commit;

Our coding style is to declare all variables before any statements, so 
this should come above `cas_reflog_check_oid = *r_oid` but that line 
wants to go away anyway.

> +	l_commit = lookup_commit_reference(the_repository, l_oid);
> +	r_commit = lookup_commit_reference(the_repository, r_oid);
> +
> +	/*
> +	 * If the remote-tracking ref is an ancestor of the local ref (a merge,
> +	 * for instance) there is no need to iterate through the reflog entries
> +	 * to ensure reachability; it can be skipped to return early instead.
> +	 */
> +	ret = (r_commit && l_commit) ? in_merge_bases(r_commit, l_commit) : 0;
> +	if (ret)
> +		goto skip;

Rather than using a goto it would perhaps be better to do

if (!ret)
	ret = for_each_reflog_...

> +
> +	ret = for_each_reflog_ent_reverse(local_ref_name,
> +					  oid_in_reflog_ent,
> +					  NULL);

using the callback data we'd pass r_oid rather than NULL as the last 
argument

> +skip:
> +	return ret;
> +}
> +
>   static void apply_cas(struct push_cas_option *cas,
>   		      struct remote *remote,
>   		      struct ref *ref)
>   {
> -	int i;
> +	int i, do_reflog_check = 0;
> +	struct object_id oid;
> +	struct ref *local_ref = get_local_ref(ref->name);
> 
>   	/* Find an explicit --<option>=<name>[:<value>] entry */
>   	for (i = 0; i < cas->nr; i++) {
>   		struct push_cas *entry = &cas->entry[i];
>   		if (!refname_match(entry->refname, ref->name))
>   			continue;
> +
>   		ref->expect_old_sha1 = 1;
>   		if (!entry->use_tracking)
>   			oidcpy(&ref->old_oid_expect, &entry->expect);
>   		else if (remote_tracking(remote, ref->name, &ref->old_oid_expect))
>   			oidclr(&ref->old_oid_expect);
> -		return;
> +		else
> +			do_reflog_check = 1;
> +
> +		goto reflog_check;

I'm not too keen in jumping here, can't we just check `do_reflog_check` 
below?

Best Wishes

Phillip

>   	}
> 
>   	/* Are we using "--<option>" to cover all? */
> @@ -2298,6 +2365,21 @@ static void apply_cas(struct push_cas_option *cas,
>   	ref->expect_old_sha1 = 1;
>   	if (remote_tracking(remote, ref->name, &ref->old_oid_expect))
>   		oidclr(&ref->old_oid_expect);
> +	else
> +		do_reflog_check = 1;
> +
> +reflog_check:
> +	/*
> +	 * For cases where "--force-with-lease[=<refname>]" i.e., when the
> +	 * "<expect>" value is unspecified, run additional checks to verify
> +	 * if the tip of the remote-tracking branch (if implicitly updated
> +	 * when a "lease" is being held) is reachable from at least one entry
> +	 * in the reflog of the local branch that is being pushed, ensuring
> +	 * new changes (if any) have been integrated locally.
> +	 */
> +	if (do_reflog_check && local_ref && !read_ref(local_ref->name, &oid))
> +		ref->implicit_fetch = !remote_ref_in_reflog(&ref->old_oid, &oid,
> +							    local_ref->name);
>   }
> 
>   void apply_push_cas(struct push_cas_option *cas,
> diff --git a/remote.h b/remote.h
> index 5e3ea5a26d..f859fa5fed 100644
> --- a/remote.h
> +++ b/remote.h
> @@ -104,7 +104,8 @@ struct ref {
>   		forced_update:1,
>   		expect_old_sha1:1,
>   		exact_oid:1,
> -		deletion:1;
> +		deletion:1,
> +		implicit_fetch:1;
> 
>   	enum {
>   		REF_NOT_MATCHED = 0, /* initial value */
> @@ -133,6 +134,7 @@ struct ref {
>   		REF_STATUS_REJECT_FETCH_FIRST,
>   		REF_STATUS_REJECT_NEEDS_FORCE,
>   		REF_STATUS_REJECT_STALE,
> +		REF_STATUS_REJECT_IMPLICIT_FETCH,
>   		REF_STATUS_REJECT_SHALLOW,
>   		REF_STATUS_UPTODATE,
>   		REF_STATUS_REMOTE_REJECT,
> diff --git a/send-pack.c b/send-pack.c
> index 632f1580ca..fe7f14add4 100644
> --- a/send-pack.c
> +++ b/send-pack.c
> @@ -240,6 +240,7 @@ static int check_to_send_update(const struct ref *ref, const struct send_pack_ar
>   	case REF_STATUS_REJECT_FETCH_FIRST:
>   	case REF_STATUS_REJECT_NEEDS_FORCE:
>   	case REF_STATUS_REJECT_STALE:
> +	case REF_STATUS_REJECT_IMPLICIT_FETCH:
>   	case REF_STATUS_REJECT_NODELETE:
>   		return CHECK_REF_STATUS_REJECTED;
>   	case REF_STATUS_UPTODATE:
> diff --git a/t/t5533-push-cas.sh b/t/t5533-push-cas.sh
> index 0b0eb1d025..840b2a95f9 100755
> --- a/t/t5533-push-cas.sh
> +++ b/t/t5533-push-cas.sh
> @@ -13,6 +13,41 @@ setup_srcdst_basic () {
>   	)
>   }
> 
> +setup_implicit_fetch () {
> +	rm -fr src dup dst &&
> +	git init --bare dst &&
> +	git clone --no-local dst src &&
> +	git clone --no-local dst dup
> +	(
> +		cd src &&
> +		test_commit A &&
> +		git push
> +	) &&
> +	(
> +		cd dup &&
> +		git fetch &&
> +		git merge origin/master &&
> +		test_commit B &&
> +		git switch -c branch master~1 &&
> +		test_commit C &&
> +		test_commit D &&
> +		git push --all
> +	) &&
> +	(
> +		cd src &&
> +		git switch master &&
> +		git fetch --all &&
> +		git branch branch --track origin/branch &&
> +		git merge origin/master
> +	) &&
> +	(
> +		cd dup &&
> +		git switch master &&
> +		test_commit E &&
> +		git push origin master:master
> +	)
> +}
> +
>   test_expect_success setup '
>   	# create template repository
>   	test_commit A &&
> @@ -256,4 +291,55 @@ test_expect_success 'background updates of REMOTE can be mitigated with a non-up
>   	)
>   '
> 
> +test_expect_success 'implicit updates to remote-tracking refs with `push.rejectImplicitFetch` set (protected, all refs)' '
> +	setup_implicit_fetch &&
> +	test_when_finished "rm -fr dst src dup" &&
> +	git ls-remote dst refs/heads/master >expect.master &&
> +	git ls-remote dst refs/heads/master >expect.branch &&
> +	(
> +		cd src &&
> +		git switch master &&
> +		test_commit G &&
> +		git switch branch &&
> +		test_commit H &&
> +		git fetch --all &&
> +		git config --local feature.experimental true &&
> +		test_must_fail git push --force-with-lease --all 2>err &&
> +		grep "implicit fetch" err
> +	) &&
> +	git ls-remote dst refs/heads/master >actual.master &&
> +	git ls-remote dst refs/heads/master >actual.branch &&
> +	test_cmp expect.master actual.master &&
> +	test_cmp expect.branch actual.branch &&
> +	(
> +		cd src &&
> +		git config --local feature.experimental false &&
> +		git push --force-with-lease --all 2>err &&
> +		grep "forced update" err
> +	)
> +'
> +
> +test_expect_success 'implicit updates to remote-tracking refs with `push.rejectImplicitFetch` set (protected, specific ref)' '
> +	setup_implicit_fetch &&
> +	git ls-remote dst refs/heads/master >actual &&
> +	(
> +		cd src &&
> +		git switch branch &&
> +		test_commit F &&
> +		git switch master &&
> +		test_commit G &&
> +		git fetch  &&
> +		git config --local push.rejectImplicitFetch true &&
> +		test_must_fail git push --force-with-lease=master --all 2>err &&
> +		grep "implicit fetch" err
> +	) &&
> +	git ls-remote dst refs/heads/master >expect &&
> +	test_cmp expect actual &&
> +	(
> +		cd src &&
> +		git push --force --force-with-lease --all 2>err &&
> +		grep "forced update" err
> +	)
> +'
> +
>   test_done
> diff --git a/transport-helper.c b/transport-helper.c
> index c52c99d829..75b4c1b758 100644
> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -779,6 +779,10 @@ static int push_update_ref_status(struct strbuf *buf,
>   			status = REF_STATUS_REJECT_STALE;
>   			FREE_AND_NULL(msg);
>   		}
> +		else if (!strcmp(msg, "ignored fetch")) {
> +			status = REF_STATUS_REJECT_IMPLICIT_FETCH;
> +			FREE_AND_NULL(msg);
> +		}
>   		else if (!strcmp(msg, "forced update")) {
>   			forced = 1;
>   			FREE_AND_NULL(msg);
> @@ -896,6 +900,7 @@ static int push_refs_with_push(struct transport *transport,
>   		switch (ref->status) {
>   		case REF_STATUS_REJECT_NONFASTFORWARD:
>   		case REF_STATUS_REJECT_STALE:
> +		case REF_STATUS_REJECT_IMPLICIT_FETCH:
>   		case REF_STATUS_REJECT_ALREADY_EXISTS:
>   			if (atomic) {
>   				reject_atomic_push(remote_refs, mirror);
> diff --git a/transport.c b/transport.c
> index 43e24bf1e5..588575498f 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -567,6 +567,10 @@ static int print_one_push_status(struct ref *ref, const char *dest, int count,
>   		print_ref_status('!', "[rejected]", ref, ref->peer_ref,
>   				 "stale info", porcelain, summary_width);
>   		break;
> +	case REF_STATUS_REJECT_IMPLICIT_FETCH:
> +		print_ref_status('!', "[rejected]", ref, ref->peer_ref,
> +				 "implicit fetch", porcelain, summary_width);
> +		break;
>   	case REF_STATUS_REJECT_SHALLOW:
>   		print_ref_status('!', "[rejected]", ref, ref->peer_ref,
>   				 "new shallow roots not allowed",
> @@ -1101,6 +1105,7 @@ static int run_pre_push_hook(struct transport *transport,
>   		if (!r->peer_ref) continue;
>   		if (r->status == REF_STATUS_REJECT_NONFASTFORWARD) continue;
>   		if (r->status == REF_STATUS_REJECT_STALE) continue;
> +		if (r->status == REF_STATUS_REJECT_IMPLICIT_FETCH) continue;
>   		if (r->status == REF_STATUS_UPTODATE) continue;
> 
>   		strbuf_reset(&buf);
> --
> 2.28.0
> 

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache()
  2020-09-07  2:23  0%   ` Jeff King
@ 2020-09-07  2:30  0%     ` Taylor Blau
  0 siblings, 0 replies; 200+ results
From: Taylor Blau @ 2020-09-07  2:30 UTC (permalink / raw)
  To: Jeff King
  Cc: Taylor Blau, René Scharfe, Git Mailing List, Junio C Hamano

On Sun, Sep 06, 2020 at 10:23:40PM -0400, Jeff King wrote:
> On Sun, Sep 06, 2020 at 03:02:35PM -0400, Taylor Blau wrote:
>
> > On Sun, Sep 06, 2020 at 10:59:06AM +0200, René Scharfe wrote:
> > > -		uint32_t hash_value = htonl(entry->hash);
> > > -		hashwrite(f, &hash_value, sizeof(hash_value));
> > > +		hashwrite_be32(f, entry->hash);
> >
> > This is an obviously correct translation of what's already written, and
> > indeed it is shorter and easier to read.
> >
> > Unfortunately, I think there is some more subtlety here since the hash
> > cache isn't guarenteed to be aligned, and so blindly calling htonl()
> > (either directly in write_hash_cache(), or indirectly in
> > hashwrite_be32()) might cause tools like ASan to complain when loading
> > data on architectures that don't support fast unaligned reads.
>
> I think the alignment here is fine. We're just writing out an individual
> value. So in the original entry->hash and our local hash_value are both
> properly aligned, since they're declared as uint32_t. We pass the
> pointer to hashwrite(), but it doesn't expect any particular alignment.
> After the patch, the situation is the same, except that we're working
> with the uint32_t parameter to hashwrite_be32(), which is also properly
> aligned.

Ack; I would blame it on skimming the patch, but this is far too obvious
for that. The bug is on the *reading* end in GitHub's fork, and in a
(custom) extension (which it looks like you describe below).
Embarrassing.

> > So, I think that we could do one of three things, depending on how much
> > you care about improving this case ;-).
> >
> >   - leave your patch alone, accepting that this case which was broken
> >     before will remain broken, and leave it as #leftoverbits
>
> So I think this is what we should do. :)

Yep, this patch is correct as-is.

> >   - change the 'hashwrite_beXX()' implementations to use the correct
> >     'get_beXX' wrappers which behave like htonl() on architectures with
> >     fast unaligned loads, and fall back to byte reads and shifts on
> >     architectures that don't.
>
> Likewise, I don't think there's any reason to do this. hashwrite_be32()
> gets its parameter as a value, not a pointer. So even if it were coming
> from an unaligned mmap, it's actually the _caller_ who would have to
> use get_be32() when passing it.

Right.

> > Credit goes to Peff for finding this issue in GitHub's fork. For what
> > it's worth, we were planning on sending those patches to the list soon,
> > but they are tied up with a longer series in the meantime.
>
> There is a bug in our fork, but I don't think it's upstream. [...]

Agreed with all of that, too.


Taylor

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache()
  2020-09-06 19:02  5% ` Taylor Blau
@ 2020-09-07  2:23  0%   ` Jeff King
  2020-09-07  2:30  0%     ` Taylor Blau
  0 siblings, 1 reply; 200+ results
From: Jeff King @ 2020-09-07  2:23 UTC (permalink / raw)
  To: Taylor Blau; +Cc: René Scharfe, Git Mailing List, Junio C Hamano

On Sun, Sep 06, 2020 at 03:02:35PM -0400, Taylor Blau wrote:

> On Sun, Sep 06, 2020 at 10:59:06AM +0200, René Scharfe wrote:
> > -		uint32_t hash_value = htonl(entry->hash);
> > -		hashwrite(f, &hash_value, sizeof(hash_value));
> > +		hashwrite_be32(f, entry->hash);
> 
> This is an obviously correct translation of what's already written, and
> indeed it is shorter and easier to read.
> 
> Unfortunately, I think there is some more subtlety here since the hash
> cache isn't guarenteed to be aligned, and so blindly calling htonl()
> (either directly in write_hash_cache(), or indirectly in
> hashwrite_be32()) might cause tools like ASan to complain when loading
> data on architectures that don't support fast unaligned reads.

I think the alignment here is fine. We're just writing out an individual
value. So in the original entry->hash and our local hash_value are both
properly aligned, since they're declared as uint32_t. We pass the
pointer to hashwrite(), but it doesn't expect any particular alignment.
After the patch, the situation is the same, except that we're working
with the uint32_t parameter to hashwrite_be32(), which is also properly
aligned.

> So, I think that we could do one of three things, depending on how much
> you care about improving this case ;-).
> 
>   - leave your patch alone, accepting that this case which was broken
>     before will remain broken, and leave it as #leftoverbits

So I think this is what we should do. :)

>   - discard your patch as-is, and replace the 'htonl' with 'get_be32()'
>     before handing it off to 'hashwrite()', or

No need here; get_be32() is for when we're reading in from an unaligned
mmap.

>   - change the 'hashwrite_beXX()' implementations to use the correct
>     'get_beXX' wrappers which behave like htonl() on architectures with
>     fast unaligned loads, and fall back to byte reads and shifts on
>     architectures that don't.

Likewise, I don't think there's any reason to do this. hashwrite_be32()
gets its parameter as a value, not a pointer. So even if it were coming
from an unaligned mmap, it's actually the _caller_ who would have to
use get_be32() when passing it.

> Credit goes to Peff for finding this issue in GitHub's fork. For what
> it's worth, we were planning on sending those patches to the list soon,
> but they are tied up with a longer series in the meantime.

There is a bug in our fork, but I don't think it's upstream. The
relevant spot for the name-hash cache is in show_objects_for_type(),
which reads from the bitmap->hashes pointer (that points into our
unaligned mmap). But it does:

        if (bitmap_git->hashes)
                hash = get_be32(bitmap_git->hashes + entry->nr);

which is correct (using htonl() would not be). The bug is that in our
fork, we have a custom bit-cache extension[1] which does use htonl(),
and should be get_be32(). That's something we'll need to clean up when
we send those patches upstream.

-Peff

[1] For the curious, the point is it keep a cache of the bit position of
    each object, which lets us ask "is this object's bit set" without
    having to load the revindex. It's helpful for bitmap-optimizing some
    algorithms like "branch --contains", though I think we should
    re-evaluate how much it helps now that we have commit-graphs with
    generation numbers.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache()
  @ 2020-09-06 19:02  5% ` Taylor Blau
  2020-09-07  2:23  0%   ` Jeff King
  0 siblings, 1 reply; 200+ results
From: Taylor Blau @ 2020-09-06 19:02 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git Mailing List, Junio C Hamano, Taylor Blau, Jeff King

Hi René,

On Sun, Sep 06, 2020 at 10:59:06AM +0200, René Scharfe wrote:
> -		uint32_t hash_value = htonl(entry->hash);
> -		hashwrite(f, &hash_value, sizeof(hash_value));
> +		hashwrite_be32(f, entry->hash);

This is an obviously correct translation of what's already written, and
indeed it is shorter and easier to read.

Unfortunately, I think there is some more subtlety here since the hash
cache isn't guarenteed to be aligned, and so blindly calling htonl()
(either directly in write_hash_cache(), or indirectly in
hashwrite_be32()) might cause tools like ASan to complain when loading
data on architectures that don't support fast unaligned reads.

So, I think that we could do one of three things, depending on how much
you care about improving this case ;-).

  - leave your patch alone, accepting that this case which was broken
    before will remain broken, and leave it as #leftoverbits

  - discard your patch as-is, and replace the 'htonl' with 'get_be32()'
    before handing it off to 'hashwrite()', or

  - change the 'hashwrite_beXX()' implementations to use the correct
    'get_beXX' wrappers which behave like htonl() on architectures with
    fast unaligned loads, and fall back to byte reads and shifts on
    architectures that don't.

Credit goes to Peff for finding this issue in GitHub's fork. For what
it's worth, we were planning on sending those patches to the list soon,
but they are tied up with a longer series in the meantime.

For what it's worth, I think doing any of the above would be fine.

Thanks,
Taylor

^ permalink raw reply	[relevance 5%]

* [PATCH] push: make `--force-with-lease[=<ref>]` safer
@ 2020-09-04 18:51  2% Srinidhi Kaushik
  2020-09-07 15:23  0% ` Phillip Wood
  2020-09-07 19:45  0% ` Johannes Schindelin
  0 siblings, 2 replies; 200+ results
From: Srinidhi Kaushik @ 2020-09-04 18:51 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Srinidhi Kaushik

The `--force-with-lease` option in `git-push`, makes sure that
refs on remote aren't clobbered by unexpected changes when the
"<expect>" ref value is explicitly specified.

For other cases (i.e., `--force-with-lease[=<ref>]`) where the tip
of the remote tracking branch is populated as the "<expect>" value,
there is a possibility of allowing unwanted overwrites on the remote
side when some tools that implicitly fetch remote-tracking refs in
the background are used with the repository. If a remote-tracking ref
was updated when a rewrite is happening locally and if those changes
are pushed by omitting the "<expect>" value in `--force-with-lease`,
any new changes from the updated tip will be lost locally and will
be overwritten on the remote.

This problem can be addressed by checking the `reflog` of the branch
that is being pushed and verify if there in a entry with the remote
tracking ref. By running this check, we can ensure that refs being
are fetched in the background while a "lease" is being held are not
overlooked before a push, and any new changes can be acknowledged
and (if necessary) integrated locally.

The new check will cause `git-push` to fail if it detects the presence
of any updated refs that we do not have locally and reject the push
stating `implicit fetch` as the reason.

An experimental configuration setting: `push.rejectImplicitFetch`
which defaults to `true` (when `features.experimental` is enabled)
has been added, to allow `git-push` to reject a push if the check
fails.

Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
---

Hello,
I picked this up from #leftoverbits over at GitHub [1] from the open
issues list. This idea [2], for a safer `--force-with-lease` was
originally proposed by Johannes on the mailing list.

[1]: https://github.com/gitgitgadget/git/issues/640
[2]: https://lore.kernel.org/git/nycvar.QRO.7.76.6.1808272306271.73@tvgsbejvaqbjf.bet/

Thanks.

 Documentation/config/feature.txt |  3 +
 Documentation/config/push.txt    | 14 +++++
 Documentation/git-push.txt       |  6 ++
 builtin/send-pack.c              |  5 ++
 remote.c                         | 96 +++++++++++++++++++++++++++++---
 remote.h                         |  4 +-
 send-pack.c                      |  1 +
 t/t5533-push-cas.sh              | 86 ++++++++++++++++++++++++++++
 transport-helper.c               |  5 ++
 transport.c                      |  5 ++
 10 files changed, 217 insertions(+), 8 deletions(-)

diff --git a/Documentation/config/feature.txt b/Documentation/config/feature.txt
index c0cbf2bb1c..f93e9fd898 100644
--- a/Documentation/config/feature.txt
+++ b/Documentation/config/feature.txt
@@ -18,6 +18,9 @@ skipping more commits at a time, reducing the number of round trips.
 * `protocol.version=2` speeds up fetches from repositories with many refs by
 allowing the client to specify which refs to list before the server lists
 them.
++
+* `push.rejectImplicitFetch=true` runs additional checks for linkgit:git-push[1]
+`--force-with-lease` to mitigate implicit updates of remote-tracking refs.

 feature.manyFiles::
 	Enable config options that optimize for repos with many files in the
diff --git a/Documentation/config/push.txt b/Documentation/config/push.txt
index f5e5b38c68..1a7184034d 100644
--- a/Documentation/config/push.txt
+++ b/Documentation/config/push.txt
@@ -114,3 +114,17 @@ push.recurseSubmodules::
 	specifying '--recurse-submodules=check|on-demand|no'.
 	If not set, 'no' is used by default, unless 'submodule.recurse' is
 	set (in which case a 'true' value means 'on-demand').
+
+push.rejectImplicitFetch::
+	If set to `true`, runs additional checks for the `--force-with-lease`
+	option when used with linkgit:git-push[1] if the expected value for
+	the remote ref is unspecified (`--force-with-lease[=<ref>]`), and
+	instead asked depend on the current value of the remote-tracking ref.
+	The check ensures that the commit at the tip of the remote-tracking
+	branch -- which may have been implicitly updated by tools that fetch
+	remote refs by running linkgit:git-fetch[1] in the background -- has
+	been integrated locally, when holding the "lease". If the new changes
+	from such remote-tracking refs have not been updated locally before
+	pushing, linkgit:git-push[1] will fail indicating the reject reason
+	as `implicit fetch`. Enabling `feature.experimental` makes this option
+	default to `true`.
diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt
index 3b8053447e..2176a743f3 100644
--- a/Documentation/git-push.txt
+++ b/Documentation/git-push.txt
@@ -320,6 +320,12 @@ seen and are willing to overwrite, then rewrite history, and finally
 force push changes to `master` if the remote version is still at
 `base`, regardless of what your local `remotes/origin/master` has been
 updated to in the background.
++
+Alternatively, setting the (experimental) `push.rejectImplicitFetch` option
+to `true` will ensure changes from remote-tracking refs that are updated in the
+background using linkgit:git-fetch[1] are accounted for (either by integrating
+them locally, or explicitly specifying an overwrite), by rejecting to update
+such refs.

 -f::
 --force::
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 2b9610f121..6500a8267a 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -69,6 +69,11 @@ static void print_helper_status(struct ref *ref)
 			msg = "stale info";
 			break;

+		case REF_STATUS_REJECT_IMPLICIT_FETCH:
+			res = "error";
+			msg = "implicit fetch";
+			break;
+
 		case REF_STATUS_REJECT_ALREADY_EXISTS:
 			res = "error";
 			msg = "already exists";
diff --git a/remote.c b/remote.c
index c5ed74f91c..ee2dedd15b 100644
--- a/remote.c
+++ b/remote.c
@@ -49,6 +49,8 @@ static const char *pushremote_name;
 static struct rewrites rewrites;
 static struct rewrites rewrites_push;

+static struct object_id cas_reflog_check_oid;
+
 static int valid_remote(const struct remote *remote)
 {
 	return (!!remote->url) || (!!remote->foreign_vcs);
@@ -1446,6 +1448,22 @@ int match_push_refs(struct ref *src, struct ref **dst,
 	return 0;
 }

+/*
+ * Consider `push.rejectImplicitFetch` to be set to true if experimental
+ * features are enabled; use user-defined value if set explicitly.
+ */
+int reject_implicit_fetch()
+{
+	int conf = 0;
+	if (!git_config_get_bool("push.rejectImplicitFetch", &conf))
+		return conf;
+
+	if (!git_config_get_bool("feature.experimental", &conf))
+		return conf;
+
+	return conf;
+}
+
 void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
 			     int force_update)
 {
@@ -1471,16 +1489,21 @@ void set_ref_status_for_push(struct ref *remote_refs, int send_mirror,
 		 * If the remote ref has moved and is now different
 		 * from what we expect, reject any push.
 		 *
-		 * It also is an error if the user told us to check
-		 * with the remote-tracking branch to find the value
-		 * to expect, but we did not have such a tracking
-		 * branch.
+		 * It also is an error if the user told us to check with the
+		 * remote-tracking branch to find the value to expect, but we
+		 * did not have such a tracking branch, or we have one that
+		 * has new changes.
 		 */
 		if (ref->expect_old_sha1) {
 			if (!oideq(&ref->old_oid, &ref->old_oid_expect))
 				reject_reason = REF_STATUS_REJECT_STALE;
+			else if (reject_implicit_fetch() && ref->implicit_fetch)
+				reject_reason = REF_STATUS_REJECT_IMPLICIT_FETCH;
 			else
-				/* If the ref isn't stale then force the update. */
+				/*
+				 * If the ref isn't stale, or there was no
+				 * implicit fetch, force the update.
+				 */
 				force_ref_update = 1;
 		}

@@ -2272,23 +2295,67 @@ static int remote_tracking(struct remote *remote, const char *refname,
 	return 0;
 }

+static int oid_in_reflog_ent(struct object_id *ooid, struct object_id *noid,
+			     const char *ident, timestamp_t timestamp, int tz,
+			     const char *message, void *cb_data)
+{
+	return oideq(noid, &cas_reflog_check_oid);
+}
+
+/*
+ * Iterate through the reflog of a local branch and check if the tip of the
+ * remote-tracking branch is reachable from one of the entries.
+ */
+static int remote_ref_in_reflog(const struct object_id *r_oid,
+				const struct object_id *l_oid,
+				const char *local_ref_name)
+{
+	int ret = 0;
+	cas_reflog_check_oid = *r_oid;
+
+	struct commit *r_commit, *l_commit;
+	l_commit = lookup_commit_reference(the_repository, l_oid);
+	r_commit = lookup_commit_reference(the_repository, r_oid);
+
+	/*
+	 * If the remote-tracking ref is an ancestor of the local ref (a merge,
+	 * for instance) there is no need to iterate through the reflog entries
+	 * to ensure reachability; it can be skipped to return early instead.
+	 */
+	ret = (r_commit && l_commit) ? in_merge_bases(r_commit, l_commit) : 0;
+	if (ret)
+		goto skip;
+
+	ret = for_each_reflog_ent_reverse(local_ref_name,
+					  oid_in_reflog_ent,
+					  NULL);
+skip:
+	return ret;
+}
+
 static void apply_cas(struct push_cas_option *cas,
 		      struct remote *remote,
 		      struct ref *ref)
 {
-	int i;
+	int i, do_reflog_check = 0;
+	struct object_id oid;
+	struct ref *local_ref = get_local_ref(ref->name);

 	/* Find an explicit --<option>=<name>[:<value>] entry */
 	for (i = 0; i < cas->nr; i++) {
 		struct push_cas *entry = &cas->entry[i];
 		if (!refname_match(entry->refname, ref->name))
 			continue;
+
 		ref->expect_old_sha1 = 1;
 		if (!entry->use_tracking)
 			oidcpy(&ref->old_oid_expect, &entry->expect);
 		else if (remote_tracking(remote, ref->name, &ref->old_oid_expect))
 			oidclr(&ref->old_oid_expect);
-		return;
+		else
+			do_reflog_check = 1;
+
+		goto reflog_check;
 	}

 	/* Are we using "--<option>" to cover all? */
@@ -2298,6 +2365,21 @@ static void apply_cas(struct push_cas_option *cas,
 	ref->expect_old_sha1 = 1;
 	if (remote_tracking(remote, ref->name, &ref->old_oid_expect))
 		oidclr(&ref->old_oid_expect);
+	else
+		do_reflog_check = 1;
+
+reflog_check:
+	/*
+	 * For cases where "--force-with-lease[=<refname>]" i.e., when the
+	 * "<expect>" value is unspecified, run additional checks to verify
+	 * if the tip of the remote-tracking branch (if implicitly updated
+	 * when a "lease" is being held) is reachable from at least one entry
+	 * in the reflog of the local branch that is being pushed, ensuring
+	 * new changes (if any) have been integrated locally.
+	 */
+	if (do_reflog_check && local_ref && !read_ref(local_ref->name, &oid))
+		ref->implicit_fetch = !remote_ref_in_reflog(&ref->old_oid, &oid,
+							    local_ref->name);
 }

 void apply_push_cas(struct push_cas_option *cas,
diff --git a/remote.h b/remote.h
index 5e3ea5a26d..f859fa5fed 100644
--- a/remote.h
+++ b/remote.h
@@ -104,7 +104,8 @@ struct ref {
 		forced_update:1,
 		expect_old_sha1:1,
 		exact_oid:1,
-		deletion:1;
+		deletion:1,
+		implicit_fetch:1;

 	enum {
 		REF_NOT_MATCHED = 0, /* initial value */
@@ -133,6 +134,7 @@ struct ref {
 		REF_STATUS_REJECT_FETCH_FIRST,
 		REF_STATUS_REJECT_NEEDS_FORCE,
 		REF_STATUS_REJECT_STALE,
+		REF_STATUS_REJECT_IMPLICIT_FETCH,
 		REF_STATUS_REJECT_SHALLOW,
 		REF_STATUS_UPTODATE,
 		REF_STATUS_REMOTE_REJECT,
diff --git a/send-pack.c b/send-pack.c
index 632f1580ca..fe7f14add4 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -240,6 +240,7 @@ static int check_to_send_update(const struct ref *ref, const struct send_pack_ar
 	case REF_STATUS_REJECT_FETCH_FIRST:
 	case REF_STATUS_REJECT_NEEDS_FORCE:
 	case REF_STATUS_REJECT_STALE:
+	case REF_STATUS_REJECT_IMPLICIT_FETCH:
 	case REF_STATUS_REJECT_NODELETE:
 		return CHECK_REF_STATUS_REJECTED;
 	case REF_STATUS_UPTODATE:
diff --git a/t/t5533-push-cas.sh b/t/t5533-push-cas.sh
index 0b0eb1d025..840b2a95f9 100755
--- a/t/t5533-push-cas.sh
+++ b/t/t5533-push-cas.sh
@@ -13,6 +13,41 @@ setup_srcdst_basic () {
 	)
 }

+setup_implicit_fetch () {
+	rm -fr src dup dst &&
+	git init --bare dst &&
+	git clone --no-local dst src &&
+	git clone --no-local dst dup
+	(
+		cd src &&
+		test_commit A &&
+		git push
+	) &&
+	(
+		cd dup &&
+		git fetch &&
+		git merge origin/master &&
+		test_commit B &&
+		git switch -c branch master~1 &&
+		test_commit C &&
+		test_commit D &&
+		git push --all
+	) &&
+	(
+		cd src &&
+		git switch master &&
+		git fetch --all &&
+		git branch branch --track origin/branch &&
+		git merge origin/master
+	) &&
+	(
+		cd dup &&
+		git switch master &&
+		test_commit E &&
+		git push origin master:master
+	)
+}
+
 test_expect_success setup '
 	# create template repository
 	test_commit A &&
@@ -256,4 +291,55 @@ test_expect_success 'background updates of REMOTE can be mitigated with a non-up
 	)
 '

+test_expect_success 'implicit updates to remote-tracking refs with `push.rejectImplicitFetch` set (protected, all refs)' '
+	setup_implicit_fetch &&
+	test_when_finished "rm -fr dst src dup" &&
+	git ls-remote dst refs/heads/master >expect.master &&
+	git ls-remote dst refs/heads/master >expect.branch &&
+	(
+		cd src &&
+		git switch master &&
+		test_commit G &&
+		git switch branch &&
+		test_commit H &&
+		git fetch --all &&
+		git config --local feature.experimental true &&
+		test_must_fail git push --force-with-lease --all 2>err &&
+		grep "implicit fetch" err
+	) &&
+	git ls-remote dst refs/heads/master >actual.master &&
+	git ls-remote dst refs/heads/master >actual.branch &&
+	test_cmp expect.master actual.master &&
+	test_cmp expect.branch actual.branch &&
+	(
+		cd src &&
+		git config --local feature.experimental false &&
+		git push --force-with-lease --all 2>err &&
+		grep "forced update" err
+	)
+'
+
+test_expect_success 'implicit updates to remote-tracking refs with `push.rejectImplicitFetch` set (protected, specific ref)' '
+	setup_implicit_fetch &&
+	git ls-remote dst refs/heads/master >actual &&
+	(
+		cd src &&
+		git switch branch &&
+		test_commit F &&
+		git switch master &&
+		test_commit G &&
+		git fetch  &&
+		git config --local push.rejectImplicitFetch true &&
+		test_must_fail git push --force-with-lease=master --all 2>err &&
+		grep "implicit fetch" err
+	) &&
+	git ls-remote dst refs/heads/master >expect &&
+	test_cmp expect actual &&
+	(
+		cd src &&
+		git push --force --force-with-lease --all 2>err &&
+		grep "forced update" err
+	)
+'
+
 test_done
diff --git a/transport-helper.c b/transport-helper.c
index c52c99d829..75b4c1b758 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -779,6 +779,10 @@ static int push_update_ref_status(struct strbuf *buf,
 			status = REF_STATUS_REJECT_STALE;
 			FREE_AND_NULL(msg);
 		}
+		else if (!strcmp(msg, "ignored fetch")) {
+			status = REF_STATUS_REJECT_IMPLICIT_FETCH;
+			FREE_AND_NULL(msg);
+		}
 		else if (!strcmp(msg, "forced update")) {
 			forced = 1;
 			FREE_AND_NULL(msg);
@@ -896,6 +900,7 @@ static int push_refs_with_push(struct transport *transport,
 		switch (ref->status) {
 		case REF_STATUS_REJECT_NONFASTFORWARD:
 		case REF_STATUS_REJECT_STALE:
+		case REF_STATUS_REJECT_IMPLICIT_FETCH:
 		case REF_STATUS_REJECT_ALREADY_EXISTS:
 			if (atomic) {
 				reject_atomic_push(remote_refs, mirror);
diff --git a/transport.c b/transport.c
index 43e24bf1e5..588575498f 100644
--- a/transport.c
+++ b/transport.c
@@ -567,6 +567,10 @@ static int print_one_push_status(struct ref *ref, const char *dest, int count,
 		print_ref_status('!', "[rejected]", ref, ref->peer_ref,
 				 "stale info", porcelain, summary_width);
 		break;
+	case REF_STATUS_REJECT_IMPLICIT_FETCH:
+		print_ref_status('!', "[rejected]", ref, ref->peer_ref,
+				 "implicit fetch", porcelain, summary_width);
+		break;
 	case REF_STATUS_REJECT_SHALLOW:
 		print_ref_status('!', "[rejected]", ref, ref->peer_ref,
 				 "new shallow roots not allowed",
@@ -1101,6 +1105,7 @@ static int run_pre_push_hook(struct transport *transport,
 		if (!r->peer_ref) continue;
 		if (r->status == REF_STATUS_REJECT_NONFASTFORWARD) continue;
 		if (r->status == REF_STATUS_REJECT_STALE) continue;
+		if (r->status == REF_STATUS_REJECT_IMPLICIT_FETCH) continue;
 		if (r->status == REF_STATUS_UPTODATE) continue;

 		strbuf_reset(&buf);
--
2.28.0

^ permalink raw reply related	[relevance 2%]

* Re: Git in Outreachy?
  @ 2020-08-31 17:41  6% ` Junio C Hamano
    1 sibling, 0 replies; 200+ results
From: Junio C Hamano @ 2020-08-31 17:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Christian Couder, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> Are we interested in participating in the December 2020 round of
> Outreachy? The community application period is now open.
>
> I can look into lining up funding, but we'd also need:
>
>   - volunteers to act as mentors
>
>   - updates to our applicant materials (proposed projects, but also
>     microproject / patch suggestions)
>
> -Peff

FWIW, I am interested in seeing this project participating.  As
usual, I won't be able to mentor to avoid biases, though.

As to microprojects, I think we saw #leftoverbits and #micrproject
sprinkled in a handful of messages in recent discussions, so with
the help of list archive, we may come up with new ones.

Thanks.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH] run_command: teach API users to use embedded 'args' more
  @ 2020-08-27  4:30  6%           ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-08-27  4:30 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> I've actually considered dropping child_process.argv entirely. Having
> two separate ways to do the same thing gives the potential for
> confusion. But I never dug into whether any existing callers would be
> made worse for it (I kind of doubt it, though; worst case they can use
> strvec_pushv). There are still several left after this patch, it seems.
>
> Likewise for child_process.env_array.

Yup, conversion similar to what I did in this patch may be too
trivial for #microproject, but would nevertheless be a good
#leftoverbits task.  The removal of .argv/.env is not entirely
trivial but a good candidate for #microproject.

Thanks.


^ permalink raw reply	[relevance 6%]

* Re: [PATCH] clone: add remote.cloneDefault config option
  @ 2020-08-26 19:59  6%     ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-08-26 19:59 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Sean Barag via GitGitGadget, git, Sean Barag

Derrick Stolee <stolee@gmail.com> writes:

> On 8/26/2020 2:46 PM, Junio C Hamano wrote:
>> "Sean Barag via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>> This commit implements
>>> `remote.cloneDefault` as a parallel to `remote.pushDefault`,
>>> with prioritized name resolution:
>> 
>> I highly doubt that .cloneDefault is a good name.  After reading
>> only the title of the patch e-mail, i.e. when the only available
>> information on the change available to me was the name of the
>> configuration variable and the fact that it pertains to the command
>> "git clone", I thought it is to specify a URL, from which "git
>> clone" without the URL would clone from that single repository.
>> 
>> And the name will cause the same misunderstanding to normal users,
>> not just to reviewers of your patch, after this change hits a future
>> Git release.
>> 
>> Taking a parallel from init.defaultBranchName, I would probably call
>> it clone.defaultUpstreamName if I were writing this feature.
>
> I was thinking "clone.defaultRemoteName" makes it clear we are naming
> the remote for the provided <url> in the command.

I 100% agree that defaultremotename is much better.

>> ...  For example
>> 
>> 	git -c remote.cloneDefault="bad.../...name" clone parent
>> 
>> should fail, no?
>
> This is an important suggestion.

To be fair, the current code does not handle the "--origin" command
line option not so carefully.

Back when the command was scripted, e.g. 47874d6d (revamp git-clone
(take #2)., 2006-03-21), had both ref-format check and */*
multi-level check, and these checks been retained throughout its
life until 8434c2f1 (Build in clone, 2008-04-27) rewrote the whole
thing while discarding these checks for --origin=bad.../...name

It would make an excellent #leftoverbits or #microproject.

Thanks.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  @ 2020-08-25 18:05  5%       ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-08-25 18:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee

Jeff King <peff@peff.net> writes:

> OK, that's the part I was missing. The discussion here and the statement
> from git-repack(1):
>
>   -d
>       After packing, if the newly created packs make some existing packs
>       redundant, remove the redundant packs. Also run git prune-packed
>       to remove redundant loose object files.
>
> made me think that it was running pack-redundant. But it doesn't seem
> to. It looks like we stopped doing so in 6ed64058e1 (git-repack: do not
> do complex redundancy check., 2005-11-19).

Thanks for digging.  A good opportunity for a #leftoverbits
documentation update from new people is here.

> As an aside, we tried using pack-redundant at GitHub several years ago
> for dropping packs that were replicated in alternates storage. It
> performs very poorly (quadratically, perhaps?) to the point that we
> found it unusable,...

Yes, I originally wrote "the pack-redundant subcommand" in the
message you are responding to with a bit more colourful adjectives,
but rewrote it ;-)  My recollection from the last time I looked at
it is that it is quadratic or even worse---that was long time ago,
but on the other hand I think the subcommand had no significant
improvement over the course of its life.

Perhaps it is time to drop it.

-- >8 --
Subject: [RFC] pack-redundant: gauge the usage before proposing its removal

The subcommand is unusably slow and the reason why nobody reports it
as a performance bug is suspected to be the absense of users.  Let's
disable the normal use of command by making it error out with a big
message that asks the user to tell us that they still care about the
command, with an escape hatch to override it with a command line
option.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-redundant.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index 178e3409b7..97cf3df79b 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -554,6 +554,7 @@ static void load_all(void)
 int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
 {
 	int i;
+	int i_still_use_this = 0;
 	struct pack_list *min = NULL, *red, *pl;
 	struct llist *ignore;
 	struct object_id *oid;
@@ -580,12 +581,25 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
 			alt_odb = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--i-still-use-this")) {
+			i_still_use_this = 1;
+			continue;
+		}
 		if (*arg == '-')
 			usage(pack_redundant_usage);
 		else
 			break;
 	}
 
+	if (!i_still_use_this) {
+		puts(_("'git pack-redundant' is nominated for removal.\n"
+		       "If you still use this command, please add an extra\n"
+		       "option, '--i-still-use-this', on the command line\n"
+		       "and let us know you still use it by sending an e-mail\n"
+		       "to <git@vger.kernel.org>.  Thanks\n"));
+		exit(1);
+	}
+
 	if (load_all_packs)
 		load_all();
 	else

^ permalink raw reply related	[relevance 5%]

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 13:14  0%     ` Derrick Stolee
@ 2020-08-25 14:41  0%       ` Taylor Blau
  0 siblings, 0 replies; 200+ results
From: Taylor Blau @ 2020-08-25 14:41 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Jeff King, git, dstolee

On Tue, Aug 25, 2020 at 09:14:19AM -0400, Derrick Stolee wrote:
> On 8/24/2020 10:37 PM, Taylor Blau wrote:
> > On Mon, Aug 24, 2020 at 10:26:14PM -0400, Jeff King wrote:
> >> On Mon, Aug 24, 2020 at 10:01:04PM -0400, Taylor Blau wrote:
> >>
> >>> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
> >>> learned to remove a multi-pack-index file if it added or removed a pack
> >>> from the object store.
> >>>
> >>> This mechanism is a little over-eager, since it is only necessary to
> >>> drop a MIDX if 'git repack' removes a pack that the MIDX references.
> >>> Adding a pack outside of the MIDX does not require invalidating the
> >>> MIDX, and likewise for removing a pack the MIDX does not know about.
> >>
> >> Does "git repack" ever remove just one pack? Obviously "git repack -ad"
> >> or "git repack -Ad" is going to pack everything and delete the old
> >> packs. So I think we'd want to remove a midx there.
> >>
> >> And "git repack -d" I think of as deleting only loose objects that we
> >> just packed. But I guess it could also remove a pack that has now been
> >> made redundant? That seems like a rare case in practice, but I suppose
> >> is possible.
> >
> > Yeah, the patch message makes this sound more likely than it actually
> > is, which I agree is very rare. I often write 'git repack' instead of
> > 'git pack-objects' to slurp up everything loose into a new pack without
> > having to list loose objects by name.
> >
> > That's the case that I really care about here: purely adding a new pack
> > should not invalidate the existing MIDX.
> >
> >> Not exactly related to your fix, but kind of the flip side of it: would
> >> we ever need to retain a midx that mentions some packs that still exist?
> >>
> >> E.g., imagine we have a midx that points to packs A and B, and
> >> git-repack deletes B. By your logic above, we need to remove the midx
> >> because now it points to objects in B which aren't accessible. But by
> >> deleting it, could we be deleting the only thing that mentions the
> >> objects in A?
> >>
> >> I _think_ the answer is "no", because we never went all-in on midx and
> >> allowed deleting the matching .idx files for contained packs. So we'd
> >> still have that A.idx, and we could just use the pack as normal. But
> >> it's an interesting corner case if we ever do go in that direction.
> >
> > Agreed. Maybe a (admittedly somewhat large) #leftoverbits.
> >
> >> If you'll let me muse a bit more on midx-lifetime issues (which I've
> >> never really thought about before just now):
> >>
> >> I'm also a little curious how bad it is to have a midx whose pack has
> >> gone away. I guess we'd answer queries for "yes, we have this object"
> >> even if we don't, which is bad. Though in practice we'd only delete
> >> those packs if we have their objects elsewhere. And the pack code is
> >> pretty good about retrying other copies of objects that can't be
> >> accessed. Alternatively, I wonder if the midx-loading code ought to
> >> check that all of the constituent packs are available.
> >>
> >> In that line of thinking, do we even need to delete midx files if one of
> >> their packs goes away? The reading side probably ought to be able to
> >> handle that gracefully.
> >
> > I think that this is probably the right direction, although I've only
> > spend time in the MIDX code over the past couple of weeks, so I can't
> > say with authority. It seems like it would be pretty annoying, though.
> > For example, code that cares about listing all objects in a MIDX would
> > have to check first whether the pack they're in still exists before
> > emitting them. On top of that, there are more corner cases when object X
> > exists in more than one pack, but some strict subset of those packs
> > containing X have gone away.
> >
> > I don't think that it couldn't be done, though.
> >
> >> And the more interesting case is when you repack everything with "-ad"
> >> or similar, at which point you shouldn't even need to look up what's in
> >> the midx to see if you deleted its packs. The point of your operation is
> >> to put it all-into-one, so you know the old midx should be discarded.
> >>
> >>> Teach 'git repack' to check for this by loading the MIDX, and checking
> >>> whether the to-be-removed pack is known to the MIDX. This requires a
> >>> slightly odd alternation to a test in t5319, which is explained with a
> >>> comment.
> >>
> >> My above musings aside, this seems like an obvious improvement.
> >>
> >>> diff --git a/builtin/repack.c b/builtin/repack.c
> >>> index 04c5ceaf7e..98fac03946 100644
> >>> --- a/builtin/repack.c
> >>> +++ b/builtin/repack.c
> >>> @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
> >>>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
> >>>  {
> >>>  	struct strbuf buf = STRBUF_INIT;
> >>> -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
> >>> +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
> >>> +	strbuf_addf(&buf, "%s.pack", base_name);
> >>> +	if (m && midx_contains_pack(m, buf.buf))
> >>> +		clear_midx_file(the_repository);
> >>> +	strbuf_insertf(&buf, 0, "%s/", dir_name);
> >>
> >> Makes sense. midx_contains_pack() is a binary search, so we'll spend
> >> O(n log n) effort deleting the packs (I wondered if this might be
> >> accidentally quadratic over the number of packs).
> >
> > Right. The MIDX stores packs in lexographic order, so checking them is
> > O(log n), which we do at most 'n' times.
> >
> >> And after we clear, "m" will be NULL, so we'll do it at most once. Which
> >> is why you can get rid of the manual "midx_cleared" flag from the
> >> preimage.
> >
> > Yep. I thought briefly about passing 'm' as a parameter, but then you
> > have to worry about a dangling reference to
> > 'the_repository->objects->multi_pack_index' after calling
> > 'clear_midx_file()', so it's easier to look it up each time.
>
> The discussion in this thread matches my understanding of the
> situation.
>
> >> So the patch looks good to me.
>
> The code in builtin/repack.c looks good for sure. I have a quick question
> about this new test:
>
> +test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> +	git multi-pack-index write &&
> +	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> +	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
> +
> +	# Write a new pack that is unknown to the multi-pack-index.
> +	git hash-object -w </dev/null >blob &&
> +	git pack-objects $objdir/pack/pack <blob &&
> +
> +	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> +	test_cmp_bin $objdir/pack/multi-pack-index \
> +		$objdir/pack/multi-pack-index.bak
> +'
> +
>
> You create an arbitrary blob, and then add it to a pack-file. Do we
> know that 'git repack' is definitely creating a new pack-file that makes
> our manually-created pack-file redundant?
>
> My suggestion is to have the test check itself:
>
> +test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> +	git multi-pack-index write &&
> +	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> +	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
> +
> +	# Write a new pack that is unknown to the multi-pack-index.
> +	git hash-object -w </dev/null >blob &&
> +	HASH=$(git pack-objects $objdir/pack/pack <blob) &&
> +
> +	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> +	test_cmp_bin $objdir/pack/multi-pack-index \
> +		$objdir/pack/multi-pack-index.bak &&
> +	test_path_is_missing $objdir/pack/pack-$HASH.pack
> +'
> +
>
> This test fails for me, on the 'test_path_is_missing'. Likely, the
> blob is seen as already in a pack-file so is just pruned by 'git repack'
> instead. I thought that perhaps we need to add a new pack ourselves that
> overrides the small pack. Here is my attempt:
>
> test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> 	git multi-pack-index write &&
> 	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> 	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
>
> 	# Write a new pack that is unknown to the multi-pack-index.
> 	BLOB1=$(echo blob1 | git hash-object -w --stdin) &&
> 	BLOB2=$(echo blob2 | git hash-object -w --stdin) &&
> 	cat >blobs <<-EOF &&
> 	$BLOB1
> 	$BLOB2
> 	EOF
> 	HASH1=$(echo $BLOB1 | git pack-objects $objdir/pack/pack) &&
> 	HASH2=$(git pack-objects $objdir/pack/pack <blobs) &&
> 	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> 	test_cmp_bin $objdir/pack/multi-pack-index \
> 		$objdir/pack/multi-pack-index.bak &&
> 	test_path_is_file $objdir/pack/pack-$HASH2.pack &&
> 	test_path_is_missing $objdir/pack/pack-$HASH1.pack
> '
>
> However, this _still_ fails on the "test_path_is_missing" line, so I'm not sure
> how to make sure your logic is tested. I saw that 'git repack' was writing
> "nothing new to pack" in the output, so I also tested adding a few commits and
> trying to force it to repack reachable data, but I cannot seem to trigger it
> to create a new pack that overrides only one pack that is not in the MIDX.
>
> Likely, I just don't know how 'git rebase' works well enough to trigger this
> behavior. But the test as-is is not testing what you want it to test.

I think this case might actually be impossible to tickle in a test. I
thought that 'git repack -d' looked for existing packs whose objects are
a subset of some new pack generated. But, it's much simpler than that:
'-d' by itself just looks for packs that were already on disk with the
same SHA-1 as a new pack, and it removes the old one.

Note that 'git repack' uses 'git pack-objects' internally to find
objects and generate a packfile. When calling 'git pack-objects', 'git
repack -d' passes '--all' and '--unpacked', which means that there is no
way we'd generate a new pack with the same SHA-1 as an existing pack
ordinarily.

So, I think this case is impossible, or at least astronomically
unlikely. What is more interesting (and untested) is that adding a _new_
pack doesn't cause us to invalidate the MIDX. Here's a patch that does
that:

  diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
  index 16a1ad040e..620f2058d6 100755
  --- a/t/t5319-multi-pack-index.sh
  +++ b/t/t5319-multi-pack-index.sh
  @@ -391,18 +391,27 @@ test_expect_success 'repack removes multi-pack-index when deleting packs' '
          test_path_is_missing $objdir/pack/multi-pack-index
   '

  -test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
  -       git multi-pack-index write &&
  -       cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
  -       test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
  -
  -       # Write a new pack that is unknown to the multi-pack-index.
  -       git hash-object -w </dev/null >blob &&
  -       git pack-objects $objdir/pack/pack <blob &&
  -
  -       GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
  -       test_cmp_bin $objdir/pack/multi-pack-index \
  -               $objdir/pack/multi-pack-index.bak
  +test_expect_success 'repack preserves multi-pack-index when creating packs' '
  +       git init preserve &&
  +       test_when_finished "rm -fr preserve" &&
  +       (
  +               cd preserve &&
  +               midx=.git/objects/pack/multi-pack-index &&
  +
  +               test_commit "initial" &&
  +               git repack -ad &&
  +               git multi-pack-index write &&
  +               ls .git/objects/pack | grep "\.pack$" >before &&
  +
  +               cp $midx $midx.bak &&
  +
  +               test_commit "another" &&
  +               GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
  +               ls .git/objects/pack | grep "\.pack$" >after &&
  +
  +               test_cmp_bin $midx.bak $midx &&
  +               ! test_cmp before after
  +       )
   '

   compare_results_with_midx "after repack"

What do you think about applying this on top and then calling it a day?

> Thanks,
> -Stolee

Thanks,
Taylor

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25  2:37  5%   ` Taylor Blau
@ 2020-08-25 13:14  0%     ` Derrick Stolee
  2020-08-25 14:41  0%       ` Taylor Blau
  0 siblings, 1 reply; 200+ results
From: Derrick Stolee @ 2020-08-25 13:14 UTC (permalink / raw)
  To: Taylor Blau, Jeff King; +Cc: git, dstolee

On 8/24/2020 10:37 PM, Taylor Blau wrote:
> On Mon, Aug 24, 2020 at 10:26:14PM -0400, Jeff King wrote:
>> On Mon, Aug 24, 2020 at 10:01:04PM -0400, Taylor Blau wrote:
>>
>>> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
>>> learned to remove a multi-pack-index file if it added or removed a pack
>>> from the object store.
>>>
>>> This mechanism is a little over-eager, since it is only necessary to
>>> drop a MIDX if 'git repack' removes a pack that the MIDX references.
>>> Adding a pack outside of the MIDX does not require invalidating the
>>> MIDX, and likewise for removing a pack the MIDX does not know about.
>>
>> Does "git repack" ever remove just one pack? Obviously "git repack -ad"
>> or "git repack -Ad" is going to pack everything and delete the old
>> packs. So I think we'd want to remove a midx there.
>>
>> And "git repack -d" I think of as deleting only loose objects that we
>> just packed. But I guess it could also remove a pack that has now been
>> made redundant? That seems like a rare case in practice, but I suppose
>> is possible.
> 
> Yeah, the patch message makes this sound more likely than it actually
> is, which I agree is very rare. I often write 'git repack' instead of
> 'git pack-objects' to slurp up everything loose into a new pack without
> having to list loose objects by name.
> 
> That's the case that I really care about here: purely adding a new pack
> should not invalidate the existing MIDX.
> 
>> Not exactly related to your fix, but kind of the flip side of it: would
>> we ever need to retain a midx that mentions some packs that still exist?
>>
>> E.g., imagine we have a midx that points to packs A and B, and
>> git-repack deletes B. By your logic above, we need to remove the midx
>> because now it points to objects in B which aren't accessible. But by
>> deleting it, could we be deleting the only thing that mentions the
>> objects in A?
>>
>> I _think_ the answer is "no", because we never went all-in on midx and
>> allowed deleting the matching .idx files for contained packs. So we'd
>> still have that A.idx, and we could just use the pack as normal. But
>> it's an interesting corner case if we ever do go in that direction.
> 
> Agreed. Maybe a (admittedly somewhat large) #leftoverbits.
> 
>> If you'll let me muse a bit more on midx-lifetime issues (which I've
>> never really thought about before just now):
>>
>> I'm also a little curious how bad it is to have a midx whose pack has
>> gone away. I guess we'd answer queries for "yes, we have this object"
>> even if we don't, which is bad. Though in practice we'd only delete
>> those packs if we have their objects elsewhere. And the pack code is
>> pretty good about retrying other copies of objects that can't be
>> accessed. Alternatively, I wonder if the midx-loading code ought to
>> check that all of the constituent packs are available.
>>
>> In that line of thinking, do we even need to delete midx files if one of
>> their packs goes away? The reading side probably ought to be able to
>> handle that gracefully.
> 
> I think that this is probably the right direction, although I've only
> spend time in the MIDX code over the past couple of weeks, so I can't
> say with authority. It seems like it would be pretty annoying, though.
> For example, code that cares about listing all objects in a MIDX would
> have to check first whether the pack they're in still exists before
> emitting them. On top of that, there are more corner cases when object X
> exists in more than one pack, but some strict subset of those packs
> containing X have gone away.
> 
> I don't think that it couldn't be done, though.
> 
>> And the more interesting case is when you repack everything with "-ad"
>> or similar, at which point you shouldn't even need to look up what's in
>> the midx to see if you deleted its packs. The point of your operation is
>> to put it all-into-one, so you know the old midx should be discarded.
>>
>>> Teach 'git repack' to check for this by loading the MIDX, and checking
>>> whether the to-be-removed pack is known to the MIDX. This requires a
>>> slightly odd alternation to a test in t5319, which is explained with a
>>> comment.
>>
>> My above musings aside, this seems like an obvious improvement.
>>
>>> diff --git a/builtin/repack.c b/builtin/repack.c
>>> index 04c5ceaf7e..98fac03946 100644
>>> --- a/builtin/repack.c
>>> +++ b/builtin/repack.c
>>> @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
>>>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
>>>  {
>>>  	struct strbuf buf = STRBUF_INIT;
>>> -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
>>> +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
>>> +	strbuf_addf(&buf, "%s.pack", base_name);
>>> +	if (m && midx_contains_pack(m, buf.buf))
>>> +		clear_midx_file(the_repository);
>>> +	strbuf_insertf(&buf, 0, "%s/", dir_name);
>>
>> Makes sense. midx_contains_pack() is a binary search, so we'll spend
>> O(n log n) effort deleting the packs (I wondered if this might be
>> accidentally quadratic over the number of packs).
> 
> Right. The MIDX stores packs in lexographic order, so checking them is
> O(log n), which we do at most 'n' times.
> 
>> And after we clear, "m" will be NULL, so we'll do it at most once. Which
>> is why you can get rid of the manual "midx_cleared" flag from the
>> preimage.
> 
> Yep. I thought briefly about passing 'm' as a parameter, but then you
> have to worry about a dangling reference to
> 'the_repository->objects->multi_pack_index' after calling
> 'clear_midx_file()', so it's easier to look it up each time.

The discussion in this thread matches my understanding of the
situation.

>> So the patch looks good to me.

The code in builtin/repack.c looks good for sure. I have a quick question
about this new test:

+test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
+	git multi-pack-index write &&
+	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
+	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
+
+	# Write a new pack that is unknown to the multi-pack-index.
+	git hash-object -w </dev/null >blob &&
+	git pack-objects $objdir/pack/pack <blob &&
+
+	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
+	test_cmp_bin $objdir/pack/multi-pack-index \
+		$objdir/pack/multi-pack-index.bak
+'
+

You create an arbitrary blob, and then add it to a pack-file. Do we
know that 'git repack' is definitely creating a new pack-file that makes
our manually-created pack-file redundant?

My suggestion is to have the test check itself:

+test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
+	git multi-pack-index write &&
+	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
+	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
+
+	# Write a new pack that is unknown to the multi-pack-index.
+	git hash-object -w </dev/null >blob &&
+	HASH=$(git pack-objects $objdir/pack/pack <blob) &&
+
+	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
+	test_cmp_bin $objdir/pack/multi-pack-index \
+		$objdir/pack/multi-pack-index.bak &&
+	test_path_is_missing $objdir/pack/pack-$HASH.pack
+'
+

This test fails for me, on the 'test_path_is_missing'. Likely, the
blob is seen as already in a pack-file so is just pruned by 'git repack'
instead. I thought that perhaps we need to add a new pack ourselves that
overrides the small pack. Here is my attempt:

test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
	git multi-pack-index write &&
	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&

	# Write a new pack that is unknown to the multi-pack-index.
	BLOB1=$(echo blob1 | git hash-object -w --stdin) &&
	BLOB2=$(echo blob2 | git hash-object -w --stdin) &&
	cat >blobs <<-EOF &&
	$BLOB1
	$BLOB2
	EOF
	HASH1=$(echo $BLOB1 | git pack-objects $objdir/pack/pack) &&
	HASH2=$(git pack-objects $objdir/pack/pack <blobs) &&
	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
	test_cmp_bin $objdir/pack/multi-pack-index \
		$objdir/pack/multi-pack-index.bak &&
	test_path_is_file $objdir/pack/pack-$HASH2.pack &&
	test_path_is_missing $objdir/pack/pack-$HASH1.pack
'

However, this _still_ fails on the "test_path_is_missing" line, so I'm not sure
how to make sure your logic is tested. I saw that 'git repack' was writing
"nothing new to pack" in the output, so I also tested adding a few commits and
trying to force it to repack reachable data, but I cannot seem to trigger it
to create a new pack that overrides only one pack that is not in the MIDX.

Likely, I just don't know how 'git rebase' works well enough to trigger this
behavior. But the test as-is is not testing what you want it to test.

Thanks,
-Stolee


^ permalink raw reply	[relevance 0%]

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  @ 2020-08-25  2:37  5%   ` Taylor Blau
  2020-08-25 13:14  0%     ` Derrick Stolee
    1 sibling, 1 reply; 200+ results
From: Taylor Blau @ 2020-08-25  2:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee

On Mon, Aug 24, 2020 at 10:26:14PM -0400, Jeff King wrote:
> On Mon, Aug 24, 2020 at 10:01:04PM -0400, Taylor Blau wrote:
>
> > In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
> > learned to remove a multi-pack-index file if it added or removed a pack
> > from the object store.
> >
> > This mechanism is a little over-eager, since it is only necessary to
> > drop a MIDX if 'git repack' removes a pack that the MIDX references.
> > Adding a pack outside of the MIDX does not require invalidating the
> > MIDX, and likewise for removing a pack the MIDX does not know about.
>
> Does "git repack" ever remove just one pack? Obviously "git repack -ad"
> or "git repack -Ad" is going to pack everything and delete the old
> packs. So I think we'd want to remove a midx there.
>
> And "git repack -d" I think of as deleting only loose objects that we
> just packed. But I guess it could also remove a pack that has now been
> made redundant? That seems like a rare case in practice, but I suppose
> is possible.

Yeah, the patch message makes this sound more likely than it actually
is, which I agree is very rare. I often write 'git repack' instead of
'git pack-objects' to slurp up everything loose into a new pack without
having to list loose objects by name.

That's the case that I really care about here: purely adding a new pack
should not invalidate the existing MIDX.

> Not exactly related to your fix, but kind of the flip side of it: would
> we ever need to retain a midx that mentions some packs that still exist?
>
> E.g., imagine we have a midx that points to packs A and B, and
> git-repack deletes B. By your logic above, we need to remove the midx
> because now it points to objects in B which aren't accessible. But by
> deleting it, could we be deleting the only thing that mentions the
> objects in A?
>
> I _think_ the answer is "no", because we never went all-in on midx and
> allowed deleting the matching .idx files for contained packs. So we'd
> still have that A.idx, and we could just use the pack as normal. But
> it's an interesting corner case if we ever do go in that direction.

Agreed. Maybe a (admittedly somewhat large) #leftoverbits.

> If you'll let me muse a bit more on midx-lifetime issues (which I've
> never really thought about before just now):
>
> I'm also a little curious how bad it is to have a midx whose pack has
> gone away. I guess we'd answer queries for "yes, we have this object"
> even if we don't, which is bad. Though in practice we'd only delete
> those packs if we have their objects elsewhere. And the pack code is
> pretty good about retrying other copies of objects that can't be
> accessed. Alternatively, I wonder if the midx-loading code ought to
> check that all of the constituent packs are available.
>
> In that line of thinking, do we even need to delete midx files if one of
> their packs goes away? The reading side probably ought to be able to
> handle that gracefully.

I think that this is probably the right direction, although I've only
spend time in the MIDX code over the past couple of weeks, so I can't
say with authority. It seems like it would be pretty annoying, though.
For example, code that cares about listing all objects in a MIDX would
have to check first whether the pack they're in still exists before
emitting them. On top of that, there are more corner cases when object X
exists in more than one pack, but some strict subset of those packs
containing X have gone away.

I don't think that it couldn't be done, though.

> And the more interesting case is when you repack everything with "-ad"
> or similar, at which point you shouldn't even need to look up what's in
> the midx to see if you deleted its packs. The point of your operation is
> to put it all-into-one, so you know the old midx should be discarded.
>
> > Teach 'git repack' to check for this by loading the MIDX, and checking
> > whether the to-be-removed pack is known to the MIDX. This requires a
> > slightly odd alternation to a test in t5319, which is explained with a
> > comment.
>
> My above musings aside, this seems like an obvious improvement.
>
> > diff --git a/builtin/repack.c b/builtin/repack.c
> > index 04c5ceaf7e..98fac03946 100644
> > --- a/builtin/repack.c
> > +++ b/builtin/repack.c
> > @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
> >  static void remove_redundant_pack(const char *dir_name, const char *base_name)
> >  {
> >  	struct strbuf buf = STRBUF_INIT;
> > -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
> > +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
> > +	strbuf_addf(&buf, "%s.pack", base_name);
> > +	if (m && midx_contains_pack(m, buf.buf))
> > +		clear_midx_file(the_repository);
> > +	strbuf_insertf(&buf, 0, "%s/", dir_name);
>
> Makes sense. midx_contains_pack() is a binary search, so we'll spend
> O(n log n) effort deleting the packs (I wondered if this might be
> accidentally quadratic over the number of packs).

Right. The MIDX stores packs in lexographic order, so checking them is
O(log n), which we do at most 'n' times.

> And after we clear, "m" will be NULL, so we'll do it at most once. Which
> is why you can get rid of the manual "midx_cleared" flag from the
> preimage.

Yep. I thought briefly about passing 'm' as a parameter, but then you
have to worry about a dangling reference to
'the_repository->objects->multi_pack_index' after calling
'clear_midx_file()', so it's easier to look it up each time.

> So the patch looks good to me.

Thanks.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[relevance 5%]

* Re: [PATCH v2] rebase -i: Fix possibly wrong onto hash in todo
  2020-08-13 10:41  6% ` Alban Gruin
@ 2020-08-13 14:38  0%   ` Phillip Wood
  0 siblings, 0 replies; 200+ results
From: Phillip Wood @ 2020-08-13 14:38 UTC (permalink / raw)
  To: Alban Gruin, Antti Keränen, git
  Cc: Taylor Blau, Jussi Keränen, Junio C Hamano, Phillip Wood,
	Johannes Schindelin

Hi Antti & Alban

On 13/08/2020 11:41, Alban Gruin wrote:
> Hi Antti,
> 
> Le 12/08/2020 à 20:33, Antti Keränen a écrit :
>> 'todo_list_write_to_file' may overwrite the static buffer, originating
>> from 'find_unique_abbrev', that was used to store the short commit hash
>> 'c' for "# Rebase a..b onto c" message in the todo editor. This is
>> because the buffer that is returned from 'find_unique_abbrev' is valid
>> until 4 more calls to `find_unique_abbrev` are made.
>>
>> As 'todo_list_write_to_file' calls 'find_unique_abbrev' for each rebased
>> commit, the hash for 'c' is overwritten if there are 4 or more commits
>> in the rebase. This behavior has been broken since its introduction.
>>
>> Fix by storing the short onto commit hash in a different buffer that
>> remains valid, before calling 'todo_list_write_to_file'.
>>
>> Found-by: Jussi Keränen <jussike@gmail.com>
>> Signed-off-by: Antti Keränen <detegr@rbx.email>
>> ---
>>   sequencer.c                   | 5 +++--
>>   t/t3404-rebase-interactive.sh | 6 ++++++
>>   2 files changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/sequencer.c b/sequencer.c
>> index fd7701c88a..e2007dbb8c 100644
>> --- a/sequencer.c
>> +++ b/sequencer.c
>> @@ -5178,13 +5178,14 @@ int complete_action(struct repository *r, struct replay_opts *opts, unsigned fla
>>   		    struct string_list *commands, unsigned autosquash,
>>   		    struct todo_list *todo_list)
>>   {
>> -	const char *shortonto, *todo_file = rebase_path_todo();
>> +	char shortonto[GIT_MAX_HEXSZ + 1];
>> +	const char *todo_file = rebase_path_todo();
>>   	struct todo_list new_todo = TODO_LIST_INIT;
>>   	struct strbuf *buf = &todo_list->buf, buf2 = STRBUF_INIT;
>>   	struct object_id oid = onto->object.oid;
>>   	int res;
>>   
>> -	shortonto = find_unique_abbrev(&oid, DEFAULT_ABBREV);
>> +	find_unique_abbrev_r(shortonto, &oid, DEFAULT_ABBREV);
>>   
>>   	if (buf->len == 0) {
>>   		struct todo_item *item = append_new_todo(todo_list);
>> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
>> index 4a7d21f898..1b4fa0843e 100755
>> --- a/t/t3404-rebase-interactive.sh
>> +++ b/t/t3404-rebase-interactive.sh
>> @@ -1760,6 +1760,12 @@ test_expect_success 'correct error message for commit --amend after empty pick'
>>   	test_i18ngrep "middle of a rebase -- cannot amend." err
>>   '
>>   
>> +test_expect_success 'todo has correct onto hash' '
>> +	GIT_SEQUENCE_EDITOR=cat git rebase -i no-conflict-branch~4 no-conflict-branch >actual &&
>> +	onto=$(git rev-parse --short HEAD~4) &&
>> +	test_i18ngrep "^# Rebase ..* onto $onto" actual
>> +'
>> +
>>   # This must be the last test in this file
>>   test_expect_success '$EDITOR and friends are unchanged' '
>>   	test_editor_unchanged
>>
> 
> Looks good to me.

It looks good to me too, thanks Antti

>    Acked-by: Alban Gruin <alban.gruin@gmail.com>
> 
> This makes me wonder if it's worth to do the same change in
> todo_list_to_strbuf().  #leftoverbits, perhaps?

In todo_list_to_strbuf() we append the short oid to an strbuf before we 
call find_unique_abbrev() again so I don't think it should be a problem 
there

Best Wishes

Phillip

> Cheers,
> Alban
> 

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] rebase -i: Fix possibly wrong onto hash in todo
  @ 2020-08-13 10:41  6% ` Alban Gruin
  2020-08-13 14:38  0%   ` Phillip Wood
  0 siblings, 1 reply; 200+ results
From: Alban Gruin @ 2020-08-13 10:41 UTC (permalink / raw)
  To: Antti Keränen, git
  Cc: Taylor Blau, Jussi Keränen, Junio C Hamano, Phillip Wood,
	Johannes Schindelin

Hi Antti,

Le 12/08/2020 à 20:33, Antti Keränen a écrit :
> 'todo_list_write_to_file' may overwrite the static buffer, originating
> from 'find_unique_abbrev', that was used to store the short commit hash
> 'c' for "# Rebase a..b onto c" message in the todo editor. This is
> because the buffer that is returned from 'find_unique_abbrev' is valid
> until 4 more calls to `find_unique_abbrev` are made.
> 
> As 'todo_list_write_to_file' calls 'find_unique_abbrev' for each rebased
> commit, the hash for 'c' is overwritten if there are 4 or more commits
> in the rebase. This behavior has been broken since its introduction.
> 
> Fix by storing the short onto commit hash in a different buffer that
> remains valid, before calling 'todo_list_write_to_file'.
> 
> Found-by: Jussi Keränen <jussike@gmail.com>
> Signed-off-by: Antti Keränen <detegr@rbx.email>
> ---
>  sequencer.c                   | 5 +++--
>  t/t3404-rebase-interactive.sh | 6 ++++++
>  2 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index fd7701c88a..e2007dbb8c 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -5178,13 +5178,14 @@ int complete_action(struct repository *r, struct replay_opts *opts, unsigned fla
>  		    struct string_list *commands, unsigned autosquash,
>  		    struct todo_list *todo_list)
>  {
> -	const char *shortonto, *todo_file = rebase_path_todo();
> +	char shortonto[GIT_MAX_HEXSZ + 1];
> +	const char *todo_file = rebase_path_todo();
>  	struct todo_list new_todo = TODO_LIST_INIT;
>  	struct strbuf *buf = &todo_list->buf, buf2 = STRBUF_INIT;
>  	struct object_id oid = onto->object.oid;
>  	int res;
>  
> -	shortonto = find_unique_abbrev(&oid, DEFAULT_ABBREV);
> +	find_unique_abbrev_r(shortonto, &oid, DEFAULT_ABBREV);
>  
>  	if (buf->len == 0) {
>  		struct todo_item *item = append_new_todo(todo_list);
> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index 4a7d21f898..1b4fa0843e 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh
> @@ -1760,6 +1760,12 @@ test_expect_success 'correct error message for commit --amend after empty pick'
>  	test_i18ngrep "middle of a rebase -- cannot amend." err
>  '
>  
> +test_expect_success 'todo has correct onto hash' '
> +	GIT_SEQUENCE_EDITOR=cat git rebase -i no-conflict-branch~4 no-conflict-branch >actual &&
> +	onto=$(git rev-parse --short HEAD~4) &&
> +	test_i18ngrep "^# Rebase ..* onto $onto" actual
> +'
> +
>  # This must be the last test in this file
>  test_expect_success '$EDITOR and friends are unchanged' '
>  	test_editor_unchanged
> 

Looks good to me.

  Acked-by: Alban Gruin <alban.gruin@gmail.com>

This makes me wonder if it's worth to do the same change in
todo_list_to_strbuf().  #leftoverbits, perhaps?

Cheers,
Alban


^ permalink raw reply	[relevance 6%]

* Re: [PATCH v3 0/2] fmt-merge-msg: selectively suppress "into <branch>"
  2020-07-31 20:03  0%         ` Taylor Blau
@ 2020-08-01  7:15  0%           ` Michal Suchánek
  0 siblings, 0 replies; 200+ results
From: Michal Suchánek @ 2020-08-01  7:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, Junio C Hamano, git

On Fri, Jul 31, 2020 at 04:03:06PM -0400, Taylor Blau wrote:
> On Thu, Jul 30, 2020 at 10:22:17PM -0400, Jeff King wrote:
> > On Thu, Jul 30, 2020 at 07:04:15PM -0700, Junio C Hamano wrote:
> >
> > > You'd rather want to "lie" about the destination branch while
> > > redoing these merges, perhaps with
> > >
> > > 	$ git merge --pretend-dest=jch topic-name
> > >
> > > with your HEAD detached, and tell fmt-merge-msg to pretend that the
> > > merge is being made into jch branch.  And that is outside the scope
> > > of this patch, though it might be a good #leftoverbits candidate.
> >
> > Since nobody really asked for it, it may make sense to wait for such a
> > feature. After all, this is the just the starting text we put into the
> > merge message. You are always free to add the pretend branch yourself in
> > the editor.
> >
> > > >   - should "master" be in the list even if you configure a value? That
> > > >     would do the wrong thing if you have a non-integration master, but
> > > >     that seems unlikely. And it would do the right thing if somebody
> > > >     later puts "main" in merge.suppressDest, but still occasionally
> > > >     works with "master" repos (where "right" is defined as "what they
> > > >     probably wanted", but it is perhaps a bit magical).
> > >
> > > If you configure, you can configure it fully without manually
> > > clearing first.  If you do not configure, you get a backward
> > > compatible default.  I think that is the only sensible semantics.
> > >
> > > Besides, I thought we were aiming to make 'master' less special.
> > > When a user already has a concrete list of things to use shorter
> > > merge title for, why should 'master' be magically added to the list
> > > and force the user to explicitly clear it?  I do not think that
> > > makes much sense.
> >
> > It's magic-ness would be purely for backwards compatibility. IMHO
> > maintaining exact behavior with respect to this particular case was not
> > a big deal, but clearly Linus disagrees. But the "do the right thing
> > above" I mentioned above is "do the right thing even if the user _did_
> > switch their config to a new name, but forgot that they sometimes are
> > working with old repos". So it is perhaps an even weaker reason.
> 
> I think that you could do this without treating 'master' as specially by
> making 'merge.suppressDest' contain the value of 'init.defaultBranch'
> (unless set otherwise).
> 
> This gets tricky when the fall-back value for 'init.defaultBranch'
> changes, though. If it were to go from 'master' -> 'main', you'd want to
> have both of those defaults in your 'merge.suppressDest' list, to avoid
> breaking clients who still use 'master' (and expect 'into master' not to
> show up in their merges).
> 
> So, I guess the rule would be: 'merge.suppressDest' contains the value
> of 'init.defaultBranch' (or its default value) along with any previous
> default values for 'init.defaultBranch', unless specified otherwise.
> 

IMHO this is way better than spome magic variable that you ahve to
assign magic value for it to have teh value you assign. Seen this in
systemd and it is not very nice to deal with.

Thanks

Michal

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v3 0/2] fmt-merge-msg: selectively suppress "into <branch>"
  2020-07-31  2:22  0%       ` Jeff King
@ 2020-07-31 20:03  0%         ` Taylor Blau
  2020-08-01  7:15  0%           ` Michal Suchánek
  0 siblings, 1 reply; 200+ results
From: Taylor Blau @ 2020-07-31 20:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

On Thu, Jul 30, 2020 at 10:22:17PM -0400, Jeff King wrote:
> On Thu, Jul 30, 2020 at 07:04:15PM -0700, Junio C Hamano wrote:
>
> > You'd rather want to "lie" about the destination branch while
> > redoing these merges, perhaps with
> >
> > 	$ git merge --pretend-dest=jch topic-name
> >
> > with your HEAD detached, and tell fmt-merge-msg to pretend that the
> > merge is being made into jch branch.  And that is outside the scope
> > of this patch, though it might be a good #leftoverbits candidate.
>
> Since nobody really asked for it, it may make sense to wait for such a
> feature. After all, this is the just the starting text we put into the
> merge message. You are always free to add the pretend branch yourself in
> the editor.
>
> > >   - should "master" be in the list even if you configure a value? That
> > >     would do the wrong thing if you have a non-integration master, but
> > >     that seems unlikely. And it would do the right thing if somebody
> > >     later puts "main" in merge.suppressDest, but still occasionally
> > >     works with "master" repos (where "right" is defined as "what they
> > >     probably wanted", but it is perhaps a bit magical).
> >
> > If you configure, you can configure it fully without manually
> > clearing first.  If you do not configure, you get a backward
> > compatible default.  I think that is the only sensible semantics.
> >
> > Besides, I thought we were aiming to make 'master' less special.
> > When a user already has a concrete list of things to use shorter
> > merge title for, why should 'master' be magically added to the list
> > and force the user to explicitly clear it?  I do not think that
> > makes much sense.
>
> It's magic-ness would be purely for backwards compatibility. IMHO
> maintaining exact behavior with respect to this particular case was not
> a big deal, but clearly Linus disagrees. But the "do the right thing
> above" I mentioned above is "do the right thing even if the user _did_
> switch their config to a new name, but forgot that they sometimes are
> working with old repos". So it is perhaps an even weaker reason.

I think that you could do this without treating 'master' as specially by
making 'merge.suppressDest' contain the value of 'init.defaultBranch'
(unless set otherwise).

This gets tricky when the fall-back value for 'init.defaultBranch'
changes, though. If it were to go from 'master' -> 'main', you'd want to
have both of those defaults in your 'merge.suppressDest' list, to avoid
breaking clients who still use 'master' (and expect 'into master' not to
show up in their merges).

So, I guess the rule would be: 'merge.suppressDest' contains the value
of 'init.defaultBranch' (or its default value) along with any previous
default values for 'init.defaultBranch', unless specified otherwise.

Apologies if this has already been suggested elsewhere and I glossed
past it.

> To be clear, I'm OK with the behavior in your patch. I just wanted to
> make sure we thought through all of the implications.

I am too.

> > >   - what's the plan if we do switch init.defaultBranch to "main"? Would
> > >     we add default_branch() to the list of defaults alongside "master",
> > >     or just add "main", or just leave it and let people configure
> > >     independently? It doesn't need to be decided now, but maybe worth
> > >     thinking about.
> > [...quite reasonable analysis that I agree with...]
> >
> > In any case, I do not think I want to see more reliance of the
> > notion that there always is one and only one single special branch
> > in the repository, so if we can get away without it, that would be
> > more preferrable.
>
> Yeah, if the plan is to stop here then I'm OK with that. That makes
> "master" special for historical reasons, but "main" or whatever never
> got this special treatment by default. People have the ability to
> configure if they choose, or they may not care either way.
>
> We might get a feature request later that says "gee, I wish we did this
> for 'main' by default without me having to configure it", but we can
> cross that bridge when we come to it.
>
> -Peff

Thanks,
Taylor

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v3 0/2] fmt-merge-msg: selectively suppress "into <branch>"
  2020-07-31  2:04  4%     ` Junio C Hamano
@ 2020-07-31  2:22  0%       ` Jeff King
  2020-07-31 20:03  0%         ` Taylor Blau
  0 siblings, 1 reply; 200+ results
From: Jeff King @ 2020-07-31  2:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Jul 30, 2020 at 07:04:15PM -0700, Junio C Hamano wrote:

> You'd rather want to "lie" about the destination branch while
> redoing these merges, perhaps with
> 
> 	$ git merge --pretend-dest=jch topic-name
> 
> with your HEAD detached, and tell fmt-merge-msg to pretend that the
> merge is being made into jch branch.  And that is outside the scope
> of this patch, though it might be a good #leftoverbits candidate.

Since nobody really asked for it, it may make sense to wait for such a
feature. After all, this is the just the starting text we put into the
merge message. You are always free to add the pretend branch yourself in
the editor.

> >   - should "master" be in the list even if you configure a value? That
> >     would do the wrong thing if you have a non-integration master, but
> >     that seems unlikely. And it would do the right thing if somebody
> >     later puts "main" in merge.suppressDest, but still occasionally
> >     works with "master" repos (where "right" is defined as "what they
> >     probably wanted", but it is perhaps a bit magical).
> 
> If you configure, you can configure it fully without manually
> clearing first.  If you do not configure, you get a backward
> compatible default.  I think that is the only sensible semantics.
> 
> Besides, I thought we were aiming to make 'master' less special.
> When a user already has a concrete list of things to use shorter
> merge title for, why should 'master' be magically added to the list
> and force the user to explicitly clear it?  I do not think that
> makes much sense.

It's magic-ness would be purely for backwards compatibility. IMHO
maintaining exact behavior with respect to this particular case was not
a big deal, but clearly Linus disagrees. But the "do the right thing
above" I mentioned above is "do the right thing even if the user _did_
switch their config to a new name, but forgot that they sometimes are
working with old repos". So it is perhaps an even weaker reason.

To be clear, I'm OK with the behavior in your patch. I just wanted to
make sure we thought through all of the implications.

> >   - what's the plan if we do switch init.defaultBranch to "main"? Would
> >     we add default_branch() to the list of defaults alongside "master",
> >     or just add "main", or just leave it and let people configure
> >     independently? It doesn't need to be decided now, but maybe worth
> >     thinking about.
> [...quite reasonable analysis that I agree with...]
> 
> In any case, I do not think I want to see more reliance of the
> notion that there always is one and only one single special branch
> in the repository, so if we can get away without it, that would be
> more preferrable.

Yeah, if the plan is to stop here then I'm OK with that. That makes
"master" special for historical reasons, but "main" or whatever never
got this special treatment by default. People have the ability to
configure if they choose, or they may not care either way.

We might get a feature request later that says "gee, I wish we did this
for 'main' by default without me having to configure it", but we can
cross that bridge when we come to it.

-Peff

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v3 0/2] fmt-merge-msg: selectively suppress "into <branch>"
  @ 2020-07-31  2:04  4%     ` Junio C Hamano
  2020-07-31  2:22  0%       ` Jeff King
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2020-07-31  2:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> This version looks OK to me. The remaining issues that came up in
> earlier discussion but I didn't see you weigh in on are:
>
>   - what should happen with a detached HEAD? We'd match HEAD in the
>     suppressDest config, which I think is quite reasonable. Not sure if
>     it's worth documenting or testing that specifically.

I think what the code with posted patch happens to do is just fine.
If you say "git -c merge.suppressDest=HEA? merge $topic" while on a
detached HEAD, you'll get "into HEAD" omitted.

In a workflow where you'd do

	$ git co master^0
	... enumerate the topics merged in master..jch, and redo the
	... merge, but with updated versions of these topics
	$ git shortlog jch..HEAD
	$ git range-diff jch...HEAD
	$ git diff jch..HEAD
	... after inspection, find the result satisfactory, and ...
	$ git co -B jch

I would strongly suspect that hiding "into HEAD" is not enough, so I
do not think HEAD is all that relevant in the first place.

You'd rather want to "lie" about the destination branch while
redoing these merges, perhaps with

	$ git merge --pretend-dest=jch topic-name

with your HEAD detached, and tell fmt-merge-msg to pretend that the
merge is being made into jch branch.  And that is outside the scope
of this patch, though it might be a good #leftoverbits candidate.

>   - should "master" be in the list even if you configure a value? That
>     would do the wrong thing if you have a non-integration master, but
>     that seems unlikely. And it would do the right thing if somebody
>     later puts "main" in merge.suppressDest, but still occasionally
>     works with "master" repos (where "right" is defined as "what they
>     probably wanted", but it is perhaps a bit magical).

If you configure, you can configure it fully without manually
clearing first.  If you do not configure, you get a backward
compatible default.  I think that is the only sensible semantics.

Besides, I thought we were aiming to make 'master' less special.
When a user already has a concrete list of things to use shorter
merge title for, why should 'master' be magically added to the list
and force the user to explicitly clear it?  I do not think that
makes much sense.

>   - what's the plan if we do switch init.defaultBranch to "main"? Would
>     we add default_branch() to the list of defaults alongside "master",
>     or just add "main", or just leave it and let people configure
>     independently? It doesn't need to be decided now, but maybe worth
>     thinking about.

My understanding is that much more instances of repositories come to
exist by cloning than running "git init".  Hence, the value you set
to the init.defaultBranch has no relevance to the name of the
primary branch in majority of your repositories, whose primary branch
is what their origin has designated before/when you cloned.

And the latter, "what is the primary branch name for this particular
repository?", is what we want to ask here.  The answer to "what is
the first branch name for new repository I will create?" is not a
good proxy for that.

I do not mind too much, even though I doubt it will be all that
useful, if we taught "init" and "clone" to record which branch is
the primary one in the repository they created.  We'd need to add
the repo_primary_branch_name() helper to allow this caller to
replace the hardcoded 'master' in the patch with it, just like
"init" and "clone" may ask the repo_default_branch_name() helper
what the first branch name ought to be.

In any case, I do not think I want to see more reliance of the
notion that there always is one and only one single special branch
in the repository, so if we can get away without it, that would be
more preferrable.

Thanks.

^ permalink raw reply	[relevance 4%]

* Re: [PATCH v7 0/5] cleanup ra/rebase-i-more-options
  @ 2020-07-16 17:39  6%   ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-07-16 17:39 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Johannes Schindelin, Elijah Newren, Rohit Ashiwal,
	Đoàn Trần Công Danh, Alban Gruin,
	Git Mailing List, Phillip Wood

Phillip Wood <phillip.wood123@gmail.com> writes:

> format-patch and am could do with having their similar messages
> updated in the future

That's a good #leftoverbits topic.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v16] Support auto-merge for meld to follow the vim-diff behavior
  @ 2020-07-12 18:04  4%   ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-07-12 18:04 UTC (permalink / raw)
  To: sunlin via GitGitGadget; +Cc: git, sunlin, Lin Sun

"sunlin via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Lin Sun <lin.sun@zoom.us>
>
> Make the mergetool used with "meld" backend behave similarly to "vimdiff" by
> telling it to auto-merge non-conflicting parts and highlight the conflicting
> parts when `mergetool.meld.useAutoMerge` is configured with `true`, or `auto`
> for detecting the `--auto-merge` option automatically.
>
> Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
> Helped-by: David Aguilar <davvid@gmail.com>
> Signed-off-by: Lin Sun <lin.sun@zoom.us>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>     Enable auto-merge for meld to follow the vimdiff beharior
>     
>     Hi, the mergetool "meld" does NOT merge the no-conflict changes, while
>     the mergetool "vimdiff" will merge the no-conflict changes and highlight
>     the conflict parts. This patch will make the mergetool "meld" similar to
>     "vimdiff", auto-merge the no-conflict changes, highlight conflict parts.
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-781%2Fsunlin7%2Fmaster-v16
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-781/sunlin7/master-v16
> Pull-Request: https://github.com/git/git/pull/781
>
> Range-diff vs v15:
>
>  1:  02d849784f ! 1:  d235a576b4 Support auto-merge for meld to follow the vim-diff behavior
>      @@ mergetools/meld: diff_cmd () {
>       -	else
>       -		meld_has_output_option=false
>       +		meld_use_auto_merge_option=$(
>      -+			git --bool-or-str config mergetool.meld.useAutoMerge)
>      ++			git config --bool-or-str mergetool.meld.useAutoMerge)

It is quite clear in this hunk that that the previous one was not
even proofread before sending X-<.

> +		} else if (type == TYPE_BOOL_OR_STR) {
> +			int is_bool, v;
> +			v = git_config_bool_or_str(NULL, key_, value_, &is_bool);
> +			if (is_bool)
> +				strbuf_addstr(buf, v ? "true" : "false");
> +			else
> +				strbuf_addstr(buf, value_);
>  		} else if (type == TYPE_PATH) {
>  			const char *v;
>  			if (git_config_pathname(&v, key_, value_) < 0)
> @@ -411,6 +422,14 @@ static char *normalize_value(const char *key, const char *value)
>  		else
>  			return xstrdup(v ? "true" : "false");
>  	}
> +	if (type == TYPE_BOOL_OR_STR) {
> +		int is_bool, v;
> +		v = git_config_bool_or_str(NULL, key, value, &is_bool);
> +		if (!is_bool)
> +			return xstrdup(value);
> +		else
> +			return xstrdup(v ? "true" : "false");
> +	}

That's unfortunate that we need almost identical code duplicated
here and above.  It probably is a tad larger than what we typcally
call #leftoverbits, so please ignore it for now.

> diff --git a/config.c b/config.c
> index 8db9c77098..4c6c06d10b 100644
> --- a/config.c
> +++ b/config.c
> @@ -1100,6 +1100,20 @@ int git_config_bool_or_int(const char *name, const char *value, int *is_bool)
>  	return git_config_int(name, value);
>  }
>  
> +int git_config_bool_or_str(const char **dest, const char *name, const char *value, int *is_bool)
> +{
> +	int v = git_parse_maybe_bool_text(value);
> +	if (0 <= v) {
> +		*is_bool = 1;
> +		return v;
> +	}
> +	*is_bool = 0;
> +	if (dest != NULL)
> +	  return git_config_string(dest, name, value);
> +	else
> +	  return 0;
> +}

Wrong indentation.

I do not think this is a good interface at all, from at least three
points.

 - What happens when the value is set to "2"?  git_config_bool()
   would say, because it calls git_config_bool_or_int() and learns
   that the value is an integer 2 and uses !! operator on it to
   normalize it to 1, we judge it as "true".  Your implementation
   says it is not a bool and instead it is a string "2".  When
   telling a boolean and an integer apart, saying 2 is not a bool
   makes sense, but given that "interpret this value as boolean"
   logic in git_config_bool() says "2" is a true, the logic to tell
   a boolean and a string apart probably should say that the user
   who wrote "2" there meant true, i.e. boolean.

 - What's the returned value from this function and how can the
   caller sensibly use it?  If it happened to be (narrowly defined)
   bool, the returned value is 0 for false and 1 for true.
   Otherwise, the caller gets 0 if it forgets to pass dest, or 0 if
   value successfully gets returned as a string, or -1 upon an
   error.  Hence it is impossible for the caller to use

	if (git_config_bool_or_str(...)) {
		... do one thing ...
	} else {
		... do something else ...
	}

 - There is no point to pass dest to this function.  If it is not
   bool, then the caller can do strdup() the value.

> diff --git a/config.h b/config.h
> index 060874488f..175b88d9c5 100644
> --- a/config.h
> +++ b/config.h
> @@ -217,6 +217,13 @@ ssize_t git_config_ssize_t(const char *, const char *);
>   */
>  int git_config_bool_or_int(const char *, const char *, int *);
>  
> +/**
> + * Same as `git_config_bool`, except that `is_bool` flag is unset, then if
> + * `dest` parameter is non-NULL, it allocates and copies the value string
> + * into the `dest`, if `dest` is NULL and `is_bool` flag is unset it return 0.
> + */

is_bool is not an "in-parameter" flag but a pointer to point at
where the result is stored, so the above description does not make
much sense.  I suspect, from the actual implementation, that you
wanted to say

    Parse "value" to see if it is a boolean, and if so set *is_bool
    to true and leave *dest untouched.  If it is not a boolean, set
    *is_bool to false and assign a copy of value to *dest.

But again, I do not think this function is designed right, so let's
not spend any more time polishing what you wrote for now.

I would expect something like this in builtin/config.c would be
sufficient:

	if (type == TYPE_BOOL_OR_STRING) {
		int v = git_parse_maybe_bool(value);
		if (v < 0)
			return xstrdup(value);
		else
			return xstrdup(v ? "true" : "false");
	}

i.e. we do not need a new helper in the lower level of the API stack.

> +		meld_use_auto_merge_option=$(
> +			git config --bool-or-str mergetool.meld.useAutoMerge)

If the body is made on a separate line for readability, doing it more
like so would be even more readable:

		meld_use_auto_merge_option=$(
			git config --bool-or-str mergetool.meld.useAutoMerge
		)

> +		case "$meld_use_auto_merge_option" in
> +		true|false)
> +			: use well formatted boolean value
> +			;;
> +		auto)
> +			# testing the "--auto-merge" option only if config is "auto"
> +			init_meld_help_msg
> +
> +			case "$meld_help_msg" in
> +			*"--auto-merge"*|*'[OPTION...]'*)
> +				meld_use_auto_merge_option=true
> +				;;
> +			*)
> +				meld_use_auto_merge_option=false
> +				;;
> +			esac
> +			;;
> +		*)
> +			meld_use_auto_merge_option=false

Now that the --bool-or-string would be silent, you have to give an
error message yourself here, no?  Have you hand-tested the result of
applying your patch to see if all the cases we care about (i.e.
various scenarios we raised and thought together how the code should
react to the situation during the review discussion so far)?

We are not in a hurry, and we will not be paying too much attention
on topics that are not yet in 'next' until the upcoming release is
done anyway, so take your time to try polishing before sending
anything out.

Thanks.

^ permalink raw reply	[relevance 4%]

* Re: [PATCH 1/3] docs: adjust for the recent rename of `pu` to `seen`
  @ 2020-06-23 21:32  6%       ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2020-06-23 21:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Đoàn Trần Công Danh,
	Johannes Schindelin via GitGitGadget, git

[-- Attachment #1: Type: text/plain, Size: 2958 bytes --]

Hi Junio & Danh,

On Tue, 23 Jun 2020, Junio C Hamano wrote:

> Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:
>
> > On 2020-06-23 15:04:13+0000, Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com> wrote:
> >> diff --git a/Documentation/git-ls-remote.txt b/Documentation/git-ls-remote.txt
> >> index 0a5c8b7d493..492e573856f 100644
> >> --- a/Documentation/git-ls-remote.txt
> >> +++ b/Documentation/git-ls-remote.txt
> >> @@ -101,9 +101,9 @@ f25a265a342aed6041ab0cc484224d9ca54b6f41	refs/tags/v0.99.1
> >>  7ceca275d047c90c0c7d5afb13ab97efdf51bd6e	refs/tags/v0.99.3
> >>  c5db5456ae3b0873fc659c19fafdde22313cc441	refs/tags/v0.99.2
> >>  0918385dbd9656cab0d1d81ba7453d49bbc16250	refs/tags/junio-gpg-pub
> >> -$ git ls-remote http://www.kernel.org/pub/scm/git/git.git master pu rc
> >> +$ git ls-remote http://www.kernel.org/pub/scm/git/git.git master seen rc
> >
> > rc is not with us anymore.
> >
> > Should we replace it with next, too?
>
> I do not think so.  I think we never had 'rc'.

Indeed, and the context given in the patch demonstrates that no `rc` is
shown, so I assumed the same things as Junio explained here:

> I think what the above example is demonstrating is this.
>
>     SYNOPSIS calls the last command line arguments <refs>; they are
>     actually mere patterns (which is how these command line
>     arguments are described in the documentation).  It is *not* an
>     error if no refs match a particular pattern.
>
> And because we have no refs that match the pattern "rc", we only see
> "master" and "pu" (now "seen") from the command.

Precisely.

> I see a couple of possible improvements here:
>
>  - The "<refs>...::" documentation should explain what kind of
>    pattern match is performed here.  I recall these originally were
>    just tail matches, but the rule might have been made more
>    flexible over time.
>
>  - The example should first explain the setting.  The first sample
>    depends on the current (./.) repository having these tags or it
>    would not work (showing the sample upfront and explaining the
>    outcome shown in the sample would work well in this case,
>    e.g. "we can see that in the current repository, there are tags
>    X, Y and Z").  The second one at least needs to say two things:
>    the sample repository does not have a branch called 'rc' and that
>    is why it is not shown, and it is not an error for patterns to
>    produce no match.

Those sound like wonderful #leftoverbits to me.

Thank you,
Dscho

>
> Thanks.
>
> >
> >>  5fe978a5381f1fbad26a80e682ddd2a401966740	refs/heads/master
> >> -c781a84b5204fb294c9ccc79f8b3baceeb32c061	refs/heads/pu
> >> +c781a84b5204fb294c9ccc79f8b3baceeb32c061	refs/heads/seen
> >>  $ git remote add korg http://www.kernel.org/pub/scm/git/git.git
> >>  $ git ls-remote --tags korg v\*
> >>  d6602ec5194c87b0fc87103ca4d67251c76f233a	refs/tags/v0.99
>

^ permalink raw reply	[relevance 6%]

* [PATCH] diff-files: treat "i-t-a" files as "not-in-index"
@ 2020-06-11 16:16  5% Srinidhi Kaushik
  0 siblings, 0 replies; 200+ results
From: Srinidhi Kaushik @ 2020-06-11 16:16 UTC (permalink / raw)
  To: git; +Cc: Srinidhi Kaushik

The `diff-files' command and related commands which call `cmd_diff_files()',
consider the "intent-to-add" files as a part of the index when comparing the
work-tree against it. This was previously addressed in [1] and [2] by turning
the option `--ita-invisible-in-index' (introduced in [3]) on by default.

For `diff-files' (and `add -p' as a consequence) to show the i-t-a files as
as new, `ita_invisible_in_index' will be enabled by default here as well.

[1] 0231ae71d3 (diff: turn --ita-invisible-in-index on by default, 2018-05-26)
[2] 425a28e0a4 (diff-lib: allow ita entries treated as "not yet exist in
                index", 2016-10-24)
[3] b42b451919 (diff: add --ita-[in]visible-in-index, 2016-10-24)

Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
---

Hello! This is my first patch in this project.
This issue was mentioned in #leftoverbits on GitHub: [1], and this
patch implements the change proposed in [2].

[1] https://github.com/gitgitgadget/git/issues/647
[2] https://lore.kernel.org/git/20200527230357.GB546534@coredump.intra.peff.net


 builtin/diff-files.c  |  7 +++++++
 t/t2203-add-intent.sh | 25 ++++++++++++++++++++++++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index 86ae474fbf..1e352dd8f7 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -28,6 +28,13 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix)
 	git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
 	repo_init_revisions(the_repository, &rev, prefix);
 	rev.abbrev = 0;
+
+	/*
+	 * Consider "intent-to-add" files as new by default, unless
+	 * explicitly specified in the command line or anywhere else.
+	 */
+	rev.diffopt.ita_invisible_in_index = 1;
+
 	precompose_argv(argc, argv);
 
 	argc = setup_revisions(argc, argv, &rev, NULL);
diff --git a/t/t2203-add-intent.sh b/t/t2203-add-intent.sh
index 5bbe8dcce4..742f27a935 100755
--- a/t/t2203-add-intent.sh
+++ b/t/t2203-add-intent.sh
@@ -232,7 +232,7 @@ test_expect_success 'double rename detection in status' '
 	)
 '
 
-test_expect_success 'diff-files/diff-cached shows ita as new/not-new files' '
+test_expect_success 'diff/diff-cached shows ita as new/not-new files' '
 	git reset --hard &&
 	echo new >new-ita &&
 	git add -N new-ita &&
@@ -243,6 +243,29 @@ test_expect_success 'diff-files/diff-cached shows ita as new/not-new files' '
 	test_must_be_empty actual2
 '
 
+test_expect_success 'diff-files shows i-t-a files as new files' '
+	git reset --hard &&
+	touch empty &&
+	content="foo" &&
+	echo $content >not-empty &&
+	git add -N empty not-empty &&
+	git diff-files -p >actual &&
+	hash_e=$(git hash-object empty) &&
+	hash_n=$(git hash-object not-empty) &&
+	cat >expect <<-EOF &&
+	diff --git a/empty b/empty
+	new file mode 100644
+	index 0000000..$(git rev-parse --short $hash_e)
+	diff --git a/not-empty b/not-empty
+	new file mode 100644
+	index 0000000..$(git rev-parse --short $hash_n)
+	--- /dev/null
+	+++ b/not-empty
+	@@ -0,0 +1 @@
+	+$content
+	EOF
+	test_cmp expect actual
+'
 
 test_expect_success '"diff HEAD" includes ita as new files' '
 	git reset --hard &&
-- 
2.27.0


^ permalink raw reply related	[relevance 5%]

* Re: Git multiple remotes push stop at first failed connection
  2020-06-02 16:26  6%   ` Junio C Hamano
@ 2020-06-02 16:54  0%     ` John Siu
  0 siblings, 0 replies; 200+ results
From: John Siu @ 2020-06-02 16:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

On Tue, Jun 2, 2020 at 12:26 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> > There's really no benefit to doing it all in a single Git process, as
> > we'd connect to each independently, run a separate independent
> > pack-objects for each, etc.
> >
> > I'd even suggest that Git implement such a loop itself, as we did for
> > "git fetch --all", but sadly "push --all" is already taken for a
> > different meaning (but it might still be worth doing under a different
> > option name).
>

Yes. We notice the fetch/push --all is for branches.

> I wonder if it is possible to update the implementation to do so
> without changing the UI at all, though.
>
> The presence of the "--all" option in "fetch" command is tied
> closely to the fact that it makes no sense to have multiple URLs
> that are used to download from at the same time under a single
> remote name (e.g. what should "remotes/origin/master" point at if
> two URLs say different things if such an arrangement were allowed?).
>
> On the other hand, the pushURL for a single remote can be multiple
> places for redundancy (a possible #leftoverbits here is that we
> should probably disable the "pretend that we immediately turned
> around and fetched from them after pushing" optimization when
> pushing to a remote that has multiple pushURLs defined) does not
> need an extra option.  If the way we currently push is suboptimal
> and it is better to spawn a separate "git push" instance via the
> run_command() API, that can safely be done as a bugfix without
> affecting any UI elements, no?
>

I agree a "bugfix" for push only is good enough and safe. As the
current behavior is already pushing to all pushURLs of a single
remote. We are not trying to change behavior or do anything extra.

^ permalink raw reply	[relevance 0%]

* Re: Git multiple remotes push stop at first failed connection
  @ 2020-06-02 16:26  6%   ` Junio C Hamano
  2020-06-02 16:54  0%     ` John Siu
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2020-06-02 16:26 UTC (permalink / raw)
  To: Jeff King; +Cc: John Siu, git

Jeff King <peff@peff.net> writes:

> There's really no benefit to doing it all in a single Git process, as
> we'd connect to each independently, run a separate independent
> pack-objects for each, etc.
>
> I'd even suggest that Git implement such a loop itself, as we did for
> "git fetch --all", but sadly "push --all" is already taken for a
> different meaning (but it might still be worth doing under a different
> option name).

I wonder if it is possible to update the implementation to do so
without changing the UI at all, though.

The presence of the "--all" option in "fetch" command is tied
closely to the fact that it makes no sense to have multiple URLs
that are used to download from at the same time under a single
remote name (e.g. what should "remotes/origin/master" point at if
two URLs say different things if such an arrangement were allowed?).

On the other hand, the pushURL for a single remote can be multiple
places for redundancy (a possible #leftoverbits here is that we
should probably disable the "pretend that we immediately turned
around and fetched from them after pushing" optimization when
pushing to a remote that has multiple pushURLs defined) does not
need an extra option.  If the way we currently push is suboptimal
and it is better to spawn a separate "git push" instance via the
run_command() API, that can safely be done as a bugfix without
affecting any UI elements, no?

^ permalink raw reply	[relevance 6%]

* [PATCH v3] fast-import: add new --date-format=raw-permissive format
  @ 2020-05-30 20:25  3% ` Elijah Newren via GitGitGadget
  0 siblings, 0 replies; 200+ results
From: Elijah Newren via GitGitGadget @ 2020-05-30 20:25 UTC (permalink / raw)
  To: git; +Cc: peff, jrnieder, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

There are multiple repositories in the wild with random, invalid
timezones.  Most notably is a commit from rails.git with a timezone of
"+051800"[1].  A few searches will find other repos with that same
invalid timezone as well.  Further, Peff reports that GitHub relaxed
their fsck checks in August 2011 to accept any timezone value[2], and
there have been multiple reports to filter-repo about fast-import
crashing while trying to import their existing repositories since they
had timezone values such as "-7349423" and "-43455309"[3].

The existing check on timezone values inside fast-import may prove
useful for people who are crafting fast-import input by hand or with a
new script.  For them, the check may help them avoid accidentally
recording invalid dates.  (Note that this check is rather simplistic and
there are still several forms of invalid dates that fast-import does not
check for: dates in the future, timezone values with minutes that are
not divisible by 15, and timezone values with minutes that are 60 or
greater.)  While this simple check may have some value for those users,
other users or tools will want to import existing repositories as-is.
Provide a --date-format=raw-permissive format that will not error out on
these otherwise invalid timezones so that such existing repositories can
be imported.

[1] https://github.com/rails/rails/commit/4cf94979c9f4d6683c9338d694d5eb3106a4e734
[2] https://lore.kernel.org/git/20200521195513.GA1542632@coredump.intra.peff.net/
[3] https://github.com/newren/git-filter-repo/issues/88

Signed-off-by: Elijah Newren <newren@gmail.com>
---
    fast-import: accept invalid timezones so we can import existing repos
    
    Changes since v2:
    
     * Add documentation
     * Note the fact that the "strict" method really isn't all that strict
       with some NEEDSWORK comments
     * Check for parsed as unsigned before checking that value range makes
       sense
     * Simplify the testcase as suggested by Peff, leaving it to stick out a
       bit like a sore thumb from the rest of the tests in the same file
       (#leftoverbits)

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-795%2Fnewren%2Floosen-fast-import-timezone-parsing-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-795/newren/loosen-fast-import-timezone-parsing-v3
Pull-Request: https://github.com/git/git/pull/795

Range-diff vs v2:

 1:  9580aacdb21 ! 1:  48326d16dbd fast-import: add new --date-format=raw-permissive format
     @@ Commit message
      
          Signed-off-by: Elijah Newren <newren@gmail.com>
      
     + ## Documentation/git-fast-import.txt ##
     +@@ Documentation/git-fast-import.txt: by users who are located in the same location and time zone.  In this
     + case a reasonable offset from UTC could be assumed.
     + +
     + Unlike the `rfc2822` format, this format is very strict.  Any
     +-variation in formatting will cause fast-import to reject the value.
     ++variation in formatting will cause fast-import to reject the value,
     ++and some sanity checks on the numeric values may also be performed.
     ++
     ++`raw-permissive`::
     ++	This is the same as `raw` except that no sanity checks on
     ++	the numeric epoch and local offset are performed.  This can
     ++	be useful when trying to filter or import an existing history
     ++	with e.g. bogus timezone values.
     + 
     + `rfc2822`::
     + 	This is the standard email format as described by RFC 2822.
     +
       ## fast-import.c ##
      @@ fast-import.c: struct hash_list {
       
     @@ fast-import.c: static int parse_data(struct strbuf *sb, uintmax_t limit, uintmax
       {
       	const char *orig_src = src;
       	char *endp;
     - 	unsigned long num;
     -+	int out_of_range_timezone;
     - 
     +@@ fast-import.c: static int validate_raw_date(const char *src, struct strbuf *result)
       	errno = 0;
       
     + 	num = strtoul(src, &endp, 10);
     +-	/* NEEDSWORK: perhaps check for reasonable values? */
     ++	/*
     ++	 * NEEDSWORK: perhaps check for reasonable values? For example, we
     ++	 *            could error on values representing times more than a
     ++	 *            day in the future.
     ++	 */
     + 	if (errno || endp == src || *endp != ' ')
     + 		return -1;
     + 
      @@ fast-import.c: static int validate_raw_date(const char *src, struct strbuf *result)
       		return -1;
       
       	num = strtoul(src + 1, &endp, 10);
      -	if (errno || endp == src + 1 || *endp || 1400 < num)
     -+	out_of_range_timezone = strict && (1400 < num);
     -+	if (errno || endp == src + 1 || *endp || out_of_range_timezone)
     ++	/*
     ++	 * NEEDSWORK: check for brokenness other than num > 1400, such as
     ++	 *            (num % 100) >= 60, or ((num % 100) % 15) != 0 ?
     ++	 */
     ++	if (errno || endp == src + 1 || *endp || /* did not parse */
     ++	    (strict && (1400 < num))             /* parsed a broken timezone */
     ++	   )
       		return -1;
       
       	strbuf_addstr(result, orig_src);
     @@ t/t9300-fast-import.sh: test_expect_success 'B: accept empty committer' '
      +	COMMIT
      +	INPUT_END
      +
     -+	test_when_finished "git update-ref -d refs/heads/invalid-timezone
     -+		git gc
     -+		git prune" &&
     -+	git fast-import --date-format=raw-permissive <input &&
     -+	git cat-file -p invalid-timezone >out &&
     ++	git init invalid-timezone &&
     ++	git -C invalid-timezone fast-import --date-format=raw-permissive <input &&
     ++	git -C invalid-timezone cat-file -p invalid-timezone >out &&
      +	grep "1234567890 [+]051800" out
      +'
      +


 Documentation/git-fast-import.txt |  9 ++++++++-
 fast-import.c                     | 25 +++++++++++++++++++++----
 t/t9300-fast-import.sh            | 28 ++++++++++++++++++++++++++++
 3 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index 77c6b3d0019..7d9aad2a7e1 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -293,7 +293,14 @@ by users who are located in the same location and time zone.  In this
 case a reasonable offset from UTC could be assumed.
 +
 Unlike the `rfc2822` format, this format is very strict.  Any
-variation in formatting will cause fast-import to reject the value.
+variation in formatting will cause fast-import to reject the value,
+and some sanity checks on the numeric values may also be performed.
+
+`raw-permissive`::
+	This is the same as `raw` except that no sanity checks on
+	the numeric epoch and local offset are performed.  This can
+	be useful when trying to filter or import an existing history
+	with e.g. bogus timezone values.
 
 `rfc2822`::
 	This is the standard email format as described by RFC 2822.
diff --git a/fast-import.c b/fast-import.c
index c98970274c4..0dfa14dc8c3 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -139,6 +139,7 @@ struct hash_list {
 
 typedef enum {
 	WHENSPEC_RAW = 1,
+	WHENSPEC_RAW_PERMISSIVE,
 	WHENSPEC_RFC2822,
 	WHENSPEC_NOW
 } whenspec_type;
@@ -1911,7 +1912,7 @@ static int parse_data(struct strbuf *sb, uintmax_t limit, uintmax_t *len_res)
 	return 1;
 }
 
-static int validate_raw_date(const char *src, struct strbuf *result)
+static int validate_raw_date(const char *src, struct strbuf *result, int strict)
 {
 	const char *orig_src = src;
 	char *endp;
@@ -1920,7 +1921,11 @@ static int validate_raw_date(const char *src, struct strbuf *result)
 	errno = 0;
 
 	num = strtoul(src, &endp, 10);
-	/* NEEDSWORK: perhaps check for reasonable values? */
+	/*
+	 * NEEDSWORK: perhaps check for reasonable values? For example, we
+	 *            could error on values representing times more than a
+	 *            day in the future.
+	 */
 	if (errno || endp == src || *endp != ' ')
 		return -1;
 
@@ -1929,7 +1934,13 @@ static int validate_raw_date(const char *src, struct strbuf *result)
 		return -1;
 
 	num = strtoul(src + 1, &endp, 10);
-	if (errno || endp == src + 1 || *endp || 1400 < num)
+	/*
+	 * NEEDSWORK: check for brokenness other than num > 1400, such as
+	 *            (num % 100) >= 60, or ((num % 100) % 15) != 0 ?
+	 */
+	if (errno || endp == src + 1 || *endp || /* did not parse */
+	    (strict && (1400 < num))             /* parsed a broken timezone */
+	   )
 		return -1;
 
 	strbuf_addstr(result, orig_src);
@@ -1963,7 +1974,11 @@ static char *parse_ident(const char *buf)
 
 	switch (whenspec) {
 	case WHENSPEC_RAW:
-		if (validate_raw_date(ltgt, &ident) < 0)
+		if (validate_raw_date(ltgt, &ident, 1) < 0)
+			die("Invalid raw date \"%s\" in ident: %s", ltgt, buf);
+		break;
+	case WHENSPEC_RAW_PERMISSIVE:
+		if (validate_raw_date(ltgt, &ident, 0) < 0)
 			die("Invalid raw date \"%s\" in ident: %s", ltgt, buf);
 		break;
 	case WHENSPEC_RFC2822:
@@ -3258,6 +3273,8 @@ static void option_date_format(const char *fmt)
 {
 	if (!strcmp(fmt, "raw"))
 		whenspec = WHENSPEC_RAW;
+	else if (!strcmp(fmt, "raw-permissive"))
+		whenspec = WHENSPEC_RAW_PERMISSIVE;
 	else if (!strcmp(fmt, "rfc2822"))
 		whenspec = WHENSPEC_RFC2822;
 	else if (!strcmp(fmt, "now"))
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 768257b29e0..e151df81c06 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -410,6 +410,34 @@ test_expect_success 'B: accept empty committer' '
 	test -z "$out"
 '
 
+test_expect_success 'B: reject invalid timezone' '
+	cat >input <<-INPUT_END &&
+	commit refs/heads/invalid-timezone
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1234567890 +051800
+	data <<COMMIT
+	empty commit
+	COMMIT
+	INPUT_END
+
+	test_when_finished "git update-ref -d refs/heads/invalid-timezone" &&
+	test_must_fail git fast-import <input
+'
+
+test_expect_success 'B: accept invalid timezone with raw-permissive' '
+	cat >input <<-INPUT_END &&
+	commit refs/heads/invalid-timezone
+	committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1234567890 +051800
+	data <<COMMIT
+	empty commit
+	COMMIT
+	INPUT_END
+
+	git init invalid-timezone &&
+	git -C invalid-timezone fast-import --date-format=raw-permissive <input &&
+	git -C invalid-timezone cat-file -p invalid-timezone >out &&
+	grep "1234567890 [+]051800" out
+'
+
 test_expect_success 'B: accept and fixup committer with no name' '
 	cat >input <<-INPUT_END &&
 	commit refs/heads/empty-committer-2

base-commit: 2d5e9f31ac46017895ce6a183467037d29ceb9d3
-- 
gitgitgadget

^ permalink raw reply related	[relevance 3%]

* Re: [PATCH] checkout -p: handle new files correctly
  2020-05-27 19:51  6%   ` Johannes Schindelin
@ 2020-05-27 19:58  0%     ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2020-05-27 19:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Merlin Büge

Hi,

On Wed, 27 May 2020, Johannes Schindelin wrote:

> On Wed, 27 May 2020, Jeff King wrote:
>
> > I lied a little with "would never see a new file". There _is_ a
> > related case with "add -p" that might be worth thinking about:
> > intent-to-add files.
>
> Indeed. Maybe I can leave that as #leftoverbits?

Added as https://github.com/gitgitgadget/git/issues/647

Ciao,
Dscho

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] checkout -p: handle new files correctly
  @ 2020-05-27 19:51  6%   ` Johannes Schindelin
  2020-05-27 19:58  0%     ` Johannes Schindelin
  0 siblings, 1 reply; 200+ results
From: Johannes Schindelin @ 2020-05-27 19:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin via GitGitGadget, git, Merlin Büge

Hi Peff,

On Wed, 27 May 2020, Jeff King wrote:

> On Wed, May 27, 2020 at 09:09:06PM +0000, Johannes Schindelin via GitGitGadget wrote:
>
> > However, since the same machinery was used for `git checkout -p` &
> > friends, we can see new files.
> >
> > Handle this case specifically, adding a new prompt for it that is
> > modeled after the `deleted file` case.
>
> Thanks! I was planning to dig further into this topic today, and here it
> is all wrapped up with a bow. :)

:-)

>
> >  add-patch.c                | 30 +++++++++++++++++++++++-------
> >  git-add--interactive.perl  | 21 +++++++++++++++++++--
>
> Ooh, you even fixed the perl version, too. I was just going to leave it
> in the dust and add a test that set GIT_TEST_ADD_I_USE_BUILTIN.

As long as there is an escape hatch, I try to keep it working.

> Both versions look good, and are similar to what I expected from looking
> at it last night.

Thank you!

> > The original patch selection code was written for `git add -p`, and the
> > fundamental unit on which it works is a hunk.
> >
> > We hacked around that to handle deletions back in 24ab81ae4d
> > (add-interactive: handle deletion of empty files, 2009-10-27). But `git
> > add -p` would never see a new file, since we only consider the set of
> > tracked files in the index.
>
> I lied a little with "would never see a new file". There _is_ a related
> case with "add -p" that might be worth thinking about: intent-to-add
> files.

Indeed. Maybe I can leave that as #leftoverbits?

Ciao,
Dscho

>
>   $ git init
>   $ >empty
>   $ echo content >not-empty
>   $ git add -N .
>   $ git add -p
>   diff --git a/not-empty b/not-empty
>   index e69de29..d95f3ad 100644
>   --- a/not-empty
>   +++ b/not-empty
>   @@ -0,0 +1 @@
>   +content
>   (1/1) Stage this hunk [y,n,q,a,d,e,?]? n
>
>   [no mention of empty file!]
>
> I think the culprit here is diff-files, though, which doesn't show a
> patch for intent-to-add:
>
>   $ git diff-files
>   :100644 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0000000000000000000000000000000000000000 M	empty
>   :100644 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0000000000000000000000000000000000000000 M	not-empty
>
>   $ git diff-files -p
>   diff --git a/not-empty b/not-empty
>   index e69de29..d95f3ad 100644
>   --- a/not-empty
>   +++ b/not-empty
>   @@ -0,0 +1 @@
>   +content
>
> I don't think this really intersects with the patch here at all, because
> diff-files is not producing "new file" lines for these entries (even for
> the non-empty one).
>
> The solution _might_ be to convince diff-files to treat i-t-a entries as
> creations. And then with your patch here, we'd naturally do the right
> thing. So I don't think this needs to hold up your patch in any way, nor
> do we necessarily need to deal with i-t-a now. I was mostly curious how
> they worked, since we don't support added files. The answer is just that
> they don't always. ;)
>
> -Peff
>

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v13 04/13] reftable: file format documentation
  @ 2020-05-19 22:00  3%     ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-05-19 22:00 UTC (permalink / raw)
  To: Jonathan Nieder via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Jonathan Nieder

"Jonathan Nieder via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jonathan Nieder <jrnieder@gmail.com>
>
> Shawn Pearce explains:
>
> Some repositories contain a lot of references (e.g. android at 866k,
> rails at 31k). The reftable format provides:
>
> - Near constant time lookup for any single reference, even when the
>   repository is cold and not in process or kernel cache.
> - Near constant time verification a SHA-1 is referred to by at least
>   one reference (for allow-tip-sha1-in-want).

Not quite grammatical sentence?  Perhaps "if" after "verification?

> - Efficient lookup of an entire namespace, such as `refs/tags/`.
> - Support atomic push `O(size_of_update)` operations.
> - Combine reflog storage with ref storage.
>
> This file format spec was originally written in July, 2017 by Shawn
> Pearce.  Some refinements since then were made by Shawn and by Han-Wen
> Nienhuys based on experiences implementing and experimenting with the
> format.  (All of this was in the context of our work at Google and
> Google is happy to contribute the result to the Git project.)
>
> Imported from JGit[1]'s current version (c217d33ff,
> "Documentation/technical/reftable: improve repo layout", 2020-02-04)
> of Documentation/technical/reftable.md and converted to asciidoc by
> running
>
>   pandoc -t asciidoc -f markdown reftable.md >reftable.txt
>
> using pandoc 2.2.1.  The result required the following additional
> minor changes:
>
> - removed the [TOC] directive to add a table of contents, since
>   asciidoc does not support it
> - replaced git-scm.com/docs links with linkgit: directives that link
>   to other pages within Git's documentation

There are many 

	’

funny-quotes where we would prefer to place vanilla single quotes,
which may also need to be corrected in the conversion toolchain.

Typoes pointed out below may probably be from the original where
they should be corrected.

> diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
> new file mode 100644
> index 00000000000..8bad9ade256
> --- /dev/null
> +++ b/Documentation/technical/reftable.txt
> @@ -0,0 +1,1067 @@
> +reftable
> +--------
> +
> +Overview
> +~~~~~~~~
> +
> +Problem statement
> +^^^^^^^^^^^^^^^^^
> +
> +Some repositories contain a lot of references (e.g. android at 866k,

Let's not use &nbsp; here after "e.g.".  I see a normal space after "e.g."
a few lines below.

> +rails at 31k). The existing packed-refs format takes up a lot of space
> +(e.g. 62M), and does not scale with additional references. Lookup of a
> +single reference requires linearly scanning the file.
> +Atomic pushes modifying multiple references require copying the entire
> +packed-refs file, which can be a considerable amount of data moved
> +(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
> +
> +Repositories with many loose references occupy a large number of disk
> +blocks from the local file system, as each reference is its own file
> +storing 41 bytes (and another file for the corresponding reflog). This
> +negatively affects the number of inodes available when a large number of
> +repositories are stored on the same filesystem. Readers can be penalized
> +due to the larger number of syscalls required to traverse and read the
> +`$GIT_DIR/refs` directory.

Another downside is that we cannot arrange atomic updates to
multiple refs over loose refs, even though the "lookup of a single
reference does not require linear scan" unlike packed-refs, (as long
as the filesystem does its job).  Worth mentioning?

> +
> +Objectives
> +^^^^^^^^^^
> +
> +* Near constant time lookup for any single reference, even when the
> +repository is cold and not in process or kernel cache.
> +* Near constant time verification if a SHA-1 is referred to by at least
> +one reference (for allow-tip-sha1-in-want).
> +* Efficient lookup of an entire namespace, such as `refs/tags/`.

Does this "lookup" refer to "do we have anything in refs/tags/
hierarchy?" or "enumerate all refs under refs/tags/ hierarchy?"

If the latter, perhaps s/lookup of/iteration over/

> +* Support atomic push with `O(size_of_update)` operations.
> +* Combine reflog storage with ref storage for small transactions.
> +* Separate reflog storage for base refs and historical logs.

> +Details
> +~~~~~~~
> +
> +Peeling
> +^^^^^^^
> +
> +References stored in a reftable are peeled, a record for an annotated
> +(or signed) tag records both the tag object, and the object it refers
> +to.

OK.  Peeled results are recorded in packed-refs file because quite
often when we use a tag object, what we actually want to access is
the commit object it points at.  We do so here for the same reason?

Not a rhetorical question, but if it invites a question from a
reader, it may deserve to be described before readers ask it.

> +Reference name encoding
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Reference names are an uninterpreted sequence of bytes that must pass
> +linkgit:git-check-ref-format[1] as a valid reference name.

OK.  We want to be able to express any reference that we allow in
the current backends.

> +Key unicity
> +^^^^^^^^^^^
> +
> +Each entry must have a unique key; repeated keys are disallowed.
> +
> +Network byte order
> +^^^^^^^^^^^^^^^^^^
> +
> +All multi-byte, fixed width fields are in network byte order.
> +
> +Ordering
> +^^^^^^^^
> +
> +Blocks are lexicographically ordered by their first reference.

Key and Block are not explained until quite later, so these two
among the above three are "Huh?" to readers at this point during
their first read, but it probably cannot be helped.  Let's read on.

> +Directory/file conflicts
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +The reftable format accepts both `refs/heads/foo` and
> +`refs/heads/foo/bar` as distinct references.
> +
> +This property is useful for retaining log records in reftable, but may
> +confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
> +references. Users of reftable may choose to continue to reject `foo` and
> +`foo/bar` type conflicts to prevent problems for peers.

Here "users" refer to things like "git-core", "jgit", "libgit2", etc.?

Let's say we have these two "conflicting" branches and want to
interoperate with existing versions of Git (e.g. a "git ls-remote"
client requests us to show what we have).  We could either show
"refs/heads/foo" with its object name, or "refs/heads/foo/bar" with
its object name, but not both.

"users ... may choose" implies that it is up to the implementation
of reftable user which one to show, so given a single repository,
"jgit" may show "refs/heads/foo" while "libgit2" may choose to show
the other one.

I am not sure if that is desirable---I suspect that we want to
record which one needs to be chosen so that these "D/F conflicts
disallowing" users can make consistent choices, but I dunno.

> +Block size
> +^^^^^^^^^^
> +
> +The file’s block size is arbitrarily determined by the writer, and does
> +not have to be a power of 2. The block size must be larger than the
> +longest reference name or log entry used in the repository, as
> +references cannot span blocks.
> +
> +Powers of two that are friendly to the virtual memory system or
> +filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
> +yield better compression, with a possible increased cost incurred by
> +readers during access.
> +
> +The largest block size is `16777215` bytes (15.99 MiB).

The number being "(2**24) - 1" might be as significant as "15.99
MiB" to readers.  As we recommend, and the users would find it
natural, to use powers of two, the largest block size in practice
would be 8 MiB?

> +Ref block format
> +^^^^^^^^^^^^^^^^
> +
> +A ref block is written as:
> +
> +....
> +'r'
> +uint24( block_len )
> +ref_record+
> +uint24( restart_offset )+
> +uint16( restart_count )
> +
> +padding?
> +....
> +
> +Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
> +encodes the number of bytes in the block up to, but not including the
> +optional `padding`. This is always less than or equal to the file’s
> +block size. In the first ref block, `block_len` includes 24 bytes for
> +the file header.
> +
> +The 2-byte `restart_count` stores the number of entries in the
> +`restart_offset` list, which must not be empty. Readers can use
> +`restart_count` to binary search between restarts before starting a
> +linear scan.
> +
> +Exactly `restart_count` 3-byte `restart_offset` values precedes the
> +`restart_count`. Offsets are relative to the start of the block and
> +refer to the first byte of any `ref_record` whose name has not been
> +prefix compressed. Entries in the `restart_offset` list must be sorted,
> +ascending. Readers can start linear scans from any of these records.

So the algorithm to find a record in a single block would be

 - see how big the block_len is by reading the first four bytes;

 - look at the last 16-bit word to see how many restart_offset
   entries there are;

 - bisect using restart_offset array to see where to start a linear
   scan in the ref_record array

 - linear scan the range between two adjacent offsets in
   restart_offset array.

(this is a mental exercise to make sure the information given here
is sufficient, which I think it is).

> +A variable number of `ref_record` fill the middle of the block,
> +describing reference names and values. The format is described below.
> +
> +As the first ref block shares the first file block with the file header,
> +all `restart_offset` in the first block are relative to the start of the
> +file (position 0), and include the file header. This forces the first
> +`restart_offset` to be `28`.

OK.

> +ref record
> +++++++++++
> +
> +A `ref_record` describes a single reference, storing both the name and
> +its value(s). Records are formatted as:
> +
> +....
> +varint( prefix_length )
> +varint( (suffix_length << 3) | value_type )
> +suffix
> +varint( update_index_delta )
> +value?
> +....
> +
> +The `prefix_length` field specifies how many leading bytes of the prior
> +reference record’s name should be copied to obtain this reference’s
> +name. This must be 0 for the first reference in any block, and also must
> +be 0 for any `ref_record` whose offset is listed in the `restart_offset`
> +table at the end of the block.

OK.  That's quite similar to how v4 index format shortens pathnames.
We have encode_varint() helper (in varint.c), that is used from the
index v4 code (in read-cache.c) and also untracked cache code (in
dir.c).

The OFS_DELTA codepaths, both for encoding (in packfile.c) and for
decoding (in builtin/pack-objects.c), uses the same algorithm but
open codes it without using helper functions from varint.c (could
this become leftoverbit?  I dunno).

The document clarifies that the chosen variant of varint
representation much later, but it may want to be moved up close to
where "we use network byte order" etc. are declared.

> +Recovering a reference name from any `ref_record` is a simple concat:
> +
> +....
> +this_name = prior_name[0..prefix_length] + suffix
> +....
> +
> +The `suffix_length` value provides the number of bytes available in
> +`suffix` to copy from `suffix` to complete the reference name.

It is interesting that suffix is *not* a simple NUL terminated
string.  This DOES allow a NUL byte in a refname, but because we
upfront declared that only the refnames allowed are the ones that
pass check-ref-format, that would not be an advantage the use of
varint to encode the suffix seeks.  Then what is it?  

I guess the answer is that a lot of time, the suffix length is
shorter than 16 bytes (= 128/8), so we can store 3-bit value_type
for free in such a case.  Is that worth a mention?

> +The `update_index` that last modified the reference can be obtained by
> +adding `update_index_delta` to the `min_update_index` from the file
> +header: `min_update_index + update_index_delta`.

At this point, we can infer the following from we have learned so
far by reading the document:

 - there is a quantity called "update index" attached to each ref
   record;

 - each file knows the range of update indices used in it; and

 - when files are chained together, the range of update indices do
   not overlap---all indices in one file are strictly larger or
   smaller than those in another file.

But we haven't learned at all what "update index" is and how it is
used, which makes it a frustrating read.  We probably should give a
mention of what it is here (a brief description at the same level of
detail as say "monotonically increasing counter that is used as a
transaction id when any ref update is made"), even though we will go
into much deeper details in a later section.

> +The `value` follows. Its format is determined by `value_type`, one of
> +the following:
> +
> +* `0x0`: deletion; no value data (see transactions, below)
> +* `0x1`: one 20-byte object id; value of the ref
> +* `0x2`: two 20-byte object ids; value of the ref, peeled target
> +* `0x3`: symbolic reference: `varint( target_len ) target`

We probably should write these as "one (binary) object name", and
"two (binary) object names", without hardwiring the number of bytes
needed to represent an object name.

> +Symbolic references use `0x3`, followed by the complete name of the
> +reference target. No compression is applied to the target name.

Is there a place in the file format where an incomplete name can be
stored?  If not, I think it makes it easier to read if we drop
"complete" from the sentence.

> +Types `0x4..0x7` are reserved for future use.
> +
> +Ref index
> +^^^^^^^^^
> +
> +The ref index stores the name of the last reference from every ref block
> +in the file, enabling reduced disk seeks for lookups. Any reference can
> +be found by searching the index, identifying the containing block, and
> +searching within that block.
> +
> +The index may be organized into a multi-level index, where the 1st level
> +index block points to additional ref index blocks (2nd level), which may
> +in turn point to either additional index blocks (e.g. 3rd level) or ref
> +blocks (leaf level). Disk reads required to access a ref go up with
> +higher index levels. Multi-level indexes may be required to ensure no
> +single index block exceeds the file format’s max block size of
> +`16777215` bytes (15.99 MiB). To acheive constant O(1) disk seeks for

achieve

> +lookups the index must be a single level, which is permitted to exceed
> +the file’s configured block size, but not the format’s max block size of
> +15.99 MiB.


> +Obj block format
> +^^^^^^^^^^^^^^^^
> +
> +Object blocks are optional. Writers may choose to omit object blocks,
> +especially if readers will not use the SHA-1 to ref mapping.

"the object name to ref mapping".

> +Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to

Likewise.  "unique prefix of object names no less than 2 bytes" or
somesuch to futureproof "2-20 byte SHA-1".

> +ref blocks containing references pointing to that object directly, or as
> +the peeled value of an annotated tag. Like ref blocks, object blocks use
> +the file’s standard block size. The abbrevation length is available in

abbreviation

> +the footer as `obj_id_len`.
> +
> +To save space in small files, object blocks may be omitted if the ref
> +index is not present, as brute force search will only need to read a few
> +ref blocks. When missing, readers should brute force a linear search of
> +all references to lookup by SHA-1.
> +
> +An object block is written as:
> +
> +....
> +'o'
> +uint24( block_len )
> +obj_record+
> +uint24( restart_offset )+
> +uint16( restart_count )
> +
> +padding?
> +....
> +
> +Fields are identical to ref block. Binary search using the restart table
> +works the same as in reference blocks.
> +
> +Because object identifiers are abbreviated by writers to the shortest
> +unique abbreviation within the reftable, obj key lengths are variable
> +between 2 and 20 bytes. Readers must compare only for common prefix

Futureproof "2 and 20" similarly.

> +match within an obj block or obj index.

> +Log block format
> +^^^^^^^^^^^^^^^^
> +
> +Unlike ref and obj blocks, log blocks are always unaligned.
> +
> +Log blocks are variable in size, and do not match the `block_size`
> +specified in the file header or footer. Writers should choose an
> +appropriate buffer size to prepare a log block for deflation, such as
> +`2 * block_size`.

I can guess the reason behind this design decision, but the readers
may not be able to.  Should we write it down here, or would it make
too much irrelevant details?

> +A log block is written as:
> +
> +....
> +'g'
> +uint24( block_len )
> +zlib_deflate {
> +  log_record+
> +  uint24( restart_offset )+
> +  uint16( restart_count )
> +}
> +....
> +
> +Log blocks look similar to ref blocks, except `block_type = 'g'`.
> +
> +The 4-byte block header is followed by the deflated block contents using
> +zlib deflate. The `block_len` in the header is the inflated size
> +(including 4-byte block header), and should be used by readers to
> +preallocate the inflation output buffer. A log block’s `block_len` may
> +exceed the file’s block size.
> +
> +Offsets within the log block (e.g. `restart_offset`) still include the
> +4-byte header. Readers may prefer prefixing the inflation output buffer
> +with the 4-byte header.
> +
> +Within the deflate container, a variable number of `log_record` describe
> +reference changes. The log record format is described below. See ref
> +block format (above) for a description of `restart_offset` and
> +`restart_count`.
> +
> +Because log blocks have no alignment or padding between blocks, readers
> +must keep track of the bytes consumed by the inflater to know where the
> +next log block begins.
> +
> +log record
> +++++++++++
> +
> +Log record keys are structured as:
> +
> +....
> +ref_name '\0' reverse_int64( update_index )
> +....
> +
> +where `update_index` is the unique transaction identifier. The
> +`update_index` field must be unique within the scope of a `ref_name`.
> +See the update transactions section below for further details.
> +
> +The `reverse_int64` function inverses the value so lexographical

lexicographical

> +ordering the network byte order encoding sorts the more recent records
> +with higher `update_index` values first:
> +
> +....
> +reverse_int64(int64 t) {
> +  return 0xffffffffffffffff - t;
> +}
> +....

Rationale?  It may be to ease the iteration over reflog i.e. "log
-g" that wants to go from the youngest to older---isn't it worth
mentioning?

> +Log records have a similar starting structure to ref and index records,
> +utilizing the same prefix compression scheme applied to the log record
> +key described above.
> +
> +....
> +    varint( prefix_length )
> +    varint( (suffix_length << 3) | log_type )
> +    suffix
> +    log_data {
> +      old_id
> +      new_id
> +      varint( name_length    )  name
> +      varint( email_length   )  email
> +      varint( time_seconds )
> +      sint16( tz_offset )
> +      varint( message_length )  message
> +    }?
> +....
> +
> +Log record entries use `log_type` to indicate what follows:
> +
> +* `0x0`: deletion; no log data.
> +* `0x1`: standard git reflog data using `log_data` above.
> +
> +The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
> +entry from the reflog of `refs/stash` in a transaction file (below),
> +without needing to rewrite larger files. Readers reading a stack of
> +reflogs must treat this as a deletion.
> +
> +For `log_type = 0x1`, the `log_data` section follows
> +linkgit:git-update-ref[1] logging and includes:
> +
> +* two 20-byte SHA-1s (old id, new id)

"two (binary) object names (old name, new name)" for futureproof out
of SHA-1 world.

> +* varint string of committer’s name
> +* varint string of committer’s email
> +* varint time in seconds since epoch (Jan 1, 1970)
> +* 2-byte timezone offset in minutes (signed)

We use minus eight hundred for "GMT-0800" internally, but this would
use -480, which makes more sense ;-)

> +* varint string of message
> +
> +`tz_offset` is the absolute number of minutes from GMT the committer was
> +at the time of the update. For example `GMT-0800` is encoded in reftable
> +as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
> +
> +The committer email does not contain `<` or `>`, it’s the value normally
> +found between the `<>` in a git commit object header.

Saving two precious bytes?

This is a tangent but in a repository at hosting provider, whose
primary (and often the only) source of updates are by end-user
pushing into it, if reflogs are enabled, whose name and email are
recorded in the logs?  The committer or tagger of the object that
sits at the tip of the ref after the update?  What happens when a
blob is pushed to update a ref?  Or would it be just a single "user"
that represents the "server operator"?

We know in a non-bare repository an individual contributor works on
typically records only one <name, email> in the reflog: the user who
works in it.

What I am trying to get at is if it makes more sense to have a small
table of unique <name, email> pairs used in the file and have
log_data record a single varint that is the index into that
"committer ident" table.  I would suspect that it would give us
significantly more gain than mere <> two bytes per log_data entry.

> +The `message_length` may be 0, in which case there was no message
> +supplied for the update.
> +
> +Contrary to traditional reflog (which is a file), renames are encoded as
> +a combination of ref deletion and ref creation.

Yay?  How does the deletion record look like?  The new object name
being 0*hashlength?  I didn't see it defined in the description (and
I am guessing that log_type of 0x0 is *NOT* used for that purpose).

So, NEEDSWORK: describe how "creation of a ref" and "deletion of a ref"
appears in a log as a log record entry.

> +Footer
> +^^^^^^
> +
> +After the last block of the file, a file footer is written. It begins
> +like the file header, but is extended with additional data.
> +
> +A 68-byte footer appears at the end:
> +
> +....
> +    'REFT'
> +    uint8( version_number = 1 )
> +    uint24( block_size )
> +    uint64( min_update_index )
> +    uint64( max_update_index )
> +
> +    uint64( ref_index_position )
> +    uint64( (obj_position << 5) | obj_id_len )
> +    uint64( obj_index_position )
> +
> +    uint64( log_position )
> +    uint64( log_index_position )
> +
> +    uint32( CRC-32 of above )
> +....
> +
> +If a section is missing (e.g. ref index) the corresponding position
> +field (e.g. `ref_index_position`) will be 0.
> +
> +* `obj_position`: byte position for the first obj block.
> +* `obj_id_len`: number of bytes used to abbreviate object identifiers in
> +obj blocks.

Should we write "this can be up to 31" somewhere?  It is more than
enough for SHA-1 and not quite sufficient for SHA-256 (unless we say
"we store obj_id_len-1 here")?

> +* `log_position`: byte position for the first log block.
> +* `ref_index_position`: byte position for the start of the ref index.
> +* `obj_index_position`: byte position for the start of the obj index.
> +* `log_index_position`: byte position for the start of the log index.


> +Varint encoding
> +^^^^^^^^^^^^^^^
> +
> +Varint encoding is identical to the ofs-delta encoding method used
> +within pack files.
> +
> +Decoder works such as:
> +
> +....
> +val = buf[ptr] & 0x7f
> +while (buf[ptr] & 0x80) {
> +  ptr++
> +  val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
> +}
> +....

As already said, I think this should be given upfront next to where
we declare that we use network byte order.

> +Restart point selection
> ++++++++++++++++++++++++
> +
> +Writers determine the restart points at file creation. The process is
> +arbitrary, but every 16 or 64 records is recommended. Every 16 may be
> +more suitable for smaller block sizes (4k or 8k), every 64 for larger
> +block sizes (64k).
> +
> +More frequent restart points reduces prefix compression and increases
> +space consumed by the restart table, both of which increase file size.
> +
> +Less frequent restart points makes prefix compression more effective,
> +decreasing overall file size, with increased penalities for readers

penalties

> +walking through more records after the binary search step.
> +
> +A maximum of `65535` restart points per block is supported.


> +LMDB
> +^^^^
> +
> +David Turner proposed
> +https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/[using
> +LMDB], as LMDB is lightweight (64k of runtime code) and GPL-compatible
> +license.
> +
> +A downside of LMDB is its reliance on a single C implementation. This
> +makes embedding inside JGit (a popular reimplemenation of Git)

reimplementation

> +difficult, and hoisting onto virtual storage (for JGit DFS) virtually
> +impossible.
> +
> +A common format that can be supported by all major Git implementations
> +(git-core, JGit, libgit2) is strongly preferred.
> +
> +Future
> +~~~~~~
> +
> +Longer hashes
> +^^^^^^^^^^^^^
> +
> +Version will bump (e.g. 2) to indicate `value` uses a different object
> +id length other than 20. The length could be stored in an expanded file
> +header, or hardcoded as part of the version.


^ permalink raw reply	[relevance 3%]

* [PATCH v2] submodule: port subcommand 'set-branch' from shell to C
@ 2020-05-19 18:26  4% Shourya Shukla
  0 siblings, 0 replies; 200+ results
From: Shourya Shukla @ 2020-05-19 18:26 UTC (permalink / raw)
  To: git
  Cc: christian.couder, kaartic.sivaraam, liu.denton, gitster,
	congdanhqx, Shourya Shukla, Christian Couder

Convert submodule subcommand 'set-branch' to a builtin. Port 'set-branch'
to 'submodule--helper.c' and call the latter via 'git-submodule.sh'.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
---
An improvement over the previous version, with a lot less clutter and
redundancy. This version also covers the side-effect pointed out by
Công Danh in (thanks to Kaartic for pointing it out):
https://lore.kernel.org/git/20200517161151.GA30938@danh.dev/

I have refrained from using the `newbranch` variable because using
only `opt_branch` simplified things even further (thanks to Christian).
I think a similar improvement could be made to `set-url`, but let's leave
that for 'leftoverbits' maybe?

Thank you Denton, Christian and Kaartic for the reviews! :)
Next step is conversion of `summary` to C (after the review of `set-branch`
is done).

 builtin/submodule--helper.c | 41 +++++++++++++++++++++++++++++++++++++
 git-submodule.sh            | 32 +++--------------------------
 2 files changed, 44 insertions(+), 29 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f50745a03f..5cd7dc84c6 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -2284,6 +2284,46 @@ static int module_set_url(int argc, const char **argv, const char *prefix)
 	return 0;
 }
 
+static int module_set_branch(int argc, const char **argv, const char *prefix)
+{
+	int quiet = 0, opt_default = 0;
+	char *opt_branch = NULL;
+	const char *path;
+	char *config_name;
+
+	struct option options[] = {
+		OPT__QUIET(&quiet,
+			N_("suppress output for setting default tracking branch of a submodule")),
+		OPT_BOOL(0, "default", &opt_default,
+			N_("set the default tracking branch to master")),
+		OPT_STRING(0, "branch", &opt_branch, N_("branch"),
+			N_("set the default tracking branch to the one specified")),
+		OPT_END()
+	};
+	const char *const usage[] = {
+		N_("git submodule--helper set-branch [--quiet] (-d|--default) <path>"),
+		N_("git submodule--helper set-branch [--quiet] (-b|--branch) <branch> <path>"),
+		NULL
+	};
+
+	argc = parse_options(argc, argv, prefix, options, usage, 0);
+
+	if (!opt_branch && !opt_default)
+		die(_("at least one of --branch and --default required"));
+
+	if (opt_branch && opt_default)
+		die(_("--branch and --default do not make sense together"));
+
+	if (argc != 1 || !(path = argv[0]))
+		usage_with_options(usage, options);
+
+	config_name = xstrfmt("submodule.%s.branch", path);
+	config_set_in_gitmodules_file_gently(config_name, opt_branch);
+
+	free(config_name);
+	return 0;
+}
+
 #define SUPPORT_SUPER_PREFIX (1<<0)
 
 struct cmd_struct {
@@ -2315,6 +2355,7 @@ static struct cmd_struct commands[] = {
 	{"check-name", check_name, 0},
 	{"config", module_config, 0},
 	{"set-url", module_set_url, 0},
+	{"set-branch", module_set_branch, 0},
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/git-submodule.sh b/git-submodule.sh
index 39ebdf25b5..8c56191f77 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -719,7 +719,7 @@ cmd_update()
 # $@ = requested path
 #
 cmd_set_branch() {
-	unset_branch=false
+	default=
 	branch=
 
 	while test $# -ne 0
@@ -729,7 +729,7 @@ cmd_set_branch() {
 			# we don't do anything with this but we need to accept it
 			;;
 		-d|--default)
-			unset_branch=true
+			default=1
 			;;
 		-b|--branch)
 			case "$2" in '') usage ;; esac
@@ -750,33 +750,7 @@ cmd_set_branch() {
 		shift
 	done
 
-	if test $# -ne 1
-	then
-		usage
-	fi
-
-	# we can't use `git submodule--helper name` here because internally, it
-	# hashes the path so a trailing slash could lead to an unintentional no match
-	name="$(git submodule--helper list "$1" | cut -f2)"
-	if test -z "$name"
-	then
-		exit 1
-	fi
-
-	test -n "$branch"; has_branch=$?
-	test "$unset_branch" = true; has_unset_branch=$?
-
-	if test $((!$has_branch != !$has_unset_branch)) -eq 0
-	then
-		usage
-	fi
-
-	if test $has_branch -eq 0
-	then
-		git submodule--helper config submodule."$name".branch "$branch"
-	else
-		git submodule--helper config --unset submodule."$name".branch
-	fi
+	git ${wt_prefix:+-C "$wt_prefix"} ${prefix:+--super-prefix "$prefix"} submodule--helper set-branch ${GIT_QUIET:+--quiet} ${branch:+--branch $branch} ${default:+--default} -- "$@"
 }
 
 #
-- 
2.26.2


^ permalink raw reply related	[relevance 4%]

* Re: [PATCH v3 3/4] gitfaq: shallow cloning a repository
  @ 2020-04-22  4:00  6%         ` Jonathan Nieder
  0 siblings, 0 replies; 200+ results
From: Jonathan Nieder @ 2020-04-22  4:00 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Randall S. Becker, 'Junio C Hamano',
	'Shourya Shukla', git, sandals, 'Derrick Stolee',
	'Elijah Newren', 'Christian Couder'

Derrick Stolee wrote:

> Of course, with the speedups from reachability bitmaps, it is sometimes
> _faster_ to do a partial clone than a shallow clone. (It definitely takes
> less time in the "counting objects" phase, and the cost of downloading
> all commits and trees might be small enough on top of the necessary blob
> data to keep the total cost under a shallow clone. Your mileage may vary.)
> Because the cost of a partial clone is "comparable" to shallow clone, I
> would almost recommend partial clone over shallow clones 95% of the time,
> even in scenarios like automated builds on cloud-hosted VMs.

By the way, an idea for the interested (#leftoverbits?):

It would be possible to emulate the shallow clone experience making
use of the partial clone protocol.  That is, fetch a full history
without blobs but record the "shallows" somewhere and make user-facing
traversals like "git log" stop there (similar to the effect "git
replace" has on user-facing traversals).  Then later fetches would be
able to take advantage of the full commit history, but scripts and
muscle memory (e.g., the assumption that most commands will never
contact the remote) that assume a shallow clone would continue to
work.

Would that be useful or interesting to people?

Thanks,
Jonathan

^ permalink raw reply	[relevance 6%]

* Re: [PATCH] commit-graph: fix buggy --expire-time option
  2020-04-01 20:33  6%     ` Junio C Hamano
@ 2020-04-01 20:51  0%       ` Derrick Stolee
  0 siblings, 0 replies; 200+ results
From: Derrick Stolee @ 2020-04-01 20:51 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King
  Cc: Derrick Stolee via GitGitGadget, git, me, Derrick Stolee

On 4/1/2020 4:33 PM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
>> On Wed, Apr 01, 2020 at 12:49:25PM -0700, Junio C Hamano wrote:
>>
>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>> diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
>>>> index 53b2e6b4555..4e4efcaff22 100755
>>>> --- a/t/t5324-split-commit-graph.sh
>>>> +++ b/t/t5324-split-commit-graph.sh
>>>> @@ -210,8 +210,10 @@ test_expect_success 'test merge stragety constants' '
>>>>  		git config core.commitGraph true &&
>>>>  		test_line_count = 2 $graphdir/commit-graph-chain &&
>>>>  		test_commit 15 &&
>>>> -		git commit-graph write --reachable --split --size-multiple=10 --expire-time=1980-01-01 &&
>>>> +		touch -m -t 201801010000.00 $graphdir/extra.graph &&
>>>
>>> We have "test-tool chmtime" since 17e48368 (Add test-chmtime: a
>>> utility to change mtime on files, 2007-02-24) and refrained from
>>> using "touch -t" anywhere in our tests.  Can we use it here, too?
>>
>> There are a couple new ones added last year in t5319. Nobody has
>> complained yet, but I wonder if it's a matter of time.
> 
> Indeed.  We should fix them (#leftoverbits).

I'm adding a patch to fix that now.

-Stolee


^ permalink raw reply	[relevance 0%]

* Re: [PATCH] commit-graph: fix buggy --expire-time option
  @ 2020-04-01 20:33  6%     ` Junio C Hamano
  2020-04-01 20:51  0%       ` Derrick Stolee
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2020-04-01 20:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee via GitGitGadget, git, me, Derrick Stolee

Jeff King <peff@peff.net> writes:

> On Wed, Apr 01, 2020 at 12:49:25PM -0700, Junio C Hamano wrote:
>
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> 
>> > diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
>> > index 53b2e6b4555..4e4efcaff22 100755
>> > --- a/t/t5324-split-commit-graph.sh
>> > +++ b/t/t5324-split-commit-graph.sh
>> > @@ -210,8 +210,10 @@ test_expect_success 'test merge stragety constants' '
>> >  		git config core.commitGraph true &&
>> >  		test_line_count = 2 $graphdir/commit-graph-chain &&
>> >  		test_commit 15 &&
>> > -		git commit-graph write --reachable --split --size-multiple=10 --expire-time=1980-01-01 &&
>> > +		touch -m -t 201801010000.00 $graphdir/extra.graph &&
>> 
>> We have "test-tool chmtime" since 17e48368 (Add test-chmtime: a
>> utility to change mtime on files, 2007-02-24) and refrained from
>> using "touch -t" anywhere in our tests.  Can we use it here, too?
>
> There are a couple new ones added last year in t5319. Nobody has
> complained yet, but I wonder if it's a matter of time.

Indeed.  We should fix them (#leftoverbits).

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v1 1/2] sequencer: don't abbreviate a command if it doesn't have a short form
  @ 2020-03-30 17:50  5%     ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-03-30 17:50 UTC (permalink / raw)
  To: Alban Gruin
  Cc: git, Johannes Schindelin, Elijah Newren, Phillip Wood,
	jan.steffens

Alban Gruin <alban.gruin@gmail.com> writes:

>  static char command_to_char(const enum todo_command command)
>  {
> -	if (command < TODO_COMMENT && todo_command_info[command].c)
> +	if (command < TODO_COMMENT)
>  		return todo_command_info[command].c;
>  	return comment_line_char;
>  }

This is not a new issue, and it may not even be an issue at all, but
it is curious that command_to_string() barfs with "unknown command"
when fed an int outside enum todo_command or TODO_COMMENT iteslf,
while this returns comment_line_char.  Makes a reader wonder if both
of them should be dying the same way.

> @@ -4963,6 +4963,8 @@ static void todo_list_to_strbuf(struct repository *r, struct todo_list *todo_lis
>  		max = num;
>  
>  	for (item = todo_list->items, i = 0; i < max; i++, item++) {
> +		char cmd;
> +
>  		/* if the item is not a command write it and continue */
>  		if (item->command >= TODO_COMMENT) {
>  			strbuf_addf(buf, "%.*s\n", item->arg_len,
> @@ -4971,8 +4973,9 @@ static void todo_list_to_strbuf(struct repository *r, struct todo_list *todo_lis
>  		}
>  
>  		/* add command to the buffer */
> -		if (flags & TODO_LIST_ABBREVIATE_CMDS)
> -			strbuf_addch(buf, command_to_char(item->command));
> +		cmd = command_to_char(item->command);
> +		if (flags & TODO_LIST_ABBREVIATE_CMDS && cmd)

Even though the precedence rule may not require it, for
readability's sake, it would be easier to see the association if
this is written with an extra set of parentheses, i.e.

		if ((flags & TODO_LIST_ABBREVIATE_CMDS) && cmd)

> +			strbuf_addch(buf, cmd);
>  		else
>  			strbuf_addstr(buf, command_to_string(item->command));

The logic is quite clear.  If there is an abbreviation and the user
prefers to see it, we use it, but otherwise we'll give the full
spelling.

We are sure we will never get TODO_COMMENT here in item->command at
this point (the loop would have already continued after adding it to
the buffer), so it does not affect us that command_to_string() would
die.  For that matter, if we made command_to_char() die, just like
command_to_string() would, nobody will get hurt and the resulting
code would become saner.  But obviously it is outside the scope of
this fix (#leftoverbits).

Thanks.

^ permalink raw reply	[relevance 5%]

* Re: [PATCH v3] rebase --merge: optionally skip upstreamed commits
  @ 2020-03-30 16:49  5%     ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2020-03-30 16:49 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jonathan Tan, git, congdanhqx, newren

Derrick Stolee <stolee@gmail.com> writes:

>> +--keep-cherry-pick::
>> +--no-keep-cherry-pick::
>
> I noticed that this _could_ have been simplified to
>
> 	--[no-]keep-cherry-pick::
>
> but I also see several uses of either in our documentation. Do we
> have a preference? By inspecting the lines before a "no-" string,
> I see that some have these two lines, some use the [no-] pattern,
> and others highlight the --no-<option> flag completely separately.

"git log -S'--[no-]' Documentation/" (and its "-S'--no-'" variant)
tell us that many of our recent commits do prefer the single-line
form, but then in d333f672 (git-checkout.txt: spell out --no-option,
2019-03-29), we see we turned a handful of "--[no-]option" into
"--option" followed by "--no-option" deliberately  [*1*].

So, we do not seem to have a strong concensus.

I think all the new ones that spell --no-option:: out are the ones
when --option:: and --no-option:: have their own paragraph, e.g.
"--sign/--no-sign" of "git-tag".

As the differences do not matter all that much, I do not mind
declaring (and one of the tasks of the maintainer is to make a
declaration on such a choice that it matters more for us to pick
either one and we all sticking to it, rather than which choice we
make) that we'd prefer the expanded two-liner form (which when
formatted would become a single line with two things on it) and
mark the task to convert from '--[no-]option' as #leftoverbit.

Thanks for your attention to the details.

[Footnote]

*1* The justification given was that it makes is it is easier to
search that way and it is less cryptic.  Personally I do not think
it matters that much.  Even when trying to learn what the negated
form does, nobody would look for "--no-keep-ch" to find the above
paragraph.  "keep-cherry-pick" would be what they would look for,
with or without leading double-dashes.

^ permalink raw reply	[relevance 5%]

* Re: [PATCH] rebase --merge: optionally skip upstreamed commits
  2020-03-10  2:10  5% ` Taylor Blau
@ 2020-03-10 15:51  0%   ` Jonathan Tan
  0 siblings, 0 replies; 200+ results
From: Jonathan Tan @ 2020-03-10 15:51 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git, stolee, git

> Hi Jonathan,
> 
> This patch makes good sense to me. I left a few notes below, but they
> are relatively minor, and this seems to be all in a good direction.
> 
> As a (somewhat) interesting aside, this feature would be useful to me
> outside of partial clones, since I often have this workflow in my local
> development wherein 'git rebase' spends quite a bit of time comparing
> patches on my branch to everything new upstream.

Thanks for your review, and it's great to know of a use case that this
helps.

> This sentence is a little confusing if you skip over the graph, since it
> reads: "When rebasing against an because ... because ...". It may be
> clearer if you swap the order of the last two clauses to instead be:
> 
>   it must read the contents of every novel upstream commit, in addition to
>   the tip of the upstream and the merge base, because "git rebase"
>   attempts to exclude commits that are duplicates of upstream ones.

Sounds good; will do.

> > +--skip-already-present::
> > +--no-skip-already-present::
> > +	Skip commits that are already present in the new upstream.
> > +	This is the default.
> 
> I believe that you mean '--skip-already-present' is the default, here,
> but the placement makes it ambiguous, since it is in a paragraph with a
> header that contains both the positive and negated version of this flag.
> 
> Maybe this could changed to: s/This/--skip-already-present/'.

Will do.

> >  In that case, the fix is easy because 'git rebase' knows to skip
> > -changes that are already present in the new upstream.  So if you say
> > +changes that are already present in the new upstream (unless
> > +`--no-skip-already-present` is given). So if you say
> 
> Extremely minor nit: there is a whitespace change on this line where the
> original has two spaces between the '.' and 'So', and the new version
> has only one.

OK - I'll change it to 2 spaces.

> > diff --git a/sequencer.h b/sequencer.h
> > index 393571e89a..39bb12f624 100644
> > --- a/sequencer.h
> > +++ b/sequencer.h
> > @@ -149,7 +149,7 @@ int sequencer_remove_state(struct replay_opts *opts);
> >   * `--onto`, we do not want to re-generate the root commits.
> >   */
> >  #define TODO_LIST_ROOT_WITH_ONTO (1U << 6)
> > -
> > +#define TODO_LIST_SKIP_ALREADY_PRESENT (1U << 7)
> 
> This was another spot that I thought could maybe be turned into an enum,
> but it's clearly not the fault of your patch, and could easily be turned
> into #leftoverbits.

There was a recent discussion on the list [1] about whether bitsets
should be enums, and we decided against it. But anyway we can revisit
this later if need be.

[1] https://lore.kernel.org/git/20191016193750.258148-1-jonathantanmy@google.com/

The changes Taylor suggested were minor, so I'll hold off sending
another version until there are more substantial changes requested.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] rebase --merge: optionally skip upstreamed commits
  @ 2020-03-10  2:10  5% ` Taylor Blau
  2020-03-10 15:51  0%   ` Jonathan Tan
    1 sibling, 1 reply; 200+ results
From: Taylor Blau @ 2020-03-10  2:10 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, stolee, git

Hi Jonathan,

This patch makes good sense to me. I left a few notes below, but they
are relatively minor, and this seems to be all in a good direction.

As a (somewhat) interesting aside, this feature would be useful to me
outside of partial clones, since I often have this workflow in my local
development wherein 'git rebase' spends quite a bit of time comparing
patches on my branch to everything new upstream.

On Mon, Mar 09, 2020 at 01:55:23PM -0700, Jonathan Tan wrote:
> When rebasing against an upstream that has had many commits since the
> original branch was created:
>
>  O -- O -- ... -- O -- O (upstream)
>   \
>    -- O (my-dev-branch)
>
> because "git rebase" attempts to exclude commits that are duplicates of
> upstream ones, it must read the contents of every novel upstream commit,
> in addition to the tip of the upstream and the merge base.

This sentence is a little confusing if you skip over the graph, since it
reads: "When rebasing against an because ... because ...". It may be
clearer if you swap the order of the last two clauses to instead be:

  it must read the contents of every novel upstream commit, in addition to
  the tip of the upstream and the merge base, because "git rebase"
  attempts to exclude commits that are duplicates of upstream ones.

> This can be a significant performance hit, especially in a partial
> clone, wherein a read of an object may end up being a fetch.
>
> Add a flag to "git rebase" to allow suppression of this feature. This
> flag only works when using the "merge" backend.
>
> This flag changes the behavior of sequencer_make_script(), called from
> do_interactive_rebase() <- run_rebase_interactive() <-
> run_specific_rebase() <- cmd_rebase(). With this flag, limit_list()
> (indirectly called from sequencer_make_script() through
> prepare_revision_walk()) will no longer call cherry_pick_list(), and
> thus PATCHSAME is no longer set. Refraining from setting PATCHSAME both
> means that the intermediate commits in upstream are no longer read (as
> shown by the test) and means that no PATCHSAME-caused skipping of
> commits is done by sequencer_make_script(), either directly or through
> make_script_with_merges().

This all sounds good to me.

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> More improvements for partial clone, but this is a benefit for
> non-partial-clone as well, hence the way I wrote the commit message (not
> focusing too much on partial clone) and the documentation.
>
> I've chosen --skip-already-present and --no-skip-already-present to
> reuse the language already existing in the documentation and to avoid a
> double negative (e.g. --avoid-checking-if-already-present and
> --no-avoid-checking-if-already-present) but this causes some clumsiness
> in the documentation and in the code. Any suggestions for the name are
> welcome.
>
> I've only implemented this for the "merge" backend since I think that
> there is an effort to migrate "rebase" to use the "merge" backend by
> default, and also because "merge" uses diff internally which already has
> the (per-commit) blob batch prefetching.

This also makes sense to me.

> ---
>  Documentation/git-rebase.txt | 12 +++++-
>  builtin/rebase.c             | 10 ++++-
>  sequencer.c                  |  3 +-
>  sequencer.h                  |  2 +-
>  t/t3402-rebase-merge.sh      | 77 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 100 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 0c4f038dd6..f73a82b4a9 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -318,6 +318,15 @@ See also INCOMPATIBLE OPTIONS below.
>  +
>  See also INCOMPATIBLE OPTIONS below.
>
> +--skip-already-present::
> +--no-skip-already-present::
> +	Skip commits that are already present in the new upstream.
> +	This is the default.

I believe that you mean '--skip-already-present' is the default, here,
but the placement makes it ambiguous, since it is in a paragraph with a
header that contains both the positive and negated version of this flag.

Maybe this could changed to: s/This/--skip-already-present/'.

> ++
> +If the skip-if-already-present feature is unnecessary or undesired,
> +`--no-skip-already-present` may improve performance since it avoids
> +the need to read the contents of every commit in the new upstream.
> +
>  --rerere-autoupdate::
>  --no-rerere-autoupdate::
>  	Allow the rerere mechanism to update the index with the
> @@ -866,7 +875,8 @@ Only works if the changes (patch IDs based on the diff contents) on
>  'subsystem' did.
>
>  In that case, the fix is easy because 'git rebase' knows to skip
> -changes that are already present in the new upstream.  So if you say
> +changes that are already present in the new upstream (unless
> +`--no-skip-already-present` is given). So if you say

Extremely minor nit: there is a whitespace change on this line where the
original has two spaces between the '.' and 'So', and the new version
has only one.

>  (assuming you're on 'topic')
>  ------------
>      $ git rebase subsystem
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 6154ad8fa5..943211e5bb 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -88,13 +88,15 @@ struct rebase_options {
>  	struct strbuf git_format_patch_opt;
>  	int reschedule_failed_exec;
>  	int use_legacy_rebase;
> +	int skip_already_present;
>  };
>
>  #define REBASE_OPTIONS_INIT {			  	\
>  		.type = REBASE_UNSPECIFIED,	  	\
>  		.flags = REBASE_NO_QUIET, 		\
>  		.git_am_opts = ARGV_ARRAY_INIT,		\
> -		.git_format_patch_opt = STRBUF_INIT	\
> +		.git_format_patch_opt = STRBUF_INIT,	\
> +		.skip_already_present =	1		\
>  	}
>
>  static struct replay_opts get_replay_opts(const struct rebase_options *opts)
> @@ -373,6 +375,7 @@ static int run_rebase_interactive(struct rebase_options *opts,
>  	flags |= opts->rebase_cousins > 0 ? TODO_LIST_REBASE_COUSINS : 0;
>  	flags |= opts->root_with_onto ? TODO_LIST_ROOT_WITH_ONTO : 0;
>  	flags |= command == ACTION_SHORTEN_OIDS ? TODO_LIST_SHORTEN_IDS : 0;
> +	flags |= opts->skip_already_present ? TODO_LIST_SKIP_ALREADY_PRESENT : 0;
>
>  	switch (command) {
>  	case ACTION_NONE: {
> @@ -1507,6 +1510,8 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>  		OPT_BOOL(0, "reschedule-failed-exec",
>  			 &reschedule_failed_exec,
>  			 N_("automatically re-schedule any `exec` that fails")),
> +		OPT_BOOL(0, "skip-already-present", &options.skip_already_present,
> +			 N_("skip changes that are already present in the new upstream")),

I scratched my head a little bit about why we weren't using OPT_BIT and
&flags directly here, but it matches the pattern in the surrounding, so
I think that 'OPT_BOOL' and target '&options.skip_already_present' here.

>  		OPT_END(),
>  	};
>  	int i;
> @@ -1840,6 +1845,9 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>  			      "interactive or merge options"));
>  	}
>
> +	if (!options.skip_already_present && !is_interactive(&options))
> +		die(_("--no-skip-already-present does not work with the 'am' backend"));
> +
>  	if (options.signoff) {
>  		if (options.type == REBASE_PRESERVE_MERGES)
>  			die("cannot combine '--signoff' with "
> diff --git a/sequencer.c b/sequencer.c
> index ba90a513b9..752580c017 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4797,12 +4797,13 @@ int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
>  	int keep_empty = flags & TODO_LIST_KEEP_EMPTY;
>  	const char *insn = flags & TODO_LIST_ABBREVIATE_CMDS ? "p" : "pick";
>  	int rebase_merges = flags & TODO_LIST_REBASE_MERGES;
> +	int skip_already_present = !!(flags & TODO_LIST_SKIP_ALREADY_PRESENT);
>
>  	repo_init_revisions(r, &revs, NULL);
>  	revs.verbose_header = 1;
>  	if (!rebase_merges)
>  		revs.max_parents = 1;
> -	revs.cherry_mark = 1;
> +	revs.cherry_mark = skip_already_present;

:-). All of that plumbing just to poke at this variable. Looks good to
me.

>  	revs.limited = 1;
>  	revs.reverse = 1;
>  	revs.right_only = 1;
> diff --git a/sequencer.h b/sequencer.h
> index 393571e89a..39bb12f624 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -149,7 +149,7 @@ int sequencer_remove_state(struct replay_opts *opts);
>   * `--onto`, we do not want to re-generate the root commits.
>   */
>  #define TODO_LIST_ROOT_WITH_ONTO (1U << 6)
> -
> +#define TODO_LIST_SKIP_ALREADY_PRESENT (1U << 7)

This was another spot that I thought could maybe be turned into an enum,
but it's clearly not the fault of your patch, and could easily be turned
into #leftoverbits.

>  int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
>  			  const char **argv, unsigned flags);
> diff --git a/t/t3402-rebase-merge.sh b/t/t3402-rebase-merge.sh
> index a1ec501a87..9b52739a10 100755
> --- a/t/t3402-rebase-merge.sh
> +++ b/t/t3402-rebase-merge.sh
> @@ -162,4 +162,81 @@ test_expect_success 'rebase --skip works with two conflicts in a row' '
>  	git rebase --skip
>  '
>
> +test_expect_success '--no-skip-already-present' '
> +	git init repo &&
> +
> +	# O(1-10) -- O(1-11) -- O(0-10) master
> +	#        \
> +	#         -- O(1-11) -- O(1-12) otherbranch
> +
> +	printf "Line %d\n" $(test_seq 1 10) >repo/file.txt &&
> +	git -C repo add file.txt &&
> +	git -C repo commit -m "base commit" &&
> +
> +	printf "Line %d\n" $(test_seq 1 11) >repo/file.txt &&
> +	git -C repo commit -a -m "add 11" &&
> +
> +	printf "Line %d\n" $(test_seq 0 10) >repo/file.txt &&
> +	git -C repo commit -a -m "add 0 delete 11" &&
> +
> +	git -C repo checkout -b otherbranch HEAD^^ &&
> +	printf "Line %d\n" $(test_seq 1 11) >repo/file.txt &&
> +	git -C repo commit -a -m "add 11 in another branch" &&
> +
> +	printf "Line %d\n" $(test_seq 1 12) >repo/file.txt &&
> +	git -C repo commit -a -m "add 12 in another branch" &&
> +
> +	# Regular rebase fails, because the 1-11 commit is deduplicated
> +	test_must_fail git -C repo rebase --merge master 2> err &&
> +	test_i18ngrep "error: could not apply.*add 12 in another branch" err &&
> +	git -C repo rebase --abort &&
> +
> +	# With --no-skip-already-present, it works
> +	git -C repo rebase --merge --no-skip-already-present master
> +'
> +
> +test_expect_success '--no-skip-already-present refrains from reading unneeded blobs' '
> +	git init server &&
> +
> +	# O(1-10) -- O(1-11) -- O(1-12) master
> +	#        \
> +	#         -- O(0-10) otherbranch
> +
> +	printf "Line %d\n" $(test_seq 1 10) >server/file.txt &&
> +	git -C server add file.txt &&
> +	git -C server commit -m "merge base" &&
> +
> +	printf "Line %d\n" $(test_seq 1 11) >server/file.txt &&
> +	git -C server commit -a -m "add 11" &&
> +
> +	printf "Line %d\n" $(test_seq 1 12) >server/file.txt &&
> +	git -C server commit -a -m "add 12" &&
> +
> +	git -C server checkout -b otherbranch HEAD^^ &&
> +	printf "Line %d\n" $(test_seq 0 10) >server/file.txt &&
> +	git -C server commit -a -m "add 0" &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +
> +	git clone --filter=blob:none "file://$(pwd)/server" client &&
> +	git -C client checkout origin/master &&
> +	git -C client checkout origin/otherbranch &&
> +
> +	# Sanity check to ensure that the blobs from the merge base and "add
> +	# 11" are missing
> +	git -C client rev-list --objects --all --missing=print >missing_list &&
> +	MERGE_BASE_BLOB=$(git -C server rev-parse master^^:file.txt) &&
> +	ADD_11_BLOB=$(git -C server rev-parse master^:file.txt) &&
> +	grep "\\?$MERGE_BASE_BLOB" missing_list &&
> +	grep "\\?$ADD_11_BLOB" missing_list &&
> +
> +	git -C client rebase --merge --no-skip-already-present origin/master &&
> +
> +	# The blob from the merge base had to be fetched, but not "add 11"
> +	git -C client rev-list --objects --all --missing=print >missing_list &&
> +	! grep "\\?$MERGE_BASE_BLOB" missing_list &&
> +	grep "\\?$ADD_11_BLOB" missing_list
> +'
> +
>  test_done
> --
> 2.25.1.481.gfbce0eb801-goog

The tests look good to me. Thanks for working on this!

Thanks,
Taylor

^ permalink raw reply	[relevance 5%]

* Re: [PATCH 0/7] New execute-commands hook for centralized workflow
  2020-03-04 20:39  4% ` Junio C Hamano
@ 2020-03-05 16:51  0%   ` Jiang Xin
  0 siblings, 0 replies; 200+ results
From: Jiang Xin @ 2020-03-05 16:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Jiang Xin

Junio C Hamano <gitster@pobox.com> 于2020年3月5日周四 上午4:39写道：
>
> I do not claim to be great at naming, but you are worse ;-)

I totally agree that I am not good at naming, for example my daughter's name.

>  - Any hook is about executing command(s), so "execute-commands"
>    hook does not give any information to users.
>
>  - IIUC, this is only about what happens when accepting a push and
>    is not called at any other time.  Naming your hook without
>    "receive" anywhere in its name would mean other people won't be
>    able to add hook that "executes" commands upon cues other than
>    receiving a push.
>
> I can guess why you chose that name, because I know there is a
> function called execute_commands() in "git receive-pack", but that
> is not somethhing you can expect your end users, who are not
> intimate to our codebase, to know.

Yes, it's better to name the hook "* -receive", because the hooks are
for different commands, such as "commit-msg" is for `git commit`.

> > We can use the external "execute-commands" hook to create pull requests
> > or send emails.
>
> You can create pull requests or send emails out of the post-receive
> hook, so that is not a convincing justification why we want a new
> hook.

Another solution is using "pre-receive" + "post-receive" to handle a
push to "refs/for/master".  The "post-receive" hook is used to create
a pull requst and delete the special reference "refs/for/master"
created between these two hooks.  But having a temporary reference
created is not safe for concurrent pushes.

> Now, I understand that Gerrit-style "notice a push to for/<target>,
> take over the whole operation that happens after receiving the pack
> data and do something entirely different, such as attempting to run
> a merge with refs/heads/<target> and update refs/heads/<target>
> instead, or fail the push if automerge fails" is not easy to arrange
> within the current "pre-receive" + "post-receive" framework (by the
> way, we should start considering to deprecate "update", and
> "post-update" hooks as these "*-receive" hooks were added to replace
> them, perhaps we should leave a #leftoverbits mark here).  And I
> think it is reasonable to add a new hook that takes over the whole
> flow in "git receive-pack" to do so.
>
> I just do not think "the execute-commands hook" is a good name for
> it.  Perhaps "divert-receive" (as it diverts large portion of what
> receive does) or something?  I dunno.

I suggest naming the hook as "process-receive", which is executed
between the other two "p*-receive" hooks, and no need to create a
special "pre-receive" for "process-receive".

> How do Gerrit folks deal with the "we pushed to the server, so let's
> pretend to have turned around and fetched from the same server
> immediately after doing so" client-side hack, by the way?

In the following example, I push a local commit to a special reference
(refs/for/master) of the remote Gerrit server.  The "report()"
function (if Gerrit has one) says a new reference "refs/for/master"
has been created.  But in deed, there is no such reference created in
the remote repository, Gerrit will create another reference instead,
such as "refs/changes/71/623871/1", for user to download the code
review .  Because the local repository only has normal
"remote.<name>.fetch" config variables for remote tracking, so git
will not create a tracking reference for "refs/for/master".  Command
line tool, such as Android "repo" (or the reimplemented git-repo in
Golang), will create a special reference
(refs/published/<local/branch>) for tracking, and these tools are
responsible for banch tracking.

    $ git push --receive-pack="gerrit receive-pack" origin
refs/heads/master:refs/for/master
    Enumerating objects: 13, done.
    Counting objects: 100% (13/13), done.
    Delta compression using up to 8 threads
    Compressing objects: 100% (11/11), done.
    Writing objects: 100% (12/12), 1.34 KiB | 171.00 KiB/s, done.
    Total 12 (delta 2), reused 0 (delta 0), pack-reused 0
    remote: Resolving deltas: 100% (2/2)
    remote: Processing changes: refs: 1, new: 1, done
    remote:
    remote: SUCCESS
    remote:
    remote: New Changes:
    remote:   http://gerrit.example.com/c/my/repo/+/623889 Test commit
    To ssh://gerrit.example.com:29418/my/repo
     * [new branch]      master -> refs/for/master


> A vanilla "git push" on the client side does not know a push to
> refs/for/master would result in an update to refs/heads/master on
> the server side, and it would not know the result of the munging
> done on the server side (whether it is to rebase what is received on
> top of 'master' or to merge it to 'master') anyway, the

Neither Gerrit nor our AGit-Flow server will update the master branch.
Our AGit-Flow server will create a special reference (like GitHub's
"refs/pull/<number>/head") for reviewers to download commits.

> remote-tracking branch refs/remotes/origin/master on the client side
> would be left stale.  If we wanted to help them pretend to have
> fetched immediately after, I think we need to extend the protocol.
> Right now, after accepting "git push", the server end will say, for
> each proposed update for a ref, if the push updated successfully or
> not, but to support the "push to for/<target>, get heads/<target>
> updated" interaction, the reporting of the result (done in the
> report() function in builtin/receive-pack.c) needs to be able to say
> what ref (it may be a ref that "git push" did not think it pushed
> to) got updated to what value (it may be an object the client does
> not yet have---and we may have to actually turn around and fetch
> from them internally if we want to keep the illusion).

I have no idea now how to make a simple patch to give an accurate report.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 0/7] New execute-commands hook for centralized workflow
  @ 2020-03-04 20:39  4% ` Junio C Hamano
  2020-03-05 16:51  0%   ` Jiang Xin
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2020-03-04 20:39 UTC (permalink / raw)
  To: Jiang Xin; +Cc: Git List, Jiang Xin

Jiang Xin <worldhello.net@gmail.com> writes:

> It would be more convenient to work in a centralized workflow like what
> Gerrit provided for some cases.  For example, a read-only user may run
> the following `git push` command to push commits to a special reference
> to create a code review, instead of updating a reference directly.
>
>     git push -o reviewers=user1,user2 \
>         -o oldoid=89c082363ac950d224a7259bfba3ccfbf4c560c4 \
>         origin \
>         HEAD:refs/for/<branch-name>/<session>
>
> The `<branch-name>` in the above example can be as simple as "master",
> or a more complicated branch name like "foo/bar".  The `<session>` in
> the above example command can be the local branch name of the clien-
> side, such as "my/topic".
>
> To support this kind of workflow in CGit, add a filter and a new
> handler.  The filter will check the prefix of the reference name, and
> if the command has a special reference name, the filter will add a
> specific tag (`exec_by_hook`) to the command.  Commands with this
> specific tag will be executed by a new handler (an external hook named
> "execute-commands") instead of the internal `execute_commands` function.

I do not claim to be great at naming, but you are worse ;-)

 - Any hook is about executing command(s), so "execute-commands"
   hook does not give any information to users.

 - IIUC, this is only about what happens when accepting a push and
   is not called at any other time.  Naming your hook without
   "receive" anywhere in its name would mean other people won't be
   able to add hook that "executes" commands upon cues other than
   receiving a push.

I can guess why you chose that name, because I know there is a
function called execute_commands() in "git receive-pack", but that
is not somethhing you can expect your end users, who are not
intimate to our codebase, to know.

> We can use the external "execute-commands" hook to create pull requests
> or send emails.

You can create pull requests or send emails out of the post-receive
hook, so that is not a convincing justification why we want a new
hook.

Now, I understand that Gerrit-style "notice a push to for/<target>,
take over the whole operation that happens after receiving the pack
data and do something entirely different, such as attempting to run
a merge with refs/heads/<target> and update refs/heads/<target>
instead, or fail the push if automerge fails" is not easy to arrange
within the current "pre-receive" + "post-receive" framework (by the
way, we should start considering to deprecate "update", and
"post-update" hooks as these "*-receive" hooks were added to replace
them, perhaps we should leave a #leftoverbits mark here).  And I
think it is reasonable to add a new hook that takes over the whole
flow in "git receive-pack" to do so.

I just do not think "the execute-commands hook" is a good name for
it.  Perhaps "divert-receive" (as it diverts large portion of what
receive does) or something?  I dunno.

How do Gerrit folks deal with the "we pushed to the server, so let's
pretend to have turned around and fetched from the same server
immediately after doing so" client-side hack, by the way?  

A vanilla "git push" on the client side does not know a push to
refs/for/master would result in an update to refs/heads/master on
the server side, and it would not know the result of the munging
done on the server side (whether it is to rebase what is received on
top of 'master' or to merge it to 'master') anyway, the
remote-tracking branch refs/remotes/origin/master on the client side
would be left stale.  If we wanted to help them pretend to have
fetched immediately after, I think we need to extend the protocol.
Right now, after accepting "git push", the server end will say, for
each proposed update for a ref, if the push updated successfully or
not, but to support the "push to for/<target>, get heads/<target>
updated" interaction, the reporting of the result (done in the
report() function in builtin/receive-pack.c) needs to be able to say
what ref (it may be a ref that "git push" did not think it pushed
to) got updated to what value (it may be an object the client does
not yet have---and we may have to actually turn around and fetch
from them internally if we want to keep the illusion).

^ permalink raw reply	[relevance 4%]

* Re: [RFC][GSOC] Microproject Suggestion
  @ 2020-02-14  8:49  6%     ` Denton Liu
  0 siblings, 0 replies; 200+ results
From: Denton Liu @ 2020-02-14  8:49 UTC (permalink / raw)
  To: Robear Selwans; +Cc: Junio C Hamano, git

Hi Robear,

On Fri, Feb 14, 2020 at 09:29:33AM +0200, Robear Selwans wrote:
> That was actually the only idea that I could come up with till now. I
> am open to suggestions, though. I don't really mind if it is too big,
> as I am also interested in contributing to git.

Even though Git doesn't have an offical issue tracker, one good place to
look is GitGitGadget's issue[1]. It's pretty well-curated and has a lot
of tiny cleanup issues that you could get started on.

Another thing you could do to find inspiration is to search for
#leftoverbits on your Git mailing list archive of choice (most people
seem to use lore.kernel.org/git).

Hope that helps,

Denton

[1]: https://github.com/gitgitgadget/git/issues

^ permalink raw reply	[relevance 6%]

* Re: [PATCH] commit: replace rebase/sequence booleans with single pick_state enum
  @ 2020-01-20 17:09  5%     ` Phillip Wood
  0 siblings, 0 replies; 200+ results
From: Phillip Wood @ 2020-01-20 17:09 UTC (permalink / raw)
  To: Ben Curtis, git
  Cc: Derrick Stolee, phillip.wood, Ben Curtis via GitGitGadget,
	Johannes Schindelin

Hi Ben

[Cc'ing dscho as it relates to issue management on gitgitgadget]

On 18/01/2020 16:34, Ben Curtis wrote:
> On Fri, 2020-01-17 at 20:01 +0000, Phillip Wood wrote:
>> Hi Ben
>>
>> On 17/01/2020 13:45, Ben Curtis via GitGitGadget wrote:
>>> From: Ben Curtis <nospam@nowsci.com>
>>>
>>> In 116a408,
>>
>> That commit is no longer in pu, it has been replaced by 430b75f720
>> ("commit: give correct advice for empty commit during a rebase",
>> 2019-12-06). There is now a preparatory commit 8d57f75749 ("commit:
>> use
>> enum value for multiple cherry-picks", 2019-12-06) which replaces
>> the
>> booleans with an enum. I need to reroll the series
>> (pw/advise-rebase-skip) that contains them, if you've got any
>> comments
>> please let me know.
>>
>> Best Wishes
>>
>> Phillip
>>
> 
> Hi Phillip,
> 
> Thank you for the feedback, I assume that means my patch is no longer
> required?

Unfortunately yes

> Also, is there a formal issue assignment method with `git`? I hopped on
> this particular issue on GitGitGadget to get my feet wet here but was
> not sure if there was a separate maintained list to track overlap like
> the above.

Unfortunately there is no formal issue management. There is the mailing 
list which is where patches are picked up but it does not provide any 
issue management. In practice when an issue is reported on the list 
there is either a fix posted relatively quickly or someone notes it down 
somewhere and may work on it later. There is a convention of adding 
#leftoverbits to an email in the hope that someone will search for that 
and find things but I've never seen someone reference that when 
submitting a new patch and if the fix comes in a different email thread 
then there's no way to see that the issue has been fixed.

There is gitgitgadet's list of issues but not everyone uses it (and it's 
only triaged by a small subset of people so there's no guarantee that a 
feature requested there will be accepted once it gets submitted to the 
mailing list). If someone posts directly to the mailing list then they 
probably wont see that there is an issue open there. Further confusion 
is provided by https://bugs.chromium.org/p/git/issues/list which has a 
different list of issues.

The best thing would be to check the history of the 'pu' branch before 
starting work on an issue to see if it has already been fixed.

I'm really sorry that you've had a bad experience, the idea of the 
gitgitgadget issue tracker is to make it easier for new contributors, 
not to waste their time. I hope it wont put you off making another 
contribution.

Best Wishes

Phillip

> Thanks!
> Ben
> 

^ permalink raw reply	[relevance 5%]

* Re: [PATCH v2 1/1] gpg-interface: add minTrustLevel as a configuration option
  @ 2019-12-27 22:21  6%         ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2019-12-27 22:21 UTC (permalink / raw)
  To: Hans Jerry Illikainen; +Cc: git

Hans Jerry Illikainen <hji@dyntopia.com> writes:

>> I wonder if the code becomes less misleading if we either (1)
>> renamed 'next' to a name that hints more strongly that it is not the
>> 'next' line but the end of the current token we are interested in,
>> or (2) get rid of the pointer and instead counted size of the
>> current token we are interested in, or perhaps both?  
>
> Yeah the name 'next' does seem a bit counter-intuitive when used in
> relation to 'line'.  Looking through the function it seems that both (1)
> and (2) would work.

Thanks for thinking the code a bit more than necessary for the
purpose of this topic.  Let's leave such a clean-up outside the
scope of this topic, but perhaps a #leftoverbits marker may help us
remember it as something we could do when we have nothing else
better to do ;-)


^ permalink raw reply	[relevance 6%]

* [PATCH 0/2] dl/format-patch-notes-config-fixup: clean up some leftoverbits
@ 2019-12-12  0:49  8% Denton Liu
  0 siblings, 0 replies; 200+ results
From: Denton Liu @ 2019-12-12  0:49 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Elijah Newren, Eric Sunshine, Philip Oakley

This series gives 'dl/format-patch-notes-config-fixup' a few polishing
touches. First of all, we document the behaviour of multiple
`format.notes` configuration variables so that end-users are aware of
the change.

Also, Eric Sunshine suggested some cleanup in the previous round, like
breaking the monolithic set_display_notes() into multiple smaller
functions and not using the return value of the function to assign to
`show_notes`.

Denton Liu (2):
  config/format.txt: clarify behavior of multiple format.notes
  notes: break set_display_notes() into smaller functions

 Documentation/config/format.txt | 18 +++++++++++++-
 builtin/log.c                   |  7 +++++-
 notes.c                         | 43 ++++++++++++++++++---------------
 notes.h                         | 19 +++++++++------
 revision.c                      |  6 ++---
 revision.h                      |  2 +-
 6 files changed, 62 insertions(+), 33 deletions(-)

-- 
2.24.0.627.geba02921db

^ permalink raw reply	[relevance 8%]

* Re: [PATCH v3 1/2] sequencer: move check_todo_list_from_file() to rebase-interactive.c
  @ 2019-12-06 14:38  6%     ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2019-12-06 14:38 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Phillip Wood, Junio C Hamano

Hi Alban,

On Tue, 3 Dec 2019, Alban Gruin wrote:

> The message contained in `edit_todo_list_advice' (sequencer.c) is
> printed after the initial edit of the todo list if it can't be parsed or
> if commits were dropped.  This is done either in complete_action() for
> `rebase -i', or in check_todo_list_from_file() for `rebase -p'.
>
> Since we want to add this check when editing the list, we also want to
> use this message from edit_todo_list() (rebase-interactive.c).  To this
> end, check_todo_list_from_file() is moved to rebase-interactive.c, and
> `edit_todo_list_advice' is copied there.  In the next commit,
> complete_action() will stop using it, and `edit_todo_list_advice' will
> be removed from sequencer.c.

Makes sense to me.

> diff --git a/rebase-interactive.c b/rebase-interactive.c
> index aa18ae82b7..ad5dd49c31 100644
> --- a/rebase-interactive.c
> +++ b/rebase-interactive.c
> @@ -187,3 +193,32 @@ int todo_list_check(struct todo_list *old_todo, struct todo_list *new_todo)
>  	clear_commit_seen(&commit_seen);
>  	return res;
>  }
> +
> +int check_todo_list_from_file(struct repository *r)
> +{
> +	struct todo_list old_todo = TODO_LIST_INIT, new_todo = TODO_LIST_INIT;
> +	int res = 0;
> +
> +	if (strbuf_read_file(&new_todo.buf, rebase_path_todo(), 0) < 0) {
> +		res = error(_("could not read '%s'."), rebase_path_todo());
> +		goto out;
> +	}
> +
> +	if (strbuf_read_file(&old_todo.buf, rebase_path_todo_backup(), 0) < 0) {
> +		res = error(_("could not read '%s'."), rebase_path_todo_backup());
> +		goto out;
> +	}
> +
> +	res = todo_list_parse_insn_buffer(r, old_todo.buf.buf, &old_todo);
> +	if (!res)
> +		res = todo_list_parse_insn_buffer(r, new_todo.buf.buf, &new_todo);
> +	if (!res)
> +		res = todo_list_check(&old_todo, &new_todo);
> +	if (res)
> +		fprintf(stderr, _(edit_todo_list_advice));
> +out:
> +	todo_list_release(&old_todo);
> +	todo_list_release(&new_todo);
> +
> +	return res;
> +}

No need to address the following concern in this patch series, but I do
think that a #leftoverbits project could be to simplify this to

	if (strbuf_read_file(&new_todo.buf, rebase_path_todo(), 0) < 0)
		res = error(_("could not read '%s'."), rebase_path_todo());
	else if (strbuf_read_file(&old_todo.buf, rebase_path_todo_backup(), 0) < 0)
		res = error(_("could not read '%s'."), rebase_path_todo_backup());
	else if ((res = todo_list_parse_insn_buffer(r, old_todo.buf.buf, &old_todo)) ||
		 (res = todo_list_parse_insn_buffer(r, new_todo.buf.buf, &new_todo)) ||
		 (res = todo_list_check(&old_todo, &new_todo)))
		fprintf(stderr, _(edit_todo_list_advice));

Ciao,
Dscho

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v3 22/22] t7700: stop losing return codes of git commands
  2019-11-25 23:57  0%       ` Denton Liu
@ 2019-11-26  0:58  0%         ` Eric Sunshine
  0 siblings, 0 replies; 200+ results
From: Eric Sunshine @ 2019-11-26  0:58 UTC (permalink / raw)
  To: Denton Liu; +Cc: Junio C Hamano, Git Mailing List, Jeff King

On Mon, Nov 25, 2019 at 6:57 PM Denton Liu <liu.denton@gmail.com> wrote:
> On Sat, Nov 23, 2019 at 10:49:44AM +0900, Junio C Hamano wrote:
> > Denton Liu <liu.denton@gmail.com> writes:
> > > -   objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 |
> > > -           sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&
> > > +   git verify-pack -v pack-$packsha1.idx >packlist &&
> > > +   objsha1=$(head -n 1 packlist | sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&
> >
> > We probably should lose reference to SHA-1 and use $OID_REGEX; this
> > is obviously a #leftoverbits material that is outside the scope of
> > this series.
>
> Since the theme of this series is test cleanup, I believe that it's
> probably appropriate to roll these changes (and the ones below that I
> omitted) into the current series. Since it isn't too much work, I'll
> send them out in my next reroll.

It may not be too much work for you to keep adding more (unrelated)
changes to a series, but doing so increases the burden on reviewers
unnecessarily, especially for a long patch series such as this one.
Generally speaking, each iteration should help the series converge to
the point at which it can finally land (be merged to "next"). Thus,
ideally, each iteration should have fewer changes than the previous
one.

When you add entirely new changes which are not directly related to
the changes which begat the series, that iteration diverges (not
converges). It creates extra work for reviewers (who are trying to
help you land the series) and makes it less likely that people will
want to review each new iteration since a series which diverges with
each iteration makes the goal of landing the series a moving target
(thus, represents never-ending review work).

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v3 22/22] t7700: stop losing return codes of git commands
  2019-11-23  1:49  6%     ` Junio C Hamano
@ 2019-11-25 23:57  0%       ` Denton Liu
  2019-11-26  0:58  0%         ` Eric Sunshine
  0 siblings, 1 reply; 200+ results
From: Denton Liu @ 2019-11-25 23:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Eric Sunshine, Jeff King

Hi Junio,

On Sat, Nov 23, 2019 at 10:49:44AM +0900, Junio C Hamano wrote:
> Denton Liu <liu.denton@gmail.com> writes:
> 
> > -	objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 |
> > -		sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&
> > +	git verify-pack -v pack-$packsha1.idx >packlist &&
> > +	objsha1=$(head -n 1 packlist | sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&
> 
> We probably should lose reference to SHA-1 and use $OID_REGEX; this
> is obviously a #leftoverbits material that is outside the scope of
> this series.

Since the theme of this series is test cleanup, I believe that it's
probably appropriate to roll these changes (and the ones below that I
omitted) into the current series. Since it isn't too much work, I'll
send them out in my next reroll.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v3 22/22] t7700: stop losing return codes of git commands
  @ 2019-11-23  1:49  6%     ` Junio C Hamano
  2019-11-25 23:57  0%       ` Denton Liu
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2019-11-23  1:49 UTC (permalink / raw)
  To: Denton Liu; +Cc: Git Mailing List, Eric Sunshine, Jeff King

Denton Liu <liu.denton@gmail.com> writes:

> -	objsha1=$(git verify-pack -v pack-$packsha1.idx | head -n 1 |
> -		sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&
> +	git verify-pack -v pack-$packsha1.idx >packlist &&
> +	objsha1=$(head -n 1 packlist | sed -e "s/^\([0-9a-f]\{40\}\).*/\1/") &&

We probably should lose reference to SHA-1 and use $OID_REGEX; this
is obviously a #leftoverbits material that is outside the scope of
this series.

> @@ -91,7 +93,8 @@ test_expect_success 'loose objects in alternate ODB are not repacked' '
>  	git prune-packed &&
>  	for p in .git/objects/pack/*.idx
>  	do
> -		if git verify-pack -v $p | egrep "^$objsha1"
> +		git verify-pack -v $p >packlist || return $?
> +		if egrep "^$objsha1" packlist
>  		then
>  			found_duplicate_object=1
>  			echo "DUPLICATE OBJECT FOUND"

These egrep that try to match lines that begin with an object name
can be a simple grep instead (again, outside the scope of this
series).

> @@ -109,15 +112,18 @@ test_expect_success 'packed obs in alt ODB are repacked even when local repo is
>  	test_path_is_file "$myidx" &&
>  	for p in alt_objects/pack/*.idx
>  	do
> -		git verify-pack -v $p | sed -n -e "/^[0-9a-f]\{40\}/p"
> -	done | while read sha1 rest
> +		git verify-pack -v $p >packlist || return $?
> +		sed -n -e "/^[0-9a-f]\{40\}/p"
> +	done >packs &&

A misleading filename?  The lines in this file are not pack files;
rather the file has a list of objects in various packs.

> +	git verify-pack -v $myidx >mypacklist &&
> +	while read sha1 rest
>  	do
> -		if ! ( git verify-pack -v $myidx | grep "^$sha1" )
> +		if ! grep "^$sha1" mypacklist
>  		then
>  			echo "Missing object in local pack: $sha1"
>  			return 1
>  		fi
> -	done
> +	done <packs
>  '

Again outside the scope of this series, but this looks O(n^2)
to me.

If I were writing this today, I would prepare a sorted list of all
object names (and nothing else on each line) in alt_objects/pack/ in
one file (call it 'orig'), and prepare another file with a sorted
list of all object names described in $myidx (call it 'dest'), and
then run "comm -23 orig dest" and see if there is anything that is
unique in the 'orig' file (i.e. something in 'orig' is missing from
'dest').

> @@ -132,15 +138,18 @@ test_expect_success 'packed obs in alt ODB are repacked when local repo has pack
>  	test_path_is_file "$myidx" &&
>  	for p in alt_objects/pack/*.idx
>  	do
> -		git verify-pack -v $p | sed -n -e "/^[0-9a-f]\{40\}/p"
> -	done | while read sha1 rest
> +		git verify-pack -v $p >packlist || return $?
> +		sed -n -e "/^[0-9a-f]\{40\}/p" packlist
> +	done >packs &&
> +	git verify-pack -v $myidx >mypacklist &&
> +	while read sha1 rest
>  	do
> -		if ! ( git verify-pack -v $myidx | grep "^$sha1" )
> +		if ! grep "^$sha1" mypacklist
>  		then
>  			echo "Missing object in local pack: $sha1"
>  			return 1
>  		fi
> -	done
> +	done <packs
>  '

Likewise.

> @@ -160,15 +169,18 @@ test_expect_success 'packed obs in alternate ODB kept pack are repacked' '
>  	test_path_is_file "$myidx" &&
>  	for p in alt_objects/pack/*.idx
>  	do
> -		git verify-pack -v $p | sed -n -e "/^[0-9a-f]\{40\}/p"
> -	done | while read sha1 rest
> +		git verify-pack -v $p >packlist || return $?
> +		sed -n -e "/^[0-9a-f]\{40\}/p" packlist
> +	done >packs &&
> +	git verify-pack -v $myidx >mypacklist &&
> +	while read sha1 rest
>  	do
> -		if ! ( git verify-pack -v $myidx | grep "^$sha1" )
> +		if ! grep "^$sha1" mypacklist
>  		then
>  			echo "Missing object in local pack: $sha1"
>  			return 1
>  		fi
> -	done
> +	done <packs
>  '

Likewise.


^ permalink raw reply	[relevance 6%]

* Re: Git in Outreachy December 2019?
  2019-09-23 18:07  0%   ` SZEDER Gábor
@ 2019-09-26 11:42  6%     ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2019-09-26 11:42 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Jeff King, git, Olga Telezhnaya, Christian Couder, Elijah Newren,
	Thomas Gummerer, Matheus Tavares Bernardino

[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]

Hi,

On Mon, 23 Sep 2019, SZEDER Gábor wrote:

> On Wed, Sep 04, 2019 at 03:41:15PM -0400, Jeff King wrote:
> > The project page has a section to point people in the right direction
> > for first-time contributions. I've left it blank for now, but I think it
> > makes sense to point one (or both) of:
> >
> >   - https://git-scm.com/docs/MyFirstContribution
> >
> >   - https://matheustavares.gitlab.io/posts/first-steps-contributing-to-git
> >
> > as well as a list of micro-projects (or at least instructions on how to
> > find #leftoverbits, though we'd definitely have to step up our labeling,
> > as I do not recall having seen one for a while).
>
> And we should make sure that all microprojects are indeed micro in
> size.  Matheus sent v8 of a 10 patch series in July that started out
> as a microproject back in February...

Indeed.

> Here is one more idea for microprojects:
>
>   Find a group of related preprocessor constants and turn them into an
>   enum.  Also find where those constants are stored in variables and
>   in structs and passed around as function parameters, and change the
>   type of those variables, fields and parameters to the new enum.

I agree that this is a good suggestion, and turned this #leftoverbits
into https://github.com/gitgitgadget/git/issues/357.

Ciao,
Dscho

^ permalink raw reply	[relevance 6%]

* Re: Git in Outreachy December 2019?
  2019-09-04 19:41  6% ` Jeff King
  2019-09-08 14:56  0%   ` Pratyush Yadav
@ 2019-09-23 18:07  0%   ` SZEDER Gábor
  2019-09-26 11:42  6%     ` Johannes Schindelin
  1 sibling, 1 reply; 200+ results
From: SZEDER Gábor @ 2019-09-23 18:07 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Olga Telezhnaya, Christian Couder, Elijah Newren,
	Thomas Gummerer, Matheus Tavares Bernardino

On Wed, Sep 04, 2019 at 03:41:15PM -0400, Jeff King wrote:
> The project page has a section to point people in the right direction
> for first-time contributions. I've left it blank for now, but I think it
> makes sense to point one (or both) of:
> 
>   - https://git-scm.com/docs/MyFirstContribution
> 
>   - https://matheustavares.gitlab.io/posts/first-steps-contributing-to-git
> 
> as well as a list of micro-projects (or at least instructions on how to
> find #leftoverbits, though we'd definitely have to step up our labeling,
> as I do not recall having seen one for a while).

And we should make sure that all microprojects are indeed micro in
size.  Matheus sent v8 of a 10 patch series in July that started out
as a microproject back in February...

Here is one more idea for microprojects:

  Find a group of related preprocessor constants and turn them into an
  enum.  Also find where those constants are stored in variables and
  in structs and passed around as function parameters, and change the
  type of those variables, fields and parameters to the new enum.

^ permalink raw reply	[relevance 0%]

* Re: Git in Outreachy December 2019?
  2019-09-04 19:41  6% ` Jeff King
@ 2019-09-08 14:56  0%   ` Pratyush Yadav
  2019-09-23 18:07  0%   ` SZEDER Gábor
  1 sibling, 0 replies; 200+ results
From: Pratyush Yadav @ 2019-09-08 14:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi Jeff,

On 04/09/19 03:41PM, Jeff King wrote:
[snip]
> The project page has a section to point people in the right direction
> for first-time contributions. I've left it blank for now, but I think it
> makes sense to point one (or both) of:
> 
>   - https://git-scm.com/docs/MyFirstContribution
> 
>   - https://matheustavares.gitlab.io/posts/first-steps-contributing-to-git
> 
> as well as a list of micro-projects (or at least instructions on how to
> find #leftoverbits, though we'd definitely have to step up our labeling,
> as I do not recall having seen one for a while).

I'd like to put out a proposal regarding first contributions and micro 
projects.

I have a small list of small isolated features and bug fixes that
_I think_ git-gui would benefit with. And other people using it can 
probably add their pet peeves and issues as well. My question is, are 
these something new contributors should try to work on as an 
introduction to the community? Since most of these features and fixes 
are small and isolated, they should be pretty easy to work on. And I 
think people generally find UI apps a little easier to work on.

But I'll play the devil's advocate on my proposal and point out some 
problems/flaws:
- Git-gui is written in Tcl, and git in C (and other languages too, but 
  not Tcl). That means while people do get a feel of the community and 
  general workflow, they don't necessarily get a feel of the actual git 
  internal codebase.
- Since I don't see a git-gui related project worth being into the 
  Outreachy program, it essentially means they will likely not work on 
  anything related to their project.
- Git-gui is essentially a wrapper on top of git, so people won't get 
  exposure to the git internals.

I'd like to hear your and the rest of the community's thoughts about 
this proposal, and whether it will be a good idea or not.

If people do like this idea, I can do a write up on "things to fix in 
git-gui" that people can add to (and they get a chance to call me stupid 
for even thinking feature X is a good idea ;)).

-- 
Regards,
Pratyush Yadav

^ permalink raw reply	[relevance 0%]

* Re: [RFC PATCH v2 12/12] clean: fix theoretical path corruption
  @ 2019-09-07  0:34  6%       ` Elijah Newren
  0 siblings, 0 replies; 200+ results
From: Elijah Newren @ 2019-09-07  0:34 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Git Mailing List, Jeff King, Rafael Ascensão, Samuel Lijin

On Thu, Sep 5, 2019 at 12:27 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Thu, Sep 05, 2019 at 08:47:35AM -0700, Elijah Newren wrote:
> > cmd_clean() had the following code structure:
> >
> >     struct strbuf abs_path = STRBUF_INIT;
> >     for_each_string_list_item(item, &del_list) {
> >         strbuf_addstr(&abs_path, prefix);
> >         strbuf_addstr(&abs_path, item->string);
> >         PROCESS(&abs_path);
> >         strbuf_reset(&abs_path);
> >     }
> >
> > where I've elided a bunch of unnecessary details and PROCESS(&abs_path)
> > represents a big chunk of code rather than an actual function call.  One
> > piece of PROCESS was:
> >
> >     if (lstat(abs_path.buf, &st))
> >         continue;
> >
> > which would cause the strbuf_reset() to be missed -- meaning that the
> > next path to be handled would have two paths concatenated.  This path
> > used to use die_errno() instead of continue prior to commit 396049e5fb62
> > ("git-clean: refactor git-clean into two phases", 2013-06-25), but my
> > understanding of how correct_untracked_entries() works is that it will
> > prevent both dir/ and dir/file from being in the list to clean so this
> > should be dead code and the die_errno() should be safe.  But I hesitate
> > to remove it since I am not certain.  Instead, just fix it to avoid path
> > corruption in case it is possible to reach this continue statement.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  builtin/clean.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/clean.c b/builtin/clean.c
> > index 6030842f3a..ccb6e23f0b 100644
> > --- a/builtin/clean.c
> > +++ b/builtin/clean.c
> > @@ -1028,8 +1028,10 @@ int cmd_clean(int argc, const char **argv, const char *prefix)
> >                * recursive directory removal, so lstat() here could
> >                * fail with ENOENT.
> >                */
> > -             if (lstat(abs_path.buf, &st))
> > +             if (lstat(abs_path.buf, &st)) {
> > +                     strbuf_reset(&abs_path);
> >                       continue;
> > +             }
>
> I wonder whether it would be safer to call strbuf_reset() at the start
> of each loop iteration instead of before 'continue'.  That way we
> wouldn't have to worry about another 'continue' statements forgetting
> about it.
>
> It probably doesn't really matter in this particular case (considering
> that it's potentially dead code to begin with), but have a look at
> e.g. diff.c:show_stats() and its several strbuf_reset(&out) calls
> preceeding continue statements.

Ooh, I like that idea.  I think I'll apply that here.  I'll probably
leave diff.c:show_stats() as #leftoverbits for someone else, though I
really like the idea of fixing up other issues like this as you
suggest.

^ permalink raw reply	[relevance 6%]

* Re: [RFC PATCH 0/1] commit-graph.c: handle corrupt commit trees
  2019-09-04 21:21  6%   ` Taylor Blau
  2019-09-05  6:08  0%     ` Jeff King
@ 2019-09-06 16:48  0%     ` Derrick Stolee
  1 sibling, 0 replies; 200+ results
From: Derrick Stolee @ 2019-09-06 16:48 UTC (permalink / raw)
  To: Taylor Blau, Garima Singh; +Cc: git, peff

On 9/4/2019 5:21 PM, Taylor Blau wrote:
> Hi Garima,
> 
> On Wed, Sep 04, 2019 at 02:25:55PM -0400, Garima Singh wrote:
>>
>> On 9/3/2019 10:22 PM, Taylor Blau wrote:
>>> Hi,
>>>
>>> I was running some of the new 'git commit-graph' commands, and noticed
>>> that I could consistently get 'git commit-graph write --reachable' to
>>> segfault when a commit's root tree is corrupt.
>>>
>>> I have an extremely-unfinished fix attached as an RFC PATCH below, but I
>>> wanted to get a few thoughts on this before sending it out as a non-RFC.
>>>
>>> In my patch, I simply 'die()' when a commit isn't able to be parsed
>>> (i.e., when 'parse_commit_no_graph' returns a non-zero code), but I
>>> wanted to see if others thought that this was an OK approach. Some
>>> thoughts:
>>
>> I like the idea of completely bailing if the commit can't be parsed too.
>> Only question: Is there a reason you chose to die() instead of BUG() like
>> the other two places in that function? What is the criteria of choosing one
>> over the other?
> 
> I did not call 'BUG' here because 'BUG' is traditionally used to
> indicate an internal bug, e.g., an unexpected state or some such. On the
> other side of that coin, 'BUG' is _not_ used to indicate repository
> corruption, since that is not an issue in the Git codebase, rather in
> the user's repository.
> 
> Though, to be honest, I've never seen that rule written out explicitly
> (maybe if it were to be written somewhere, it could be stored in
> Documentation/CodingGuidelines?). I think that this is some good
> #leftoverbits material.
> 
>>>
>>>    * It seems like we could write a commit-graph by placing a "filler"
>>>      entry where the broken commit would have gone. I don't see any place
>>>      where this is implemented currently, but this seems like a viable
>>>      alternative to not writing _any_ commits into the commit-graph.
>>
>> I would rather we didn't do this cause it will probably kick open the can of
>> always watching for that filler when we are working with the commit-graph.
>> Or do we already do that today? Maybe @stolee can chime in on what we do in
>> cases of shallow clones and other potential gaps in the walk
> 
> Yeah, I think that the consensus is that it makes sense to just die
> here, which is fine by me.

I agree the die() is the best thing to do for now.

If we wanted to salvage as much as possible, then we could use these
corrupt marks and then use the "reverse walk" in compute_generation_numbers()
to mark all commits that can reach the corrupt commit as corrupt.
We would then need to remove all corrupt commits from the list we are
planning to write.

However, that is just hiding a corrupt object in the object database,
which is not a situation we want to leave unnoticed.

Thanks,
-Stolee

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] t: use common $SQ variable
  2019-09-05 22:10  1%         ` [PATCH] t: use common $SQ variable Denton Liu
  2019-09-05 22:25  6%           ` Taylor Blau
@ 2019-09-06  2:04  0%           ` Denton Liu
  1 sibling, 0 replies; 200+ results
From: Denton Liu @ 2019-09-06  2:04 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Jeff King, Taylor Blau

I should've noted earlier that this patch applies cleanly on top of the
"jc/tests-use-lf-from-test-lib" branch.

On Thu, Sep 05, 2019 at 03:10:05PM -0700, Denton Liu wrote:
> In many test scripts, there are bespoke definitions of the single quote
> that are some variation of this:
> 
>     SQ="'"
> 
> Define a common $SQ variable in test-lib.sh and replace all usages of
> these bespoke variables with the common one.
> 
> This change was done by running `git grep =\"\'\" t/` and
> `git grep =\\\\\'` and manually changing the resulting definitions and

Oops, this invocation wasn't exactly correct; it's missing the `t/` at
the end. The full invocation should read `git grep =\\\\\' t/`.

> corresponding usages.
> 
> Signed-off-by: Denton Liu <liu.denton@gmail.com>
> ---
> 
> [whoops, forgot to include the mailing list in the last email]
> 
> Sorry, I wrote this patch up before I saw the email about leaving this
> as #leftoverbits. No point in letting it go to waste, though.
> 
>  t/t1300-config.sh              |  9 +++--
>  t/t1404-update-ref-errors.sh   | 64 ++++++++++++++++------------------
>  t/t1414-reflog-walk.sh         |  3 +-
>  t/t1506-rev-parse-diagnosis.sh |  5 ++-
>  t/t1507-rev-parse-upstream.sh  | 12 +++----
>  t/t3005-ls-files-relative.sh   |  9 +++--
>  t/t3404-rebase-interactive.sh  |  1 -
>  t/t3430-rebase-merges.sh       |  1 -
>  t/t5601-clone.sh               |  1 -
>  t/t7406-submodule-update.sh    |  3 +-
>  t/test-lib.sh                  |  3 ++
>  11 files changed, 51 insertions(+), 60 deletions(-)
> 
> diff --git a/t/t1300-config.sh b/t/t1300-config.sh
> index 428177c390..983a0a1583 100755
> --- a/t/t1300-config.sh
> +++ b/t/t1300-config.sh
> @@ -1294,26 +1294,25 @@ test_expect_success 'git -c is not confused by empty environment' '
>  	GIT_CONFIG_PARAMETERS="" git -c x.one=1 config --list
>  '
>  
> -sq="'"
>  test_expect_success 'detect bogus GIT_CONFIG_PARAMETERS' '
>  	cat >expect <<-\EOF &&
>  	env.one one
>  	env.two two
>  	EOF
> -	GIT_CONFIG_PARAMETERS="${sq}env.one=one${sq} ${sq}env.two=two${sq}" \
> +	GIT_CONFIG_PARAMETERS="${SQ}env.one=one${SQ} ${SQ}env.two=two${SQ}" \
>  		git config --get-regexp "env.*" >actual &&
>  	test_cmp expect actual &&
>  
>  	cat >expect <<-EOF &&
> -	env.one one${sq}
> +	env.one one${SQ}
>  	env.two two
>  	EOF
> -	GIT_CONFIG_PARAMETERS="${sq}env.one=one${sq}\\$sq$sq$sq ${sq}env.two=two${sq}" \
> +	GIT_CONFIG_PARAMETERS="${SQ}env.one=one${SQ}\\$SQ$SQ$SQ ${SQ}env.two=two${SQ}" \
>  		git config --get-regexp "env.*" >actual &&
>  	test_cmp expect actual &&
>  
>  	test_must_fail env \
> -		GIT_CONFIG_PARAMETERS="${sq}env.one=one${sq}\\$sq ${sq}env.two=two${sq}" \
> +		GIT_CONFIG_PARAMETERS="${SQ}env.one=one${SQ}\\$SQ ${SQ}env.two=two${SQ}" \
>  		git config --get-regexp "env.*"
>  '
>  
> diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
> index 970c5c36b9..2d142e5535 100755
> --- a/t/t1404-update-ref-errors.sh
> +++ b/t/t1404-update-ref-errors.sh
> @@ -32,8 +32,6 @@ test_update_rejected () {
>  	test_cmp unchanged actual
>  }
>  
> -Q="'"
> -
>  # Test adding and deleting D/F-conflicting references in a single
>  # transaction.
>  df_test() {
> @@ -93,7 +91,7 @@ df_test() {
>  		delname="$delref"
>  	fi &&
>  	cat >expected-err <<-EOF &&
> -	fatal: cannot lock ref $Q$addname$Q: $Q$delref$Q exists; cannot create $Q$addref$Q
> +	fatal: cannot lock ref $SQ$addname$SQ: $SQ$delref$SQ exists; cannot create $SQ$addref$SQ
>  	EOF
>  	$pack &&
>  	if $add_del
> @@ -123,7 +121,7 @@ test_expect_success 'existing loose ref is a simple prefix of new' '
>  
>  	prefix=refs/1l &&
>  	test_update_rejected "a c e" false "b c/x d" \
> -		"$Q$prefix/c$Q exists; cannot create $Q$prefix/c/x$Q"
> +		"$SQ$prefix/c$SQ exists; cannot create $SQ$prefix/c/x$SQ"
>  
>  '
>  
> @@ -131,7 +129,7 @@ test_expect_success 'existing packed ref is a simple prefix of new' '
>  
>  	prefix=refs/1p &&
>  	test_update_rejected "a c e" true "b c/x d" \
> -		"$Q$prefix/c$Q exists; cannot create $Q$prefix/c/x$Q"
> +		"$SQ$prefix/c$SQ exists; cannot create $SQ$prefix/c/x$SQ"
>  
>  '
>  
> @@ -139,7 +137,7 @@ test_expect_success 'existing loose ref is a deeper prefix of new' '
>  
>  	prefix=refs/2l &&
>  	test_update_rejected "a c e" false "b c/x/y d" \
> -		"$Q$prefix/c$Q exists; cannot create $Q$prefix/c/x/y$Q"
> +		"$SQ$prefix/c$SQ exists; cannot create $SQ$prefix/c/x/y$SQ"
>  
>  '
>  
> @@ -147,7 +145,7 @@ test_expect_success 'existing packed ref is a deeper prefix of new' '
>  
>  	prefix=refs/2p &&
>  	test_update_rejected "a c e" true "b c/x/y d" \
> -		"$Q$prefix/c$Q exists; cannot create $Q$prefix/c/x/y$Q"
> +		"$SQ$prefix/c$SQ exists; cannot create $SQ$prefix/c/x/y$SQ"
>  
>  '
>  
> @@ -155,7 +153,7 @@ test_expect_success 'new ref is a simple prefix of existing loose' '
>  
>  	prefix=refs/3l &&
>  	test_update_rejected "a c/x e" false "b c d" \
> -		"$Q$prefix/c/x$Q exists; cannot create $Q$prefix/c$Q"
> +		"$SQ$prefix/c/x$SQ exists; cannot create $SQ$prefix/c$SQ"
>  
>  '
>  
> @@ -163,7 +161,7 @@ test_expect_success 'new ref is a simple prefix of existing packed' '
>  
>  	prefix=refs/3p &&
>  	test_update_rejected "a c/x e" true "b c d" \
> -		"$Q$prefix/c/x$Q exists; cannot create $Q$prefix/c$Q"
> +		"$SQ$prefix/c/x$SQ exists; cannot create $SQ$prefix/c$SQ"
>  
>  '
>  
> @@ -171,7 +169,7 @@ test_expect_success 'new ref is a deeper prefix of existing loose' '
>  
>  	prefix=refs/4l &&
>  	test_update_rejected "a c/x/y e" false "b c d" \
> -		"$Q$prefix/c/x/y$Q exists; cannot create $Q$prefix/c$Q"
> +		"$SQ$prefix/c/x/y$SQ exists; cannot create $SQ$prefix/c$SQ"
>  
>  '
>  
> @@ -179,7 +177,7 @@ test_expect_success 'new ref is a deeper prefix of existing packed' '
>  
>  	prefix=refs/4p &&
>  	test_update_rejected "a c/x/y e" true "b c d" \
> -		"$Q$prefix/c/x/y$Q exists; cannot create $Q$prefix/c$Q"
> +		"$SQ$prefix/c/x/y$SQ exists; cannot create $SQ$prefix/c$SQ"
>  
>  '
>  
> @@ -187,7 +185,7 @@ test_expect_success 'one new ref is a simple prefix of another' '
>  
>  	prefix=refs/5 &&
>  	test_update_rejected "a e" false "b c c/x d" \
> -		"cannot process $Q$prefix/c$Q and $Q$prefix/c/x$Q at the same time"
> +		"cannot process $SQ$prefix/c$SQ and $SQ$prefix/c/x$SQ at the same time"
>  
>  '
>  
> @@ -334,7 +332,7 @@ test_expect_success 'D/F conflict prevents indirect delete long packed + indirec
>  test_expect_success 'missing old value blocks update' '
>  	prefix=refs/missing-update &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/foo$Q: unable to resolve reference $Q$prefix/foo$Q
> +	fatal: cannot lock ref $SQ$prefix/foo$SQ: unable to resolve reference $SQ$prefix/foo$SQ
>  	EOF
>  	printf "%s\n" "update $prefix/foo $E $D" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -345,7 +343,7 @@ test_expect_success 'incorrect old value blocks update' '
>  	prefix=refs/incorrect-update &&
>  	git update-ref $prefix/foo $C &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/foo$Q: is at $C but expected $D
> +	fatal: cannot lock ref $SQ$prefix/foo$SQ: is at $C but expected $D
>  	EOF
>  	printf "%s\n" "update $prefix/foo $E $D" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -356,7 +354,7 @@ test_expect_success 'existing old value blocks create' '
>  	prefix=refs/existing-create &&
>  	git update-ref $prefix/foo $C &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/foo$Q: reference already exists
> +	fatal: cannot lock ref $SQ$prefix/foo$SQ: reference already exists
>  	EOF
>  	printf "%s\n" "create $prefix/foo $E" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -367,7 +365,7 @@ test_expect_success 'incorrect old value blocks delete' '
>  	prefix=refs/incorrect-delete &&
>  	git update-ref $prefix/foo $C &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/foo$Q: is at $C but expected $D
> +	fatal: cannot lock ref $SQ$prefix/foo$SQ: is at $C but expected $D
>  	EOF
>  	printf "%s\n" "delete $prefix/foo $D" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -378,7 +376,7 @@ test_expect_success 'missing old value blocks indirect update' '
>  	prefix=refs/missing-indirect-update &&
>  	git symbolic-ref $prefix/symref $prefix/foo &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/symref$Q: unable to resolve reference $Q$prefix/foo$Q
> +	fatal: cannot lock ref $SQ$prefix/symref$SQ: unable to resolve reference $SQ$prefix/foo$SQ
>  	EOF
>  	printf "%s\n" "update $prefix/symref $E $D" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -390,7 +388,7 @@ test_expect_success 'incorrect old value blocks indirect update' '
>  	git symbolic-ref $prefix/symref $prefix/foo &&
>  	git update-ref $prefix/foo $C &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/symref$Q: is at $C but expected $D
> +	fatal: cannot lock ref $SQ$prefix/symref$SQ: is at $C but expected $D
>  	EOF
>  	printf "%s\n" "update $prefix/symref $E $D" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -402,7 +400,7 @@ test_expect_success 'existing old value blocks indirect create' '
>  	git symbolic-ref $prefix/symref $prefix/foo &&
>  	git update-ref $prefix/foo $C &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/symref$Q: reference already exists
> +	fatal: cannot lock ref $SQ$prefix/symref$SQ: reference already exists
>  	EOF
>  	printf "%s\n" "create $prefix/symref $E" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -414,7 +412,7 @@ test_expect_success 'incorrect old value blocks indirect delete' '
>  	git symbolic-ref $prefix/symref $prefix/foo &&
>  	git update-ref $prefix/foo $C &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/symref$Q: is at $C but expected $D
> +	fatal: cannot lock ref $SQ$prefix/symref$SQ: is at $C but expected $D
>  	EOF
>  	printf "%s\n" "delete $prefix/symref $D" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -425,7 +423,7 @@ test_expect_success 'missing old value blocks indirect no-deref update' '
>  	prefix=refs/missing-noderef-update &&
>  	git symbolic-ref $prefix/symref $prefix/foo &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/symref$Q: reference is missing but expected $D
> +	fatal: cannot lock ref $SQ$prefix/symref$SQ: reference is missing but expected $D
>  	EOF
>  	printf "%s\n" "option no-deref" "update $prefix/symref $E $D" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -437,7 +435,7 @@ test_expect_success 'incorrect old value blocks indirect no-deref update' '
>  	git symbolic-ref $prefix/symref $prefix/foo &&
>  	git update-ref $prefix/foo $C &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/symref$Q: is at $C but expected $D
> +	fatal: cannot lock ref $SQ$prefix/symref$SQ: is at $C but expected $D
>  	EOF
>  	printf "%s\n" "option no-deref" "update $prefix/symref $E $D" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -449,7 +447,7 @@ test_expect_success 'existing old value blocks indirect no-deref create' '
>  	git symbolic-ref $prefix/symref $prefix/foo &&
>  	git update-ref $prefix/foo $C &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/symref$Q: reference already exists
> +	fatal: cannot lock ref $SQ$prefix/symref$SQ: reference already exists
>  	EOF
>  	printf "%s\n" "option no-deref" "create $prefix/symref $E" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -461,7 +459,7 @@ test_expect_success 'incorrect old value blocks indirect no-deref delete' '
>  	git symbolic-ref $prefix/symref $prefix/foo &&
>  	git update-ref $prefix/foo $C &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/symref$Q: is at $C but expected $D
> +	fatal: cannot lock ref $SQ$prefix/symref$SQ: is at $C but expected $D
>  	EOF
>  	printf "%s\n" "option no-deref" "delete $prefix/symref $D" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -474,13 +472,13 @@ test_expect_success 'non-empty directory blocks create' '
>  	: >.git/$prefix/foo/bar/baz.lock &&
>  	test_when_finished "rm -f .git/$prefix/foo/bar/baz.lock" &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/foo$Q: there is a non-empty directory $Q.git/$prefix/foo$Q blocking reference $Q$prefix/foo$Q
> +	fatal: cannot lock ref $SQ$prefix/foo$SQ: there is a non-empty directory $SQ.git/$prefix/foo$SQ blocking reference $SQ$prefix/foo$SQ
>  	EOF
>  	printf "%s\n" "update $prefix/foo $C" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
>  	test_cmp expected output.err &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/foo$Q: unable to resolve reference $Q$prefix/foo$Q
> +	fatal: cannot lock ref $SQ$prefix/foo$SQ: unable to resolve reference $SQ$prefix/foo$SQ
>  	EOF
>  	printf "%s\n" "update $prefix/foo $D $C" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -493,13 +491,13 @@ test_expect_success 'broken reference blocks create' '
>  	echo "gobbledigook" >.git/$prefix/foo &&
>  	test_when_finished "rm -f .git/$prefix/foo" &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/foo$Q: unable to resolve reference $Q$prefix/foo$Q: reference broken
> +	fatal: cannot lock ref $SQ$prefix/foo$SQ: unable to resolve reference $SQ$prefix/foo$SQ: reference broken
>  	EOF
>  	printf "%s\n" "update $prefix/foo $C" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
>  	test_cmp expected output.err &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/foo$Q: unable to resolve reference $Q$prefix/foo$Q: reference broken
> +	fatal: cannot lock ref $SQ$prefix/foo$SQ: unable to resolve reference $SQ$prefix/foo$SQ: reference broken
>  	EOF
>  	printf "%s\n" "update $prefix/foo $D $C" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -513,13 +511,13 @@ test_expect_success 'non-empty directory blocks indirect create' '
>  	: >.git/$prefix/foo/bar/baz.lock &&
>  	test_when_finished "rm -f .git/$prefix/foo/bar/baz.lock" &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/symref$Q: there is a non-empty directory $Q.git/$prefix/foo$Q blocking reference $Q$prefix/foo$Q
> +	fatal: cannot lock ref $SQ$prefix/symref$SQ: there is a non-empty directory $SQ.git/$prefix/foo$SQ blocking reference $SQ$prefix/foo$SQ
>  	EOF
>  	printf "%s\n" "update $prefix/symref $C" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
>  	test_cmp expected output.err &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/symref$Q: unable to resolve reference $Q$prefix/foo$Q
> +	fatal: cannot lock ref $SQ$prefix/symref$SQ: unable to resolve reference $SQ$prefix/foo$SQ
>  	EOF
>  	printf "%s\n" "update $prefix/symref $D $C" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -532,13 +530,13 @@ test_expect_success 'broken reference blocks indirect create' '
>  	echo "gobbledigook" >.git/$prefix/foo &&
>  	test_when_finished "rm -f .git/$prefix/foo" &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/symref$Q: unable to resolve reference $Q$prefix/foo$Q: reference broken
> +	fatal: cannot lock ref $SQ$prefix/symref$SQ: unable to resolve reference $SQ$prefix/foo$SQ: reference broken
>  	EOF
>  	printf "%s\n" "update $prefix/symref $C" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
>  	test_cmp expected output.err &&
>  	cat >expected <<-EOF &&
> -	fatal: cannot lock ref $Q$prefix/symref$Q: unable to resolve reference $Q$prefix/foo$Q: reference broken
> +	fatal: cannot lock ref $SQ$prefix/symref$SQ: unable to resolve reference $SQ$prefix/foo$SQ: reference broken
>  	EOF
>  	printf "%s\n" "update $prefix/symref $D $C" |
>  	test_must_fail git update-ref --stdin 2>output.err &&
> @@ -614,7 +612,7 @@ test_expect_success 'delete fails cleanly if packed-refs file is locked' '
>  	test_when_finished "rm -f .git/packed-refs.lock" &&
>  	test_must_fail git update-ref -d $prefix/foo >out 2>err &&
>  	git for-each-ref $prefix >actual &&
> -	test_i18ngrep "Unable to create $Q.*packed-refs.lock$Q: " err &&
> +	test_i18ngrep "Unable to create $SQ.*packed-refs.lock$SQ: " err &&
>  	test_cmp unchanged actual
>  '
>  
> diff --git a/t/t1414-reflog-walk.sh b/t/t1414-reflog-walk.sh
> index feb1efd8ff..1181a9fb28 100755
> --- a/t/t1414-reflog-walk.sh
> +++ b/t/t1414-reflog-walk.sh
> @@ -18,10 +18,9 @@ do_walk () {
>  	git log -g --format="%gd %gs" "$@"
>  }
>  
> -sq="'"
>  test_expect_success 'set up expected reflog' '
>  	cat >expect.all <<-EOF
> -	HEAD@{0} commit (merge): Merge branch ${sq}master${sq} into side
> +	HEAD@{0} commit (merge): Merge branch ${SQ}master${SQ} into side
>  	HEAD@{1} commit: three
>  	HEAD@{2} checkout: moving from master to side
>  	HEAD@{3} commit: two
> diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
> index 4ee009da66..21a9c8ffb2 100755
> --- a/t/t1506-rev-parse-diagnosis.sh
> +++ b/t/t1506-rev-parse-diagnosis.sh
> @@ -8,10 +8,9 @@ exec </dev/null
>  
>  test_did_you_mean ()
>  {
> -	sq="'" &&
>  	cat >expected <<-EOF &&
> -	fatal: Path '$2$3' $4, but not ${5:-$sq$3$sq}.
> -	Did you mean '$1:$2$3'${2:+ aka $sq$1:./$3$sq}?
> +	fatal: Path '$2$3' $4, but not ${5:-$SQ$3$SQ}.
> +	Did you mean '$1:$2$3'${2:+ aka $SQ$1:./$3$SQ}?
>  	EOF
>  	test_cmp expected error
>  }
> diff --git a/t/t1507-rev-parse-upstream.sh b/t/t1507-rev-parse-upstream.sh
> index fa3e499641..8b4cf8a6e3 100755
> --- a/t/t1507-rev-parse-upstream.sh
> +++ b/t/t1507-rev-parse-upstream.sh
> @@ -28,8 +28,6 @@ test_expect_success 'setup' '
>  	)
>  '
>  
> -sq="'"
> -
>  full_name () {
>  	(cd clone &&
>  	 git rev-parse --symbolic-full-name "$@")
> @@ -129,7 +127,7 @@ test_expect_success 'merge my-side@{u} records the correct name' '
>  	git branch -t new my-side@{u} &&
>  	git merge -s ours new@{u} &&
>  	git show -s --pretty=tformat:%s >actual &&
> -	echo "Merge remote-tracking branch ${sq}origin/side${sq}" >expect &&
> +	echo "Merge remote-tracking branch ${SQ}origin/side${SQ}" >expect &&
>  	test_cmp expect actual
>  )
>  '
> @@ -156,7 +154,7 @@ test_expect_success 'branch@{u} works when tracking a local branch' '
>  
>  test_expect_success 'branch@{u} error message when no upstream' '
>  	cat >expect <<-EOF &&
> -	fatal: no upstream configured for branch ${sq}non-tracking${sq}
> +	fatal: no upstream configured for branch ${SQ}non-tracking${SQ}
>  	EOF
>  	error_message non-tracking@{u} &&
>  	test_i18ncmp expect error
> @@ -164,7 +162,7 @@ test_expect_success 'branch@{u} error message when no upstream' '
>  
>  test_expect_success '@{u} error message when no upstream' '
>  	cat >expect <<-EOF &&
> -	fatal: no upstream configured for branch ${sq}master${sq}
> +	fatal: no upstream configured for branch ${SQ}master${SQ}
>  	EOF
>  	test_must_fail git rev-parse --verify @{u} 2>actual &&
>  	test_i18ncmp expect actual
> @@ -172,7 +170,7 @@ test_expect_success '@{u} error message when no upstream' '
>  
>  test_expect_success 'branch@{u} error message with misspelt branch' '
>  	cat >expect <<-EOF &&
> -	fatal: no such branch: ${sq}no-such-branch${sq}
> +	fatal: no such branch: ${SQ}no-such-branch${SQ}
>  	EOF
>  	error_message no-such-branch@{u} &&
>  	test_i18ncmp expect error
> @@ -189,7 +187,7 @@ test_expect_success '@{u} error message when not on a branch' '
>  
>  test_expect_success 'branch@{u} error message if upstream branch not fetched' '
>  	cat >expect <<-EOF &&
> -	fatal: upstream branch ${sq}refs/heads/side${sq} not stored as a remote-tracking branch
> +	fatal: upstream branch ${SQ}refs/heads/side${SQ} not stored as a remote-tracking branch
>  	EOF
>  	error_message bad-upstream@{u} &&
>  	test_i18ncmp expect error
> diff --git a/t/t3005-ls-files-relative.sh b/t/t3005-ls-files-relative.sh
> index 209b4c7cd8..c841f9b454 100755
> --- a/t/t3005-ls-files-relative.sh
> +++ b/t/t3005-ls-files-relative.sh
> @@ -9,7 +9,6 @@ This test runs git ls-files with various relative path arguments.
>  
>  new_line='
>  '
> -sq=\'
>  
>  test_expect_success 'prepare' '
>  	: >never-mind-me &&
> @@ -44,9 +43,9 @@ test_expect_success 'ls-files -c' '
>  		cd top/sub &&
>  		for f in ../y*
>  		do
> -			echo "error: pathspec $sq$f$sq did not match any file(s) known to git"
> +			echo "error: pathspec $SQ$f$SQ did not match any file(s) known to git"
>  		done >expect.err &&
> -		echo "Did you forget to ${sq}git add${sq}?" >>expect.err &&
> +		echo "Did you forget to ${SQ}git add${SQ}?" >>expect.err &&
>  		ls ../x* >expect.out &&
>  		test_must_fail git ls-files -c --error-unmatch ../[xy]* >actual.out 2>actual.err &&
>  		test_cmp expect.out actual.out &&
> @@ -59,9 +58,9 @@ test_expect_success 'ls-files -o' '
>  		cd top/sub &&
>  		for f in ../x*
>  		do
> -			echo "error: pathspec $sq$f$sq did not match any file(s) known to git"
> +			echo "error: pathspec $SQ$f$SQ did not match any file(s) known to git"
>  		done >expect.err &&
> -		echo "Did you forget to ${sq}git add${sq}?" >>expect.err &&
> +		echo "Did you forget to ${SQ}git add${SQ}?" >>expect.err &&
>  		ls ../y* >expect.out &&
>  		test_must_fail git ls-files -o --error-unmatch ../[xy]* >actual.out 2>actual.err &&
>  		test_cmp expect.out actual.out &&
> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index 461dd539ff..9c152b6245 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh
> @@ -1419,7 +1419,6 @@ test_expect_success 'editor saves as CR/LF' '
>  	)
>  '
>  
> -SQ="'"
>  test_expect_success 'rebase -i --gpg-sign=<key-id>' '
>  	test_when_finished "test_might_fail git rebase --abort" &&
>  	set_fake_editor &&
> diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
> index 7b6c4847ad..11141ac864 100755
> --- a/t/t3430-rebase-merges.sh
> +++ b/t/t3430-rebase-merges.sh
> @@ -151,7 +151,6 @@ test_expect_success 'failed `merge -C` writes patch (may be rescheduled, too)' '
>  	test_path_is_file .git/rebase-merge/patch
>  '
>  
> -SQ="'"
>  test_expect_success 'failed `merge <branch>` does not crash' '
>  	test_when_finished "test_might_fail git rebase --abort" &&
>  	git checkout conflicting-G &&
> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index 4a3b901f06..3be025a658 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -434,7 +434,6 @@ test_expect_success 'double quoted plink.exe in GIT_SSH_COMMAND' '
>  	expect_ssh "-v -P 123" myhost src
>  '
>  
> -SQ="'"
>  test_expect_success 'single quoted plink.exe in GIT_SSH_COMMAND' '
>  	copy_ssh_wrapper_as "$TRASH_DIRECTORY/plink.exe" &&
>  	GIT_SSH_COMMAND="$SQ$TRASH_DIRECTORY/plink.exe$SQ -v" \
> diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
> index c973278300..df34c994d2 100755
> --- a/t/t7406-submodule-update.sh
> +++ b/t/t7406-submodule-update.sh
> @@ -158,7 +158,6 @@ test_expect_success 'submodule update --init from and of subdirectory' '
>  	test_i18ncmp expect2 actual2
>  '
>  
> -apos="'";
>  test_expect_success 'submodule update does not fetch already present commits' '
>  	(cd submodule &&
>  	  echo line3 >> file &&
> @@ -168,7 +167,7 @@ test_expect_success 'submodule update does not fetch already present commits' '
>  	) &&
>  	(cd super/submodule &&
>  	  head=$(git rev-parse --verify HEAD) &&
> -	  echo "Submodule path ${apos}submodule$apos: checked out $apos$head$apos" > ../../expected &&
> +	  echo "Submodule path ${SQ}submodule$SQ: checked out $SQ$head$SQ" > ../../expected &&
>  	  git reset --hard HEAD~1
>  	) &&
>  	(cd super &&
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index a9d45642a5..ee602c4d9c 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -509,6 +509,9 @@ EMPTY_BLOB=e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
>  LF='
>  '
>  
> +# Single quote
> +SQ=\'
> +
>  # UTF-8 ZERO WIDTH NON-JOINER, which HFS+ ignores
>  # when case-folding filenames
>  u200c=$(printf '\342\200\214')
> -- 
> 2.23.0.37.g745f681289
> 

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] t: use common $SQ variable
  2019-09-05 22:25  6%           ` Taylor Blau
@ 2019-09-05 22:27  6%             ` Taylor Blau
  0 siblings, 0 replies; 200+ results
From: Taylor Blau @ 2019-09-05 22:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Denton Liu, Git Mailing List, Junio C Hamano, Jeff King

On Thu, Sep 05, 2019 at 06:25:26PM -0400, Taylor Blau wrote:
> On Thu, Sep 05, 2019 at 03:10:05PM -0700, Denton Liu wrote:
> > In many test scripts, there are bespoke definitions of the single quote
> > that are some variation of this:
> >
> >     SQ="'"
> >
> > Define a common $SQ variable in test-lib.sh and replace all usages of
> > these bespoke variables with the common one.
> >
> > This change was done by running `git grep =\"\'\" t/` and
> > `git grep =\\\\\'` and manually changing the resulting definitions and
> > corresponding usages.
> > Signed-off-by: Denton Liu <liu.denton@gmail.com>
> > ---
> >
> > [whoops, forgot to include the mailing list in the last email]
> >
> > Sorry, I wrote this patch up before I saw the email about leaving this
> > as #leftoverbits. No point in letting it go to waste, though.
>
> Thanks for doing this. I marked it as '#leftoverbits' in case anybody
> hosting an Outreachy intern might be interested in having something
> small for a newcomer to dip their toes into sending to the mailing list.

Oh, how silly of me. I was thinking about [1], which I said would be
good #leftoverbits material. This thread was tagged by Junio, not me.
The rest of my point stands, though ;).

> But, there's no shortage of other such tasks, I'd assume, so it's good
> that you cleaned these up.
>
> Both of your 'git grep' invocations look correct to me, so the patch
> below looks like an obviously-correct result. Thanks.
>
> -Taylor
Thanks,
Taylor

[1]: https://public-inbox.org/git/20190904212121.GB20904@syl.local/

^ permalink raw reply	[relevance 6%]

* Re: [PATCH] t: use common $SQ variable
  2019-09-05 22:10  1%         ` [PATCH] t: use common $SQ variable Denton Liu
@ 2019-09-05 22:25  6%           ` Taylor Blau
  2019-09-05 22:27  6%             ` Taylor Blau
  2019-09-06  2:04  0%           ` Denton Liu
  1 sibling, 1 reply; 200+ results
From: Taylor Blau @ 2019-09-05 22:25 UTC (permalink / raw)
  To: Denton Liu; +Cc: Git Mailing List, Junio C Hamano, Jeff King, Taylor Blau

On Thu, Sep 05, 2019 at 03:10:05PM -0700, Denton Liu wrote:
> In many test scripts, there are bespoke definitions of the single quote
> that are some variation of this:
>
>     SQ="'"
>
> Define a common $SQ variable in test-lib.sh and replace all usages of
> these bespoke variables with the common one.
>
> This change was done by running `git grep =\"\'\" t/` and
> `git grep =\\\\\'` and manually changing the resulting definitions and
> corresponding usages.
> Signed-off-by: Denton Liu <liu.denton@gmail.com>
> ---
>
> [whoops, forgot to include the mailing list in the last email]
>
> Sorry, I wrote this patch up before I saw the email about leaving this
> as #leftoverbits. No point in letting it go to waste, though.

Thanks for doing this. I marked it as '#leftoverbits' in case anybody
hosting an Outreachy intern might be interested in having something
small for a newcomer to dip their toes into sending to the mailing list.

But, there's no shortage of other such tasks, I'd assume, so it's good
that you cleaned these up.

Both of your 'git grep' invocations look correct to me, so the patch
below looks like an obviously-correct result. Thanks.

-Taylor

^ permalink raw reply	[relevance 6%]

* [PATCH] t: use common $SQ variable
  2019-09-05 19:34  6%       ` Junio C Hamano
@ 2019-09-05 22:10  1%         ` Denton Liu
  2019-09-05 22:25  6%           ` Taylor Blau
  2019-09-06  2:04  0%           ` Denton Liu
  0 siblings, 2 replies; 200+ results
From: Denton Liu @ 2019-09-05 22:10 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Jeff King, Taylor Blau

In many test scripts, there are bespoke definitions of the single quote
that are some variation of this:

    SQ="'"

Define a common $SQ variable in test-lib.sh and replace all usages of
these bespoke variables with the common one.

This change was done by running `git grep =\"\'\" t/` and
`git grep =\\\\\'` and manually changing the resulting definitions and
corresponding usages.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
---

[whoops, forgot to include the mailing list in the last email]

Sorry, I wrote this patch up before I saw the email about leaving this
as #leftoverbits. No point in letting it go to waste, though.

 t/t1300-config.sh              |  9 +++--
 t/t1404-update-ref-errors.sh   | 64 ++++++++++++++++------------------
 t/t1414-reflog-walk.sh         |  3 +-
 t/t1506-rev-parse-diagnosis.sh |  5 ++-
 t/t1507-rev-parse-upstream.sh  | 12 +++----
 t/t3005-ls-files-relative.sh   |  9 +++--
 t/t3404-rebase-interactive.sh  |  1 -
 t/t3430-rebase-merges.sh       |  1 -
 t/t5601-clone.sh               |  1 -
 t/t7406-submodule-update.sh    |  3 +-
 t/test-lib.sh                  |  3 ++
 11 files changed, 51 insertions(+), 60 deletions(-)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 428177c390..983a0a1583 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -1294,26 +1294,25 @@ test_expect_success 'git -c is not confused by empty environment' '
 	GIT_CONFIG_PARAMETERS="" git -c x.one=1 config --list
 '
 
-sq="'"
 test_expect_success 'detect bogus GIT_CONFIG_PARAMETERS' '
 	cat >expect <<-\EOF &&
 	env.one one
 	env.two two
 	EOF
-	GIT_CONFIG_PARAMETERS="${sq}env.one=one${sq} ${sq}env.two=two${sq}" \
+	GIT_CONFIG_PARAMETERS="${SQ}env.one=one${SQ} ${SQ}env.two=two${SQ}" \
 		git config --get-regexp "env.*" >actual &&
 	test_cmp expect actual &&
 
 	cat >expect <<-EOF &&
-	env.one one${sq}
+	env.one one${SQ}
 	env.two two
 	EOF
-	GIT_CONFIG_PARAMETERS="${sq}env.one=one${sq}\\$sq$sq$sq ${sq}env.two=two${sq}" \
+	GIT_CONFIG_PARAMETERS="${SQ}env.one=one${SQ}\\$SQ$SQ$SQ ${SQ}env.two=two${SQ}" \
 		git config --get-regexp "env.*" >actual &&
 	test_cmp expect actual &&
 
 	test_must_fail env \
-		GIT_CONFIG_PARAMETERS="${sq}env.one=one${sq}\\$sq ${sq}env.two=two${sq}" \
+		GIT_CONFIG_PARAMETERS="${SQ}env.one=one${SQ}\\$SQ ${SQ}env.two=two${SQ}" \
 		git config --get-regexp "env.*"
 '
 
diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
index 970c5c36b9..2d142e5535 100755
--- a/t/t1404-update-ref-errors.sh
+++ b/t/t1404-update-ref-errors.sh
@@ -32,8 +32,6 @@ test_update_rejected () {
 	test_cmp unchanged actual
 }
 
-Q="'"
-
 # Test adding and deleting D/F-conflicting references in a single
 # transaction.
 df_test() {
@@ -93,7 +91,7 @@ df_test() {
 		delname="$delref"
 	fi &&
 	cat >expected-err <<-EOF &&
-	fatal: cannot lock ref $Q$addname$Q: $Q$delref$Q exists; cannot create $Q$addref$Q
+	fatal: cannot lock ref $SQ$addname$SQ: $SQ$delref$SQ exists; cannot create $SQ$addref$SQ
 	EOF
 	$pack &&
 	if $add_del
@@ -123,7 +121,7 @@ test_expect_success 'existing loose ref is a simple prefix of new' '
 
 	prefix=refs/1l &&
 	test_update_rejected "a c e" false "b c/x d" \
-		"$Q$prefix/c$Q exists; cannot create $Q$prefix/c/x$Q"
+		"$SQ$prefix/c$SQ exists; cannot create $SQ$prefix/c/x$SQ"
 
 '
 
@@ -131,7 +129,7 @@ test_expect_success 'existing packed ref is a simple prefix of new' '
 
 	prefix=refs/1p &&
 	test_update_rejected "a c e" true "b c/x d" \
-		"$Q$prefix/c$Q exists; cannot create $Q$prefix/c/x$Q"
+		"$SQ$prefix/c$SQ exists; cannot create $SQ$prefix/c/x$SQ"
 
 '
 
@@ -139,7 +137,7 @@ test_expect_success 'existing loose ref is a deeper prefix of new' '
 
 	prefix=refs/2l &&
 	test_update_rejected "a c e" false "b c/x/y d" \
-		"$Q$prefix/c$Q exists; cannot create $Q$prefix/c/x/y$Q"
+		"$SQ$prefix/c$SQ exists; cannot create $SQ$prefix/c/x/y$SQ"
 
 '
 
@@ -147,7 +145,7 @@ test_expect_success 'existing packed ref is a deeper prefix of new' '
 
 	prefix=refs/2p &&
 	test_update_rejected "a c e" true "b c/x/y d" \
-		"$Q$prefix/c$Q exists; cannot create $Q$prefix/c/x/y$Q"
+		"$SQ$prefix/c$SQ exists; cannot create $SQ$prefix/c/x/y$SQ"
 
 '
 
@@ -155,7 +153,7 @@ test_expect_success 'new ref is a simple prefix of existing loose' '
 
 	prefix=refs/3l &&
 	test_update_rejected "a c/x e" false "b c d" \
-		"$Q$prefix/c/x$Q exists; cannot create $Q$prefix/c$Q"
+		"$SQ$prefix/c/x$SQ exists; cannot create $SQ$prefix/c$SQ"
 
 '
 
@@ -163,7 +161,7 @@ test_expect_success 'new ref is a simple prefix of existing packed' '
 
 	prefix=refs/3p &&
 	test_update_rejected "a c/x e" true "b c d" \
-		"$Q$prefix/c/x$Q exists; cannot create $Q$prefix/c$Q"
+		"$SQ$prefix/c/x$SQ exists; cannot create $SQ$prefix/c$SQ"
 
 '
 
@@ -171,7 +169,7 @@ test_expect_success 'new ref is a deeper prefix of existing loose' '
 
 	prefix=refs/4l &&
 	test_update_rejected "a c/x/y e" false "b c d" \
-		"$Q$prefix/c/x/y$Q exists; cannot create $Q$prefix/c$Q"
+		"$SQ$prefix/c/x/y$SQ exists; cannot create $SQ$prefix/c$SQ"
 
 '
 
@@ -179,7 +177,7 @@ test_expect_success 'new ref is a deeper prefix of existing packed' '
 
 	prefix=refs/4p &&
 	test_update_rejected "a c/x/y e" true "b c d" \
-		"$Q$prefix/c/x/y$Q exists; cannot create $Q$prefix/c$Q"
+		"$SQ$prefix/c/x/y$SQ exists; cannot create $SQ$prefix/c$SQ"
 
 '
 
@@ -187,7 +185,7 @@ test_expect_success 'one new ref is a simple prefix of another' '
 
 	prefix=refs/5 &&
 	test_update_rejected "a e" false "b c c/x d" \
-		"cannot process $Q$prefix/c$Q and $Q$prefix/c/x$Q at the same time"
+		"cannot process $SQ$prefix/c$SQ and $SQ$prefix/c/x$SQ at the same time"
 
 '
 
@@ -334,7 +332,7 @@ test_expect_success 'D/F conflict prevents indirect delete long packed + indirec
 test_expect_success 'missing old value blocks update' '
 	prefix=refs/missing-update &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/foo$Q: unable to resolve reference $Q$prefix/foo$Q
+	fatal: cannot lock ref $SQ$prefix/foo$SQ: unable to resolve reference $SQ$prefix/foo$SQ
 	EOF
 	printf "%s\n" "update $prefix/foo $E $D" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -345,7 +343,7 @@ test_expect_success 'incorrect old value blocks update' '
 	prefix=refs/incorrect-update &&
 	git update-ref $prefix/foo $C &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/foo$Q: is at $C but expected $D
+	fatal: cannot lock ref $SQ$prefix/foo$SQ: is at $C but expected $D
 	EOF
 	printf "%s\n" "update $prefix/foo $E $D" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -356,7 +354,7 @@ test_expect_success 'existing old value blocks create' '
 	prefix=refs/existing-create &&
 	git update-ref $prefix/foo $C &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/foo$Q: reference already exists
+	fatal: cannot lock ref $SQ$prefix/foo$SQ: reference already exists
 	EOF
 	printf "%s\n" "create $prefix/foo $E" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -367,7 +365,7 @@ test_expect_success 'incorrect old value blocks delete' '
 	prefix=refs/incorrect-delete &&
 	git update-ref $prefix/foo $C &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/foo$Q: is at $C but expected $D
+	fatal: cannot lock ref $SQ$prefix/foo$SQ: is at $C but expected $D
 	EOF
 	printf "%s\n" "delete $prefix/foo $D" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -378,7 +376,7 @@ test_expect_success 'missing old value blocks indirect update' '
 	prefix=refs/missing-indirect-update &&
 	git symbolic-ref $prefix/symref $prefix/foo &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/symref$Q: unable to resolve reference $Q$prefix/foo$Q
+	fatal: cannot lock ref $SQ$prefix/symref$SQ: unable to resolve reference $SQ$prefix/foo$SQ
 	EOF
 	printf "%s\n" "update $prefix/symref $E $D" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -390,7 +388,7 @@ test_expect_success 'incorrect old value blocks indirect update' '
 	git symbolic-ref $prefix/symref $prefix/foo &&
 	git update-ref $prefix/foo $C &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/symref$Q: is at $C but expected $D
+	fatal: cannot lock ref $SQ$prefix/symref$SQ: is at $C but expected $D
 	EOF
 	printf "%s\n" "update $prefix/symref $E $D" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -402,7 +400,7 @@ test_expect_success 'existing old value blocks indirect create' '
 	git symbolic-ref $prefix/symref $prefix/foo &&
 	git update-ref $prefix/foo $C &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/symref$Q: reference already exists
+	fatal: cannot lock ref $SQ$prefix/symref$SQ: reference already exists
 	EOF
 	printf "%s\n" "create $prefix/symref $E" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -414,7 +412,7 @@ test_expect_success 'incorrect old value blocks indirect delete' '
 	git symbolic-ref $prefix/symref $prefix/foo &&
 	git update-ref $prefix/foo $C &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/symref$Q: is at $C but expected $D
+	fatal: cannot lock ref $SQ$prefix/symref$SQ: is at $C but expected $D
 	EOF
 	printf "%s\n" "delete $prefix/symref $D" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -425,7 +423,7 @@ test_expect_success 'missing old value blocks indirect no-deref update' '
 	prefix=refs/missing-noderef-update &&
 	git symbolic-ref $prefix/symref $prefix/foo &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/symref$Q: reference is missing but expected $D
+	fatal: cannot lock ref $SQ$prefix/symref$SQ: reference is missing but expected $D
 	EOF
 	printf "%s\n" "option no-deref" "update $prefix/symref $E $D" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -437,7 +435,7 @@ test_expect_success 'incorrect old value blocks indirect no-deref update' '
 	git symbolic-ref $prefix/symref $prefix/foo &&
 	git update-ref $prefix/foo $C &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/symref$Q: is at $C but expected $D
+	fatal: cannot lock ref $SQ$prefix/symref$SQ: is at $C but expected $D
 	EOF
 	printf "%s\n" "option no-deref" "update $prefix/symref $E $D" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -449,7 +447,7 @@ test_expect_success 'existing old value blocks indirect no-deref create' '
 	git symbolic-ref $prefix/symref $prefix/foo &&
 	git update-ref $prefix/foo $C &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/symref$Q: reference already exists
+	fatal: cannot lock ref $SQ$prefix/symref$SQ: reference already exists
 	EOF
 	printf "%s\n" "option no-deref" "create $prefix/symref $E" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -461,7 +459,7 @@ test_expect_success 'incorrect old value blocks indirect no-deref delete' '
 	git symbolic-ref $prefix/symref $prefix/foo &&
 	git update-ref $prefix/foo $C &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/symref$Q: is at $C but expected $D
+	fatal: cannot lock ref $SQ$prefix/symref$SQ: is at $C but expected $D
 	EOF
 	printf "%s\n" "option no-deref" "delete $prefix/symref $D" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -474,13 +472,13 @@ test_expect_success 'non-empty directory blocks create' '
 	: >.git/$prefix/foo/bar/baz.lock &&
 	test_when_finished "rm -f .git/$prefix/foo/bar/baz.lock" &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/foo$Q: there is a non-empty directory $Q.git/$prefix/foo$Q blocking reference $Q$prefix/foo$Q
+	fatal: cannot lock ref $SQ$prefix/foo$SQ: there is a non-empty directory $SQ.git/$prefix/foo$SQ blocking reference $SQ$prefix/foo$SQ
 	EOF
 	printf "%s\n" "update $prefix/foo $C" |
 	test_must_fail git update-ref --stdin 2>output.err &&
 	test_cmp expected output.err &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/foo$Q: unable to resolve reference $Q$prefix/foo$Q
+	fatal: cannot lock ref $SQ$prefix/foo$SQ: unable to resolve reference $SQ$prefix/foo$SQ
 	EOF
 	printf "%s\n" "update $prefix/foo $D $C" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -493,13 +491,13 @@ test_expect_success 'broken reference blocks create' '
 	echo "gobbledigook" >.git/$prefix/foo &&
 	test_when_finished "rm -f .git/$prefix/foo" &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/foo$Q: unable to resolve reference $Q$prefix/foo$Q: reference broken
+	fatal: cannot lock ref $SQ$prefix/foo$SQ: unable to resolve reference $SQ$prefix/foo$SQ: reference broken
 	EOF
 	printf "%s\n" "update $prefix/foo $C" |
 	test_must_fail git update-ref --stdin 2>output.err &&
 	test_cmp expected output.err &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/foo$Q: unable to resolve reference $Q$prefix/foo$Q: reference broken
+	fatal: cannot lock ref $SQ$prefix/foo$SQ: unable to resolve reference $SQ$prefix/foo$SQ: reference broken
 	EOF
 	printf "%s\n" "update $prefix/foo $D $C" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -513,13 +511,13 @@ test_expect_success 'non-empty directory blocks indirect create' '
 	: >.git/$prefix/foo/bar/baz.lock &&
 	test_when_finished "rm -f .git/$prefix/foo/bar/baz.lock" &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/symref$Q: there is a non-empty directory $Q.git/$prefix/foo$Q blocking reference $Q$prefix/foo$Q
+	fatal: cannot lock ref $SQ$prefix/symref$SQ: there is a non-empty directory $SQ.git/$prefix/foo$SQ blocking reference $SQ$prefix/foo$SQ
 	EOF
 	printf "%s\n" "update $prefix/symref $C" |
 	test_must_fail git update-ref --stdin 2>output.err &&
 	test_cmp expected output.err &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/symref$Q: unable to resolve reference $Q$prefix/foo$Q
+	fatal: cannot lock ref $SQ$prefix/symref$SQ: unable to resolve reference $SQ$prefix/foo$SQ
 	EOF
 	printf "%s\n" "update $prefix/symref $D $C" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -532,13 +530,13 @@ test_expect_success 'broken reference blocks indirect create' '
 	echo "gobbledigook" >.git/$prefix/foo &&
 	test_when_finished "rm -f .git/$prefix/foo" &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/symref$Q: unable to resolve reference $Q$prefix/foo$Q: reference broken
+	fatal: cannot lock ref $SQ$prefix/symref$SQ: unable to resolve reference $SQ$prefix/foo$SQ: reference broken
 	EOF
 	printf "%s\n" "update $prefix/symref $C" |
 	test_must_fail git update-ref --stdin 2>output.err &&
 	test_cmp expected output.err &&
 	cat >expected <<-EOF &&
-	fatal: cannot lock ref $Q$prefix/symref$Q: unable to resolve reference $Q$prefix/foo$Q: reference broken
+	fatal: cannot lock ref $SQ$prefix/symref$SQ: unable to resolve reference $SQ$prefix/foo$SQ: reference broken
 	EOF
 	printf "%s\n" "update $prefix/symref $D $C" |
 	test_must_fail git update-ref --stdin 2>output.err &&
@@ -614,7 +612,7 @@ test_expect_success 'delete fails cleanly if packed-refs file is locked' '
 	test_when_finished "rm -f .git/packed-refs.lock" &&
 	test_must_fail git update-ref -d $prefix/foo >out 2>err &&
 	git for-each-ref $prefix >actual &&
-	test_i18ngrep "Unable to create $Q.*packed-refs.lock$Q: " err &&
+	test_i18ngrep "Unable to create $SQ.*packed-refs.lock$SQ: " err &&
 	test_cmp unchanged actual
 '
 
diff --git a/t/t1414-reflog-walk.sh b/t/t1414-reflog-walk.sh
index feb1efd8ff..1181a9fb28 100755
--- a/t/t1414-reflog-walk.sh
+++ b/t/t1414-reflog-walk.sh
@@ -18,10 +18,9 @@ do_walk () {
 	git log -g --format="%gd %gs" "$@"
 }
 
-sq="'"
 test_expect_success 'set up expected reflog' '
 	cat >expect.all <<-EOF
-	HEAD@{0} commit (merge): Merge branch ${sq}master${sq} into side
+	HEAD@{0} commit (merge): Merge branch ${SQ}master${SQ} into side
 	HEAD@{1} commit: three
 	HEAD@{2} checkout: moving from master to side
 	HEAD@{3} commit: two
diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
index 4ee009da66..21a9c8ffb2 100755
--- a/t/t1506-rev-parse-diagnosis.sh
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -8,10 +8,9 @@ exec </dev/null
 
 test_did_you_mean ()
 {
-	sq="'" &&
 	cat >expected <<-EOF &&
-	fatal: Path '$2$3' $4, but not ${5:-$sq$3$sq}.
-	Did you mean '$1:$2$3'${2:+ aka $sq$1:./$3$sq}?
+	fatal: Path '$2$3' $4, but not ${5:-$SQ$3$SQ}.
+	Did you mean '$1:$2$3'${2:+ aka $SQ$1:./$3$SQ}?
 	EOF
 	test_cmp expected error
 }
diff --git a/t/t1507-rev-parse-upstream.sh b/t/t1507-rev-parse-upstream.sh
index fa3e499641..8b4cf8a6e3 100755
--- a/t/t1507-rev-parse-upstream.sh
+++ b/t/t1507-rev-parse-upstream.sh
@@ -28,8 +28,6 @@ test_expect_success 'setup' '
 	)
 '
 
-sq="'"
-
 full_name () {
 	(cd clone &&
 	 git rev-parse --symbolic-full-name "$@")
@@ -129,7 +127,7 @@ test_expect_success 'merge my-side@{u} records the correct name' '
 	git branch -t new my-side@{u} &&
 	git merge -s ours new@{u} &&
 	git show -s --pretty=tformat:%s >actual &&
-	echo "Merge remote-tracking branch ${sq}origin/side${sq}" >expect &&
+	echo "Merge remote-tracking branch ${SQ}origin/side${SQ}" >expect &&
 	test_cmp expect actual
 )
 '
@@ -156,7 +154,7 @@ test_expect_success 'branch@{u} works when tracking a local branch' '
 
 test_expect_success 'branch@{u} error message when no upstream' '
 	cat >expect <<-EOF &&
-	fatal: no upstream configured for branch ${sq}non-tracking${sq}
+	fatal: no upstream configured for branch ${SQ}non-tracking${SQ}
 	EOF
 	error_message non-tracking@{u} &&
 	test_i18ncmp expect error
@@ -164,7 +162,7 @@ test_expect_success 'branch@{u} error message when no upstream' '
 
 test_expect_success '@{u} error message when no upstream' '
 	cat >expect <<-EOF &&
-	fatal: no upstream configured for branch ${sq}master${sq}
+	fatal: no upstream configured for branch ${SQ}master${SQ}
 	EOF
 	test_must_fail git rev-parse --verify @{u} 2>actual &&
 	test_i18ncmp expect actual
@@ -172,7 +170,7 @@ test_expect_success '@{u} error message when no upstream' '
 
 test_expect_success 'branch@{u} error message with misspelt branch' '
 	cat >expect <<-EOF &&
-	fatal: no such branch: ${sq}no-such-branch${sq}
+	fatal: no such branch: ${SQ}no-such-branch${SQ}
 	EOF
 	error_message no-such-branch@{u} &&
 	test_i18ncmp expect error
@@ -189,7 +187,7 @@ test_expect_success '@{u} error message when not on a branch' '
 
 test_expect_success 'branch@{u} error message if upstream branch not fetched' '
 	cat >expect <<-EOF &&
-	fatal: upstream branch ${sq}refs/heads/side${sq} not stored as a remote-tracking branch
+	fatal: upstream branch ${SQ}refs/heads/side${SQ} not stored as a remote-tracking branch
 	EOF
 	error_message bad-upstream@{u} &&
 	test_i18ncmp expect error
diff --git a/t/t3005-ls-files-relative.sh b/t/t3005-ls-files-relative.sh
index 209b4c7cd8..c841f9b454 100755
--- a/t/t3005-ls-files-relative.sh
+++ b/t/t3005-ls-files-relative.sh
@@ -9,7 +9,6 @@ This test runs git ls-files with various relative path arguments.
 
 new_line='
 '
-sq=\'
 
 test_expect_success 'prepare' '
 	: >never-mind-me &&
@@ -44,9 +43,9 @@ test_expect_success 'ls-files -c' '
 		cd top/sub &&
 		for f in ../y*
 		do
-			echo "error: pathspec $sq$f$sq did not match any file(s) known to git"
+			echo "error: pathspec $SQ$f$SQ did not match any file(s) known to git"
 		done >expect.err &&
-		echo "Did you forget to ${sq}git add${sq}?" >>expect.err &&
+		echo "Did you forget to ${SQ}git add${SQ}?" >>expect.err &&
 		ls ../x* >expect.out &&
 		test_must_fail git ls-files -c --error-unmatch ../[xy]* >actual.out 2>actual.err &&
 		test_cmp expect.out actual.out &&
@@ -59,9 +58,9 @@ test_expect_success 'ls-files -o' '
 		cd top/sub &&
 		for f in ../x*
 		do
-			echo "error: pathspec $sq$f$sq did not match any file(s) known to git"
+			echo "error: pathspec $SQ$f$SQ did not match any file(s) known to git"
 		done >expect.err &&
-		echo "Did you forget to ${sq}git add${sq}?" >>expect.err &&
+		echo "Did you forget to ${SQ}git add${SQ}?" >>expect.err &&
 		ls ../y* >expect.out &&
 		test_must_fail git ls-files -o --error-unmatch ../[xy]* >actual.out 2>actual.err &&
 		test_cmp expect.out actual.out &&
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 461dd539ff..9c152b6245 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -1419,7 +1419,6 @@ test_expect_success 'editor saves as CR/LF' '
 	)
 '
 
-SQ="'"
 test_expect_success 'rebase -i --gpg-sign=<key-id>' '
 	test_when_finished "test_might_fail git rebase --abort" &&
 	set_fake_editor &&
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index 7b6c4847ad..11141ac864 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -151,7 +151,6 @@ test_expect_success 'failed `merge -C` writes patch (may be rescheduled, too)' '
 	test_path_is_file .git/rebase-merge/patch
 '
 
-SQ="'"
 test_expect_success 'failed `merge <branch>` does not crash' '
 	test_when_finished "test_might_fail git rebase --abort" &&
 	git checkout conflicting-G &&
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 4a3b901f06..3be025a658 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -434,7 +434,6 @@ test_expect_success 'double quoted plink.exe in GIT_SSH_COMMAND' '
 	expect_ssh "-v -P 123" myhost src
 '
 
-SQ="'"
 test_expect_success 'single quoted plink.exe in GIT_SSH_COMMAND' '
 	copy_ssh_wrapper_as "$TRASH_DIRECTORY/plink.exe" &&
 	GIT_SSH_COMMAND="$SQ$TRASH_DIRECTORY/plink.exe$SQ -v" \
diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index c973278300..df34c994d2 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -158,7 +158,6 @@ test_expect_success 'submodule update --init from and of subdirectory' '
 	test_i18ncmp expect2 actual2
 '
 
-apos="'";
 test_expect_success 'submodule update does not fetch already present commits' '
 	(cd submodule &&
 	  echo line3 >> file &&
@@ -168,7 +167,7 @@ test_expect_success 'submodule update does not fetch already present commits' '
 	) &&
 	(cd super/submodule &&
 	  head=$(git rev-parse --verify HEAD) &&
-	  echo "Submodule path ${apos}submodule$apos: checked out $apos$head$apos" > ../../expected &&
+	  echo "Submodule path ${SQ}submodule$SQ: checked out $SQ$head$SQ" > ../../expected &&
 	  git reset --hard HEAD~1
 	) &&
 	(cd super &&
diff --git a/t/test-lib.sh b/t/test-lib.sh
index a9d45642a5..ee602c4d9c 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -509,6 +509,9 @@ EMPTY_BLOB=e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
 LF='
 '
 
+# Single quote
+SQ=\'
+
 # UTF-8 ZERO WIDTH NON-JOINER, which HFS+ ignores
 # when case-folding filenames
 u200c=$(printf '\342\200\214')
-- 
2.23.0.37.g745f681289


^ permalink raw reply related	[relevance 1%]

* Re: [PATCH] t: use LF variable defined in the test harness
  2019-09-05 18:47  6%     ` Jeff King
@ 2019-09-05 19:34  6%       ` Junio C Hamano
  2019-09-05 22:10  1%         ` [PATCH] t: use common $SQ variable Denton Liu
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2019-09-05 19:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git

Jeff King <peff@peff.net> writes:

> On Thu, Sep 05, 2019 at 11:17:57AM -0700, Junio C Hamano wrote:
>
>> Somebody may want to go clean-up the use of various $sq and $SQ
>> locally defined by giving a unified $SQ in test-lib.sh, by the way.
>
> Maybe good #leftoverbits material, since we may have Outreachy
> applications coming up soon.

OK, then I'd refrain from doing it as a lunchtime hack myself ;-)

 * Find sq=, $sq and ${sq} case insensitively in t/.  If there is
   any use of $SQ that does not want a single quote in it, abort
   the whole thing.  Otherwise proceed.

 * Introduce an assignment SQ=\' in t/test-lib.sh, next to where LF
   is assigned to.  Replace all uses you found in #1 with reference
   to $SQ.

#leftoverbits.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH] t: use LF variable defined in the test harness
  @ 2019-09-05 18:47  6%     ` Jeff King
  2019-09-05 19:34  6%       ` Junio C Hamano
  0 siblings, 1 reply; 200+ results
From: Jeff King @ 2019-09-05 18:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git

On Thu, Sep 05, 2019 at 11:17:57AM -0700, Junio C Hamano wrote:

> Somebody may want to go clean-up the use of various $sq and $SQ
> locally defined by giving a unified $SQ in test-lib.sh, by the way.

Maybe good #leftoverbits material, since we may have Outreachy
applications coming up soon.

-Peff

^ permalink raw reply	[relevance 6%]

* Re: [RFC PATCH 0/1] commit-graph.c: handle corrupt commit trees
  2019-09-04 21:21  6%   ` Taylor Blau
@ 2019-09-05  6:08  0%     ` Jeff King
  2019-09-06 16:48  0%     ` Derrick Stolee
  1 sibling, 0 replies; 200+ results
From: Jeff King @ 2019-09-05  6:08 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Garima Singh, stolee, git

On Wed, Sep 04, 2019 at 05:21:21PM -0400, Taylor Blau wrote:

> > I like the idea of completely bailing if the commit can't be parsed too.
> > Only question: Is there a reason you chose to die() instead of BUG() like
> > the other two places in that function? What is the criteria of choosing one
> > over the other?
> 
> I did not call 'BUG' here because 'BUG' is traditionally used to
> indicate an internal bug, e.g., an unexpected state or some such. On the
> other side of that coin, 'BUG' is _not_ used to indicate repository
> corruption, since that is not an issue in the Git codebase, rather in
> the user's repository.
> 
> Though, to be honest, I've never seen that rule written out explicitly
> (maybe if it were to be written somewhere, it could be stored in
> Documentation/CodingGuidelines?). I think that this is some good
> #leftoverbits material.

That rule matches my understanding. A BUG() should be about asserting
invariants or catching should-not-happen cases, etc. Any time a BUG()
triggers, that is truly a bug in Git, no matter what input got thrown at
it, what syscalls failed, etc, and is worth fixing (even if the only
sensible thing is to die()).

As a side note, we've generally treated segfaults the same way. It
doesn't matter if the files on disk or the program input is garbage, we
should say so and abort the operation cleanly.

-Peff

^ permalink raw reply	[relevance 0%]

* Re: [RFC PATCH 0/1] commit-graph.c: handle corrupt commit trees
  @ 2019-09-04 21:21  6%   ` Taylor Blau
  2019-09-05  6:08  0%     ` Jeff King
  2019-09-06 16:48  0%     ` Derrick Stolee
  0 siblings, 2 replies; 200+ results
From: Taylor Blau @ 2019-09-04 21:21 UTC (permalink / raw)
  To: Garima Singh; +Cc: Taylor Blau, stolee, git, peff

Hi Garima,

On Wed, Sep 04, 2019 at 02:25:55PM -0400, Garima Singh wrote:
>
> On 9/3/2019 10:22 PM, Taylor Blau wrote:
> > Hi,
> >
> > I was running some of the new 'git commit-graph' commands, and noticed
> > that I could consistently get 'git commit-graph write --reachable' to
> > segfault when a commit's root tree is corrupt.
> >
> > I have an extremely-unfinished fix attached as an RFC PATCH below, but I
> > wanted to get a few thoughts on this before sending it out as a non-RFC.
> >
> > In my patch, I simply 'die()' when a commit isn't able to be parsed
> > (i.e., when 'parse_commit_no_graph' returns a non-zero code), but I
> > wanted to see if others thought that this was an OK approach. Some
> > thoughts:
>
> I like the idea of completely bailing if the commit can't be parsed too.
> Only question: Is there a reason you chose to die() instead of BUG() like
> the other two places in that function? What is the criteria of choosing one
> over the other?

I did not call 'BUG' here because 'BUG' is traditionally used to
indicate an internal bug, e.g., an unexpected state or some such. On the
other side of that coin, 'BUG' is _not_ used to indicate repository
corruption, since that is not an issue in the Git codebase, rather in
the user's repository.

Though, to be honest, I've never seen that rule written out explicitly
(maybe if it were to be written somewhere, it could be stored in
Documentation/CodingGuidelines?). I think that this is some good
#leftoverbits material.

> >
> >    * It seems like we could write a commit-graph by placing a "filler"
> >      entry where the broken commit would have gone. I don't see any place
> >      where this is implemented currently, but this seems like a viable
> >      alternative to not writing _any_ commits into the commit-graph.
>
> I would rather we didn't do this cause it will probably kick open the can of
> always watching for that filler when we are working with the commit-graph.
> Or do we already do that today? Maybe @stolee can chime in on what we do in
> cases of shallow clones and other potential gaps in the walk

Yeah, I think that the consensus is that it makes sense to just die
here, which is fine by me.

> -Garima

Thanks,
Taylor

^ permalink raw reply	[relevance 6%]

* Re: Git in Outreachy December 2019?
  @ 2019-09-04 19:41  6% ` Jeff King
  2019-09-08 14:56  0%   ` Pratyush Yadav
  2019-09-23 18:07  0%   ` SZEDER Gábor
  0 siblings, 2 replies; 200+ results
From: Jeff King @ 2019-09-04 19:41 UTC (permalink / raw)
  To: git
  Cc: Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer,
	Matheus Tavares Bernardino

On Tue, Aug 27, 2019 at 01:17:57AM -0400, Jeff King wrote:

> Do we have interested mentors for the next round of Outreachy?
> 
> The deadline for Git to apply to the program is September 5th. The
> deadline for mentors to have submitted project descriptions is September
> 24th. Intern applications would start on October 1st.
> 
> If there are mentors who want to participate, I can handle the project
> application and can start asking around for funding.

Funding is still up in the air, but in the meantime I've tentatively
signed us up (we have until the 24th to have the funding committed).
Next we need mentors to submit projects, as well as first-time
contribution micro-projects.

Project proposals can be made here:

  https://www.outreachy.org/communities/cfp/git/

If you want to know more about the program, there's a mentor FAQ here:

  https://www.outreachy.org/mentor/mentor-faq/

or just ask in this thread.

The project page has a section to point people in the right direction
for first-time contributions. I've left it blank for now, but I think it
makes sense to point one (or both) of:

  - https://git-scm.com/docs/MyFirstContribution

  - https://matheustavares.gitlab.io/posts/first-steps-contributing-to-git

as well as a list of micro-projects (or at least instructions on how to
find #leftoverbits, though we'd definitely have to step up our labeling,
as I do not recall having seen one for a while).

-Peff

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v2 20/23] .gitignore: touch up the entries regarding Visual Studio
  @ 2019-08-28 11:34  6%               ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2019-08-28 11:34 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Philip Oakley, Philip Oakley via GitGitGadget, git,
	Junio C Hamano, Cesar Eduardo Barros

[-- Attachment #1: Type: text/plain, Size: 2543 bytes --]

Hi Gábor,

On Mon, 26 Aug 2019, SZEDER Gábor wrote:

> On Sun, Aug 25, 2019 at 11:21:23PM +0100, Philip Oakley wrote:
> > >>>>diff --git a/.gitignore b/.gitignore
> > >>>>index e096e0a51c..e7bb15d301 100644
> > >>>>--- a/.gitignore
> > >>>>+++ b/.gitignore
> > >>>>@@ -230,6 +230,7 @@
> > >>>>  *.ipdb
> > >>>>  *.dll
> > >>>>  .vs/
> > >>>>-/Debug/
> > >>>>-/Release/
> > >>>>+*.manifest
> > >>>This new line ignores the tracked file 'compat/win32/git.manifest'
> > >>>that was added fairly recently in fe90397604 (mingw: embed a manifest
> > >>>to trick UAC into Doing The Right Thing, 2019-06-27).
> > >>>
> > >>>I wonder whether that's intentional or accidental.
> > >>>
> > >>>I'm inclined to think that it's merely accidental, because, as far as
> > >>>I understand, this is an old-ish patch from times when there wasn't
> > >>>any 'git.manifest' file in tree, and simply noone noticed that in the
> > >>>meantime we got one.  But I have no idea about how a Git build with
> > >>>Visual Studio is supposed to work, so it doesn't really matter what
> > >>>I'm inclined to think :)
> > >>>
> > >>At the time, it was just one of the many non-source files that were
> > >>generated by Visual Studio that cluttered the status list and also could
> > >>accidentally added to the tracked files.
> > >>
> > >>The newly added .manifest file does appear to be there to 'trick' the
> > >>Windows User Access Control (UAC) which otherwise can be an annoyance to
> > >>'regular' users.
> > >Sorry, I'm not sure how to interpret your reply, and can't decide
> > >whether it tries to justify why that tracked file should be ignored,
> > >or explains that ignoring it was accidental.
> > >
> > >Anyway, ignoring that tracked file apparently triggered a nested
> > >worktree-related bug in 'git clean', which can lead to data loss:
> > >
> > >https://public-inbox.org/git/20190825185918.3909-1-szeder.dev@gmail.com/
> > >
> > Basically manifests are a build artefact from Visual Studio [1], so it was
> > just another file to be ignored, from a _source_ control control viewpoint.
>
> I understand that manifest files, in general, are build artifacts.
> But does Visual Studio overwrite the existing
> 'compat/win32/git.manifest' file in particular?  Yes or no? :)

No.

The reason this entry was there: at least _some_ Visual Studio versions
(IIRC), auto-generates `.manifest` files when the project does not have
any. But now we do. So this line's gotta go.

#leftoverbits ?

Ciao,
Dscho

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v3 13/13] format-patch: learn --infer-cover-subject option
  @ 2019-08-23 18:46  6%         ` Philip Oakley
  0 siblings, 0 replies; 200+ results
From: Philip Oakley @ 2019-08-23 18:46 UTC (permalink / raw)
  To: Denton Liu, Junio C Hamano
  Cc: Git Mailing List, Ævar Arnfjörð Bjarmason,
	Eric Sunshine



On 23/08/2019 19:15, Denton Liu wrote:
>> Having said that, I suspect that in the longer term, people would
>> want to see this new behaviour with a bit of tweak become the new
>> default.
>>
>> The "tweak" I suspect is needed is to behave sensibly when "the
>> first line" ends up to be too long a subject.  Whether we make this
>> the new default or keep this optional, the issue exists either way.
> The reason why I chose to make this an "opt-in" option was because there
> currently doesn't exist a standard on how to write branch descriptions
> like there does for commit messages (i.e. subject then body, subject
> less than x characters). However, against best practices, some
> developers like to have really long subjects. As a result, there's no
> "real" way of telling whether the first paragraph is a long subject or a
> short paragraph.
>
> As a result, we should allow the cover subject to be read from the
> branch description only if the developer explicitly chooses this (either
> with `--infer-cover-subject` the config option). This way, we won't have
> to deal with the ambiguity of deciding whether or not the first
> paragraph is truly a subject and stepping on users' toes if we end up
> deciding wrong.
>
> Thoughts?
Perhaps the `--infer-cover-subject` the config option needs to be 
multi-valued to include:
      "subject" (always expect short first lines) or
      "message" (always the long paragraph description, still use 
***Subject Here***),
      with the "true" being used when expecting both as previously 
described.

-- 
Philip

As an aside, for format-patch to learn a --branch-version option that 
creates a branch with the '-vN' suffix to the current branch when the 
-vN option is used would be a useful addition (as long as the formatted 
refs are first parent to the current branch). #todo list #leftoverbits


^ permalink raw reply	[relevance 6%]

* RE: -EXT-Re: Problem with git diff
  2019-07-09 23:13  5% ` Elijah Newren
@ 2019-07-09 23:26  5%   ` McRoberts, John
  2019-07-09 23:29  0%   ` Bryan Turner
  1 sibling, 0 replies; 200+ results
From: McRoberts, John @ 2019-07-09 23:26 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 4823 bytes --]

Actually, I was actually using git log (not git diff... sorry for the mistake) 
because I also mine other information I(dates, author, summary, etc)

'git log  -m --first-parent --pretty=fuller --decorate=short  --name-only 
REL1..REL2'

That being said, I can work with the git diff output to filter out the 
irrelevant info.  I would still be curious about whether this is a valid use 
of git log.

Thanks,

Jack McRoberts
Configuration Management Software Specialist
General Atomics EMS Group
Work Phone: 858-522-8342
16969 Mesamint St, Room 86-1023G,
San Diego, CA  92127

************************************************************************
CONFIDENTIALITY NOTICE: This communication is intended to be confidential to 
the
person(s) to whom it is addressed. If you are not the intended recipient or 
the agent of the
intended recipient or if you are unable to deliver this communication to the 
intended
 recipient, you must not read, use or disseminate this information. If you 
have received this
communication in error, please advise the sender immediately by telephone and 
delete
this message and any attachments without retaining a copy.
**************************************************************************

-----Original Message-----
From: Elijah Newren <newren@gmail.com>
Sent: Tuesday, July 9, 2019 4:13 PM
To: McRoberts, John <John.McRoberts@ga.com>
Cc: git@vger.kernel.org
Subject: -EXT-Re: Problem with git diff

WARNING:  This message is from an external source.  Evaluate the message 
carefully BEFORE clicking on links or opening attachments.

Hi John,

On Tue, Jul 9, 2019 at 3:57 PM McRoberts, John <John.McRoberts@ga.com> wrote:
>
> I am responsible for generating a list of all files changed between
> two successive releases of software. I was using 'git diff' but have
> run into a problem.
>
> Consider the following situation: A development branch comes off of
> commit A and files are changed three times.  A tag (REL1) is placed on
> the third commit.  Then the branch is merged back to master.  At this
> point, master's HEAD is at C (and it remains there).  Two development
> branches are created off of master, the first of which is not
> important here.  In the second one, there are files changed and a tag (REL2) 
> applied.
>
>
> ---------------------[I]
>
> /      {dev branch}
>
> /
>
> /
>              {master branch}
> /
>  [A]
> ---------------------------------[B]------------------------------->[C
> ]
> master <HEAD>
>    \                               filelist 6                           /  \
>     \                                                                  /
> \
>      \                                                                /
> \
>       \                                                              /
> \
>        \                {development branch}                        /
> \
>         \------->[D]----------------------->[E]-------------------[F]
> \------[G]--------[H]
>                                                                  REL1
> REL2
>                 fileset 1               fileset 2           fileset 3
> fileset 4    fileset 5
>
> At this point, I run
>     'git diff  -m --first-parent --pretty=fuller --decorate=short
> --name-only REL2..REL2'

Wow, we really, really need to throw errors and warnings when people use crazy 
range operators with diff.[1][2]  What version of git are you using that 
accepts --decorate=short as an argument to `git diff`?
And why in the world does git diff accept --first-parent or --pretty=fuller?!? 
That's insane for git-diff to swallow that.
(#leftoverbits?)  Also, I think you meant `REL1` one of the two times you 
wrote `REL2`, which makes me suspect you may have done some copy-edit-paste 
and didn't try this actual command.

> I expect to see only filesets 4 and 5 listed.  I also see filesets 1,
> 2 and
> 3 showing up.  This means that the git diff command is showing files
> that,in fact, did not change between the two tags.  By the way, I
> verified with a file by file comparison that under REL2 and REL1, the
> files represented by filesets 1, 2 and 3 had identical contents.

From your description, I assume you actually ran something like
  git diff --name-only REL1..REL2

which compares REL2 to the merge base of REL1 and REL2 (yes, this is totally 
counter-intuitive to a large percentage of the git userbase, but it is well 
documented and hard to change).  Also from your description, what you seem to 
want is
  git diff --name-only REL1 REL2

since you want to compare the two endpoints.  Does that help get what you 
want?

Hope that helps,
Elijah

[1] 
https://public-inbox.org/git/CABPp-BECj___HneAYviE3SB=wU6OTcBi3S=+Un1sP6L4WJ7agA@mail.gmail.com/
[2] 
https://public-inbox.org/git/CABPp-BGg_iSx3QMc-J4Fov97v9NnAtfxZGMrm3WfrGugOThjmA@mail.gmail.com/

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 7316 bytes --]

^ permalink raw reply	[relevance 5%]

* Re: Problem with git diff
  2019-07-09 23:29  0%   ` Bryan Turner
@ 2019-07-09 23:35  0%     ` Elijah Newren
  0 siblings, 0 replies; 200+ results
From: Elijah Newren @ 2019-07-09 23:35 UTC (permalink / raw)
  To: Bryan Turner; +Cc: McRoberts, John, git@vger.kernel.org

On Tue, Jul 9, 2019 at 4:30 PM Bryan Turner <bturner@atlassian.com> wrote:
>
> On Tue, Jul 9, 2019 at 4:13 PM Elijah Newren <newren@gmail.com> wrote:
> >
> > Hi John,
> >
> > On Tue, Jul 9, 2019 at 3:57 PM McRoberts, John <John.McRoberts@ga.com> wrote:
> > >
> > > I am responsible for generating a list of all files changed between two
> > > successive releases of software. I was using 'git diff' but have run into a
> > > problem.
> > >
> > > Consider the following situation: A development branch comes off of commit A
> > > and files are changed three times.  A tag (REL1) is placed on the third
> > > commit.  Then the branch is merged back to master.  At this point, master's
> > > HEAD is at C (and it remains there).  Two development branches are created
> > > off of master, the first of which is not important here.  In the second one,
> > > there are files changed and a tag (REL2) applied.
> > >
> > >
> > > ---------------------[I]
> > >
> > > /      {dev branch}
> > >
> > > /
> > >
> > > /
> > >              {master branch}
> > > /
> > >  [A] ---------------------------------[B]------------------------------->[C]
> > > master <HEAD>
> > >    \                               filelist 6                           /  \
> > >     \                                                                  /
> > > \
> > >      \                                                                /
> > > \
> > >       \                                                              /
> > > \
> > >        \                {development branch}                        /
> > > \
> > >         \------->[D]----------------------->[E]-------------------[F]
> > > \------[G]--------[H]
> > >                                                                  REL1
> > > REL2
> > >                 fileset 1               fileset 2           fileset 3
> > > fileset 4    fileset 5
> > >
> > > At this point, I run
> > >     'git diff  -m --first-parent --pretty=fuller --decorate=short
> > > --name-only REL2..REL2'
> >
> > Wow, we really, really need to throw errors and warnings when people
> > use crazy range operators with diff.[1][2]  What version of git are
> > you using that accepts --decorate=short as an argument to `git diff`?
> > And why in the world does git diff accept --first-parent or
> > --pretty=fuller?!?  That's insane for git-diff to swallow that.
> > (#leftoverbits?)  Also, I think you meant `REL1` one of the two times
> > you wrote `REL2`, which makes me suspect you may have done some
> > copy-edit-paste and didn't try this actual command.
> >
> > > I expect to see only filesets 4 and 5 listed.  I also see filesets 1, 2 and
> > > 3 showing up.  This means that the git diff command is showing files that,in
> > > fact, did not change between the two tags.  By the way, I verified with a
> > > file by file comparison that under REL2 and REL1, the files represented by
> > > filesets 1, 2 and 3 had identical contents.
> >
> > From your description, I assume you actually ran something like
> >   git diff --name-only REL1..REL2
>
> Did you mean REL1...REL2 (3 dots)? 2 dots (REL1..REL2) is identical to
> no dots (REL1 REL2), per the documentation for "git diff":
>
>        git diff [<options>] <commit> <commit> [--] [<path>...]
>
>            This is to view the changes between two arbitrary <commit>.
>
>        git diff [<options>] <commit>..<commit> [--] [<path>...]
>
>            This is synonymous to the previous form. If <commit> on one
> side is omitted, it will have the same effect
>            as using HEAD instead.
>
> (Forgive me if I'm mistaken here!)

Yes, thanks.  In trying to explain how two and three dots behave
contrary to expectation for git diff, I mess up two versus three dots.
That's kind of embarrassing...

> > which compares REL2 to the merge base of REL1 and REL2 (yes, this is
> > totally counter-intuitive to a large percentage of the git userbase,
> > but it is well documented and hard to change).  Also from your
> > description, what you seem to want is
> >   git diff --name-only REL1 REL2

...though at least I was smart enough to suggest something without
dots, which is the only sane way to use git-diff.  ;-)

^ permalink raw reply	[relevance 0%]

* Re: Problem with git diff
  2019-07-09 23:13  5% ` Elijah Newren
  2019-07-09 23:26  5%   ` -EXT-Re: " McRoberts, John
@ 2019-07-09 23:29  0%   ` Bryan Turner
  2019-07-09 23:35  0%     ` Elijah Newren
  1 sibling, 1 reply; 200+ results
From: Bryan Turner @ 2019-07-09 23:29 UTC (permalink / raw)
  To: Elijah Newren; +Cc: McRoberts, John, git@vger.kernel.org

On Tue, Jul 9, 2019 at 4:13 PM Elijah Newren <newren@gmail.com> wrote:
>
> Hi John,
>
> On Tue, Jul 9, 2019 at 3:57 PM McRoberts, John <John.McRoberts@ga.com> wrote:
> >
> > I am responsible for generating a list of all files changed between two
> > successive releases of software. I was using 'git diff' but have run into a
> > problem.
> >
> > Consider the following situation: A development branch comes off of commit A
> > and files are changed three times.  A tag (REL1) is placed on the third
> > commit.  Then the branch is merged back to master.  At this point, master's
> > HEAD is at C (and it remains there).  Two development branches are created
> > off of master, the first of which is not important here.  In the second one,
> > there are files changed and a tag (REL2) applied.
> >
> >
> > ---------------------[I]
> >
> > /      {dev branch}
> >
> > /
> >
> > /
> >              {master branch}
> > /
> >  [A] ---------------------------------[B]------------------------------->[C]
> > master <HEAD>
> >    \                               filelist 6                           /  \
> >     \                                                                  /
> > \
> >      \                                                                /
> > \
> >       \                                                              /
> > \
> >        \                {development branch}                        /
> > \
> >         \------->[D]----------------------->[E]-------------------[F]
> > \------[G]--------[H]
> >                                                                  REL1
> > REL2
> >                 fileset 1               fileset 2           fileset 3
> > fileset 4    fileset 5
> >
> > At this point, I run
> >     'git diff  -m --first-parent --pretty=fuller --decorate=short
> > --name-only REL2..REL2'
>
> Wow, we really, really need to throw errors and warnings when people
> use crazy range operators with diff.[1][2]  What version of git are
> you using that accepts --decorate=short as an argument to `git diff`?
> And why in the world does git diff accept --first-parent or
> --pretty=fuller?!?  That's insane for git-diff to swallow that.
> (#leftoverbits?)  Also, I think you meant `REL1` one of the two times
> you wrote `REL2`, which makes me suspect you may have done some
> copy-edit-paste and didn't try this actual command.
>
> > I expect to see only filesets 4 and 5 listed.  I also see filesets 1, 2 and
> > 3 showing up.  This means that the git diff command is showing files that,in
> > fact, did not change between the two tags.  By the way, I verified with a
> > file by file comparison that under REL2 and REL1, the files represented by
> > filesets 1, 2 and 3 had identical contents.
>
> From your description, I assume you actually ran something like
>   git diff --name-only REL1..REL2

Did you mean REL1...REL2 (3 dots)? 2 dots (REL1..REL2) is identical to
no dots (REL1 REL2), per the documentation for "git diff":

       git diff [<options>] <commit> <commit> [--] [<path>...]

           This is to view the changes between two arbitrary <commit>.

       git diff [<options>] <commit>..<commit> [--] [<path>...]

           This is synonymous to the previous form. If <commit> on one
side is omitted, it will have the same effect
           as using HEAD instead.

(Forgive me if I'm mistaken here!)

>
> which compares REL2 to the merge base of REL1 and REL2 (yes, this is
> totally counter-intuitive to a large percentage of the git userbase,
> but it is well documented and hard to change).  Also from your
> description, what you seem to want is
>   git diff --name-only REL1 REL2
>
> since you want to compare the two endpoints.  Does that help get what you want?
>
> Hope that helps,
> Elijah
>
> [1] https://public-inbox.org/git/CABPp-BECj___HneAYviE3SB=wU6OTcBi3S=+Un1sP6L4WJ7agA@mail.gmail.com/
> [2] https://public-inbox.org/git/CABPp-BGg_iSx3QMc-J4Fov97v9NnAtfxZGMrm3WfrGugOThjmA@mail.gmail.com/

^ permalink raw reply	[relevance 0%]

* Re: Problem with git diff
  @ 2019-07-09 23:13  5% ` Elijah Newren
  2019-07-09 23:26  5%   ` -EXT-Re: " McRoberts, John
  2019-07-09 23:29  0%   ` Bryan Turner
  0 siblings, 2 replies; 200+ results
From: Elijah Newren @ 2019-07-09 23:13 UTC (permalink / raw)
  To: McRoberts, John; +Cc: git@vger.kernel.org

Hi John,

On Tue, Jul 9, 2019 at 3:57 PM McRoberts, John <John.McRoberts@ga.com> wrote:
>
> I am responsible for generating a list of all files changed between two
> successive releases of software. I was using 'git diff' but have run into a
> problem.
>
> Consider the following situation: A development branch comes off of commit A
> and files are changed three times.  A tag (REL1) is placed on the third
> commit.  Then the branch is merged back to master.  At this point, master's
> HEAD is at C (and it remains there).  Two development branches are created
> off of master, the first of which is not important here.  In the second one,
> there are files changed and a tag (REL2) applied.
>
>
> ---------------------[I]
>
> /      {dev branch}
>
> /
>
> /
>              {master branch}
> /
>  [A] ---------------------------------[B]------------------------------->[C]
> master <HEAD>
>    \                               filelist 6                           /  \
>     \                                                                  /
> \
>      \                                                                /
> \
>       \                                                              /
> \
>        \                {development branch}                        /
> \
>         \------->[D]----------------------->[E]-------------------[F]
> \------[G]--------[H]
>                                                                  REL1
> REL2
>                 fileset 1               fileset 2           fileset 3
> fileset 4    fileset 5
>
> At this point, I run
>     'git diff  -m --first-parent --pretty=fuller --decorate=short
> --name-only REL2..REL2'

Wow, we really, really need to throw errors and warnings when people
use crazy range operators with diff.[1][2]  What version of git are
you using that accepts --decorate=short as an argument to `git diff`?
And why in the world does git diff accept --first-parent or
--pretty=fuller?!?  That's insane for git-diff to swallow that.
(#leftoverbits?)  Also, I think you meant `REL1` one of the two times
you wrote `REL2`, which makes me suspect you may have done some
copy-edit-paste and didn't try this actual command.

> I expect to see only filesets 4 and 5 listed.  I also see filesets 1, 2 and
> 3 showing up.  This means that the git diff command is showing files that,in
> fact, did not change between the two tags.  By the way, I verified with a
> file by file comparison that under REL2 and REL1, the files represented by
> filesets 1, 2 and 3 had identical contents.

From your description, I assume you actually ran something like
  git diff --name-only REL1..REL2

which compares REL2 to the merge base of REL1 and REL2 (yes, this is
totally counter-intuitive to a large percentage of the git userbase,
but it is well documented and hard to change).  Also from your
description, what you seem to want is
  git diff --name-only REL1 REL2

since you want to compare the two endpoints.  Does that help get what you want?

Hope that helps,
Elijah

[1] https://public-inbox.org/git/CABPp-BECj___HneAYviE3SB=wU6OTcBi3S=+Un1sP6L4WJ7agA@mail.gmail.com/
[2] https://public-inbox.org/git/CABPp-BGg_iSx3QMc-J4Fov97v9NnAtfxZGMrm3WfrGugOThjmA@mail.gmail.com/

^ permalink raw reply	[relevance 5%]

* Re: [PATCH v3 00/10] grep: move from kwset to optional PCRE v2
  @ 2019-07-02 11:10  5%     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2019-07-02 11:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, git-packagers, gitgitgadget, johannes.schindelin, peff,
	sandals, szeder.dev

On Mon, Jul 01 2019, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> This v3 has a new patch (3/10) that I believe fixes the regression on
>> MinGW Johannes noted in
>> https://public-inbox.org/git/nycvar.QRO.7.76.6.1907011515150.44@tvgsbejvaqbjf.bet/
>>
>> As noted in the updated commit message in 10/10 I believe just
>> skipping this test & documenting this in a commit message is the least
>> amount of suck for now. It's really an existing issue with us doing
>> nothing sensible when the log/grep haystack encoding doesn't match the
>> needle encoding supplied via the command line.
>
> Is that quite the case?  If they do not match, not finding the match
> is the right answer, because we are byte-for-byte matching/searching
> IIUC.
>
>> We swept that under the carpet with the kwset backend, but PCRE v2
>> exposes it.
>
> Is it exposing, or just showing the limitation of the rewritten
> implementation where it cannot do byte-for-byte matching/searching
> as we used to be able to?
>
> Without having a way to know what encoding is used on the command
> line, there is no sensible way to reencode them to match the
> haystack encoding (even when it is known), so "you got to feed the
> strings in the same encoding, as we are going to match/search
> byte-for-byte" is the only sensible way to work, given the design
> space, I would think.
>
> Not that it is all that useful to be able to match/search
> byte-for-byte, of course, so I am OK if we punt with these tests,
> but I'd prefer to see us admit we are punting when we do ;-).

I'm guilty as charged in punting this larger encoding issue. As it
pertains to this patch series it unearths an obscure case I think nobody
cares about in practice, and I'd like to move on with the "remove kwset"
optimization.

But I strongly believe that the new behavior with the PCRE v2
optimization is the only sane thing to do, and to the extent we have
anything left to do (#leftoverbits) it's that we should modify git more
generally (aside from string searching) to do the same thing where
appropriate.

Remember, this only happens if the user has set a UTF-8 locale and thus
promised that they're going to give us UTF-8. We then take that promise
and make e.g. "æ" match "Æ" under --ignore-case.

Just falling back on raw byte matching isn't going to cut it, because
then "æ<invalid utf8>" won't match "Æ<same invalid utf8>" under
--ignore-case, and there's other cases like that with matching word
boundaries & other Unicode gotchas.

The best that can be hoped for at that point is some "loose UTF-8"
mode. I see both perl & GNU grep seem to support that (although I'm sure
it falls apart at some point). GNU grep will also die in the same way
that we now die with --perl-regexp (since it also use PCRE).

I think that's saner, if the user thinks they're feeding us UTF-8 but
they're not I think they'd like to know rather than having the string
matching library fall back.

^ permalink raw reply	[relevance 5%]

* Re: [2.22.0] difftool no longer passes through to git diff if diff.tool is unset
  @ 2019-06-26 18:08  6%   ` Jeff King
  0 siblings, 0 replies; 200+ results
From: Jeff King @ 2019-06-26 18:08 UTC (permalink / raw)
  To: Pugh, Logan; +Cc: git@vger.kernel.org, liu.denton@gmail.com

On Tue, Jun 25, 2019 at 11:09:08PM +0000, Pugh, Logan wrote:

> > Or in your case I suppose even better would just be an
> > option like "--if-not-configured-just-use-regular-diff". Then it would
> > do what you want, without impacting users who do want the interactive
> > setup.
> 
> If such an option was considered I would be in favor of it. Maybe call 
> it "--no-tutorial" or perhaps "--diff-fallback".
> 
> But having fixed my app, I'm content with the status quo too, now.

Yeah, those are definitely better names. :)

I think we're on the same page about a good path forward, then. I don't
plan to work on this myself, but maybe it would be a good #leftoverbits
candidate for somebody wanting to get started on modifying Git.

-Peff

^ permalink raw reply	[relevance 6%]

* Re: [GSoC] Some #leftoverbits for anyone looking for little projects
  2019-05-29  9:38  8% ` Johannes Schindelin
@ 2019-05-29  9:40  8%   ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2019-05-29  9:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 914 bytes --]

Hi,

On Wed, 29 May 2019, Johannes Schindelin wrote:

> On Sat, 17 Mar 2018, Ævar Arnfjörð Bjarmason wrote:
>
> > In lieu of sending a PR to https://git.github.io/SoC-2018-Microprojects/
> > I thought I'd list a few more suggestions, and hopefully others will
> > chime in.
>
> I am in the same camp, and figured that GitGitGadget (which *already*
> augments the Git mailing list-centric workflow via GitHub's convenient UI)
> would make for a fine location for these small left-over bits. So I added
> them to https://github.com/gitgitgadget/git/issues/234 (except the

Of course, I added them to https://github.com/gitgitgadget/git/issues/,
with #234 being the last from this email.

Ciao,
Dscho

> "git-unpack-*" idea, as I think that should be done as a test helper
> instead, and it should be done in the context of a new test case that
> actually needs this).
>
> Ciao,
> Dscho

^ permalink raw reply	[relevance 8%]

* Re: [GSoC] Some #leftoverbits for anyone looking for little projects
    2019-05-20 18:23  6% ` Matheus Tavares
@ 2019-05-29  9:38  8% ` Johannes Schindelin
  2019-05-29  9:40  8%   ` Johannes Schindelin
  1 sibling, 1 reply; 200+ results
From: Johannes Schindelin @ 2019-05-29  9:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 704 bytes --]

Hi Ævar,

On Sat, 17 Mar 2018, Ævar Arnfjörð Bjarmason wrote:

> In lieu of sending a PR to https://git.github.io/SoC-2018-Microprojects/
> I thought I'd list a few more suggestions, and hopefully others will
> chime in.

I am in the same camp, and figured that GitGitGadget (which *already*
augments the Git mailing list-centric workflow via GitHub's convenient UI)
would make for a fine location for these small left-over bits. So I added
them to https://github.com/gitgitgadget/git/issues/234 (except the
"git-unpack-*" idea, as I think that should be done as a test helper
instead, and it should be done in the context of a new test case that
actually needs this).

Ciao,
Dscho

^ permalink raw reply	[relevance 8%]

* Re: [GSoC] Some #leftoverbits for anyone looking for little projects
  2019-05-28 17:37  8%     ` Matheus Tavares Bernardino
@ 2019-05-28 18:16  8%       ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2019-05-28 18:16 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Ævar Arnfjörð Bjarmason, git, Christian Couder,
	Оля Тележная

Hi Matheus,

On Tue, 28 May 2019, Matheus Tavares Bernardino wrote:

> On Tue, May 28, 2019 at 7:37 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Mon, 20 May 2019, Matheus Tavares wrote:
> >
> > > > Give "rebase -i" some option so when you "reword" the patch is
> > > > included in the message.
> > > >
> > > > I keep going to the shell because I have no idea what change I'm
> > > > describing.
> > >
> > > I have the same problem, so I wanted to try solving this. The patch
> > > bellow creates a "rebase.verboseCommit" configuration that includes a
> > > diff when rewording or squashing. I'd appreciate knowing your thoughts
> > > on it.
> > >
> > > As Christian wisely pointed out to me, though, we can also achieve this
> > > behavior by setting "commit.verbose" to true. The only "downside" of it
> > > is that users cannot choose to see the diff only when rebasing.
> >
> > You could of course add an alias like
> >
> >         [alias]
> >                 myrebase = -c commit.verbose=true rebase
>
> Hmm, I didn't know about `alias`. Thanks for the information.
>
> > which *should* work.
> >
> > However, I am actually slightly in favor of your patch because it *does*
> > make it more convenient to have this on during rebases only.
>
> Another option we were discussing is to document that rebase obeys all
> commit.* options, instead of adding the rebase.verboseCommit config.
> Yes, this way we won't be able to toggle diff for rebase only, but I'm
> not sure if that's something users would want to do...

It is rather unintuitive that the `commit.*` options apply to a rebase.
Sure, you could document it. But realistically, how many users will read
it? Yes, I agree, that is a very low percentage.

Also: you yourself mentioned the rather convincing use case of `reword`.

Personally, I never really thought that I'd need `commit.verbose`. But
your report made me think that I could use it *just* for `reword`, too.

So now you already have two active Git contributors wishing for that
feature.

Ciao,
Dscho

^ permalink raw reply	[relevance 8%]

* Re: [GSoC] Some #leftoverbits for anyone looking for little projects
  2019-05-28 10:37  8%   ` Johannes Schindelin
@ 2019-05-28 17:37  8%     ` Matheus Tavares Bernardino
  2019-05-28 18:16  8%       ` Johannes Schindelin
  0 siblings, 1 reply; 200+ results
From: Matheus Tavares Bernardino @ 2019-05-28 17:37 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Ævar Arnfjörð Bjarmason, git, Christian Couder,
	Оля Тележная

On Tue, May 28, 2019 at 7:37 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Matheus,
>
> On Mon, 20 May 2019, Matheus Tavares wrote:
>
> > > Give "rebase -i" some option so when you "reword" the patch is
> > > included in the message.
> > >
> > > I keep going to the shell because I have no idea what change I'm
> > > describing.
> >
> > I have the same problem, so I wanted to try solving this. The patch
> > bellow creates a "rebase.verboseCommit" configuration that includes a
> > diff when rewording or squashing. I'd appreciate knowing your thoughts
> > on it.
> >
> > As Christian wisely pointed out to me, though, we can also achieve this
> > behavior by setting "commit.verbose" to true. The only "downside" of it
> > is that users cannot choose to see the diff only when rebasing.
>
> You could of course add an alias like
>
>         [alias]
>                 myrebase = -c commit.verbose=true rebase

Hmm, I didn't know about `alias`. Thanks for the information.

> which *should* work.
>
> However, I am actually slightly in favor of your patch because it *does*
> make it more convenient to have this on during rebases only.

Another option we were discussing is to document that rebase obeys all
commit.* options, instead of adding the rebase.verboseCommit config.
Yes, this way we won't be able to toggle diff for rebase only, but I'm
not sure if that's something users would want to do...

> Ciao,
> Dscho

^ permalink raw reply	[relevance 8%]

* Re: [GSoC] Some #leftoverbits for anyone looking for little projects
  2019-05-20 18:23  6% ` Matheus Tavares
  2019-05-20 23:49  8%   ` Ævar Arnfjörð Bjarmason
@ 2019-05-28 10:37  8%   ` Johannes Schindelin
  2019-05-28 17:37  8%     ` Matheus Tavares Bernardino
  1 sibling, 1 reply; 200+ results
From: Johannes Schindelin @ 2019-05-28 10:37 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: avarab, git, Christian Couder,
	Оля Тележная

Hi Matheus,

On Mon, 20 May 2019, Matheus Tavares wrote:

> > Give "rebase -i" some option so when you "reword" the patch is
> > included in the message.
> >
> > I keep going to the shell because I have no idea what change I'm
> > describing.
>
> I have the same problem, so I wanted to try solving this. The patch
> bellow creates a "rebase.verboseCommit" configuration that includes a
> diff when rewording or squashing. I'd appreciate knowing your thoughts
> on it.
>
> As Christian wisely pointed out to me, though, we can also achieve this
> behavior by setting "commit.verbose" to true. The only "downside" of it
> is that users cannot choose to see the diff only when rebasing.

You could of course add an alias like

	[alias]
		myrebase = -c commit.verbose=true rebase

which *should* work.

However, I am actually slightly in favor of your patch because it *does*
make it more convenient to have this on during rebases only.

Ciao,
Dscho

^ permalink raw reply	[relevance 8%]

* Re: [RFC/PATCH] refs: tone down the dwimmery in refname_match() for {heads,tags,remotes}/*
  2019-05-27 14:29  5%     ` Ævar Arnfjörð Bjarmason
@ 2019-05-27 15:39  0%       ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2019-05-27 15:39 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Paolo Bonzini, git, Linus Torvalds, Linux List Kernel Mailing,
	Radim Krčmář, KVM list, Michael Haggerty

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> It mostly (and I believe always should) works by looking at whether
> "someref" is a named ref, and e.g. looking at whether it's "master". We
> then see that it lives in "refs/heads/master" locally, and thus
> correspondingly add a "refs/heads/" to your <dst> "tags/foo", making it
> "refs/heads/tags/foo".

Yes.

(I am still not up to speed, so pardon me if I sound nonsense)

> *Or* we take e.g. <some random SHA-1>:master, the <some random...> is
> ambiguous, but we see that "master" unambiguously refers to
> "refs/heads/master" on the remote (so e.g. a refs/tags/master doesn't
> exist). If you had both refs/{heads,tags}/master refs on the remote we'd
> emit:
>
>     error: dst refspec master matches more than one

OK, so you are saying "if the source is unique, try to qualify the
destination to the same hierarchy (i.e. the previous paragraph). If
the source is not a ref (this paragraph), try to find a unique match
with the destination to determine where it should go".  I think that
makes sense.

> (We should improve that error to note what conflicted, #leftoverbits)

OK.

> So your HEAD:tags/for-linus resulted in pushing a HEAD that
> referred to some refs/heads/* to refs/tags/for-linus.  I believe
> that's an unintendedem ergent effect in how we try to apply these
> two rules. We should apply one, not both in combination.

Are you saying that HEAD is locally dereferenced to a branch name
(if you are not detached when pushing), and "if the source is unique
ref" rule is applied first?  That is not how I recall we designed
this dwimmery.  As we know there is no refs/heads/HEAD, it should be
like pushing HEAD^0:tags/for-linus (i.e. it should behave the same
way as pushing "<some random SHA-1>:tags/for-linus"), without "where
is the source?  let's qualify the destination the same way" rule
kicking in.  And because the repeated "Linus, please pull from that
usual tag for this cycle" request is a norm, "does the destination
uniquely exist at the receiving end" should kick in.  IOW, I think
that is quite a deliberate behaviour that is desirable, or atleast
was considered to be desirable when the feature was designed.

>> In my opinion, the bug is that "git request-pull" should warn if the tag
>> is lightweight remotely but not locally, and possibly even vice versa.

Hmm (yes, I realize I am not commenting on what Ævar wrote)...

>>   # create remote lightweight tag and prepare a pull request
>>   git push ../b HEAD:refs/tags/tag1
>>   git request-pull HEAD^ ../b tags/tag1

I do not think lightweight vs annotated should be the issue.  The
tag that the requestor asks to be pulled (from repository ../b)
should be what the requestor has locally when writing the request
(in repository .).  Even if both tags at remote and local are
annotated, we should still warn if they are different objects, no?

Do we run ls-remote or something (or consult remote-trakcing branch)
to see if that is the case in request-pull?
?

^ permalink raw reply	[relevance 0%]

* Re: [RFC/PATCH] refs: tone down the dwimmery in refname_match() for {heads,tags,remotes}/*
  @ 2019-05-27 14:29  5%     ` Ævar Arnfjörð Bjarmason
  2019-05-27 15:39  0%       ` Junio C Hamano
  0 siblings, 1 reply; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2019-05-27 14:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: git, Linus Torvalds, Junio C Hamano, Linux List Kernel Mailing,
	Radim Krčmář, KVM list, Michael Haggerty

On Mon, May 27 2019, Paolo Bonzini wrote:

> On 27/05/19 00:54, Ævar Arnfjörð Bjarmason wrote:
>> This resulted in a case[1] where someone on LKML did:
>>
>>     git push kvm +HEAD:tags/for-linus
>>
>> Which would have created a new "tags/for-linus" branch in their "kvm"
>> repository, except because they happened to have an existing
>> "refs/tags/for-linus" reference we pushed there instead, and replaced
>> an annotated tag with a lightweight tag.
>
> Actually, I would not be surprised even if "git push foo
> someref:tags/foo" _always_ created a lightweight tag (i.e. push to
> refs/tags/foo).

That's not the intention (I think), and not what we document.

It mostly (and I believe always should) works by looking at whether
"someref" is a named ref, and e.g. looking at whether it's "master". We
then see that it lives in "refs/heads/master" locally, and thus
correspondingly add a "refs/heads/" to your <dst> "tags/foo", making it
"refs/heads/tags/foo".

*Or* we take e.g. <some random SHA-1>:master, the <some random...> is
ambiguous, but we see that "master" unambiguously refers to
"refs/heads/master" on the remote (so e.g. a refs/tags/master doesn't
exist). If you had both refs/{heads,tags}/master refs on the remote we'd
emit:

    error: dst refspec master matches more than one

(We should improve that error to note what conflicted, #leftoverbits)

So your HEAD:tags/for-linus resulted in pushing a HEAD that referred to
some refs/heads/* to refs/tags/for-linus. I believe that's an unintendedem
ergent effect in how we try to apply these two rules. We should apply
one, not both in combination.

And as an aside none of these rules have to do with whether the <src> is
a lightweight or annotated tag, and both types live in the refs/tags/*
namespace.

> In my opinion, the bug is that "git request-pull" should warn if the tag
> is lightweight remotely but not locally, and possibly even vice versa.
> Here is a simple testcase:
>
>   # setup "local" repo
>   mkdir -p testdir/a
>   cd testdir/a
>   git init
>   echo a > test
>   git add test
>   git commit -minitial
>
>   # setup "remote" repo
>   git clone --bare . ../b
>
>   # setup "local" tag
>   echo b >> test
>   git commit -msecond test
>   git tag -mtag tag1
>
>   # create remote lightweight tag and prepare a pull request
>   git push ../b HEAD:refs/tags/tag1
>   git request-pull HEAD^ ../b tags/tag1

Yeah, maybe. I don't use git-request-pull. So maybe this is a simple
mitigation for that tool since you supply a <remote> to it already.

I was more interested and surprised by HEAD being implicitly resolved to
refs/tags/* in a way that would be *different* than if you didn't have
an existing tag there, but of course if we errored on that you might
have just done "+HEAD:refs/tags/for-linus" and ended up with the same
thing.

As an aside, in *general* tags, unlike branches, don't have "remote
tracking". That's something we'd eventually want, but we're nowhere near
the refstore and porcelain supporting that.

Thus such a check is hard to support in general, we'd always need a
remote name and a network roundtrip. Otherwise we couldn't do anything
sensible if you have 10 remotes of fellow LKML developers, all of whom
have a "for-linus" tag, which I'm assuming is a common use-case.

But since git-request-pull gets the remote it can (and does) check on
that remote, but seems to satisfied to see that the ref exists somewhere
on that remote.

^ permalink raw reply	[relevance 5%]

* Re: [GSoC] Some #leftoverbits for anyone looking for little projects
  2019-05-20 23:49  8%   ` Ævar Arnfjörð Bjarmason
@ 2019-05-21  4:38  8%     ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 200+ results
From: Matheus Tavares Bernardino @ 2019-05-21  4:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Christian Couder,
	Оля Тележная,
	Johannes Schindelin

On Mon, May 20, 2019 at 8:49 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Mon, May 20 2019, Matheus Tavares wrote:
>
> > Hi, Ævar
> >
> >> Give "rebase -i" some option so when you "reword" the patch is
> >> included in the message.
> >>
> >> I keep going to the shell because I have no idea what change I'm
> >> describing.
> >
> > I have the same problem, so I wanted to try solving this. The patch
> > bellow creates a "rebase.verboseCommit" configuration that includes
> > a diff when rewording or squashing. I'd appreciate knowing your thoughts
> > on it.
> >
> > As Christian wisely pointed out to me, though, we can also achieve this
> > behavior by setting "commit.verbose" to true. The only "downside" of it
> > is that users cannot choose to see the diff only when rebasing. Despite
> > of that, if we decide not to go with this patch, what do you think of
> > adding a "commit.verbose" entry at git-rebase's man page?
>
> Thanks for working on this. I'd somehow missed the addition of the
> commit.verbose option, so the problem I had is 100% solved by it (and
> I've turned it on).
>
> I think it's better to just document it with rebase, perhaps rather than
> mention that option specifically (but that would also be fine) promise
> that we support "commit" options in general.

Indeed, it seems to be the right way to go.

> Do we promise anywhere that interactive rebase is going to run the
> "normal" git-commit command. From a quick skimming of the docs it
> doesn't seem so, perhaps we should explicitly promise that, and then
> test for it if we don't (e.g. by stealing the tests you added).

Ok, sounds good to me. In order to avoid duplicate tests, is it OK to
assume that if one commit configuration is being respected by rebase,
then all will be?  Or should a patch adding such a promise include
rebase tests for all commit.* configurations?

^ permalink raw reply	[relevance 8%]

* Re: [GSoC] Some #leftoverbits for anyone looking for little projects
  2019-05-20 18:23  6% ` Matheus Tavares
@ 2019-05-20 23:49  8%   ` Ævar Arnfjörð Bjarmason
  2019-05-21  4:38  8%     ` Matheus Tavares Bernardino
  2019-05-28 10:37  8%   ` Johannes Schindelin
  1 sibling, 1 reply; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2019-05-20 23:49 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Christian Couder,
	Оля Тележная,
	Johannes Schindelin


On Mon, May 20 2019, Matheus Tavares wrote:

> Hi, Ævar
>
>> Give "rebase -i" some option so when you "reword" the patch is
>> included in the message.
>>
>> I keep going to the shell because I have no idea what change I'm
>> describing.
>
> I have the same problem, so I wanted to try solving this. The patch
> bellow creates a "rebase.verboseCommit" configuration that includes
> a diff when rewording or squashing. I'd appreciate knowing your thoughts
> on it.
>
> As Christian wisely pointed out to me, though, we can also achieve this
> behavior by setting "commit.verbose" to true. The only "downside" of it
> is that users cannot choose to see the diff only when rebasing. Despite
> of that, if we decide not to go with this patch, what do you think of
> adding a "commit.verbose" entry at git-rebase's man page?

Thanks for working on this. I'd somehow missed the addition of the
commit.verbose option, so the problem I had is 100% solved by it (and
I've turned it on).

I think it's better to just document it with rebase, perhaps rather than
mention that option specifically (but that would also be fine) promise
that we support "commit" options in general.

Do we promise anywhere that interactive rebase is going to run the
"normal" git-commit command. From a quick skimming of the docs it
doesn't seem so, perhaps we should explicitly promise that, and then
test for it if we don't (e.g. by stealing the tests you added).

Aside from that, if this patch is kept I see commit.verbose is a
bool-or-int option, but yours is maybe-bool, so there's no way with
rebase.verboseCommit to turn on the higher level of verbosity. Perhaps
if this option is kept some implementation that just grabs whatever "X"
rebase.verboseCommit=X is set to and passes it as commit.verbase=X down
to git-commit is better, letting it deal with the validation?

> diff --git a/Documentation/config/rebase.txt b/Documentation/config/rebase.txt
> index d98e32d812..ae50b3e05d 100644
> --- a/Documentation/config/rebase.txt
> +++ b/Documentation/config/rebase.txt
> @@ -62,3 +62,8 @@ rebase.rescheduleFailedExec::
>  	Automatically reschedule `exec` commands that failed. This only makes
>  	sense in interactive mode (or when an `--exec` option was provided).
>  	This is the same as specifying the `--reschedule-failed-exec` option.
> +
> +rebase.verboseCommit::
> +	When rewording or squashing commits, during an interactive rebase, show
> +	the commits' diff to help describe the modifications they bring. False
> +	by default.
> diff --git a/sequencer.c b/sequencer.c
> index f88a97fb10..1596fc4cd0 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -914,6 +914,7 @@ N_("you have staged changes in your working tree\n"
>  #define CLEANUP_MSG (1<<3)
>  #define VERIFY_MSG  (1<<4)
>  #define CREATE_ROOT_COMMIT (1<<5)
> +#define VERBOSE_COMMIT (1<<6)
>
>  static int run_command_silent_on_success(struct child_process *cmd)
>  {
> @@ -1007,6 +1008,8 @@ static int run_git_commit(struct repository *r,
>  		argv_array_push(&cmd.args, "-n");
>  	if ((flags & AMEND_MSG))
>  		argv_array_push(&cmd.args, "--amend");
> +	if ((flags & VERBOSE_COMMIT))
> +		argv_array_push(&cmd.args, "-v");
>  	if (opts->gpg_sign)
>  		argv_array_pushf(&cmd.args, "-S%s", opts->gpg_sign);
>  	if (defmsg)
> @@ -1782,7 +1785,7 @@ static int do_pick_commit(struct repository *r,
>  	char *author = NULL;
>  	struct commit_message msg = { NULL, NULL, NULL, NULL };
>  	struct strbuf msgbuf = STRBUF_INIT;
> -	int res, unborn = 0, allow;
> +	int res, unborn = 0, allow, verbose_commit = 0;
>
>  	if (opts->no_commit) {
>  		/*
> @@ -1843,6 +1846,9 @@ static int do_pick_commit(struct repository *r,
>  		return error(_("cannot get commit message for %s"),
>  			oid_to_hex(&commit->object.oid));
>
> +	if (git_config_get_maybe_bool("rebase.verbosecommit", &verbose_commit) < 0)
> +		warning("Invalid value for rebase.verboseCommit. Using 'false' instead.");
> +
>  	if (opts->allow_ff && !is_fixup(command) &&
>  	    ((parent && oideq(&parent->object.oid, &head)) ||
>  	     (!parent && unborn))) {
> @@ -1853,6 +1859,8 @@ static int do_pick_commit(struct repository *r,
>  		if (res || command != TODO_REWORD)
>  			goto leave;
>  		flags |= EDIT_MSG | AMEND_MSG | VERIFY_MSG;
> +		if (verbose_commit)
> +			flags |= VERBOSE_COMMIT;
>  		msg_file = NULL;
>  		goto fast_forward_edit;
>  	}
> @@ -1909,12 +1917,17 @@ static int do_pick_commit(struct repository *r,
>  			author = get_author(msg.message);
>  	}
>
> -	if (command == TODO_REWORD)
> +	if (command == TODO_REWORD) {
>  		flags |= EDIT_MSG | VERIFY_MSG;
> +		if (verbose_commit)
> +			flags |= VERBOSE_COMMIT;
> +	}
>  	else if (is_fixup(command)) {
>  		if (update_squash_messages(r, command, commit, opts))
>  			return -1;
>  		flags |= AMEND_MSG;
> +		if (verbose_commit)
> +			flags |= VERBOSE_COMMIT;
>  		if (!final_fixup)
>  			msg_file = rebase_path_squash_msg();
>  		else if (file_exists(rebase_path_fixup_msg())) {
> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index 1723e1a858..9b410d31e2 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh
> @@ -1477,4 +1477,60 @@ test_expect_success 'valid author header when author contains single quote' '
>  	test_cmp expected actual
>  '
>
> +write_script "reword-and-check-for-diff" <<\EOF &&
> +case "$1" in
> +*/git-rebase-todo)
> +	sed s/pick/reword/ "$1" > "$1.tmp"
> +	mv -f "$1.tmp" "$1"
> +	;;
> +*)
> +	grep '^diff --git' "$1" >has-diff
> +	;;
> +esac
> +exit 0
> +EOF
> +
> +test_expect_success 'rebase -i does not show diff by default when rewording' '
> +	rebase_setup_and_clean no-verbose-commit-reword &&
> +	test_set_editor "$PWD/reword-and-check-for-diff" &&
> +	git rebase -i HEAD~1 &&
> +	test_line_count = 0 has-diff
> +'
> +
> +test_expect_success 'rebase -i respects rebase.verboseCommit when rewording' '
> +	rebase_setup_and_clean verbose-commit-reword &&
> +	test_config rebase.verboseCommit true &&
> +	test_set_editor "$PWD/reword-and-check-for-diff" &&
> +	git rebase -i HEAD~1 &&
> +	test_line_count -gt 0 has-diff
> +'
> +
> +write_script "squash-and-check-for-diff" <<\EOF &&
> +case "$1" in
> +*/git-rebase-todo)
> +	sed "s/pick \([0-9a-f]*\) E/squash \1 E/" "$1" > "$1.tmp"
> +	mv -f "$1.tmp" "$1"
> +	;;
> +*)
> +	grep '^diff --git' "$1" >has-diff
> +	;;
> +esac
> +exit 0
> +EOF
> +
> +test_expect_success 'rebase -i does not show diff by default when squashing' '
> +	rebase_setup_and_clean no-verbose-commit-squash &&
> +	test_set_editor "$PWD/squash-and-check-for-diff" &&
> +	git rebase -i HEAD~2 &&
> +	test_line_count = 0 has-diff
> +'
> +
> +test_expect_success 'rebase -i respects rebase.verboseCommit when squashing' '
> +	rebase_setup_and_clean verbose-commit-squash &&
> +	test_config rebase.verboseCommit true &&
> +	test_set_editor "$PWD/squash-and-check-for-diff" &&
> +	git rebase -i HEAD~2 &&
> +	test_line_count -gt 0 has-diff
> +'
> +
>  test_done

^ permalink raw reply	[relevance 8%]

* [PATCH 0/3] hash-object doc: small fixes
@ 2019-05-20 21:53  6% Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2019-05-20 21:53 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Adam Roben, Bryan Larsen,
	Matthias Urlichs, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

Small doc fixes. Maybe trivial enough to land in 2.22, but there's no
rush.

A pair of #leftoverbits I noticed is that we've implemented the
"--stdin-paths" option via unquote_c_style() from day one, so our
current docs lie (and still do with this series) about wanting
\n-delimited files, you can't hash a file called '"foo"' as you'd
expect, you need to pass '"\"foo\""'.

I wonder if we should document this at this point, or just change it
and add a "-z" option. None of our tests fail if I remove this
unquote_c_style() codepath, and it's never been documented, but
someone in the wild may have organically depended on it.

Ævar Arnfjörð Bjarmason (3):
  hash-object doc: stop mentioning git-cvsimport
  hash-object doc: elaborate on -w and --literally promises
  hash-object doc: point to ls-files and rev-parse

 Documentation/git-hash-object.txt | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

-- 
2.21.0.1020.gf2820cf01a

^ permalink raw reply	[relevance 6%]

* Re: [GSoC] Some #leftoverbits for anyone looking for little projects
  @ 2019-05-20 18:23  6% ` Matheus Tavares
  2019-05-20 23:49  8%   ` Ævar Arnfjörð Bjarmason
  2019-05-28 10:37  8%   ` Johannes Schindelin
  2019-05-29  9:38  8% ` Johannes Schindelin
  1 sibling, 2 replies; 200+ results
From: Matheus Tavares @ 2019-05-20 18:23 UTC (permalink / raw)
  To: avarab
  Cc: git, Christian Couder,
	Оля Тележная

Hi, Ævar

> Give "rebase -i" some option so when you "reword" the patch is
> included in the message.
>
> I keep going to the shell because I have no idea what change I'm
> describing.

I have the same problem, so I wanted to try solving this. The patch
bellow creates a "rebase.verboseCommit" configuration that includes
a diff when rewording or squashing. I'd appreciate knowing your thoughts
on it.

As Christian wisely pointed out to me, though, we can also achieve this
behavior by setting "commit.verbose" to true. The only "downside" of it
is that users cannot choose to see the diff only when rebasing. Despite
of that, if we decide not to go with this patch, what do you think of
adding a "commit.verbose" entry at git-rebase's man page? 

diff --git a/Documentation/config/rebase.txt b/Documentation/config/rebase.txt
index d98e32d812..ae50b3e05d 100644
--- a/Documentation/config/rebase.txt
+++ b/Documentation/config/rebase.txt
@@ -62,3 +62,8 @@ rebase.rescheduleFailedExec::
 	Automatically reschedule `exec` commands that failed. This only makes
 	sense in interactive mode (or when an `--exec` option was provided).
 	This is the same as specifying the `--reschedule-failed-exec` option.
+
+rebase.verboseCommit::
+	When rewording or squashing commits, during an interactive rebase, show
+	the commits' diff to help describe the modifications they bring. False
+	by default.
diff --git a/sequencer.c b/sequencer.c
index f88a97fb10..1596fc4cd0 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -914,6 +914,7 @@ N_("you have staged changes in your working tree\n"
 #define CLEANUP_MSG (1<<3)
 #define VERIFY_MSG  (1<<4)
 #define CREATE_ROOT_COMMIT (1<<5)
+#define VERBOSE_COMMIT (1<<6)
 
 static int run_command_silent_on_success(struct child_process *cmd)
 {
@@ -1007,6 +1008,8 @@ static int run_git_commit(struct repository *r,
 		argv_array_push(&cmd.args, "-n");
 	if ((flags & AMEND_MSG))
 		argv_array_push(&cmd.args, "--amend");
+	if ((flags & VERBOSE_COMMIT))
+		argv_array_push(&cmd.args, "-v");
 	if (opts->gpg_sign)
 		argv_array_pushf(&cmd.args, "-S%s", opts->gpg_sign);
 	if (defmsg)
@@ -1782,7 +1785,7 @@ static int do_pick_commit(struct repository *r,
 	char *author = NULL;
 	struct commit_message msg = { NULL, NULL, NULL, NULL };
 	struct strbuf msgbuf = STRBUF_INIT;
-	int res, unborn = 0, allow;
+	int res, unborn = 0, allow, verbose_commit = 0;
 
 	if (opts->no_commit) {
 		/*
@@ -1843,6 +1846,9 @@ static int do_pick_commit(struct repository *r,
 		return error(_("cannot get commit message for %s"),
 			oid_to_hex(&commit->object.oid));
 
+	if (git_config_get_maybe_bool("rebase.verbosecommit", &verbose_commit) < 0)
+		warning("Invalid value for rebase.verboseCommit. Using 'false' instead.");
+
 	if (opts->allow_ff && !is_fixup(command) &&
 	    ((parent && oideq(&parent->object.oid, &head)) ||
 	     (!parent && unborn))) {
@@ -1853,6 +1859,8 @@ static int do_pick_commit(struct repository *r,
 		if (res || command != TODO_REWORD)
 			goto leave;
 		flags |= EDIT_MSG | AMEND_MSG | VERIFY_MSG;
+		if (verbose_commit)
+			flags |= VERBOSE_COMMIT;
 		msg_file = NULL;
 		goto fast_forward_edit;
 	}
@@ -1909,12 +1917,17 @@ static int do_pick_commit(struct repository *r,
 			author = get_author(msg.message);
 	}
 
-	if (command == TODO_REWORD)
+	if (command == TODO_REWORD) {
 		flags |= EDIT_MSG | VERIFY_MSG;
+		if (verbose_commit)
+			flags |= VERBOSE_COMMIT;
+	}
 	else if (is_fixup(command)) {
 		if (update_squash_messages(r, command, commit, opts))
 			return -1;
 		flags |= AMEND_MSG;
+		if (verbose_commit)
+			flags |= VERBOSE_COMMIT;
 		if (!final_fixup)
 			msg_file = rebase_path_squash_msg();
 		else if (file_exists(rebase_path_fixup_msg())) {
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 1723e1a858..9b410d31e2 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -1477,4 +1477,60 @@ test_expect_success 'valid author header when author contains single quote' '
 	test_cmp expected actual
 '
 
+write_script "reword-and-check-for-diff" <<\EOF &&
+case "$1" in
+*/git-rebase-todo)
+	sed s/pick/reword/ "$1" > "$1.tmp"
+	mv -f "$1.tmp" "$1"
+	;;
+*)
+	grep '^diff --git' "$1" >has-diff
+	;;
+esac
+exit 0
+EOF
+
+test_expect_success 'rebase -i does not show diff by default when rewording' '
+	rebase_setup_and_clean no-verbose-commit-reword &&
+	test_set_editor "$PWD/reword-and-check-for-diff" &&
+	git rebase -i HEAD~1 &&
+	test_line_count = 0 has-diff
+'
+
+test_expect_success 'rebase -i respects rebase.verboseCommit when rewording' '
+	rebase_setup_and_clean verbose-commit-reword &&
+	test_config rebase.verboseCommit true &&
+	test_set_editor "$PWD/reword-and-check-for-diff" &&
+	git rebase -i HEAD~1 &&
+	test_line_count -gt 0 has-diff
+'
+
+write_script "squash-and-check-for-diff" <<\EOF &&
+case "$1" in
+*/git-rebase-todo)
+	sed "s/pick \([0-9a-f]*\) E/squash \1 E/" "$1" > "$1.tmp"
+	mv -f "$1.tmp" "$1"
+	;;
+*)
+	grep '^diff --git' "$1" >has-diff
+	;;
+esac
+exit 0
+EOF
+
+test_expect_success 'rebase -i does not show diff by default when squashing' '
+	rebase_setup_and_clean no-verbose-commit-squash &&
+	test_set_editor "$PWD/squash-and-check-for-diff" &&
+	git rebase -i HEAD~2 &&
+	test_line_count = 0 has-diff
+'
+
+test_expect_success 'rebase -i respects rebase.verboseCommit when squashing' '
+	rebase_setup_and_clean verbose-commit-squash &&
+	test_config rebase.verboseCommit true &&
+	test_set_editor "$PWD/squash-and-check-for-diff" &&
+	git rebase -i HEAD~2 &&
+	test_line_count -gt 0 has-diff
+'
+
 test_done
-- 
2.20.1


^ permalink raw reply related	[relevance 6%]

* Re: [PATCH 1/3] transport_anonymize_url(): support retaining username
  2019-05-19  5:10  4% ` [PATCH 1/3] transport_anonymize_url(): support retaining username Jeff King
@ 2019-05-20 16:36  0%   ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2019-05-20 16:36 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Martin Langhoff,
	Git Mailing List

Hi Peff,

On Sun, 19 May 2019, Jeff King wrote:

> When we anonymize URLs to show in messages, we strip out both the
> username and password (if any). But there are also contexts where we
> should strip out the password (to avoid leaking it) but retain the
> username.
>
> Let's generalize transport_anonymize_url() to support both cases. We'll
> give it a new name since the password-only mode isn't really
> "anonymizing", but keep the old name as a synonym to avoid disrupting
> existing callers.
>
> Note that there are actually three places we parse URLs, and this
> functionality _could_ go into any of them:
>
>   - transport_anonymize_url(), which we modify here
>
>   - the urlmatch.c code parses a URL into its constituent parts, from
>     which we could easily remove the elements we want to drop and
>     re-format it as a single URL. But its parsing also normalizes
>     elements (e.g., downcasing hostnames).  This isn't wrong, but it's
>     more friendly if we can leave the rest of the URL untouched.

I have not looked into it at all, but I seem to vaguely remember that the
result of this code might be used to look up `url.<url>.insteadOf`
settings, where the middle part *is* case-sensitive.

>   - credential_form_url() parses a URL and decodes the specific
>     elements, but it's hard to convert it back into a regular URL. It
>     treats "host:port" as a single unit, meaning it needs to be
>     re-encoded specially (since a colon would otherwise end
>     percent-encoded).
>
> Since transport_anonymize_url() seemed closest to what we want here, I
> used that as the base.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I think it would be beneficial to unify these three cases under a single
> parser, but it seemed like too big a rabbit hole for this topic. Of the
> three, the urlmatch one seems the most mature. I think if we could
> simply separate the normalization from the parsing/decoding, the others
> could build on top of it. It might also require some careful thinking
> about how pseudo-urls like ssh "host:path" interact.

In light of what I mentioned above, I am not sure that we should go there
in the first place...

Thanks,
Dscho

> I won't call that a #leftoverbits, because it's more of a feast. :)
>
>  transport.c | 21 ++++++++++++++-------
>  transport.h | 11 ++++++++++-
>  2 files changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/transport.c b/transport.c
> index f1fcd2c4b0..ba61e57295 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -1335,11 +1335,7 @@ int transport_disconnect(struct transport *transport)
>  	return ret;
>  }
>
> -/*
> - * Strip username (and password) from a URL and return
> - * it in a newly allocated string.
> - */
> -char *transport_anonymize_url(const char *url)
> +char *transport_strip_url(const char *url, int strip_user)
>  {
>  	char *scheme_prefix, *anon_part;
>  	size_t anon_len, prefix_len = 0;
> @@ -1348,7 +1344,10 @@ char *transport_anonymize_url(const char *url)
>  	if (url_is_local_not_ssh(url) || !anon_part)
>  		goto literal_copy;
>
> -	anon_len = strlen(++anon_part);
> +	anon_len = strlen(anon_part);
> +	if (strip_user)
> +		anon_part++;
> +
>  	scheme_prefix = strstr(url, "://");
>  	if (!scheme_prefix) {
>  		if (!strchr(anon_part, ':'))
> @@ -1373,7 +1372,15 @@ char *transport_anonymize_url(const char *url)
>  		cp = strchr(scheme_prefix + 3, '/');
>  		if (cp && cp < anon_part)
>  			goto literal_copy;
> -		prefix_len = scheme_prefix - url + 3;
> +
> +		if (strip_user)
> +			prefix_len = scheme_prefix - url + 3;
> +		else {
> +			cp = strchr(scheme_prefix + 3, ':');
> +			if (cp && cp > anon_part)
> +				goto literal_copy; /* username only */
> +			prefix_len = cp - url;
> +		}
>  	}
>  	return xstrfmt("%.*s%.*s", (int)prefix_len, url,
>  		       (int)anon_len, anon_part);
> diff --git a/transport.h b/transport.h
> index 06e06d3d89..6d8c99ac91 100644
> --- a/transport.h
> +++ b/transport.h
> @@ -243,10 +243,19 @@ const struct ref *transport_get_remote_refs(struct transport *transport,
>  int transport_fetch_refs(struct transport *transport, struct ref *refs);
>  void transport_unlock_pack(struct transport *transport);
>  int transport_disconnect(struct transport *transport);
> -char *transport_anonymize_url(const char *url);
>  void transport_take_over(struct transport *transport,
>  			 struct child_process *child);
>
> +/*
> + * Strip password and optionally username from a URL and return
> + * it in a newly allocated string (even if nothing was stripped).
> + */
> +char *transport_strip_url(const char *url, int strip_username);
> +static inline char *transport_anonymize_url(const char *url)
> +{
> +	return transport_strip_url(url, 1);
> +}
> +
>  int transport_connect(struct transport *transport, const char *name,
>  		      const char *exec, int fd[2]);
>
> --
> 2.22.0.rc0.583.g23d90da2b3
>
>

^ permalink raw reply	[relevance 0%]

* [PATCH 1/3] transport_anonymize_url(): support retaining username
  @ 2019-05-19  5:10  4% ` Jeff King
  2019-05-20 16:36  0%   ` Johannes Schindelin
  0 siblings, 1 reply; 200+ results
From: Jeff King @ 2019-05-19  5:10 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Ævar Arnfjörð Bjarmason, Martin Langhoff,
	Git Mailing List

When we anonymize URLs to show in messages, we strip out both the
username and password (if any). But there are also contexts where we
should strip out the password (to avoid leaking it) but retain the
username.

Let's generalize transport_anonymize_url() to support both cases. We'll
give it a new name since the password-only mode isn't really
"anonymizing", but keep the old name as a synonym to avoid disrupting
existing callers.

Note that there are actually three places we parse URLs, and this
functionality _could_ go into any of them:

  - transport_anonymize_url(), which we modify here

  - the urlmatch.c code parses a URL into its constituent parts, from
    which we could easily remove the elements we want to drop and
    re-format it as a single URL. But its parsing also normalizes
    elements (e.g., downcasing hostnames).  This isn't wrong, but it's
    more friendly if we can leave the rest of the URL untouched.

  - credential_form_url() parses a URL and decodes the specific
    elements, but it's hard to convert it back into a regular URL. It
    treats "host:port" as a single unit, meaning it needs to be
    re-encoded specially (since a colon would otherwise end
    percent-encoded).

Since transport_anonymize_url() seemed closest to what we want here, I
used that as the base.

Signed-off-by: Jeff King <peff@peff.net>
---
I think it would be beneficial to unify these three cases under a single
parser, but it seemed like too big a rabbit hole for this topic. Of the
three, the urlmatch one seems the most mature. I think if we could
simply separate the normalization from the parsing/decoding, the others
could build on top of it. It might also require some careful thinking
about how pseudo-urls like ssh "host:path" interact.

I won't call that a #leftoverbits, because it's more of a feast. :)

 transport.c | 21 ++++++++++++++-------
 transport.h | 11 ++++++++++-
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/transport.c b/transport.c
index f1fcd2c4b0..ba61e57295 100644
--- a/transport.c
+++ b/transport.c
@@ -1335,11 +1335,7 @@ int transport_disconnect(struct transport *transport)
 	return ret;
 }
 
-/*
- * Strip username (and password) from a URL and return
- * it in a newly allocated string.
- */
-char *transport_anonymize_url(const char *url)
+char *transport_strip_url(const char *url, int strip_user)
 {
 	char *scheme_prefix, *anon_part;
 	size_t anon_len, prefix_len = 0;
@@ -1348,7 +1344,10 @@ char *transport_anonymize_url(const char *url)
 	if (url_is_local_not_ssh(url) || !anon_part)
 		goto literal_copy;
 
-	anon_len = strlen(++anon_part);
+	anon_len = strlen(anon_part);
+	if (strip_user)
+		anon_part++;
+
 	scheme_prefix = strstr(url, "://");
 	if (!scheme_prefix) {
 		if (!strchr(anon_part, ':'))
@@ -1373,7 +1372,15 @@ char *transport_anonymize_url(const char *url)
 		cp = strchr(scheme_prefix + 3, '/');
 		if (cp && cp < anon_part)
 			goto literal_copy;
-		prefix_len = scheme_prefix - url + 3;
+
+		if (strip_user)
+			prefix_len = scheme_prefix - url + 3;
+		else {
+			cp = strchr(scheme_prefix + 3, ':');
+			if (cp && cp > anon_part)
+				goto literal_copy; /* username only */
+			prefix_len = cp - url;
+		}
 	}
 	return xstrfmt("%.*s%.*s", (int)prefix_len, url,
 		       (int)anon_len, anon_part);
diff --git a/transport.h b/transport.h
index 06e06d3d89..6d8c99ac91 100644
--- a/transport.h
+++ b/transport.h
@@ -243,10 +243,19 @@ const struct ref *transport_get_remote_refs(struct transport *transport,
 int transport_fetch_refs(struct transport *transport, struct ref *refs);
 void transport_unlock_pack(struct transport *transport);
 int transport_disconnect(struct transport *transport);
-char *transport_anonymize_url(const char *url);
 void transport_take_over(struct transport *transport,
 			 struct child_process *child);
 
+/*
+ * Strip password and optionally username from a URL and return
+ * it in a newly allocated string (even if nothing was stripped).
+ */
+char *transport_strip_url(const char *url, int strip_username);
+static inline char *transport_anonymize_url(const char *url)
+{
+	return transport_strip_url(url, 1);
+}
+
 int transport_connect(struct transport *transport, const char *name,
 		      const char *exec, int fd[2]);
 
-- 
2.22.0.rc0.583.g23d90da2b3


^ permalink raw reply related	[relevance 4%]

* Re: [PATCH 1/2] t5616: refactor packfile replacement
  @ 2019-05-15 18:22  6% ` Jonathan Tan
  0 siblings, 0 replies; 200+ results
From: Jonathan Tan @ 2019-05-15 18:22 UTC (permalink / raw)
  To: Johannes.Schindelin; +Cc: jonathantanmy, git

> > +# Converts bytes into their hexadecimal representation. For example,
> > +# "printf 'ab\r\n' | hex_unpack" results in '61620d0a'.
> > +hex_unpack () {
> > +	perl -e '$/ = undef; $input = <>; print unpack("H2" x length($input), $input)'
> > +}
> > +
> > +# Inserts $1 at the start of the string and every 2 characters thereafter.
> > +intersperse () {
> > +	sed 's/\(..\)/'$1'\1/g'
> > +}
> > +
> > +# Create a one-time-sed command to replace the existing packfile with $1.
> > +replace_packfile () {
> > +	# The protocol requires that the packfile be sent in sideband 1, hence
> > +	# the extra \x01 byte at the beginning.
> > +	printf "1,/packfile/!c %04x\\\\x01%s0000" \
> > +		"$(($(wc -c <$1) + 5))" \
> > +		"$(hex_unpack <$1 | intersperse '\\x')" \
> > +		>"$HTTPD_ROOT_PATH/one-time-sed"
> >  }
> 
> Urgh. This is not a problem *this* patch introduces, but why on Earth do
> we have to do complicated computations in shell code using an unholy mix
> of complex sed and Perl invocations, making things fragile and slow? We do
> have such a nice facility is the t/test-tool helper...

This might be a good #leftoverbits. I'm not sure which part you think
needs to be replaced - maybe the thing that goes into one-time-sed?

> The refactoring itself looks correct to me, of course.

Thanks, and thanks for taking a look at this.

^ permalink raw reply	[relevance 6%]

* Re: Resolving deltas dominates clone time
  2019-04-30 18:48  6%               ` Ævar Arnfjörð Bjarmason
@ 2019-04-30 20:33  0%                 ` Jeff King
  0 siblings, 0 replies; 200+ results
From: Jeff King @ 2019-04-30 20:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Duy Nguyen, Martin Fick, Git Mailing List

On Tue, Apr 30, 2019 at 08:48:08PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > So I'd say the right answer is probably either online_cpus() or half
> > that. The latter would be more appropriate for the machines I have, but
> > I'd worry that it would leave performance on the table for non-intel
> > machines.
> 
> It would be a nice #leftoverbits project to do this dynamically at
> runtime, i.e. hook up the throughput code in progress.c to some new
> utility functions where the current code using pthreads would
> occasionally stop and try to find some (local) maximum throughput given
> N threads.
> 
> You could then dynamically save that optimum for next time, or adjust
> threading at runtime every X seconds, e.g. on a server with N=24 cores
> you might want 24 threads if you have one index-pack, but if you have 24
> index-packs you probably don't want each with 24 threads, for a total of
> 576.

Yeah, I touched on that in my response to Martin. I think that would be
nice, but it's complicated enough that I don't think it's a left-over
bit. I'm also not sure how hard it is to change the number of threads
after the initialization.

IIRC, it's a worker pool that just asks for more work. So that's
probably the right moment to say not just "is there more work to do" but
also "does it seem like there's an idle slot on the system for our
thread to take".

-Peff

^ permalink raw reply	[relevance 0%]

* Re: Resolving deltas dominates clone time
  @ 2019-04-30 18:48  6%               ` Ævar Arnfjörð Bjarmason
  2019-04-30 20:33  0%                 ` Jeff King
  0 siblings, 1 reply; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2019-04-30 18:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Duy Nguyen, Martin Fick, Git Mailing List


On Tue, Apr 30 2019, Jeff King wrote:

> On Tue, Apr 23, 2019 at 05:08:40PM +0700, Duy Nguyen wrote:
>
>> On Tue, Apr 23, 2019 at 11:45 AM Jeff King <peff@peff.net> wrote:
>> >
>> > On Mon, Apr 22, 2019 at 09:55:38PM -0400, Jeff King wrote:
>> >
>> > > Here are my p5302 numbers on linux.git, by the way.
>> > >
>> > >   Test                                           jk/p5302-repeat-fix
>> > >   ------------------------------------------------------------------
>> > >   5302.2: index-pack 0 threads                   307.04(303.74+3.30)
>> > >   5302.3: index-pack 1 thread                    309.74(306.13+3.56)
>> > >   5302.4: index-pack 2 threads                   177.89(313.73+3.60)
>> > >   5302.5: index-pack 4 threads                   117.14(344.07+4.29)
>> > >   5302.6: index-pack 8 threads                   112.40(607.12+5.80)
>> > >   5302.7: index-pack default number of threads   135.00(322.03+3.74)
>> > >
>> > > which still imply that "4" is a win over "3" ("8" is slightly better
>> > > still in wall-clock time, but the total CPU rises dramatically; that's
>> > > probably because this is a quad-core with hyperthreading, so by that
>> > > point we're just throttling down the CPUs).
>> >
>> > And here's a similar test run on a 20-core Xeon w/ hyperthreading (I
>> > tweaked the test to keep going after eight threads):
>> >
>> > Test                            HEAD
>> > ----------------------------------------------------
>> > 5302.2: index-pack 1 threads    376.88(364.50+11.52)
>> > 5302.3: index-pack 2 threads    228.13(371.21+17.86)
>> > 5302.4: index-pack 4 threads    151.41(387.06+21.12)
>> > 5302.5: index-pack 8 threads    113.68(413.40+25.80)
>> > 5302.6: index-pack 16 threads   100.60(511.85+37.53)
>> > 5302.7: index-pack 32 threads   94.43(623.82+45.70)
>> > 5302.8: index-pack 40 threads   93.64(702.88+47.61)
>> >
>> > I don't think any of this is _particularly_ relevant to your case, but
>> > it really seems to me that the default of capping at 3 threads is too
>> > low.
>>
>> Looking back at the multithread commit, I think the trend was the same
>> and I capped it because the gain was not proportional to the number of
>> cores we threw at index-pack anymore. I would not be opposed to
>> raising the cap though (or maybe just remove it)
>
> I'm not sure what the right cap would be. I don't think it's static;
> we'd want ~4 threads on the top case, and 10-20 on the bottom one.
>
> It does seem like there's an inflection point in the graph at N/2
> threads. But then maybe that's just because these are hyper-threaded
> machines, so "N/2" is the actual number of physical cores, and the
> inflated CPU times above that are just because we can't turbo-boost
> then, so we're actually clocking slower. Multi-threaded profiling and
> measurement is such a mess. :)
>
> So I'd say the right answer is probably either online_cpus() or half
> that. The latter would be more appropriate for the machines I have, but
> I'd worry that it would leave performance on the table for non-intel
> machines.

It would be a nice #leftoverbits project to do this dynamically at
runtime, i.e. hook up the throughput code in progress.c to some new
utility functions where the current code using pthreads would
occasionally stop and try to find some (local) maximum throughput given
N threads.

You could then dynamically save that optimum for next time, or adjust
threading at runtime every X seconds, e.g. on a server with N=24 cores
you might want 24 threads if you have one index-pack, but if you have 24
index-packs you probably don't want each with 24 threads, for a total of
576.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH/RFC] Makefile: dedup list of files obtained from ls-files
  @ 2019-04-23  1:18  6%     ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2019-04-23  1:18 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Jeff King, git

Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

>> FWIW, after reading your commit message my thoughts immediately turned
>> to "why can't ls-files have a mode that outputs each just once", but
>> then ended up at the same place as your patch: it's not that hard to
>> just de-dup the output.
>
> My immediate thought was "that is simply a bug, no?" :-D
>
> I haven't used 'git ls-files' that much, so it's no great surprise
> that I had not noticed it odd behaviour!

Yup, the real issue is that ls-files uses exactly the same code for
tagged output, output with stage numbers and just plain list of
paths, so as we saw in the motivating use case for this patch,
unmerged paths give us one source of duplication when we are asking
for list of paths without stages.

It also considers, IIRC, deletion is merely one of the forms of
modifications, so asking it to list modified paths and deleted paths
at the same time would give you another source of duplication.

Perhaps not-so-low-hanging fruit miniproject would be to teach
"ls-files" a new "--dedup" option that does two things:

 * When -m and -d are asked at the same time, ignore '-d', because
   '-d' will give duplicates for subsets of what '-m' would show
   anyway; and

 * When neither -s nor -u is given, do not show the same path more
   than once, even the ones with multiple stages.

Perhaps it is safe to leave a #leftoverbits mark for the above, now
that two people in addition to I noticed that the behaviour is less
than ideal.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 2/2] describe doc: remove '7-char' abbreviation reference
  2019-04-07 20:05  5%   ` Ævar Arnfjörð Bjarmason
  2019-04-07 21:04  0%     ` Junio C Hamano
@ 2019-04-07 22:23  0%     ` Philip Oakley
  1 sibling, 0 replies; 200+ results
From: Philip Oakley @ 2019-04-07 22:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Philip Oakley
  Cc: GitList, Linus Torvalds, Jeff King, Junio C Hamano

Hi Ævar

On 07/04/2019 21:05, Ævar Arnfjörð Bjarmason wrote:
> On Sat, Apr 06 2019, Philip Oakley wrote:
>
>> While the minimum is 7-char, the unambiguous length can be longer.
>>
>> Signed-off-by: Philip Oakley <philipoakley@iee.org>
>> ---
>> noticed while looking int the Git-for-Windows patch thicket -
>> was looking for the ~n^m style!
>> ---
>>   Documentation/git-describe.txt | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
>> index ccdc5f83d6..a88f6ae2c6 100644
>> --- a/Documentation/git-describe.txt
>> +++ b/Documentation/git-describe.txt
>> @@ -139,7 +139,7 @@ at the end.
>>
>>   The number of additional commits is the number
>>   of commits which would be displayed by "git log v1.0.4..parent".
>> -The hash suffix is "-g" + 7-char abbreviation for the tip commit
>> +The hash suffix is "-g" + unambiguous abbreviation for the tip commit
>>   of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`).
>>   The "g" prefix stands for "git" and is used to allow describing the version of
>>   a software depending on the SCM the software is managed with. This is useful
> Both the old/new version are subtly wrong. Whether the new one is better
> is another matter.
>
> First, there's more places we mention the now-incorrect 7 characters, at
> least these (one of which you're fixing). Found by grepping for ' 7 '
> and '7.*abbr':
>
>      Documentation/git-branch.txt-181---abbrev=<length>::
>      Documentation/git-branch.txt-182-       Alter the sha1's minimum display length in the output listing.
>      Documentation/git-branch.txt:183:       The default value is 7 and can be overridden by the `core.abbrev`
>      Documentation/git-branch.txt-184-       config option.
>      Documentation/git-describe.txt-65---abbrev=<n>::
>      Documentation/git-describe.txt:66:      Instead of using the default 7 hexadecimal digits as the
>      Documentation/git-describe.txt-67-      abbreviated object name, use <n> digits, or as many digits
>      Documentation/git-ls-tree.txt-93-Object size identified by <object> is given in bytes, and right-justified
>      Documentation/git-ls-tree.txt:94:with minimum width of 7 characters.  Object size is given only for blobs
>      Documentation/git-ls-tree.txt-95-(file) entries; for other entries `-` character is used in place of size.
>      Documentation/gittutorial-2.txt-44-
>      Documentation/gittutorial-2.txt:45:What are the 7 digits of hex that Git responded to the commit with?
>      Documentation/gittutorial-2.txt-46-
>      [...]
>      Documentation/gittutorial-2.txt-52-name), and that the contents of a Git object will never change (since
>      Documentation/gittutorial-2.txt:53:that would change the object's name as well). The 7 char hex strings
>      Documentation/gittutorial-2.txt-54-here are simply the abbreviation of such 40 character long strings.
>
> It was never correct that we'd pick 7 characters, we'd *try* that before
> e6c587c733 ("abbrev: auto size the default abbreviation", 2016-09-30)
> but would pick a longer one if it was unambiguous.
>
> Whereas "unambiguous abbreviation" isn't correct either, and arguably
> less correct. At least 7 is what we *still* pick as a fallback in lieu
> of the auto-sizing, but just "unambiguous abbreviation" implies that in
> a repo with some 10 objects we might show just one character, or that
> we'd post-e6c587c733 pick say 7 characters in a repository where it *is*
> unambiguous but where we've auto-sized to 12.
>
> I've been meaning to follow-up on
> https://public-inbox.org/git/20190204161217.20047-1-avarab@gmail.com/
> where I among other things wanted to just have these instances all say
> "commits will be abbreviated as described in XYZ in linkgit:<something>"
> and summarize what happens there.
>
> I don't mind if this goes in, I mainly wrote this E-Mail as a brain dump
> since it jolted my memory on the topic, and so that I could dig it up
> later & see how I intended to follow-up on those #leftoverbits
I had had a look at most of the other '7 ' references and decided that 
most of those I saw were about the minimum abbreviation settings, but it 
looks like I maybe missed a few - like the gittutorials.

  I was aware that I was being slightly economical with the truth, but 
was just looking for a way of implying 'variable length' and I punted on 
the long explanation as the particular reference was way down the 
document. If anyone has a suggestion for a better phrase I'd be happy 
And I could add it to the tutorials as well).

Philip
(added Junio, given his follow up email, though we still need a term for 
this.)

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 2/2] describe doc: remove '7-char' abbreviation reference
  2019-04-07 20:05  5%   ` Ævar Arnfjörð Bjarmason
@ 2019-04-07 21:04  0%     ` Junio C Hamano
  2019-04-07 22:23  0%     ` Philip Oakley
  1 sibling, 0 replies; 200+ results
From: Junio C Hamano @ 2019-04-07 21:04 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Philip Oakley, GitList, Linus Torvalds, Jeff King

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I've been meaning to follow-up on
> https://public-inbox.org/git/20190204161217.20047-1-avarab@gmail.com/
> where I among other things wanted to just have these instances all say
> "commits will be abbreviated as described in XYZ in linkgit:<something>"
> and summarize what happens there.

Thanks for a pointer.  I do recall that I found that one very
promising.  I can see from
https://public-inbox.org/git/xmqq7eefv02i.fsf@gitster-ct.c.googlers.com/
that I was mostly OK with the end-user facing documentation part,
and also from
https://public-inbox.org/git/xmqq36p3uxq2.fsf@gitster-ct.c.googlers.com/
I was OK with the idea of exposing the computation but found that
the exact command option it was done with was suboptimal; neither of
these was pointing out any incorrectable flaw).

> I don't mind if this goes in, I mainly wrote this E-Mail as a brain dump
> since it jolted my memory on the topic, and so that I could dig it up
> later & see how I intended to follow-up on those #leftoverbits

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 2/2] describe doc: remove '7-char' abbreviation reference
  @ 2019-04-07 20:05  5%   ` Ævar Arnfjörð Bjarmason
  2019-04-07 21:04  0%     ` Junio C Hamano
  2019-04-07 22:23  0%     ` Philip Oakley
  0 siblings, 2 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 20:05 UTC (permalink / raw)
  To: Philip Oakley; +Cc: GitList, Linus Torvalds, Jeff King


On Sat, Apr 06 2019, Philip Oakley wrote:

> While the minimum is 7-char, the unambiguous length can be longer.
>
> Signed-off-by: Philip Oakley <philipoakley@iee.org>
> ---
> noticed while looking int the Git-for-Windows patch thicket -
> was looking for the ~n^m style!
> ---
>  Documentation/git-describe.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
> index ccdc5f83d6..a88f6ae2c6 100644
> --- a/Documentation/git-describe.txt
> +++ b/Documentation/git-describe.txt
> @@ -139,7 +139,7 @@ at the end.
>
>  The number of additional commits is the number
>  of commits which would be displayed by "git log v1.0.4..parent".
> -The hash suffix is "-g" + 7-char abbreviation for the tip commit
> +The hash suffix is "-g" + unambiguous abbreviation for the tip commit
>  of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`).
>  The "g" prefix stands for "git" and is used to allow describing the version of
>  a software depending on the SCM the software is managed with. This is useful

Both the old/new version are subtly wrong. Whether the new one is better
is another matter.

First, there's more places we mention the now-incorrect 7 characters, at
least these (one of which you're fixing). Found by grepping for ' 7 '
and '7.*abbr':

    Documentation/git-branch.txt-181---abbrev=<length>::
    Documentation/git-branch.txt-182-       Alter the sha1's minimum display length in the output listing.
    Documentation/git-branch.txt:183:       The default value is 7 and can be overridden by the `core.abbrev`
    Documentation/git-branch.txt-184-       config option.
    Documentation/git-describe.txt-65---abbrev=<n>::
    Documentation/git-describe.txt:66:      Instead of using the default 7 hexadecimal digits as the
    Documentation/git-describe.txt-67-      abbreviated object name, use <n> digits, or as many digits
    Documentation/git-ls-tree.txt-93-Object size identified by <object> is given in bytes, and right-justified
    Documentation/git-ls-tree.txt:94:with minimum width of 7 characters.  Object size is given only for blobs
    Documentation/git-ls-tree.txt-95-(file) entries; for other entries `-` character is used in place of size.
    Documentation/gittutorial-2.txt-44-
    Documentation/gittutorial-2.txt:45:What are the 7 digits of hex that Git responded to the commit with?
    Documentation/gittutorial-2.txt-46-
    [...]
    Documentation/gittutorial-2.txt-52-name), and that the contents of a Git object will never change (since
    Documentation/gittutorial-2.txt:53:that would change the object's name as well). The 7 char hex strings
    Documentation/gittutorial-2.txt-54-here are simply the abbreviation of such 40 character long strings.

It was never correct that we'd pick 7 characters, we'd *try* that before
e6c587c733 ("abbrev: auto size the default abbreviation", 2016-09-30)
but would pick a longer one if it was unambiguous.

Whereas "unambiguous abbreviation" isn't correct either, and arguably
less correct. At least 7 is what we *still* pick as a fallback in lieu
of the auto-sizing, but just "unambiguous abbreviation" implies that in
a repo with some 10 objects we might show just one character, or that
we'd post-e6c587c733 pick say 7 characters in a repository where it *is*
unambiguous but where we've auto-sized to 12.

I've been meaning to follow-up on
https://public-inbox.org/git/20190204161217.20047-1-avarab@gmail.com/
where I among other things wanted to just have these instances all say
"commits will be abbreviated as described in XYZ in linkgit:<something>"
and summarize what happens there.

I don't mind if this goes in, I mainly wrote this E-Mail as a brain dump
since it jolted my memory on the topic, and so that I could dig it up
later & see how I intended to follow-up on those #leftoverbits

^ permalink raw reply	[relevance 5%]

* Re: Feature request: Add --no-edit to git tag command
  2019-04-04 13:56  0%       ` Robert Dailey
@ 2019-04-04 13:57  0%         ` Robert Dailey
  0 siblings, 0 replies; 200+ results
From: Robert Dailey @ 2019-04-04 13:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Git

On Thu, Apr 4, 2019 at 8:56 AM Robert Dailey <rcdailey.lists@gmail.com> wrote:
>
> On Thu, Apr 4, 2019 at 7:06 AM Jeff King <peff@peff.net> wrote:
> >
> > On Wed, Apr 03, 2019 at 08:26:06PM -0700, Taylor Blau wrote:
> >
> > > Agreed.
> > >
> > > I think that the implement is a little different than "add a --no-edit"
> > > flag, though. 'git tag' already has a OPT_BOOL for '--edit', which means
> > > that '--no-edit' exists, too.
> > >
> > > But, when we look and see how the edit option is passed around, we find
> > > that the check whether or not to launch the editor (again, in
> > > builtin/tag.c within 'create_tag()') is:
> > >
> > >   if (!opt->message_given || opt->use_editor)
> > >
> > > So, it's not that we didn't take '--no-edit', it's that we didn't get a
> > > _message_, so we'll open the editor to get one (even if '--no-edit' was
> > > given).
> >
> > Yeah, I think the fundamental issue with --no-edit is that it is not a
> > tristate, so we cannot tell the difference between --edit, --no-edit,
> > and nothing.
> >
> > I think regardless of the "re-use message bits", we'd want something
> > like:
> >
> > diff --git a/builtin/tag.c b/builtin/tag.c
> > index 02f6bd1279..260adcaa60 100644
> > --- a/builtin/tag.c
> > +++ b/builtin/tag.c
> > @@ -196,7 +196,7 @@ static int build_tag_object(struct strbuf *buf, int sign, struct object_id *resu
> >
> >  struct create_tag_options {
> >         unsigned int message_given:1;
> > -       unsigned int use_editor:1;
> > +       int use_editor;
> >         unsigned int sign;
> >         enum {
> >                 CLEANUP_NONE,
> > @@ -227,7 +227,7 @@ static void create_tag(const struct object_id *object, const char *tag,
> >                     tag,
> >                     git_committer_info(IDENT_STRICT));
> >
> > -       if (!opt->message_given || opt->use_editor) {
> > +       if ((!opt->message_given && opt->use_editor != 0) || opt->use_editor > 0) {
> >                 int fd;
> >
> >                 /* write the template message before editing: */
> > @@ -380,7 +380,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
> >         static struct ref_sorting *sorting = NULL, **sorting_tail = &sorting;
> >         struct ref_format format = REF_FORMAT_INIT;
> >         int icase = 0;
> > -       int edit_flag = 0;
> > +       int edit_flag = -1;
> >         struct option options[] = {
> >                 OPT_CMDMODE('l', "list", &cmdmode, N_("list tag names"), 'l'),
> >                 { OPTION_INTEGER, 'n', NULL, &filter.lines, N_("n"),
> >
> > which even does the right thing with "git tag --no-edit -a foo" (it dies
> > with "fatal: no tag message?"
> >
> > > This makes me think that we should do two things:
> > >
> > >   1. Make !opt->message_give && !opt->use_editor an invalid invocation.
> > >      If I (1) didn't give a message but I did (2) give '--no-edit', I'd
> > >      expect a complaint, not an editor window.
> > >
> > >   2. Then, do what Robert suggests, which is to "make opt->message_given
> > >      true", by re-using the previous tag's message.
> >
> > I think I misunderstood Robert's proposal. I thought it was just about
> > fixing --no-edit, but it's actually about _adding_ (2). Which I think
> > we'd want to do differently. See Junio's reply elsewhere in the thread
> > (and my reply there).
> >
> > > > I think it wouldn't be very hard to implement, either. Maybe a good
> > > > starter project or #leftoverbits for somebody.
> > >
> > > Maybe. I think that it's made a little more complicated by the above,
> > > but it's certainly doable. Maybe good for GSoC?
> >
> > I was thinking it was just the --no-edit fix. :) Even with the "--amend"
> > thing, though, it's probably a little light for a 3-month-long GSoC
> > project. :)
>
> I apologize for the confusion. I'm not fully aware of any per-option
> philosophies in Git, so I may be unaware of the misunderstanding my
> request is causing. Let me attempt to clarify.
>
> My goal as a user is to correct a tag. If I point a tag at the wrong
> commit, I simply want to move that tag to point to another commit. At
> the moment, the only way I know to do this is the -f option, which I
> just treat as a "move" for the tag. I realize that may not be its
> intent in the implementation, but from a user perspective that's the
> end result I get.
>
> So if I treat -f as a "move this tag", I also want to say "reuse the
> existing commit message". So again, in my mind, that means -f
> --no-edit. Which means "I'm moving this tag and I want to keep the
> previous commit message".
>
> I hope this makes more sense. If getting this means not using -f or
> --no-edit at all, and is instead a whole different set of options, I'm
> OK with that as long as the end result is achievable. It's impossible
> to write a script to "move" (-f) a bunch of annotated tags without an
> editor prompting me on each one. So this "--no-edit" addition would
> assist in automation, and also making sure that we simply want to
> correct a tag, but not alter the message.

Sorry I said "commit message" but I meant "annotated tag message".

^ permalink raw reply	[relevance 0%]

* Re: Feature request: Add --no-edit to git tag command
  2019-04-04 12:06  0%     ` Jeff King
@ 2019-04-04 13:56  0%       ` Robert Dailey
  2019-04-04 13:57  0%         ` Robert Dailey
  0 siblings, 1 reply; 200+ results
From: Robert Dailey @ 2019-04-04 13:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Git

On Thu, Apr 4, 2019 at 7:06 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Apr 03, 2019 at 08:26:06PM -0700, Taylor Blau wrote:
>
> > Agreed.
> >
> > I think that the implement is a little different than "add a --no-edit"
> > flag, though. 'git tag' already has a OPT_BOOL for '--edit', which means
> > that '--no-edit' exists, too.
> >
> > But, when we look and see how the edit option is passed around, we find
> > that the check whether or not to launch the editor (again, in
> > builtin/tag.c within 'create_tag()') is:
> >
> >   if (!opt->message_given || opt->use_editor)
> >
> > So, it's not that we didn't take '--no-edit', it's that we didn't get a
> > _message_, so we'll open the editor to get one (even if '--no-edit' was
> > given).
>
> Yeah, I think the fundamental issue with --no-edit is that it is not a
> tristate, so we cannot tell the difference between --edit, --no-edit,
> and nothing.
>
> I think regardless of the "re-use message bits", we'd want something
> like:
>
> diff --git a/builtin/tag.c b/builtin/tag.c
> index 02f6bd1279..260adcaa60 100644
> --- a/builtin/tag.c
> +++ b/builtin/tag.c
> @@ -196,7 +196,7 @@ static int build_tag_object(struct strbuf *buf, int sign, struct object_id *resu
>
>  struct create_tag_options {
>         unsigned int message_given:1;
> -       unsigned int use_editor:1;
> +       int use_editor;
>         unsigned int sign;
>         enum {
>                 CLEANUP_NONE,
> @@ -227,7 +227,7 @@ static void create_tag(const struct object_id *object, const char *tag,
>                     tag,
>                     git_committer_info(IDENT_STRICT));
>
> -       if (!opt->message_given || opt->use_editor) {
> +       if ((!opt->message_given && opt->use_editor != 0) || opt->use_editor > 0) {
>                 int fd;
>
>                 /* write the template message before editing: */
> @@ -380,7 +380,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
>         static struct ref_sorting *sorting = NULL, **sorting_tail = &sorting;
>         struct ref_format format = REF_FORMAT_INIT;
>         int icase = 0;
> -       int edit_flag = 0;
> +       int edit_flag = -1;
>         struct option options[] = {
>                 OPT_CMDMODE('l', "list", &cmdmode, N_("list tag names"), 'l'),
>                 { OPTION_INTEGER, 'n', NULL, &filter.lines, N_("n"),
>
> which even does the right thing with "git tag --no-edit -a foo" (it dies
> with "fatal: no tag message?"
>
> > This makes me think that we should do two things:
> >
> >   1. Make !opt->message_give && !opt->use_editor an invalid invocation.
> >      If I (1) didn't give a message but I did (2) give '--no-edit', I'd
> >      expect a complaint, not an editor window.
> >
> >   2. Then, do what Robert suggests, which is to "make opt->message_given
> >      true", by re-using the previous tag's message.
>
> I think I misunderstood Robert's proposal. I thought it was just about
> fixing --no-edit, but it's actually about _adding_ (2). Which I think
> we'd want to do differently. See Junio's reply elsewhere in the thread
> (and my reply there).
>
> > > I think it wouldn't be very hard to implement, either. Maybe a good
> > > starter project or #leftoverbits for somebody.
> >
> > Maybe. I think that it's made a little more complicated by the above,
> > but it's certainly doable. Maybe good for GSoC?
>
> I was thinking it was just the --no-edit fix. :) Even with the "--amend"
> thing, though, it's probably a little light for a 3-month-long GSoC
> project. :)

I apologize for the confusion. I'm not fully aware of any per-option
philosophies in Git, so I may be unaware of the misunderstanding my
request is causing. Let me attempt to clarify.

My goal as a user is to correct a tag. If I point a tag at the wrong
commit, I simply want to move that tag to point to another commit. At
the moment, the only way I know to do this is the -f option, which I
just treat as a "move" for the tag. I realize that may not be its
intent in the implementation, but from a user perspective that's the
end result I get.

So if I treat -f as a "move this tag", I also want to say "reuse the
existing commit message". So again, in my mind, that means -f
--no-edit. Which means "I'm moving this tag and I want to keep the
previous commit message".

I hope this makes more sense. If getting this means not using -f or
--no-edit at all, and is instead a whole different set of options, I'm
OK with that as long as the end result is achievable. It's impossible
to write a script to "move" (-f) a bunch of annotated tags without an
editor prompting me on each one. So this "--no-edit" addition would
assist in automation, and also making sure that we simply want to
correct a tag, but not alter the message.

^ permalink raw reply	[relevance 0%]

* Re: Feature request: Add --no-edit to git tag command
  2019-04-04  3:26  0%   ` Taylor Blau
@ 2019-04-04 12:06  0%     ` Jeff King
  2019-04-04 13:56  0%       ` Robert Dailey
  0 siblings, 1 reply; 200+ results
From: Jeff King @ 2019-04-04 12:06 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Robert Dailey, Git

On Wed, Apr 03, 2019 at 08:26:06PM -0700, Taylor Blau wrote:

> Agreed.
> 
> I think that the implement is a little different than "add a --no-edit"
> flag, though. 'git tag' already has a OPT_BOOL for '--edit', which means
> that '--no-edit' exists, too.
> 
> But, when we look and see how the edit option is passed around, we find
> that the check whether or not to launch the editor (again, in
> builtin/tag.c within 'create_tag()') is:
> 
>   if (!opt->message_given || opt->use_editor)
> 
> So, it's not that we didn't take '--no-edit', it's that we didn't get a
> _message_, so we'll open the editor to get one (even if '--no-edit' was
> given).

Yeah, I think the fundamental issue with --no-edit is that it is not a
tristate, so we cannot tell the difference between --edit, --no-edit,
and nothing.

I think regardless of the "re-use message bits", we'd want something
like:

diff --git a/builtin/tag.c b/builtin/tag.c
index 02f6bd1279..260adcaa60 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -196,7 +196,7 @@ static int build_tag_object(struct strbuf *buf, int sign, struct object_id *resu
 
 struct create_tag_options {
 	unsigned int message_given:1;
-	unsigned int use_editor:1;
+	int use_editor;
 	unsigned int sign;
 	enum {
 		CLEANUP_NONE,
@@ -227,7 +227,7 @@ static void create_tag(const struct object_id *object, const char *tag,
 		    tag,
 		    git_committer_info(IDENT_STRICT));
 
-	if (!opt->message_given || opt->use_editor) {
+	if ((!opt->message_given && opt->use_editor != 0) || opt->use_editor > 0) {
 		int fd;
 
 		/* write the template message before editing: */
@@ -380,7 +380,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	static struct ref_sorting *sorting = NULL, **sorting_tail = &sorting;
 	struct ref_format format = REF_FORMAT_INIT;
 	int icase = 0;
-	int edit_flag = 0;
+	int edit_flag = -1;
 	struct option options[] = {
 		OPT_CMDMODE('l', "list", &cmdmode, N_("list tag names"), 'l'),
 		{ OPTION_INTEGER, 'n', NULL, &filter.lines, N_("n"),

which even does the right thing with "git tag --no-edit -a foo" (it dies
with "fatal: no tag message?"

> This makes me think that we should do two things:
> 
>   1. Make !opt->message_give && !opt->use_editor an invalid invocation.
>      If I (1) didn't give a message but I did (2) give '--no-edit', I'd
>      expect a complaint, not an editor window.
> 
>   2. Then, do what Robert suggests, which is to "make opt->message_given
>      true", by re-using the previous tag's message.

I think I misunderstood Robert's proposal. I thought it was just about
fixing --no-edit, but it's actually about _adding_ (2). Which I think
we'd want to do differently. See Junio's reply elsewhere in the thread
(and my reply there).

> > I think it wouldn't be very hard to implement, either. Maybe a good
> > starter project or #leftoverbits for somebody.
> 
> Maybe. I think that it's made a little more complicated by the above,
> but it's certainly doable. Maybe good for GSoC?

I was thinking it was just the --no-edit fix. :) Even with the "--amend"
thing, though, it's probably a little light for a 3-month-long GSoC
project. :)

-Peff

^ permalink raw reply related	[relevance 0%]

* Re: Feature request: Add --no-edit to git tag command
  2019-04-04  1:57  6% ` Jeff King
@ 2019-04-04  3:26  0%   ` Taylor Blau
  2019-04-04 12:06  0%     ` Jeff King
  0 siblings, 1 reply; 200+ results
From: Taylor Blau @ 2019-04-04  3:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Robert Dailey, Git

Hi Peff,

On Wed, Apr 03, 2019 at 09:57:44PM -0400, Jeff King wrote:
> On Wed, Apr 03, 2019 at 09:38:02AM -0500, Robert Dailey wrote:
>
> > Similar to git commit, it would be nice to have a --no-edit option for
> > git tag. Use case is when I force-recreate a tag:
> >
> > $ git tag -af 1.0 123abc
> >
> > An editor will be prompted with the previous annotated tag message. I
> > would like to add --no-edit to instruct it to use any previously
> > provided message and without prompting the editor:
> >
> > $ git tag --no-edit -af 1.0 123abc
>
> Yeah, that sounds like a good idea.

Agreed.

I think that the implement is a little different than "add a --no-edit"
flag, though. 'git tag' already has a OPT_BOOL for '--edit', which means
that '--no-edit' exists, too.

But, when we look and see how the edit option is passed around, we find
that the check whether or not to launch the editor (again, in
builtin/tag.c within 'create_tag()') is:

  if (!opt->message_given || opt->use_editor)

So, it's not that we didn't take '--no-edit', it's that we didn't get a
_message_, so we'll open the editor to get one (even if '--no-edit' was
given).

This makes me think that we should do two things:

  1. Make !opt->message_give && !opt->use_editor an invalid invocation.
     If I (1) didn't give a message but I did (2) give '--no-edit', I'd
     expect a complaint, not an editor window.

  2. Then, do what Robert suggests, which is to "make opt->message_given
     true", by re-using the previous tag's message.

> I think it wouldn't be very hard to implement, either. Maybe a good
> starter project or #leftoverbits for somebody.

Maybe. I think that it's made a little more complicated by the above,
but it's certainly doable. Maybe good for GSoC?

> -Peff

Thanks,
Taylor

^ permalink raw reply	[relevance 0%]

* Re: Feature request: Add --no-edit to git tag command
  @ 2019-04-04  1:57  6% ` Jeff King
  2019-04-04  3:26  0%   ` Taylor Blau
  0 siblings, 1 reply; 200+ results
From: Jeff King @ 2019-04-04  1:57 UTC (permalink / raw)
  To: Robert Dailey; +Cc: Git

On Wed, Apr 03, 2019 at 09:38:02AM -0500, Robert Dailey wrote:

> Similar to git commit, it would be nice to have a --no-edit option for
> git tag. Use case is when I force-recreate a tag:
> 
> $ git tag -af 1.0 123abc
> 
> An editor will be prompted with the previous annotated tag message. I
> would like to add --no-edit to instruct it to use any previously
> provided message and without prompting the editor:
> 
> $ git tag --no-edit -af 1.0 123abc

Yeah, that sounds like a good idea. I think it wouldn't be very hard to
implement, either. Maybe a good starter project or #leftoverbits for
somebody.

-Peff

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug
  @ 2019-03-13 22:27  6%       ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2019-03-13 22:27 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason,
	Johannes Schindelin via GitGitGadget, git, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1871 bytes --]

Hi Ævar & Peff,

On Wed, 13 Mar 2019, Jeff King wrote:

> On Wed, Mar 13, 2019 at 05:11:44PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> > > And this is where the loose object cache interferes with this
> > > feature: if *some* loose object was read whose hash shares the same
> > > first two digits with a commit that was not yet created when that
> > > loose object was created, then we fail to find that new commit by
> > > its short name in `get_oid()`, and the interactive rebase fails with
> > > an obscure error message like:
> > >
> > > 	error: invalid line 1: pick 6568fef
> > > 	error: please fix this using 'git rebase --edit-todo'.
> 
> Are we 100% sure this part is necessary? From my understanding of the
> problem, even without any ambiguity get_oid() could fail due to just
> plain not finding the object in question.

Indeed. It could be a typo, for example. Which is why that error message
is so helpful.

> > As a further improvement, is there a good reason for why we wouldn't
> > pass something down to the oid machinery to say "we're only interested
> > in commits". I have a WIP series somewhere to generalize that more, but
> > e.g.  here locally:
> 
> We have get_oid_commit() and get_oid_committish() already. Should rebase
> just be using those? (I think we probably want "commit()", because we do
> not expect a "pick" line to have a tag, for example.

I did think about this while developing this patch series, and decided
against conflating concerns.

And I was totally right to do so! Because I do have an internal ticket
that talks about allowing `reset v2.20.1`, which is a tag, not a commit.

Granted, it is easy to work around: just use `reset v2.20.1^0`, but it is
quite annoying that we do not allow this at the moment: even if we do
allow `get_oid()` to resolve the tag, we don't peel it to the commit.

#leftoverbits

Ciao,
Dscho

^ permalink raw reply	[relevance 6%]

* RE: [BUG] All files in folder are moved when cherry-picking commit that moves fewer files
  2019-02-27 17:31  6%       ` Elijah Newren
@ 2019-02-28  8:16  6%         ` Linus Nilsson
  0 siblings, 0 replies; 200+ results
From: Linus Nilsson @ 2019-02-28  8:16 UTC (permalink / raw)
  To: Elijah Newren, Jeff King; +Cc: Phillip Wood, git@vger.kernel.org

Thanks for the answers. So it seems it's not a bug, but may lead to new merge options. I worked around it anyway, so it was not a real problem.

Med vänlig hälsning
Linus Nilsson

-----Original Message-----
From: Elijah Newren <newren@gmail.com> 
Sent: Wednesday, 27 February 2019 18:32
To: Jeff King <peff@peff.net>
Cc: Phillip Wood <phillip.wood@dunelm.org.uk>; Linus Nilsson <Linus.Nilsson@trimma.se>; git@vger.kernel.org
Subject: Re: [BUG] All files in folder are moved when cherry-picking commit that moves fewer files

On Wed, Feb 27, 2019 at 8:40 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Feb 27, 2019 at 08:02:35AM -0800, Elijah Newren wrote:
>
> > > > I have found what I suspect to be a bug, or at least not 
> > > > desirable behavior in my case. In one branch, I have moved all 
> > > > files in a directory to another directory. The first directory 
> > > > is now empty in this branch (I haven't tested whether this is significant).
> > >
> > > I suspect that because you've moved all the files git thinks the 
> > > directory has been renamed and that's why it moves a/file2 when 
> > > fix is cherry-picked in the example below. I've cc'd Elijah as he 
> > > knows more about how the directory rename detection works.
> >
> > Yes, Phillip is correct.  If the branch you were 
> > merging/cherry-picking still had any files at all in the original 
> > directory, then no directory rename would be detected.  You can read 
> > up more details about how it works at 
> > https://git.kernel.org/pub/scm/git/git.git/tree/Documentation/techni
> > cal/directory-rename-detection.txt
>
> Is there a way to disable it (either by config, or for a single run)? 
> I know there's merge.renames, but it's plausible somebody might want 
> file-level renames but not directory-level ones.
>
> -Peff

Not yet.  Adding such an option, similar in nature to the flags for turning off renaming detection entirely (merge.renames, diff.renames,
-Xno-renames) would probably make sense (I don't see an analogy to -Xrename-threshold=, though).  It might make sense as just an alternate setting of merge.renames or diff.renames, though it's possible that could get confusing with "copy" being an option.
#leftoverbits for someone that wants to figure out what the option names and values should be?

^ permalink raw reply	[relevance 6%]

* Re: [BUG] All files in folder are moved when cherry-picking commit that moves fewer files
  @ 2019-02-27 17:31  6%       ` Elijah Newren
  2019-02-28  8:16  6%         ` Linus Nilsson
  0 siblings, 1 reply; 200+ results
From: Elijah Newren @ 2019-02-27 17:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Phillip Wood, Linus Nilsson, git@vger.kernel.org

On Wed, Feb 27, 2019 at 8:40 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Feb 27, 2019 at 08:02:35AM -0800, Elijah Newren wrote:
>
> > > > I have found what I suspect to be a bug, or at least not desirable
> > > > behavior in my case. In one branch, I have moved all files in a
> > > > directory to another directory. The first directory is now empty
> > > > in this branch (I haven't tested whether this is significant).
> > >
> > > I suspect that because you've moved all the files git thinks the
> > > directory has been renamed and that's why it moves a/file2 when fix is
> > > cherry-picked in the example below. I've cc'd Elijah as he knows more
> > > about how the directory rename detection works.
> >
> > Yes, Phillip is correct.  If the branch you were
> > merging/cherry-picking still had any files at all in the original
> > directory, then no directory rename would be detected.  You can read
> > up more details about how it works at
> > https://git.kernel.org/pub/scm/git/git.git/tree/Documentation/technical/directory-rename-detection.txt
>
> Is there a way to disable it (either by config, or for a single run)? I
> know there's merge.renames, but it's plausible somebody might want
> file-level renames but not directory-level ones.
>
> -Peff

Not yet.  Adding such an option, similar in nature to the flags for
turning off renaming detection entirely (merge.renames, diff.renames,
-Xno-renames) would probably make sense (I don't see an analogy to
-Xrename-threshold=, though).  It might make sense as just an
alternate setting of merge.renames or diff.renames, though it's
possible that could get confusing with "copy" being an option.
#leftoverbits for someone that wants to figure out what the option
names and values should be?

^ permalink raw reply	[relevance 6%]

* Re: Students projects: looking for small and medium project ideas
  @ 2019-02-26 20:14  8%     ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2019-02-26 20:14 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Fabio Aiuto, Matthieu Moy, git

Hi,

On Tue, 26 Feb 2019, Matthieu Moy wrote:

> Fabio Aiuto <polinice83@libero.it> writes:
> 
> > Hi Matthieu and to all developers,
> > I'm Fabio, no more a student and I'm brand new in community
> > development. I joined the git mailing-list about two weeks ago and I'm
> > looking for some first fix or tasks. I apologize myself in advance for
> > my little know of the subject.  Hope to have some useful information to
> > start workin'.
> 
> My advice would be to "scratch your own itch", i.e. find something you
> dislike about Git, and try to improve that. It's hard to find the
> motivation (and time) to contribute in a purely un-interested way, but
> once you start getting the benefits of your own patches in the way _you_
> use Git, it's really rewarding !

There are also occasional bug reports on the Git mailing list, like this
one about `git grep`:
https://public-inbox.org/git/CAGHpTB+fQccqR8SF2_dS3itboKd79238KCFRe4-3PZz6bpr3iQ@mail.gmail.com/T/#u

You can also search the Git mailing list archive for `#leftoverbits`:

https://public-inbox.org/git/?q=%23leftoverbits

(Of course, caveat emptor, some of those #leftoverbits might have been
addressed in the meantime, others might not reproduce for you, yet others
might not be considered bugs or worth fixing...)

Ciao,
Johannes

^ permalink raw reply	[relevance 8%]

* Re: does "git clean" deliberately ignore "core.excludesFile"?
  @ 2019-02-24 14:15  6%           ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2019-02-24 14:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Robert P. J. Day, Git Mailing list

Hi,

On Sat, 23 Feb 2019, Junio C Hamano wrote:

> "Robert P. J. Day" <rpjday@crashcourse.ca> writes:
> 
> > On Sat, 23 Feb 2019, Johannes Schindelin wrote:
> >
> >> Robert, care to come up with an example demonstrating where it does not?
> >
> >   sorry i wasn't clear, all i was pointing out was that "man
> > git-clean" *explicitly* mentioned two locations related to cleaning:
> > ...
> > without additionally *explicitly* mentioning core.excludesFile.
> 
> OK, so together with the homework Dscho did for you and what I wrote
> earlier, I think you have enough information to answer the question
> yourself.
> 
> That is, the code does *not* ignore, and the doc was trying to be
> (overly) exhaustive but because it predates core.excludesFile, after
> the introduction of that configuration, it no longer is exhaustigve
> and has become stale.
> 
> Which would leave a small, easy and low-hanging fruit, I guess ;-).

#leftoverbits

;-)

Thanks,
Dscho

> Thanks.
> 

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v2] merge-options.txt: correct wording of --no-commit option
  2019-02-19 22:31  6%     ` Elijah Newren
@ 2019-02-19 22:38  0%       ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2019-02-19 22:38 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List, Ulrich Windl

Elijah Newren <newren@gmail.com> writes:

>>   $ git checkout master^0
>>   $ git merge --no-commit next
>>   warning: defaulting to --no-ff, given a --no-commit request
>>   Automatic merge went well; stopped before committing as requested
>>   hint: if you'd rather have a fast-forward without creating a commit,
>>   hint: do "git reset --keep next" now.
>
> Good points.  I thought of this last one before sending, though
> without pre- and post- warnings/hints; without such text it definitely
> seemed too magical and possibly leading to unexpected surprises in a
> different direction, so I dismissed it without further thought.  But
> the warnings/hints help.
>
>> I do not have a strong preference among three (the third option
>> being not doing anything), but if pressed, I'd say that the last one
>> might be the most user-friendly, even though it feels a bit too
>> magical and trying to be smarter than its own good.
>
> I also lack a strong preference.  Maybe mark it #leftoverbits for
> someone that does?

This definitely is outside the scope of the documentation update.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] merge-options.txt: correct wording of --no-commit option
  @ 2019-02-19 22:31  6%     ` Elijah Newren
  2019-02-19 22:38  0%       ` Junio C Hamano
  0 siblings, 1 reply; 200+ results
From: Elijah Newren @ 2019-02-19 22:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Ulrich Windl

On Tue, Feb 19, 2019 at 11:32 AM Junio C Hamano <gitster@pobox.com> wrote:
> Elijah Newren <newren@gmail.com> writes:
>
> > +With --no-commit perform the merge and stop just before creating
> > +a merge commit, to give the user a chance to inspect and further
> > +tweak the merge result before committing.
> > ++
> > +Note that fast-forward updates do not need to create a merge
> > +commit and therefore there is no way to stop those merges with
> > +--no-commit.  Thus, if you want to ensure your branch is not
> > +changed or updated by the merge command, use --no-ff with
> > +--no-commit.
>
> While the above is an improvement (so I'll queue it on 'pu' not to
> lose sight of it), I find the use of "do not need to" above somewhat
> misleading.  It solicits a reaction "ok, we know it does not need
> to, but it could prepare to create one to allow us to further muck
> with it, no?".
>
> IOW, a fast-forward by definition does not create a merge by itself,
> so there is nowhere to stop during a creation of a merge.  So at
> least:
>
>         s/do not need to/do not/

Yes, I agree that's a good change.  I'll wait a few days for other
feedback and resend with that and any other changes.

> It also may be a good idea to consider detecting this case and be a
> bit more helpful, perhaps with end-user experience looking like...
>
>   $ git checkout master^0
>   $ git merge --no-commit next
>   Updating 0d0ac3826a..ee538a81fe
>   Fast-forward
>     ...diffstat follows here...
>   hint: merge completed without creating a commit.
>   hint: if you wanted to prepare for a manually tweaked merge,
>   hint: do "git reset --keep ORIG_HEAD" followed by
>   hint: "git merge --no-ff --no-commit next".
>
> or even
>
>   $ git checkout master^0
>   $ git merge --no-commit next
>   warning: defaulting to --no-ff, given a --no-commit request
>   Automatic merge went well; stopped before committing as requested
>   hint: if you'd rather have a fast-forward without creating a commit,
>   hint: do "git reset --keep next" now.

Good points.  I thought of this last one before sending, though
without pre- and post- warnings/hints; without such text it definitely
seemed too magical and possibly leading to unexpected surprises in a
different direction, so I dismissed it without further thought.  But
the warnings/hints help.

> I do not have a strong preference among three (the third option
> being not doing anything), but if pressed, I'd say that the last one
> might be the most user-friendly, even though it feels a bit too
> magical and trying to be smarter than its own good.

I also lack a strong preference.  Maybe mark it #leftoverbits for
someone that does?

> In any case, the hint for the "recovery" procedure needs to be
> carefully written.

Yes.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v2 1/3] Move init_skiplist() outside of fsck
  2019-01-22  9:46  6%                 ` Ævar Arnfjörð Bjarmason
@ 2019-01-22 18:28  0%                   ` Jeff King
  0 siblings, 0 replies; 200+ results
From: Jeff King @ 2019-01-22 18:28 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, Junio C Hamano, Barret Rhoden, git,
	David Kastrup, Jeff Smith, René Scharfe, Stefan Beller

On Tue, Jan 22, 2019 at 10:46:56AM +0100, Ævar Arnfjörð Bjarmason wrote:

> > At which point, I think it might be simpler to just make git more
> > permissive with respect to those minor data errors (and in fact, we are
> > already pretty permissive for the most part in non-fsck operations).
> 
> Yeah it's probably better to make some of these "errors" softer
> warnings.
> 
> The X-Y issue I have is that I turned on transfer.fsckObjects, so then I
> can't clone repos with various minor historical issues in commit headers
> etc., so I maintain a big skip list. But what I was actually after was
> fsck checks like the .gitmodules security check.
>
> Of course I could chase them all down and turn them into
> warn/error/ignore individually, but it would be better if we e.g. had
> some way to say "serious things error, minor things warn", maybe with
> the option of only having the looser version on fetch but not recieve
> with the principle that we should be loose in what we accept from
> existing data but strict with new data #leftoverbits

Yeah, I think the current state here is rather unfortunate. The worst
part is that many of the things _are_ marked as warnings, but we reject
transfers even for warnings. So now we have "info" as well, which is
really just silly.

I think the big blocker to simply loosening "warning" is that the
current severities are pretty arbitrary. MISSING_NAME_BEFORE_EMAIL
probably ought to be warning, but it's an warning. Whereas HAS_DOTGIT is
a warning, but has pretty serious security implications.

So that does not save you from chasing them all down, but if you do, at
least the work could benefit everybody. ;)

-Peff

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 1/3] Move init_skiplist() outside of fsck
  @ 2019-01-22  9:46  6%                 ` Ævar Arnfjörð Bjarmason
  2019-01-22 18:28  0%                   ` Jeff King
  0 siblings, 1 reply; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2019-01-22  9:46 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin, Junio C Hamano, Barret Rhoden, git,
	David Kastrup, Jeff Smith, René Scharfe, Stefan Beller


On Tue, Jan 22 2019, Jeff King wrote:

> On Fri, Jan 18, 2019 at 11:26:29PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> I stand corrected, I thought these still needed to be updated to parse
>> anything that wasn't 40 chars, since I hadn't seen anything about these
>> formats in the hash transition document.
>>
>> So fair enough, let's change that while we're at it, but this seems like
>> something that needs to be planned for in more detail / documented in
>> the hash transition doc.
>>
>> I.e. many (e.g. me) maintain some system-wide skiplist for strict fsck
>> cloning of legacy repos. So I can see there being some need for a
>> SHA1<->SHA256 map in this case, but since these files might stretch
>> across repo boundaries and not be checked into the repo itself this is a
>> new use-case that needs thinking about.
>
> My assumption had been that changing your local repository would be a
> (local) flag day, and you'd update any ancillary files like skiplists,
> mailmap.blob, etc at the same time. I'm not opposed to making those
> features more clever, though.
>
>> But now that I think about it this sort of thing would be a good
>> use-case for just fixing these various historical fsck issues while
>> we're at it when possible, e.g. "missing space before email" (probably
>> not all could be unambiguously fixed). So instead of sha256<->sha1
>> fn(sha256)<->fn(sha1)[1]?
>
> That is a very tempting thing to do, but I think it comes with its own
> complications. We do not want to do fn(sha1), I don't think; the reason
> we care about sha1 at all is that those hashes are already set in stone.
>
> There could be a "clean up the data as we convert to sha256" operation,
> but:
>
>   - it needs to be set in stone from day 1, I'd think. The last thing we
>     want is to modify it after conversions are in the wild
>
>   - I think we need to be bi-directional. So it must be a mapping that
>     can be undone to retrieve the original bytes, so we can compute
>     their "real" sha1.

It needing to be bidirectional is a very good point, and I think that
makes my suggestion a non-starter. Thanks.

> At which point, I think it might be simpler to just make git more
> permissive with respect to those minor data errors (and in fact, we are
> already pretty permissive for the most part in non-fsck operations).

Yeah it's probably better to make some of these "errors" softer
warnings.

The X-Y issue I have is that I turned on transfer.fsckObjects, so then I
can't clone repos with various minor historical issues in commit headers
etc., so I maintain a big skip list. But what I was actually after was
fsck checks like the .gitmodules security check.

Of course I could chase them all down and turn them into
warn/error/ignore individually, but it would be better if we e.g. had
some way to say "serious things error, minor things warn", maybe with
the option of only having the looser version on fetch but not recieve
with the principle that we should be loose in what we accept from
existing data but strict with new data #leftoverbits

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 4/4] built-in rebase: call `git am` directly
  @ 2019-01-18 14:15  5%     ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2019-01-18 14:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin via GitGitGadget, git, Elijah Newren,
	Orgad Shaneh

Hi Junio,

On Fri, 4 Jan 2019, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
> > +static int write_basic_state(struct rebase_options *opts)
> > +{
> > +	write_file(state_dir_path("head-name", opts), "%s",
> > +		   opts->head_name ? opts->head_name : "detached HEAD");
> > +	write_file(state_dir_path("onto", opts), "%s",
> > +		   opts->onto ? oid_to_hex(&opts->onto->object.oid) : "");
> > +	write_file(state_dir_path("orig-head", opts), "%s",
> > +		   oid_to_hex(&opts->orig_head));
> > +	write_file(state_dir_path("quiet", opts), "%s",
> > +		   opts->flags & REBASE_NO_QUIET ? "" : "t");
> > +	if (opts->flags & REBASE_VERBOSE)
> > +		write_file(state_dir_path("verbose", opts), "%s", "");
> > +	if (opts->strategy)
> > +		write_file(state_dir_path("strategy", opts), "%s",
> > +			   opts->strategy);
> > +	if (opts->strategy_opts)
> > +		write_file(state_dir_path("strategy_opts", opts), "%s",
> > +			   opts->strategy_opts);
> > +	if (opts->allow_rerere_autoupdate >= 0)
> > +		write_file(state_dir_path("allow_rerere_autoupdate", opts),
> > +			   "-%s-rerere-autoupdate",
> > +			   opts->allow_rerere_autoupdate ? "" : "-no");
> 
> Inside rebase, allow-rerere-autoupdate can be -1 (unspecified), 0
> (declined) or 1 (requested), and this code is being consistent with
> that convention.
> 
> The "--[no-]rerere-autoupdate" option that is parsed via
> OPT_RERERE_AUTOUPDATE (used in builtin/rebase--interactive.c among
> other built-in commands) on the other hand is tertially that uses 0
> (unspecified), 1 (requested) and 2 (declined).  This might be a
> ticking timebomb to confuse us in the future that may be worth
> fixing but probably outside this series.

Good point. We use -1 for unspecified in so many places, I think
OPT_RERERE_AUTOUPDATE needs to be fixed. But yes, I'll leave this as
#leftoverbits here.

> > @@ -459,6 +490,30 @@ static int reset_head(struct object_id *oid, const char *action,
> >  	return ret;
> >  }
> >  
> > +static int move_to_original_branch(struct rebase_options *opts)
> > +{
> > +	struct strbuf orig_head_reflog = STRBUF_INIT, head_reflog = STRBUF_INIT;
> > +	int ret;
> > +
> > +	if (!opts->head_name)
> > +		return 0; /* nothing to move back to */
> > +
> > +	if (!opts->onto)
> > +		BUG("move_to_original_branch without onto");
> 
> This check is absent in the scripted version, but from the message
> we generate here, it is clear that the caller must not call this
> when there is no "onto" commit.  Good.
> 
> > +	strbuf_addf(&orig_head_reflog, "rebase finished: %s onto %s",
> > +		    opts->head_name, oid_to_hex(&opts->onto->object.oid));
> > +	strbuf_addf(&head_reflog, "rebase finished: returning to %s",
> > +		    opts->head_name);
> > +	ret = reset_head(NULL, "checkout", opts->head_name,
> > +			 RESET_HEAD_REFS_ONLY,
> > +			 orig_head_reflog.buf, head_reflog.buf);
> 
> The *action given to reset_head() here is "checkout".  Makes me
> wonder about two things:
> 
>  - The only real use of the parameter in the callee is to prepare
>    the error and advice messages from the unpack_trees machinery,
>    but because we are using it in REFS_ONLY mode, it does not
>    matter.  In fact it might even be misleading; perhaps pass NULL
>    or something, so that a mistaken update to reset_head() later
>    that lets REFS_ONLY request to go to unpack_trees machinery will
>    catch it as a bug?
> 
>  - Another topic in flight wants to make sure that the post-checkout
>    hook gets called when the RESET_HEAD_RUN_POST_CHECKOUT_HOOK flag
>    is given by the caller, and IIRC, the use of the flag is strongly
>    correlated to *action being "checkout".  Do we want to pass
>    REFS_ONLY and RUN_POST_CHECKOUT_HOOK flag for this call, or do we
>    rather keep it silent?  As the original scripted version did not
>    use "checkout" here and never triggered post-checkout hook, I am
>    inclined to say that we should not pass that other bit.  That
>    then leads me to suspect that we do not want *action to be
>    "checkout" here.

The only thing for which that the `action` is used, though, is the call to
`setup_unpack_trees_porcelain()`, which does not accept a `NULL`. I guess
I could replace it by the empty string. Will do that.

> 
> > +	strbuf_release(&orig_head_reflog);
> > +	strbuf_release(&head_reflog);
> > +	return ret;
> > +}
> 
> Unlike the scripted version, this does not die() upon failure, so
> the caller needs to be careful about the returned status.

Indeed. That function is only called from `run_am()`, and returns the
status in every instance. The caller of `run_am()`,
`run_specific_rebase()` also handles it correctly.

> 
> > @@ -466,6 +521,129 @@ N_("Resolve all conflicts manually, mark them as resolved with\n"
> >  "To abort and get back to the state before \"git rebase\", run "
> >  "\"git rebase --abort\".");
> >  
> > +static int run_am(struct rebase_options *opts)
> > +{
> > +	struct child_process am = CHILD_PROCESS_INIT;
> > +	struct child_process format_patch = CHILD_PROCESS_INIT;
> > +	struct strbuf revisions = STRBUF_INIT;
> > +	int status;
> > +	char *rebased_patches;
> > +
> > +	am.git_cmd = 1;
> > +	argv_array_push(&am.args, "am");
> > +
> > +	if (opts->action && !strcmp("continue", opts->action)) {
> > +		argv_array_push(&am.args, "--resolved");
> > +		argv_array_pushf(&am.args, "--resolvemsg=%s", resolvemsg);
> > +		if (opts->gpg_sign_opt)
> > +			argv_array_push(&am.args, opts->gpg_sign_opt);
> > +		status = run_command(&am);
> > +		if (status)
> > +			return status;
> > +
> > +		discard_cache();
> > +		return move_to_original_branch(opts);
> 
> It is curious why discard_cache() is placed exacly here, as if we
> want to preserve the contents of the in-core index when
> run_command() failed.  But I do not think we care about the in-core
> index as the only thing that happen after "return status" is to
> return the control to run_specific_rebase(), let it jump to
> finished_rebase label to clean things up and rturn control to
> cmd_rebase() and exit based on the status value.
> 
> It's not like move_to_original_branch() wants to call read_cache()
> and get the result from the "am" that run_command() executed,
> either.
> 
> Puzzled.  Care to explain a bit more in the in-code comment?

I think that this call is just a left-over from a previous version that
did not have the REFS_ONLY flag to pass to `move_to_original_branch()`
(and it caused havoc before that flag was passed). Let me double-check
whether the `discard_cache()` even makes sense any longer.

*clicketyclick* indeed that is the case. Will remove all three
`discard_cache()` calls.

> 
> > +	}
> > +	if (opts->action && !strcmp("skip", opts->action)) {
> > +		argv_array_push(&am.args, "--skip");
> > +		argv_array_pushf(&am.args, "--resolvemsg=%s", resolvemsg);
> > +		status = run_command(&am);
> > +		if (status)
> > +			return status;
> > +
> > +		discard_cache();
> > +		return move_to_original_branch(opts);
> 
> Ditto.
> 
> > +	}
> > +	if (opts->action && !strcmp("show-current-patch", opts->action)) {
> > +		argv_array_push(&am.args, "--show-current-patch");
> > +		return run_command(&am);
> > +	}
> 
> Up to this point, it is a faithful conversion of the first case/esac
> statement.  Good.
> 
> > +	strbuf_addf(&revisions, "%s...%s",
> > +		    oid_to_hex(opts->root ?
> > +			       /* this is now equivalent to ! -z "$upstream" */
> 
> Does "this" refer to the "opts->root being true" check?
> 
> Because you are flipping the polarity of the test from scripted
> version, shouldn't the comment be updated to "-z $upstream"?

It did flip the polarity, you are right, this comment is incorrect. It is
even more incorrect, though, as it talks about a shell construct that is
no longer applicable. Will fix.

> 
> > +			       &opts->onto->object.oid :
> > +			       &opts->upstream->object.oid),
> > +		    oid_to_hex(&opts->orig_head));
> 
> > +	rebased_patches = xstrdup(git_path("rebased-patches"));
> > +	format_patch.out = open(rebased_patches,
> > +				O_WRONLY | O_CREAT | O_TRUNC, 0666);
> 
> Unlike scripted version, we do not remove a (possibly) existing file.
> We give CREAT in case there is no existing one, and TRUNC in case
> there is an existing one.  Makes sense.  A more faithful translation
> would have unlink(2)ed a (possibly) existing one, and then because
> we can afford to, passed O_EXCL to avoid stomping on somebody else
> racing with us, but I do not think it is worth it.

Okay.

> > +	if (format_patch.out < 0) {
> > +		status = error_errno(_("could not write '%s'"),
> > +				     rebased_patches);
> 
> s/write '%s'/open '%s' for writing/?  I dunno.

Yep, of course! Will fix.

> > +		free(rebased_patches);
> > +		argv_array_clear(&am.args);
> > +		return status;
> > +	}
> > +
> > +	format_patch.git_cmd = 1;
> > +	argv_array_pushl(&format_patch.args, "format-patch", "-k", "--stdout",
> > +			 "--full-index", "--cherry-pick", "--right-only",
> > +			 "--src-prefix=a/", "--dst-prefix=b/", "--no-renames",
> > +			 "--no-cover-letter", "--pretty=mboxrd", NULL);
> > +	if (opts->git_format_patch_opt.len)
> > +		argv_array_split(&format_patch.args,
> > +				 opts->git_format_patch_opt.buf);
> > +	argv_array_push(&format_patch.args, revisions.buf);
> > +	if (opts->restrict_revision)
> > +		argv_array_pushf(&format_patch.args, "^%s",
> > +				 oid_to_hex(&opts->restrict_revision->object.oid));
> 
> It is kinda surprising to see that we have learned quite a lot of
> fringe "configurations" we need to explicitly override like this.
> 
> Looks like a quite faithful conversion, anyway.
> 
> > +	status = run_command(&format_patch);
> > +	if (status) {
> > +		unlink(rebased_patches);
> > +		free(rebased_patches);
> > +		argv_array_clear(&am.args);
> > +
> > +		reset_head(&opts->orig_head, "checkout", opts->head_name, 0,
> > +			   "HEAD", NULL);
> 
> This one may need to trigger post-checkout hook.  The scripted
> version does two different things depending on the value of
> $head_name, but we can just use the same code without conditional?

Yes, because `opts->head_name` is `NULL` in one case, and not `NULL` in
the other, and the `reset_head()` function performs the desired operation
in each case.

> > +		error(_("\ngit encountered an error while preparing the "
> > +			"patches to replay\n"
> > +			"these revisions:\n"
> > +			"\n    %s\n\n"
> > +			"As a result, git cannot rebase them."),
> > +		      opts->revisions);
> > +
> > +		strbuf_release(&revisions);
> > +		return status;
> > +	}
> > +	strbuf_release(&revisions);
> > +
> > +	am.in = open(rebased_patches, O_RDONLY);
> > +	if (am.in < 0) {
> > +		status = error_errno(_("could not read '%s'"),
> > +				     rebased_patches);
> 
> s/write '%s'/open '%s' for reading/?  I dunno.

Yep, will fix.

> 
> > +		free(rebased_patches);
> > +		argv_array_clear(&am.args);
> > +		return status;
> > +	}
> > +
> > +	argv_array_pushv(&am.args, opts->git_am_opts.argv);
> > +	argv_array_push(&am.args, "--rebasing");
> > +	argv_array_pushf(&am.args, "--resolvemsg=%s", resolvemsg);
> > +	argv_array_push(&am.args, "--patch-format=mboxrd");
> > +	if (opts->allow_rerere_autoupdate > 0)
> > +		argv_array_push(&am.args, "--rerere-autoupdate");
> > +	else if (opts->allow_rerere_autoupdate == 0)
> > +		argv_array_push(&am.args, "--no-rerere-autoupdate");
> > +	if (opts->gpg_sign_opt)
> > +		argv_array_push(&am.args, opts->gpg_sign_opt);
> > +	status = run_command(&am);
> > +	unlink(rebased_patches);
> > +	free(rebased_patches);
> > +
> > +	if (!status) {
> > +		discard_cache();
> > +		return move_to_original_branch(opts);
> > +	}
> > +
> > +	if (is_directory(opts->state_dir))
> > +		write_basic_state(opts);
> > +
> > +	return status;
> > +}
> > +
> >  static int run_specific_rebase(struct rebase_options *opts)
> >  {
> >  	const char *argv[] = { NULL, NULL };
> > @@ -546,6 +724,11 @@ static int run_specific_rebase(struct rebase_options *opts)
> >  		goto finished_rebase;
> >  	}
> >  
> > +	if (opts->type == REBASE_AM) {
> > +		status = run_am(opts);
> > +		goto finished_rebase;
> > +	}
> > +
> >  	add_var(&script_snippet, "GIT_DIR", absolute_path(get_git_dir()));
> >  	add_var(&script_snippet, "state_dir", opts->state_dir);
> 
> 
> Overall, this was quite a pleasant read and a well constructed
> series.  Other than two minor points (i.e. interaction with the
> 'post-checkout hook' topic, and discard_cache() before calling
> move_to_original_branch) I did not quite understand, looks good to
> me.
> 
> When merged to 'pu', I seem to be getting failure from t3425.5, .8
> and .11, by the way.  I haven't dug into the actual breakages any
> further than that.

Sorry for the trouble, and for my silence (I was heads-down into the Azure
Pipelines support).

I did not see any breakage in `pu` lately, hopefully things resolved
themselves?

Ciao,
Dscho

^ permalink raw reply	[relevance 5%]

* Re: Students projects: looking for small and medium project ideas
  @ 2019-01-14 23:04  6% ` Ævar Arnfjörð Bjarmason
    1 sibling, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2019-01-14 23:04 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git


On Mon, Jan 14 2019, Matthieu Moy wrote:

> I haven't been active for a while on this list, but for those who don't
> know me, I'm a CS teacher and I'm regularly offering my students to
> contribute to open-source projects as part of their school projects. A
> few nice features like "git rebase -i --exec" or many of the hints in
> "git status" were implemented as part of these projects.
>
> I'm starting another instance of such project next week.

Good to hear!

> Part of the work of students is to choose which feature they want to
> work on, but I try to prepare this for them. I'm keeping a list of ideas
> here:
>
>   https://git.wiki.kernel.org/index.php/SmallProjectsIdeas
>
> (At some point, I should probably migrate this to git.github.io, since
> the wiki only seems half-alive these days).
>
> I'm looking for small to medium size projects (typically, a GSoC project
> is far too big in comparison, but we may expect more than just
> microprojects).
>
> You may suggest ideas by editting the wiki page, or just by replying to
> this email (I'll point my students to the thread). Don't hesitate to
> remove entries (or ask me to do so) on the wiki page if you think they
> are not relevant anymore.

Some #leftoverbits I've noted on-list before would qualify, some of
these (e.g. grep --only-matching) have been implemented, but others not:

https://public-inbox.org/git/87in9ucsbb.fsf@evledraar.gmail.com/
https://public-inbox.org/git/87bmcyfh67.fsf@evledraar.gmail.com/

^ permalink raw reply	[relevance 6%]

* Re: How de-duplicate similar repositories with alternates
  2018-11-29 14:59  4% How de-duplicate similar repositories with alternates Ævar Arnfjörð Bjarmason
  2018-11-29 16:09  0% ` Ævar Arnfjörð Bjarmason
  2018-12-04  6:59  0% ` Jeff King
@ 2018-12-04 13:35  0% ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-12-04 13:35 UTC (permalink / raw)
  To: git, Git for human beings; +Cc: Christian Couder, Derrick Stolee


On Thu, Nov 29 2018, Ævar Arnfjörð Bjarmason wrote:

> A co-worker asked me today how space could be saved when you have
> multiple checkouts of the same repository (at different revs) on the
> same machine. I said since these won't block-level de-duplicate well[1]
> one way to do this is with alternates.
>
> However, once you have an existing clone I didn't know how to get the
> gains without a full re-clone, but I hadn't looked deeply into it. As it
> turns out I'm wrong about that, which I found when writing the following
> test-case which shows that it works:
>
>     (
>         cd /tmp &&
>         rm -rf /tmp/git-{master,pu,pu-alt}.git &&
>
>         # Normal clones
>         git clone --bare --no-tags --single-branch --branch master https://github.com/git/git.git /tmp/git-master.git &&
>         git clone --bare --no-tags --single-branch --branch pu https://github.com/git/git.git /tmp/git-pu.git &&
>
>         # An 'alternate' clone using 'master' objects from another repo
>         git --bare init /tmp/git-pu-alt.git &&
>         for git in git-pu.git git-pu-alt.git
>         do
>             echo /tmp/git-master.git/objects >/tmp/$git/objects/info/alternates
>         done &&
>         git -C git-pu-alt.git fetch --no-tags https://github.com/git/git.git pu:pu
>
>         # Respective sizes, 'alternate' clone much smaller
>         du -shc /tmp/git-*.git &&
>
>         # GC them all. Compacts the git-pu.git to git-pu-alt.git's size
>         for repo in git-*.git
>         do
>             git -C $repo gc
>         done &&
>         du -shc /tmp/git-*.git
>
>         # Add another big history (GFW) to git-{pu,master}.git (in that order!)
>         for repo in $(ls -d /tmp/git-*.git | sort -r)
>         do
>             git -C $repo fetch --no-tags https://github.com/git-for-windows/git master:master-gfw
>         done &&
>         du -shc /tmp/git-*.git &&
>
>         # Another GC. The objects now in git-master.git will be de-duped by all
>         for repo in git-*.git
>         do
>             git -C $repo gc
>         done &&
>         du -shc /tmp/git-*.git
>     )
>
> This shows a scenario where we clone git.git at "master" and "pu" in
> different places. After clone the relevant sizes are:
>
>     108M    /tmp/git-master.git
>     3.2M    /tmp/git-pu-alt.git
>     109M    /tmp/git-pu.git
>     219M    total
>
> I.e. git-pu-alt.git is much smaller since it points via alternates to
> git-master.git, and the history of "pu" shares most of the objects with
> "master". But then how do you get those gains for git-pu.git? Turns out
> you just "git gc"
>
>     111M    /tmp/git-master.git
>     2.1M    /tmp/git-pu-alt.git
>     2.1M    /tmp/git-pu.git
>     115M    total
>
> This is the thing I was wrong about, in retrospect probably because I'd
> been putting PATH_TO_REPO in objects/info/alternates, but we actually
> need PATH_TO_REPO/objects, and "git gc" won't warn about this (or "git
> fsck"). Probably a good idea to patch that at some point, i.e. whine
> about paths in alternates that don't have objects, or at the very least
> those that don't exist. #leftoverbits
>
> Then when we fetch git-for-windows:master to all the repos they all grow
> by the amount git-for-windows has diverged:
>
>     144M    /tmp/git-master.git
>     36M     /tmp/git-pu-alt.git
>     36M     /tmp/git-pu.git
>     214M    total
>
> Note that the "sort -r" is critical here. If we fetched git-master.git
> first (at this point the alternate for git-pu*.git) we wouldn't get the
> duplication in the first place, but instead:
>
>     144M    /tmp/git-master.git
>     2.1M    /tmp/git-pu-alt.git
>     2.1M    /tmp/git-pu.git
>     148M    total
>
> This shows the importance of keeping such an 'alternate' repo
> up-to-date, i.e. we don't get the duplication in the first place, but
> regardless (this from a run with sort -r) a "git gc" will coalesce them:
>
>     131M    /tmp/git-master.git
>     2.1M    /tmp/git-pu-alt.git
>     2.2M    /tmp/git-pu.git
>     135M    total
>
> If you find this interesting make sure to read my
> https://public-inbox.org/git/87k1s3bomt.fsf@evledraar.gmail.com/ and
> https://public-inbox.org/git/87in7nbi5b.fsf@evledraar.gmail.com/ for the
> caveats, i.e. if this is something intended for users then no ref in the
> alternate can ever be rewound, that'll potentially result in repository
> corruption.
>
> 1. https://public-inbox.org/git/87bmhiykvw.fsf@evledraar.gmail.com/

Maybe this is useful to someone. Here's a cronjob I wrote since I wrote
this thread that runs in daily cron on some of our systems.

It expects repositories in /var/lib/git_tree-for-alternates like
/var/lib/git_tree-for-alternates/git/git.git to exist, then scours /home
and /etc/puppet/environments (which we had a lot of) for "config" files
with the string in git/git (this saves us some work) and then tries to
find a git repository relative to that "config" file with "rev-parse
--absolute-git-dir".

If there is one, we check if the repository has a SHA-1 that the history
of our /var/lib/git_tree-for-alternates/git/git.git started with (if >1
we pick the oldest), if so this is a repository that can benefit from
using /var/lib/git_tree-for-alternates/git/git.git/objects as an
alternate, and we add the appropriate alternate info, unset
gc.bigPackThreshold so GC will actually do its work, and run "git gc"
sudo'd as the the user who owns the thing.

One one server the .git directories in /home went from ~2TB to ~100GB
using this script. On another from ~250G to ~5G. The leftover space
spent is the commit-grah (not de-duped like objects are), and whatever
accumulated divergence (topic branches mainly) exist in those repos
different than what the alternate store has in the HEAD branch.

#!/bin/bash

set -euo pipefail

ALTERNATES_STORE=/var/lib/git_tree-for-alternates

if ! test -d $ALTERNATES_STORE
then
    echo 'We have no alternates repositories here to point to!' >&2
    exit 0
fi


find_owning_user() {
    path=$1
    case $path in
        /home/*|/etc/puppet/environments/*)
            who=$(echo $path | perl -pe 's[^
                (?:
                    /home
                    |
                    /etc/puppet/environments
                )
                /
                ([^/]+)
                /
                .*
            ][$1]gx')
            if getent passwd $who >/dev/null
            then
                echo $who
            else
                echo "Know how to get user from path '$path', but '$who' is not a valid user!" >&2
            fi
            ;;
        *)
            echo "Don't know how to get user from path '$path' yet!" >&2
            ;;
    esac
}

find $ALTERNATES_STORE -type d -name '*.git' -printf "%P\n" |
while read alternate
do
    alternate_no_git=$(echo $alternate | sed 's/\.git//')
    ALTERNATES_STORE_OBJECTS=$ALTERNATES_STORE/$alternate/objects

    # If these repositories we're finding don't share a root commit
    # with the repo we have this is not going to work and we have the
    # wrong match. Note that we can have more than one root commit
    # and try to find the oldest one. Pretty sure bet that that's
    # the "real" root.
    root_commit=$(git -C $ALTERNATES_STORE/$alternate log --max-parents=0 --date-order --reverse --pretty=format:%H | head -n 1)
    echo "> Finding repositories on the system that share the $root_commit commit with $alternate" >&2

    find \
        /home \
        $(if test -d /etc/puppet/environments; then echo /etc/puppet/environments; fi) \
        -type f -name 'config' -exec grep -Hl $alternate_no_git {} \; 2>/dev/null |
    while read config
    do
        dirname=$(dirname $config)
        echo ">> Checking if $dirname is in a $alternate git repository..." >&2
        if git_dir=$(git -C $dirname rev-parse --absolute-git-dir) &&
                git -C $git_dir cat-file -e $root_commit
        then
            echo ">>> ...Yes it was, at $git_dir" >&2
            echo ">>>> Is it already migrated?..." >&2
            if test -e $git_dir/objects/info/alternates &&
                    grep -x -F -q $ALTERNATES_STORE_OBJECTS $git_dir/objects/info/alternates
            then
                echo ">>>> ...yes, nothing to do here" >&2
                continue
            else
                echo ">>>> ...no, doing migration" >&2

                who=$(find_owning_user $git_dir)
                if test -z "$who"
                then
                    echo ">>>>> unable to find who owns $git_dir" >&2
                    continue
                else
                    echo ">>>>> found that $who owns $git_dir" >&2
                fi

                if test "$DRY_RUN" = "1"
                then
                    echo ">>>>>> Would have ran commands migrating $git_dir"
                else
                    if ! sudo -u $who stat $git_dir >/dev/null 2>&1
                    then
                        echo ">>>>>> The '$who' user can't access his own '$git_dir'. Could be e.g. ex-employee. Using 'root'"
                        who=root
                    fi

                    echo ">>>>>> Migrating $git_dir is now $(sudo -u $who du -sh $git_dir | cut -f1)"
                    sudo -u $who git -C $git_dir config gc.bigPackThreshold 0
                    echo $ALTERNATES_STORE_OBJECTS | sudo tee -a $git_dir/objects/info/alternates >/dev/null
                    sudo -u $who git -C $git_dir gc
                    echo ">>>>>> Migrated $git_dir is now $(sudo -u $who du -sh $git_dir | cut -f1)"
                fi
            fi
        else
            echo ">>> No it isn't. Skipping it" >&2
            continue
        fi
    done
done

^ permalink raw reply	[relevance 0%]

* Re: How de-duplicate similar repositories with alternates
  2018-12-04  6:59  0% ` Jeff King
@ 2018-12-04 10:43  0%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-12-04 10:43 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Git for human beings, Christian Couder


On Tue, Dec 04 2018, Jeff King wrote:

> On Thu, Nov 29, 2018 at 03:59:26PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> This is the thing I was wrong about, in retrospect probably because I'd
>> been putting PATH_TO_REPO in objects/info/alternates, but we actually
>> need PATH_TO_REPO/objects, and "git gc" won't warn about this (or "git
>> fsck"). Probably a good idea to patch that at some point, i.e. whine
>> about paths in alternates that don't have objects, or at the very least
>> those that don't exist. #leftoverbits
>
> We do complain about missing directories; see alt_odb_usable().
> Pointing to a real directory that doesn't happen to contain any objects
> is harder. If there are no loose objects, there might not be any hashed
> object directories. For a "real" object database, there should always be
> a "pack/" directory. But technically the object storage directory does
> not even need to have that; it can just be a directory full of loose
> objects that happens not to have any at this moment.
>
> That said, I suspect if we issued a warning for "woah, it looks like
> this doesn't have any objects in it, nor does it even have a pack
> directory" that nobody would complain.

Yeah, although see my <87sgzjyif2.fsf@evledraar.gmail.com>, I also ran
into a different issue.

I think a warning (or even error) like this would be more useful:

    test ! -d $objdir && error... # current behavior
    test -d $objdir/objects && error "Did you mean $objdir/objects, silly?" # new error

I.e. I suspect I'm not the only one who's not read the documentation
carefully enough and thought it was a path to the root of the repo and
wondered why it silently didn't work.

^ permalink raw reply	[relevance 0%]

* Re: How de-duplicate similar repositories with alternates
  2018-11-29 14:59  4% How de-duplicate similar repositories with alternates Ævar Arnfjörð Bjarmason
  2018-11-29 16:09  0% ` Ævar Arnfjörð Bjarmason
@ 2018-12-04  6:59  0% ` Jeff King
  2018-12-04 10:43  0%   ` Ævar Arnfjörð Bjarmason
  2018-12-04 13:35  0% ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 200+ results
From: Jeff King @ 2018-12-04  6:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Git for human beings, Christian Couder

On Thu, Nov 29, 2018 at 03:59:26PM +0100, Ævar Arnfjörð Bjarmason wrote:

> This is the thing I was wrong about, in retrospect probably because I'd
> been putting PATH_TO_REPO in objects/info/alternates, but we actually
> need PATH_TO_REPO/objects, and "git gc" won't warn about this (or "git
> fsck"). Probably a good idea to patch that at some point, i.e. whine
> about paths in alternates that don't have objects, or at the very least
> those that don't exist. #leftoverbits

We do complain about missing directories; see alt_odb_usable().
Pointing to a real directory that doesn't happen to contain any objects
is harder. If there are no loose objects, there might not be any hashed
object directories. For a "real" object database, there should always be
a "pack/" directory. But technically the object storage directory does
not even need to have that; it can just be a directory full of loose
objects that happens not to have any at this moment.

That said, I suspect if we issued a warning for "woah, it looks like
this doesn't have any objects in it, nor does it even have a pack
directory" that nobody would complain.

-Peff

^ permalink raw reply	[relevance 0%]

* Re: How de-duplicate similar repositories with alternates
  2018-11-29 14:59  4% How de-duplicate similar repositories with alternates Ævar Arnfjörð Bjarmason
@ 2018-11-29 16:09  0% ` Ævar Arnfjörð Bjarmason
  2018-12-04  6:59  0% ` Jeff King
  2018-12-04 13:35  0% ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-11-29 16:09 UTC (permalink / raw)
  To: git, Git for human beings; +Cc: Christian Couder, Duy Nguyen


On Thu, Nov 29 2018, Ævar Arnfjörð Bjarmason wrote:

> A co-worker asked me today how space could be saved when you have
> multiple checkouts of the same repository (at different revs) on the
> same machine. I said since these won't block-level de-duplicate well[1]
> one way to do this is with alternates.
>
> However, once you have an existing clone I didn't know how to get the
> gains without a full re-clone, but I hadn't looked deeply into it. As it
> turns out I'm wrong about that, which I found when writing the following
> test-case which shows that it works:
>
>     (
>         cd /tmp &&
>         rm -rf /tmp/git-{master,pu,pu-alt}.git &&
>
>         # Normal clones
>         git clone --bare --no-tags --single-branch --branch master https://github.com/git/git.git /tmp/git-master.git &&
>         git clone --bare --no-tags --single-branch --branch pu https://github.com/git/git.git /tmp/git-pu.git &&
>
>         # An 'alternate' clone using 'master' objects from another repo
>         git --bare init /tmp/git-pu-alt.git &&
>         for git in git-pu.git git-pu-alt.git
>         do
>             echo /tmp/git-master.git/objects >/tmp/$git/objects/info/alternates
>         done &&
>         git -C git-pu-alt.git fetch --no-tags https://github.com/git/git.git pu:pu
>
>         # Respective sizes, 'alternate' clone much smaller
>         du -shc /tmp/git-*.git &&
>
>         # GC them all. Compacts the git-pu.git to git-pu-alt.git's size
>         for repo in git-*.git
>         do
>             git -C $repo gc
>         done &&
>         du -shc /tmp/git-*.git
>
>         # Add another big history (GFW) to git-{pu,master}.git (in that order!)
>         for repo in $(ls -d /tmp/git-*.git | sort -r)
>         do
>             git -C $repo fetch --no-tags https://github.com/git-for-windows/git master:master-gfw
>         done &&
>         du -shc /tmp/git-*.git &&
>
>         # Another GC. The objects now in git-master.git will be de-duped by all
>         for repo in git-*.git
>         do
>             git -C $repo gc
>         done &&
>         du -shc /tmp/git-*.git
>     )
>
> This shows a scenario where we clone git.git at "master" and "pu" in
> different places. After clone the relevant sizes are:
>
>     108M    /tmp/git-master.git
>     3.2M    /tmp/git-pu-alt.git
>     109M    /tmp/git-pu.git
>     219M    total
>
> I.e. git-pu-alt.git is much smaller since it points via alternates to
> git-master.git, and the history of "pu" shares most of the objects with
> "master". But then how do you get those gains for git-pu.git? Turns out
> you just "git gc"
>
>     111M    /tmp/git-master.git
>     2.1M    /tmp/git-pu-alt.git
>     2.1M    /tmp/git-pu.git
>     115M    total
>
> This is the thing I was wrong about, in retrospect probably because I'd
> been putting PATH_TO_REPO in objects/info/alternates, but we actually
> need PATH_TO_REPO/objects, and "git gc" won't warn about this (or "git
> fsck"). Probably a good idea to patch that at some point, i.e. whine
> about paths in alternates that don't have objects, or at the very least
> those that don't exist. #leftoverbits

Actually looking at this again the thing that may have stumped me last
time is that this has a bad interaction with gc.bigPackThreshold. If you
have an alternate that would otherwise house most of your objects *and*
you have a pack that's larger than the gc.bigPackThreshold your mostly
redundant pack won't be removed.

That's understandable in terms of implementation, but unfortunate. It
would be nice if we learned some way to detect this, i.e. "I have this
10GB pack, but with this alternate I can extract this 100MB out of it
and throw it away". Now we just keep the 10GB pack even if it's mostly
redundant to what's in the alternate.

> Then when we fetch git-for-windows:master to all the repos they all grow
> by the amount git-for-windows has diverged:
>
>     144M    /tmp/git-master.git
>     36M     /tmp/git-pu-alt.git
>     36M     /tmp/git-pu.git
>     214M    total
>
> Note that the "sort -r" is critical here. If we fetched git-master.git
> first (at this point the alternate for git-pu*.git) we wouldn't get the
> duplication in the first place, but instead:
>
>     144M    /tmp/git-master.git
>     2.1M    /tmp/git-pu-alt.git
>     2.1M    /tmp/git-pu.git
>     148M    total
>
> This shows the importance of keeping such an 'alternate' repo
> up-to-date, i.e. we don't get the duplication in the first place, but
> regardless (this from a run with sort -r) a "git gc" will coalesce them:
>
>     131M    /tmp/git-master.git
>     2.1M    /tmp/git-pu-alt.git
>     2.2M    /tmp/git-pu.git
>     135M    total
>
> If you find this interesting make sure to read my
> https://public-inbox.org/git/87k1s3bomt.fsf@evledraar.gmail.com/ and
> https://public-inbox.org/git/87in7nbi5b.fsf@evledraar.gmail.com/ for the
> caveats, i.e. if this is something intended for users then no ref in the
> alternate can ever be rewound, that'll potentially result in repository
> corruption.
>
> 1. https://public-inbox.org/git/87bmhiykvw.fsf@evledraar.gmail.com/

^ permalink raw reply	[relevance 0%]

* How de-duplicate similar repositories with alternates
@ 2018-11-29 14:59  4% Ævar Arnfjörð Bjarmason
  2018-11-29 16:09  0% ` Ævar Arnfjörð Bjarmason
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-11-29 14:59 UTC (permalink / raw)
  To: git, Git for human beings; +Cc: Christian Couder

A co-worker asked me today how space could be saved when you have
multiple checkouts of the same repository (at different revs) on the
same machine. I said since these won't block-level de-duplicate well[1]
one way to do this is with alternates.

However, once you have an existing clone I didn't know how to get the
gains without a full re-clone, but I hadn't looked deeply into it. As it
turns out I'm wrong about that, which I found when writing the following
test-case which shows that it works:

    (
        cd /tmp &&
        rm -rf /tmp/git-{master,pu,pu-alt}.git &&

        # Normal clones
        git clone --bare --no-tags --single-branch --branch master https://github.com/git/git.git /tmp/git-master.git &&
        git clone --bare --no-tags --single-branch --branch pu https://github.com/git/git.git /tmp/git-pu.git &&

        # An 'alternate' clone using 'master' objects from another repo
        git --bare init /tmp/git-pu-alt.git &&
        for git in git-pu.git git-pu-alt.git
        do
            echo /tmp/git-master.git/objects >/tmp/$git/objects/info/alternates
        done &&
        git -C git-pu-alt.git fetch --no-tags https://github.com/git/git.git pu:pu

        # Respective sizes, 'alternate' clone much smaller
        du -shc /tmp/git-*.git &&

        # GC them all. Compacts the git-pu.git to git-pu-alt.git's size
        for repo in git-*.git
        do
            git -C $repo gc
        done &&
        du -shc /tmp/git-*.git

        # Add another big history (GFW) to git-{pu,master}.git (in that order!)
        for repo in $(ls -d /tmp/git-*.git | sort -r)
        do
            git -C $repo fetch --no-tags https://github.com/git-for-windows/git master:master-gfw
        done &&
        du -shc /tmp/git-*.git &&

        # Another GC. The objects now in git-master.git will be de-duped by all
        for repo in git-*.git
        do
            git -C $repo gc
        done &&
        du -shc /tmp/git-*.git
    )

This shows a scenario where we clone git.git at "master" and "pu" in
different places. After clone the relevant sizes are:

    108M    /tmp/git-master.git
    3.2M    /tmp/git-pu-alt.git
    109M    /tmp/git-pu.git
    219M    total

I.e. git-pu-alt.git is much smaller since it points via alternates to
git-master.git, and the history of "pu" shares most of the objects with
"master". But then how do you get those gains for git-pu.git? Turns out
you just "git gc"

    111M    /tmp/git-master.git
    2.1M    /tmp/git-pu-alt.git
    2.1M    /tmp/git-pu.git
    115M    total

This is the thing I was wrong about, in retrospect probably because I'd
been putting PATH_TO_REPO in objects/info/alternates, but we actually
need PATH_TO_REPO/objects, and "git gc" won't warn about this (or "git
fsck"). Probably a good idea to patch that at some point, i.e. whine
about paths in alternates that don't have objects, or at the very least
those that don't exist. #leftoverbits

Then when we fetch git-for-windows:master to all the repos they all grow
by the amount git-for-windows has diverged:

    144M    /tmp/git-master.git
    36M     /tmp/git-pu-alt.git
    36M     /tmp/git-pu.git
    214M    total

Note that the "sort -r" is critical here. If we fetched git-master.git
first (at this point the alternate for git-pu*.git) we wouldn't get the
duplication in the first place, but instead:

    144M    /tmp/git-master.git
    2.1M    /tmp/git-pu-alt.git
    2.1M    /tmp/git-pu.git
    148M    total

This shows the importance of keeping such an 'alternate' repo
up-to-date, i.e. we don't get the duplication in the first place, but
regardless (this from a run with sort -r) a "git gc" will coalesce them:

    131M    /tmp/git-master.git
    2.1M    /tmp/git-pu-alt.git
    2.2M    /tmp/git-pu.git
    135M    total

If you find this interesting make sure to read my
https://public-inbox.org/git/87k1s3bomt.fsf@evledraar.gmail.com/ and
https://public-inbox.org/git/87in7nbi5b.fsf@evledraar.gmail.com/ for the
caveats, i.e. if this is something intended for users then no ref in the
alternate can ever be rewound, that'll potentially result in repository
corruption.

1. https://public-inbox.org/git/87bmhiykvw.fsf@evledraar.gmail.com/

^ permalink raw reply	[relevance 4%]

* Re: [PATCH v2] read-cache: write all indexes with the same permissions
  2018-11-17 13:05  6%     ` Junio C Hamano
@ 2018-11-17 21:14  0%       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-11-17 21:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, git, Jeff King, Nguyen Thai Ngoc Duy,
	Michael Haggerty, Christian Couder


On Sat, Nov 17 2018, Junio C Hamano wrote:

> Christian Couder <christian.couder@gmail.com> writes:
>
>> "However, as noted in those commits we'd still create the file as
>> 0600, and would just re-chmod it only if core.sharedRepository is set
>> to "true" or "all". If core.sharedRepository is unset or set to
>> "false", then the file mode will not be changed, so without
>> core.splitIndex a system with e.g. the umask set to group writeability
>> would work for a group member, but not with core.splitIndex set, as
>> group members would not be able to access the shared index file.
>
> That is irrelevant.  The repository needs to be configured properly
> if it wanted to be used by the members of the group, period.
>
>> It is unfortunately not short lived when core.sharedrepository is
>> unset for example as adjust_shared_perm() starts with:
>>
>> int adjust_shared_perm(const char *path)
>> {
>>         int old_mode, new_mode;
>>
>>         if (!get_shared_repository())
>>                 return 0;
>>
>> but get_shared_repository() will return PERM_UMASK which is 0 when
>> git_config_get_value("core.sharedrepository", ...) returns a non zero
>> value which happens when "core.sharedrepository" is unset.
>
> Which is to say, you get an unwanted result when your repository is
> not configured properly.  It is not a news, and I have no sympathy.
>
> Just configure your repository properly and you'll be fine.
>
>>> > Ideally we'd split up the adjust_shared_perm() function to one that
>>> > can give us the mode we want so we could just call open() instead of
>>> > open() followed by chmod(), but that's an unrelated cleanup.
>>>
>>> I would drop this paragraph, as I think this is totally incorrect.
>>> Imagine your umask is tighter than the target permission.  You ask
>>> such a helper function and get "you want 0660".  Doing open(0660)
>>> would not help you an iota---you'd need chmod() or fchmod() to
>>> adjust the result anyway, which already is done by
>>> adjust-shared-perm.
>>
>> It seems to me that it is not done when "core.sharedrepository" is unset.
>
> So?  You are assuming that the repository is misconfigured and it is
> not set to widen the perm bit in the first place, no?
>
>>> > We already have that minor issue with the "index" file
>>> > #leftoverbits.
>>>
>>> The above "Ideally", which I suspect is totally bogus, would show up
>>> whey people look for that keyword in the list archive.  This is one
>>> of the reasons why I try to write it after at least one person
>>> sanity checks that an idea floated is worth remembering.
>>
>> It was in Ævar's commit message and I thought it might be better to
>> keep it so that people looking for that keyword could find the above
>> as well as the previous RFC patch.
>
> So do you agree that open(0660) does not guarantee the result will
> be group writable, the above "Ideally" is misguided nonsense, and
> giving the #leftoverbits label to it will clutter the search result
> and harm readers?  That's good.

Aside from issues with the clarity of the commit message, which I'll fix
& thanks for pointing them out. I think we may have stumbled on
something more important here.

Do you mean that you don't agree that following should always create
both "foo" and e.g. ".git/refs/heads/master" with the same 644
(-rw-rw-r--) mode:

    (
        rm -rf /tmp/repo &&
        umask 022 &&
        git init /tmp/repo &&
        cd /tmp/repo &&
        echo hi >foo &&
        git add foo &&
        git commit -m"first"
    )

To me what we should do with the standard umask and what
core.sharedRepository are for are completely different things.

We should in git be creating files such that if I set my umask to
e.g. 022 all users on the system can read what I'm creating.

E.g. I tend to use this on something like a production server where
others (if I'm asleep) might want to look at my .bash_history as a last
resort, and also some one-off repo I've created without setting
core.sharedRepository.

I've yet to run into a case where this doesn't just work, aside from
core.splitIndex where before the patch here we're using a tempfile API
for something that isn't a tempfile.

This is distinct from the core.sharedRepository use-case, where you'd
like to on a per-repo basis override what you'd otherwise get with the
umask. E.g. if you have a shared server hosting a shared git repo, where
users with umask 077 will still be forced to create e.g. group rw files.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] read-cache: write all indexes with the same permissions
  2018-11-17 11:19  0%   ` Christian Couder
@ 2018-11-17 13:05  6%     ` Junio C Hamano
  2018-11-17 21:14  0%       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2018-11-17 13:05 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Nguyen Thai Ngoc Duy, Michael Haggerty,
	Ævar Arnfjörð Bjarmason, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> "However, as noted in those commits we'd still create the file as
> 0600, and would just re-chmod it only if core.sharedRepository is set
> to "true" or "all". If core.sharedRepository is unset or set to
> "false", then the file mode will not be changed, so without
> core.splitIndex a system with e.g. the umask set to group writeability
> would work for a group member, but not with core.splitIndex set, as
> group members would not be able to access the shared index file.

That is irrelevant.  The repository needs to be configured properly
if it wanted to be used by the members of the group, period.

> It is unfortunately not short lived when core.sharedrepository is
> unset for example as adjust_shared_perm() starts with:
>
> int adjust_shared_perm(const char *path)
> {
>         int old_mode, new_mode;
>
>         if (!get_shared_repository())
>                 return 0;
>
> but get_shared_repository() will return PERM_UMASK which is 0 when
> git_config_get_value("core.sharedrepository", ...) returns a non zero
> value which happens when "core.sharedrepository" is unset.

Which is to say, you get an unwanted result when your repository is
not configured properly.  It is not a news, and I have no sympathy.

Just configure your repository properly and you'll be fine.

>> > Ideally we'd split up the adjust_shared_perm() function to one that
>> > can give us the mode we want so we could just call open() instead of
>> > open() followed by chmod(), but that's an unrelated cleanup.
>>
>> I would drop this paragraph, as I think this is totally incorrect.
>> Imagine your umask is tighter than the target permission.  You ask
>> such a helper function and get "you want 0660".  Doing open(0660)
>> would not help you an iota---you'd need chmod() or fchmod() to
>> adjust the result anyway, which already is done by
>> adjust-shared-perm.
>
> It seems to me that it is not done when "core.sharedrepository" is unset.

So?  You are assuming that the repository is misconfigured and it is
not set to widen the perm bit in the first place, no?

>> > We already have that minor issue with the "index" file
>> > #leftoverbits.
>>
>> The above "Ideally", which I suspect is totally bogus, would show up
>> whey people look for that keyword in the list archive.  This is one
>> of the reasons why I try to write it after at least one person
>> sanity checks that an idea floated is worth remembering.
>
> It was in Ævar's commit message and I thought it might be better to
> keep it so that people looking for that keyword could find the above
> as well as the previous RFC patch.

So do you agree that open(0660) does not guarantee the result will
be group writable, the above "Ideally" is misguided nonsense, and
giving the #leftoverbits label to it will clutter the search result
and harm readers?  That's good.

Thanks.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v2] read-cache: write all indexes with the same permissions
  2018-11-17  9:29  0% ` Junio C Hamano
@ 2018-11-17 11:19  0%   ` Christian Couder
  2018-11-17 13:05  6%     ` Junio C Hamano
  0 siblings, 1 reply; 200+ results
From: Christian Couder @ 2018-11-17 11:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Nguyen Thai Ngoc Duy, Michael Haggerty,
	Ævar Arnfjörð Bjarmason, Christian Couder

On Sat, Nov 17, 2018 at 10:29 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > However, as noted in those commits we'd still create the file as 0600,
> > and would just re-chmod it depending on the setting of
> > core.sharedRepository. So without core.splitIndex a system with
> > e.g. the umask set to group writeability would work for the members of
> > the group, but not with core.splitIndex set, as members of the group
> > would not be able to access the shared index file.
>
> I am not sure what the above wants to say.

I tried to improve from Ævar's previous commit message but I agree
that the above is not very clear.

> If we are not making
> necessary call to adjust-shared-perm,

The issue is that adjust_shared_perm() returns immediately when
core.sharedRepository is unset (or false). So when it is unset (or
false), and when the umask is 0022 or 0002 for example, then the index
and the shared index will not have the same permissions because one is
created using open() with mode 0666 and the other with mode 0600.

> then it is irrelevant that the
> lack of the call does not immediately cause an apparent problem for
> users who happens to have non-restrictive group perm bit in their
> umask.  Another group member whose umask is tighter will eventually
> use the repository and end up creating a file unreadable to group
> members.

The issue is that a group member with non-restrictive group perm bit
in their umask, like 0022 or 0002, will currently create an unreadable
shared index when using the repo.

I agree that it is much safer to just set core.sharedRepository to
"true" or "all", but maybe in some setups/systems it might be ok to
rely on everyone having non-restrictive group perm bit in their umask.

> Are you saying that we _lack_ necessary call when core.sharedRepository
> is set?

No, I am saying that, when it is unset, adjust_shared_perm() does nothing.

> If so, a commit that fixes such a bug would be the best
> place to have a paragraph like the above.  If not, the above description
> simply misleads the readers.

I agree that it is a bit misleading. Maybe something like:

"However, as noted in those commits we'd still create the file as
0600, and would just re-chmod it only if core.sharedRepository is set
to "true" or "all". If core.sharedRepository is unset or set to
"false", then the file mode will not be changed, so without
core.splitIndex a system with e.g. the umask set to group writeability
would work for a group member, but not with core.splitIndex set, as
group members would not be able to access the shared index file.

> > Let's instead make the two consistent by using mks_tempfile_sm() and
> > passing 0666 in its `mode` argument.
>
> On the other hand, this is a relevant description; this patch kills
> an inconsistency that is very short lived (I am assuming that there
> is no bug in the current code before this patch and we make
> necessary calls to adjust-shared-perm when core.sharedrepository is
> set).

It is unfortunately not short lived when core.sharedrepository is
unset for example as adjust_shared_perm() starts with:

int adjust_shared_perm(const char *path)
{
        int old_mode, new_mode;

        if (!get_shared_repository())
                return 0;

but get_shared_repository() will return PERM_UMASK which is 0 when
git_config_get_value("core.sharedrepository", ...) returns a non zero
value which happens when "core.sharedrepository" is unset.

Maybe there is a bug somewhere in adjust_shared_perm() or the
functions it calls, but I don't know this part of the code base much.

> > Note that we cannot use the create_tempfile() function itself that is
> > used to write the main ".git/index" file because we want the XXXXXX
> > part of the "sharedindex_XXXXXX" argument to be replaced by a pseudo
> > random value and create_tempfile() doesn't do that.
>
> Sure.  Pseudo-random-ness is less important than the resulting
> filename being unique.  "Because we are asking for a unique file to
> be created, we cannot use create_tempfile() interface that is
> designed to be used to create a file with known name."
>
> But is that really worth saying, I wonder.

I am ok with either your version or removing the above from the commit message.

> > Ideally we'd split up the adjust_shared_perm() function to one that
> > can give us the mode we want so we could just call open() instead of
> > open() followed by chmod(), but that's an unrelated cleanup.
>
> I would drop this paragraph, as I think this is totally incorrect.
> Imagine your umask is tighter than the target permission.  You ask
> such a helper function and get "you want 0660".  Doing open(0660)
> would not help you an iota---you'd need chmod() or fchmod() to
> adjust the result anyway, which already is done by
> adjust-shared-perm.

It seems to me that it is not done when "core.sharedrepository" is unset.

> > We already have that minor issue with the "index" file
> > #leftoverbits.
>
> The above "Ideally", which I suspect is totally bogus, would show up
> whey people look for that keyword in the list archive.  This is one
> of the reasons why I try to write it after at least one person
> sanity checks that an idea floated is worth remembering.

It was in Ævar's commit message and I thought it might be better to
keep it so that people looking for that keyword could find the above
as well as the previous RFC patch.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] read-cache: write all indexes with the same permissions
  2018-11-16 17:31  4% [PATCH v2] " Christian Couder
@ 2018-11-17  9:29  0% ` Junio C Hamano
  2018-11-17 11:19  0%   ` Christian Couder
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2018-11-17  9:29 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Ævar Arnfjörð Bjarmason,
	Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> However, as noted in those commits we'd still create the file as 0600,
> and would just re-chmod it depending on the setting of
> core.sharedRepository. So without core.splitIndex a system with
> e.g. the umask set to group writeability would work for the members of
> the group, but not with core.splitIndex set, as members of the group
> would not be able to access the shared index file.

I am not sure what the above wants to say.  If we are not making
necessary call to adjust-shared-perm, then it is irrelevant that the
lack of the call does not immediately cause an apparent problem for
users who happens to have non-restrictive group perm bit in their
umask.  Another group member whose umask is tighter will eventually
use the repository and end up creating a file unreadable to group
members.

Are you saying that we _lack_ necessary call when core.sharedRepository
is set?  If so, a commit that fixes such a bug would be the best
place to have a paragraph like the above.  If not, the above description
simply misleads the readers.

> Let's instead make the two consistent by using mks_tempfile_sm() and
> passing 0666 in its `mode` argument.

On the other hand, this is a relevant description; this patch kills
an inconsistency that is very short lived (I am assuming that there
is no bug in the current code before this patch and we make
necessary calls to adjust-shared-perm when core.sharedrepository is
set).

> Note that we cannot use the create_tempfile() function itself that is
> used to write the main ".git/index" file because we want the XXXXXX
> part of the "sharedindex_XXXXXX" argument to be replaced by a pseudo
> random value and create_tempfile() doesn't do that.

Sure.  Pseudo-random-ness is less important than the resulting
filename being unique.  "Because we are asking for a unique file to
be created, we cannot use create_tempfile() interface that is
designed to be used to create a file with known name."

But is that really worth saying, I wonder.

> Ideally we'd split up the adjust_shared_perm() function to one that
> can give us the mode we want so we could just call open() instead of
> open() followed by chmod(), but that's an unrelated cleanup.

I would drop this paragraph, as I think this is totally incorrect.
Imagine your umask is tighter than the target permission.  You ask
such a helper function and get "you want 0660".  Doing open(0660)
would not help you an iota---you'd need chmod() or fchmod() to
adjust the result anyway, which already is done by
adjust-shared-perm.

> We already have that minor issue with the "index" file
> #leftoverbits.

The above "Ideally", which I suspect is totally bogus, would show up
whey people look for that keyword in the list archive.  This is one
of the reasons why I try to write it after at least one person
sanity checks that an idea floated is worth remembering.

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
>
> This is a simpler fix iterating from Ævar's RFC patch and the
> following discussions:
>
> https://public-inbox.org/git/20181113153235.25402-1-avarab@gmail.com/
>
>  read-cache.c           |  3 ++-
>  t/t1700-split-index.sh | 20 ++++++++++++++++++++
>  2 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index 8c924506dd..ea80600bff 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -3165,7 +3165,8 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
>  		struct tempfile *temp;
>  		int saved_errno;
>  
> -		temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
> +		/* Same permissions as the main .git/index file */
> +		temp = mks_tempfile_sm(git_path("sharedindex_XXXXXX"), 0, 0666);
>  		if (!temp) {
>  			oidclr(&si->base_oid);
>  			ret = do_write_locked_index(istate, lock, flags);
> diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
> index 2ac47aa0e4..fa1d3d468b 100755
> --- a/t/t1700-split-index.sh
> +++ b/t/t1700-split-index.sh
> @@ -381,6 +381,26 @@ test_expect_success 'check splitIndex.sharedIndexExpire set to "never" and "now"
>  	test $(ls .git/sharedindex.* | wc -l) -le 2
>  '
>  
> +test_expect_success POSIXPERM 'same mode for index & split index' '
> +	git init same-mode &&
> +	(
> +		cd same-mode &&
> +		test_commit A &&
> +		test_modebits .git/index >index_mode &&
> +		test_must_fail git config core.sharedRepository &&
> +		git -c core.splitIndex=true status &&
> +		shared=$(ls .git/sharedindex.*) &&
> +		case "$shared" in
> +		*" "*)
> +			# we have more than one???
> +			false ;;
> +		*)
> +			test_modebits "$shared" >split_index_mode &&
> +			test_cmp index_mode split_index_mode ;;
> +		esac
> +	)
> +'
> +
>  while read -r mode modebits
>  do
>  	test_expect_success POSIXPERM "split index respects core.sharedrepository $mode" '

^ permalink raw reply	[relevance 0%]

* [PATCH v2] read-cache: write all indexes with the same permissions
@ 2018-11-16 17:31  4% Christian Couder
  2018-11-17  9:29  0% ` Junio C Hamano
  0 siblings, 1 reply; 200+ results
From: Christian Couder @ 2018-11-16 17:31 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyễn Thái Ngọc Duy,
	Michael Haggerty, Ævar Arnfjörð Bjarmason,
	Christian Couder

From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

Change the code that writes out the shared index to use
mks_tempfile_sm() instead of mks_tempfile().

The create_tempfile() function is used to write out the main
".git/index" (via ".git/index.lock") using lock_file(). The
create_tempfile() function respects the umask, as it uses open() with
0666, whereas the mks_tempfile() function uses open() with 0600.

So mks_tempfile() which is used to create the shared index file is
likely to create such a file with restricted permissions compared to
the main ".git/index" file.

A bug related to this was spotted, fixed and tested for in df801f3f9f
("read-cache: use shared perms when writing shared index", 2017-06-25)
and 3ee83f48e5 ("t1700: make sure split-index respects
core.sharedrepository", 2017-06-25).

However, as noted in those commits we'd still create the file as 0600,
and would just re-chmod it depending on the setting of
core.sharedRepository. So without core.splitIndex a system with
e.g. the umask set to group writeability would work for the members of
the group, but not with core.splitIndex set, as members of the group
would not be able to access the shared index file.

Let's instead make the two consistent by using mks_tempfile_sm() and
passing 0666 in its `mode` argument.

Note that we cannot use the create_tempfile() function itself that is
used to write the main ".git/index" file because we want the XXXXXX
part of the "sharedindex_XXXXXX" argument to be replaced by a pseudo
random value and create_tempfile() doesn't do that.

Ideally we'd split up the adjust_shared_perm() function to one that
can give us the mode we want so we could just call open() instead of
open() followed by chmod(), but that's an unrelated cleanup. We
already have that minor issue with the "index" file #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---

This is a simpler fix iterating from Ævar's RFC patch and the
following discussions:

https://public-inbox.org/git/20181113153235.25402-1-avarab@gmail.com/

 read-cache.c           |  3 ++-
 t/t1700-split-index.sh | 20 ++++++++++++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 8c924506dd..ea80600bff 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -3165,7 +3165,8 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		struct tempfile *temp;
 		int saved_errno;

-		temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
+		/* Same permissions as the main .git/index file */
+		temp = mks_tempfile_sm(git_path("sharedindex_XXXXXX"), 0, 0666);
 		if (!temp) {
 			oidclr(&si->base_oid);
 			ret = do_write_locked_index(istate, lock, flags);
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 2ac47aa0e4..fa1d3d468b 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -381,6 +381,26 @@ test_expect_success 'check splitIndex.sharedIndexExpire set to "never" and "now"
 	test $(ls .git/sharedindex.* | wc -l) -le 2
 '

+test_expect_success POSIXPERM 'same mode for index & split index' '
+	git init same-mode &&
+	(
+		cd same-mode &&
+		test_commit A &&
+		test_modebits .git/index >index_mode &&
+		test_must_fail git config core.sharedRepository &&
+		git -c core.splitIndex=true status &&
+		shared=$(ls .git/sharedindex.*) &&
+		case "$shared" in
+		*" "*)
+			# we have more than one???
+			false ;;
+		*)
+			test_modebits "$shared" >split_index_mode &&
+			test_cmp index_mode split_index_mode ;;
+		esac
+	)
+'
+
 while read -r mode modebits
 do
 	test_expect_success POSIXPERM "split index respects core.sharedrepository $mode" '
-- 
2.19.1.1053.g063ed687ac

^ permalink raw reply related	[relevance 4%]

* [RFC/PATCH] read-cache: write all indexes with the same permissions
  @ 2018-11-13 15:32  4% ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-11-13 15:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Christian Couder,
	Nguyễn Thái Ngọc Duy, Michael Haggerty,
	Ævar Arnfjörð Bjarmason

Change the code that writes out the shared index to use
create_tempfile() instead of mks_tempfile();

The create_tempfile() function is used to write out the main
.git/index (via .git/index.lock) using lock_file(). The
create_tempfile() function respects the umask, whereas the
mks_tempfile() function will create files with 0600 permissions.

A bug related to this was spotted, fixed and tested for in
df801f3f9f ("read-cache: use shared perms when writing shared index",
2017-06-25) and 3ee83f48e5 ("t1700: make sure split-index respects
core.sharedrepository", 2017-06-25).

However, as noted in those commits we'd still create the file as 0600,
and would just re-chmod it depending on the setting of
core.sharedRepository. So without core.splitIndex a system with
e.g. the umask set to group writeability would work, but not with
core.splitIndex set.

Let's instead make the two consistent by using create_tempfile(). This
allows us to remove the code added in df801f3f9f (subsequently
modified in 59f9d2dd60 ("read-cache.c: move tempfile creation/cleanup
out of write_shared_index", 2018-01-14)) as redundant. The
create_tempfile() function itself calls adjust_shared_perm().

Now we're not leaking the implementation detail that we're using a
mkstemp()-like API for something that's not really a mkstemp()
use-case. See c18b80a0e8 ("update-index: new options to enable/disable
split index mode", 2014-06-13) for the initial implementation which
used mkstemp() without a wrapper.

One thing I was paranoid about when making this change was not
introducing a race condition where with
e.g. core.sharedRepository=0600 we'd do something different for
"index" v.s. "sharedindex.*", as the former has a *.lock file, not the
latter.

But I'm confident that we're exposing no such edge-case. With a user
umask of e.g. 0022 and core.sharedRepository=0600 we initially create
both "index' and "sharedindex.*" files that are globally readable, but
re-chmod them while they're still empty.

Ideally we'd split up the adjust_shared_perm() function to one that
can give us the mode we want so we could just call open() instead of
open() followed by chmod(), but that's an unrelated cleanup. We
already have that minor issue with the "index" file #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

I won't have time to finish this today, as noted in
https://public-inbox.org/git/874lcl2e9t.fsf@evledraar.gmail.com/
there's a pretty major bug here in that we're now writing out literal
sharedindex_XXXXXX files.

Obviously that needs to be fixed, and the fix is trivial, I can use
another one of the mks_*() functions with the same mode we use to
create the index.

But we really ought to have tests for the bug this patch introduces,
and as noted in the E-Mail linked above we don't.

So hopefully Duy or someone with more knowledge of the split index
will chime in to say what's missing there...

 read-cache.c           |  7 +------
 t/t1700-split-index.sh | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index f3a848d61c..7135537554 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -3074,11 +3074,6 @@ static int write_shared_index(struct index_state *istate,
 	ret = do_write_index(si->base, *temp, 1);
 	if (ret)
 		return ret;
-	ret = adjust_shared_perm(get_tempfile_path(*temp));
-	if (ret) {
-		error("cannot fix permission bits on %s", get_tempfile_path(*temp));
-		return ret;
-	}
 	ret = rename_tempfile(temp,
 			      git_path("sharedindex.%s", oid_to_hex(&si->base->oid)));
 	if (!ret) {
@@ -3159,7 +3154,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		struct tempfile *temp;
 		int saved_errno;

-		temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
+		temp = create_tempfile(git_path("sharedindex_XXXXXX"));
 		if (!temp) {
 			oidclr(&si->base_oid);
 			ret = do_write_locked_index(istate, lock, flags);
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 2ac47aa0e4..fa1d3d468b 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -381,6 +381,26 @@ test_expect_success 'check splitIndex.sharedIndexExpire set to "never" and "now"
 	test $(ls .git/sharedindex.* | wc -l) -le 2
 '

+test_expect_success POSIXPERM 'same mode for index & split index' '
+	git init same-mode &&
+	(
+		cd same-mode &&
+		test_commit A &&
+		test_modebits .git/index >index_mode &&
+		test_must_fail git config core.sharedRepository &&
+		git -c core.splitIndex=true status &&
+		shared=$(ls .git/sharedindex.*) &&
+		case "$shared" in
+		*" "*)
+			# we have more than one???
+			false ;;
+		*)
+			test_modebits "$shared" >split_index_mode &&
+			test_cmp index_mode split_index_mode ;;
+		esac
+	)
+'
+
 while read -r mode modebits
 do
 	test_expect_success POSIXPERM "split index respects core.sharedrepository $mode" '
-- 
2.19.1.1182.g4ecb1133ce

^ permalink raw reply related	[relevance 4%]

* Re: [PATCH 9/9] fetch-pack: drop custom loose object cache
  2018-11-12 19:32  0%     ` Ævar Arnfjörð Bjarmason
  2018-11-12 20:07  0%       ` Jeff King
@ 2018-11-12 20:13  0%       ` René Scharfe
  1 sibling, 0 replies; 200+ results
From: René Scharfe @ 2018-11-12 20:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Geert Jansen, Junio C Hamano, git@vger.kernel.org,
	Takuto Ikuta

Am 12.11.2018 um 20:32 schrieb Ævar Arnfjörð Bjarmason:
> 
> On Mon, Nov 12 2018, René Scharfe wrote:
>> This removes the only user of OBJECT_INFO_IGNORE_LOOSE.  #leftoverbits
> 
> With this series applied there's still a use of it left in
> oid_object_info_extended()

OK, rephrasing: With that patch, OBJECT_INFO_IGNORE_LOOSE is never set
anymore, and its check in oid_object_info_extended() as well as its
definition can be removed.

René

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 9/9] fetch-pack: drop custom loose object cache
  2018-11-12 19:32  0%     ` Ævar Arnfjörð Bjarmason
@ 2018-11-12 20:07  0%       ` Jeff King
  2018-11-12 20:13  0%       ` René Scharfe
  1 sibling, 0 replies; 200+ results
From: Jeff King @ 2018-11-12 20:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: René Scharfe, Geert Jansen, Junio C Hamano,
	git@vger.kernel.org, Takuto Ikuta

On Mon, Nov 12, 2018 at 08:32:43PM +0100, Ævar Arnfjörð Bjarmason wrote:

> >>  	for (ref = *refs; ref; ref = ref->next) {
> >>  		struct object *o;
> >> -		unsigned int flags = OBJECT_INFO_QUICK;
> >>
> >> -		if (use_oidset &&
> >> -		    !oidset_contains(&loose_oid_set, &ref->old_oid)) {
> >> -			/*
> >> -			 * I know this does not exist in the loose form,
> >> -			 * so check if it exists in a non-loose form.
> >> -			 */
> >> -			flags |= OBJECT_INFO_IGNORE_LOOSE;
> >
> > This removes the only user of OBJECT_INFO_IGNORE_LOOSE.  #leftoverbits
> 
> With this series applied there's still a use of it left in
> oid_object_info_extended()

That's just the code that does something with the flag. No callers pass
it in anymore, so we could drop the flag _and_ that code.

-Peff

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 9/9] fetch-pack: drop custom loose object cache
  2018-11-12 19:25  6%   ` René Scharfe
@ 2018-11-12 19:32  0%     ` Ævar Arnfjörð Bjarmason
  2018-11-12 20:07  0%       ` Jeff King
  2018-11-12 20:13  0%       ` René Scharfe
  0 siblings, 2 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-11-12 19:32 UTC (permalink / raw)
  To: René Scharfe
  Cc: Jeff King, Geert Jansen, Junio C Hamano, git@vger.kernel.org,
	Takuto Ikuta


On Mon, Nov 12 2018, René Scharfe wrote:

> Am 12.11.2018 um 15:55 schrieb Jeff King:
>> Commit 024aa4696c (fetch-pack.c: use oidset to check existence of loose
>> object, 2018-03-14) added a cache to avoid calling stat() for a bunch of
>> loose objects we don't have.
>>
>> Now that OBJECT_INFO_QUICK handles this caching itself, we can drop the
>> custom solution.
>>
>> Note that this might perform slightly differently, as the original code
>> stopped calling readdir() when we saw more loose objects than there were
>> refs. So:
>>
>>   1. The old code might have spent work on readdir() to fill the cache,
>>      but then decided there were too many loose objects, wasting that
>>      effort.
>>
>>   2. The new code might spend a lot of time on readdir() if you have a
>>      lot of loose objects, even though there are very few objects to
>>      ask about.
>
> Plus the old code used an oidset while the new one uses an oid_array.
>
>> In practice it probably won't matter either way; see the previous commit
>> for some discussion of the tradeoff.
>>
>> Signed-off-by: Jeff King <peff@peff.net>
>> ---
>>  fetch-pack.c | 39 ++-------------------------------------
>>  1 file changed, 2 insertions(+), 37 deletions(-)
>>
>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index b3ed7121bc..25a88f4eb2 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -636,23 +636,6 @@ struct loose_object_iter {
>>  	struct ref *refs;
>>  };
>>
>> -/*
>> - *  If the number of refs is not larger than the number of loose objects,
>> - *  this function stops inserting.
>> - */
>> -static int add_loose_objects_to_set(const struct object_id *oid,
>> -				    const char *path,
>> -				    void *data)
>> -{
>> -	struct loose_object_iter *iter = data;
>> -	oidset_insert(iter->loose_object_set, oid);
>> -	if (iter->refs == NULL)
>> -		return 1;
>> -
>> -	iter->refs = iter->refs->next;
>> -	return 0;
>> -}
>> -
>>  /*
>>   * Mark recent commits available locally and reachable from a local ref as
>>   * COMPLETE. If args->no_dependents is false, also mark COMPLETE remote refs as
>> @@ -670,30 +653,14 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>>  	struct ref *ref;
>>  	int old_save_commit_buffer = save_commit_buffer;
>>  	timestamp_t cutoff = 0;
>> -	struct oidset loose_oid_set = OIDSET_INIT;
>> -	int use_oidset = 0;
>> -	struct loose_object_iter iter = {&loose_oid_set, *refs};
>> -
>> -	/* Enumerate all loose objects or know refs are not so many. */
>> -	use_oidset = !for_each_loose_object(add_loose_objects_to_set,
>> -					    &iter, 0);
>>
>>  	save_commit_buffer = 0;
>>
>>  	for (ref = *refs; ref; ref = ref->next) {
>>  		struct object *o;
>> -		unsigned int flags = OBJECT_INFO_QUICK;
>>
>> -		if (use_oidset &&
>> -		    !oidset_contains(&loose_oid_set, &ref->old_oid)) {
>> -			/*
>> -			 * I know this does not exist in the loose form,
>> -			 * so check if it exists in a non-loose form.
>> -			 */
>> -			flags |= OBJECT_INFO_IGNORE_LOOSE;
>
> This removes the only user of OBJECT_INFO_IGNORE_LOOSE.  #leftoverbits

With this series applied there's still a use of it left in
oid_object_info_extended()

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 9/9] fetch-pack: drop custom loose object cache
  @ 2018-11-12 19:25  6%   ` René Scharfe
  2018-11-12 19:32  0%     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 200+ results
From: René Scharfe @ 2018-11-12 19:25 UTC (permalink / raw)
  To: Jeff King, Geert Jansen
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	git@vger.kernel.org, Takuto Ikuta

Am 12.11.2018 um 15:55 schrieb Jeff King:
> Commit 024aa4696c (fetch-pack.c: use oidset to check existence of loose
> object, 2018-03-14) added a cache to avoid calling stat() for a bunch of
> loose objects we don't have.
> 
> Now that OBJECT_INFO_QUICK handles this caching itself, we can drop the
> custom solution.
> 
> Note that this might perform slightly differently, as the original code
> stopped calling readdir() when we saw more loose objects than there were
> refs. So:
> 
>   1. The old code might have spent work on readdir() to fill the cache,
>      but then decided there were too many loose objects, wasting that
>      effort.
> 
>   2. The new code might spend a lot of time on readdir() if you have a
>      lot of loose objects, even though there are very few objects to
>      ask about.

Plus the old code used an oidset while the new one uses an oid_array.

> In practice it probably won't matter either way; see the previous commit
> for some discussion of the tradeoff.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  fetch-pack.c | 39 ++-------------------------------------
>  1 file changed, 2 insertions(+), 37 deletions(-)
> 
> diff --git a/fetch-pack.c b/fetch-pack.c
> index b3ed7121bc..25a88f4eb2 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -636,23 +636,6 @@ struct loose_object_iter {
>  	struct ref *refs;
>  };
>  
> -/*
> - *  If the number of refs is not larger than the number of loose objects,
> - *  this function stops inserting.
> - */
> -static int add_loose_objects_to_set(const struct object_id *oid,
> -				    const char *path,
> -				    void *data)
> -{
> -	struct loose_object_iter *iter = data;
> -	oidset_insert(iter->loose_object_set, oid);
> -	if (iter->refs == NULL)
> -		return 1;
> -
> -	iter->refs = iter->refs->next;
> -	return 0;
> -}
> -
>  /*
>   * Mark recent commits available locally and reachable from a local ref as
>   * COMPLETE. If args->no_dependents is false, also mark COMPLETE remote refs as
> @@ -670,30 +653,14 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>  	struct ref *ref;
>  	int old_save_commit_buffer = save_commit_buffer;
>  	timestamp_t cutoff = 0;
> -	struct oidset loose_oid_set = OIDSET_INIT;
> -	int use_oidset = 0;
> -	struct loose_object_iter iter = {&loose_oid_set, *refs};
> -
> -	/* Enumerate all loose objects or know refs are not so many. */
> -	use_oidset = !for_each_loose_object(add_loose_objects_to_set,
> -					    &iter, 0);
>  
>  	save_commit_buffer = 0;
>  
>  	for (ref = *refs; ref; ref = ref->next) {
>  		struct object *o;
> -		unsigned int flags = OBJECT_INFO_QUICK;
>  
> -		if (use_oidset &&
> -		    !oidset_contains(&loose_oid_set, &ref->old_oid)) {
> -			/*
> -			 * I know this does not exist in the loose form,
> -			 * so check if it exists in a non-loose form.
> -			 */
> -			flags |= OBJECT_INFO_IGNORE_LOOSE;

This removes the only user of OBJECT_INFO_IGNORE_LOOSE.  #leftoverbits

> -		}
> -
> -		if (!has_object_file_with_flags(&ref->old_oid, flags))
> +		if (!has_object_file_with_flags(&ref->old_oid,
> +						OBJECT_INFO_QUICK))
>  			continue;
>  		o = parse_object(the_repository, &ref->old_oid);
>  		if (!o)
> @@ -710,8 +677,6 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>  		}
>  	}
>  
> -	oidset_clear(&loose_oid_set);
> -
>  	if (!args->deepen) {
>  		for_each_ref(mark_complete_oid, NULL);
>  		for_each_cached_alternate(NULL, mark_alternate_complete);
> 

^ permalink raw reply	[relevance 6%]

* Re: Re*: [PATCH v3] fetch: replace string-list used as a look-up table with a hashmap
  @ 2018-10-31 14:50  6%           ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2018-10-31 14:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Stefan Beller, Ramsay Jones

Hi Junio,

On Sat, 27 Oct 2018, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > Just one thing^W^Wa couple of things:
> >
> > It would probably make more sense to `hashmap_get_from_hash()` and
> > `strhash()` here (and `strhash()` should probably be used everywhere
> > instead of `memhash(str, strlen(str))`).
> 
> hashmap_get_from_hash() certainly is much better suited for simpler
> usage pattern like these callsites, and the ones in sequencer.c.  It
> is a shame that a more complex variant takes the shorter-and-sweeter
> name hashmap_get().

I agree, at least in part.

From what I understand, hashmap_get_from_hash() needs a little assistance
from the comparison function with which the hashmap is configured, see
e.g. this function in the sequencer:

	static int labels_cmp(const void *fndata, const struct labels_entry *a,
			      const struct labels_entry *b, const void *key)
	{
		return key ? strcmp(a->label, key) : strcmp(a->label, b->label);
	}

See how that first tests whether `key` is non-`NULL`, and then takes a
shortcut, not even looking at `b`? This is important, because `b` does not
refer to a complete `labels_entry` when we call `hashmap_get_from_hash()`.
It only refers to a `hashmap_entry`. Looking at `b->label` would access
some random memory, and do most certainly the wrong thing.

> I wish we named the latter hashmap_get_fullblown_feature_rich() and
> called the _from_hash() thing a simple hashmap_get() from day one,
> but it is way too late.
> 
> I looked briefly the users of the _get() variant, and some of their
> uses are legitimately not-simple and cannot be reduced to use the
> simpler _get_from_hash variant, it seems.  But others like those in
> builtin/difftool.c should be straight-forward to convert to use the
> simpler get_from_hash variant.  It could be a low-hanging fruit left
> for later clean-up, perhaps.

Right. #leftoverbits

> >> @@ -271,10 +319,10 @@ static void find_non_local_tags(const struct ref *refs,
> >>  			    !has_object_file_with_flags(&ref->old_oid,
> >>  							OBJECT_INFO_QUICK) &&
> >>  			    !will_fetch(head, ref->old_oid.hash) &&
> >> -			    !has_sha1_file_with_flags(item->util,
> >> +			    !has_sha1_file_with_flags(item->oid.hash,
> >
> > I am not sure that we need to test for null OIDs here, given that...
> > ...
> > Of course, `has_sha1_file_with_flags()` is supposed to return `false` for
> > null OIDs, I guess.
> 
> Yup.  An alternative is to make item->oid a pointer to oid, not an
> oid object itself, so that we can express "no OID for this ref" in a
> more explicit way, but is_null_oid() is already used as "no OID" in
> many other codepaths, so...

Right, and it would complicate the code. So I am fine with your version of
it.

> >> +	for_each_string_list_item(remote_ref_item, &remote_refs_list) {
> >> +		const char *refname = remote_ref_item->string;
> >> +		struct hashmap_entry key;
> >> +
> >> +		hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> >> +		item = hashmap_get(&remote_refs, &key, refname);
> >> +		if (!item)
> >> +			continue; /* can this happen??? */
> >
> > This would indicate a BUG, no?
> 
> Possibly.  Alternatively, we can just use item without checking and
> let the runtime segfault.

Hahaha! Yep. We could also cause a crash. I do prefer the BUG() call.

> Here is an incremental on top that can be squashed in to turn v3
> into v4.

Nice.

Thanks!
Dscho

> 
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 0f8e333022..aee1d9bf21 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -259,7 +259,7 @@ static struct refname_hash_entry *refname_hash_add(struct hashmap *map,
>  	size_t len = strlen(refname);
>  
>  	FLEX_ALLOC_MEM(ent, refname, refname, len);
> -	hashmap_entry_init(ent, memhash(refname, len));
> +	hashmap_entry_init(ent, strhash(refname));
>  	oidcpy(&ent->oid, oid);
>  	hashmap_add(map, ent);
>  	return ent;
> @@ -282,11 +282,7 @@ static void refname_hash_init(struct hashmap *map)
>  
>  static int refname_hash_exists(struct hashmap *map, const char *refname)
>  {
> -	struct hashmap_entry key;
> -	size_t len = strlen(refname);
> -	hashmap_entry_init(&key, memhash(refname, len));
> -
> -	return !!hashmap_get(map, &key, refname);
> +	return !!hashmap_get_from_hash(map, strhash(refname), refname);
>  }
>  
>  static void find_non_local_tags(const struct ref *refs,
> @@ -365,12 +361,10 @@ static void find_non_local_tags(const struct ref *refs,
>  	 */
>  	for_each_string_list_item(remote_ref_item, &remote_refs_list) {
>  		const char *refname = remote_ref_item->string;
> -		struct hashmap_entry key;
>  
> -		hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> -		item = hashmap_get(&remote_refs, &key, refname);
> +		item = hashmap_get_from_hash(&remote_refs, strhash(refname), refname);
>  		if (!item)
> -			continue; /* can this happen??? */
> +			BUG("unseen remote ref?");
>  
>  		/* Unless we have already decided to ignore this item... */
>  		if (!is_null_oid(&item->oid)) {
> @@ -497,12 +491,12 @@ static struct ref *get_ref_map(struct remote *remote,
>  
>  	for (rm = ref_map; rm; rm = rm->next) {
>  		if (rm->peer_ref) {
> -			struct hashmap_entry key;
>  			const char *refname = rm->peer_ref->name;
>  			struct refname_hash_entry *peer_item;
>  
> -			hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> -			peer_item = hashmap_get(&existing_refs, &key, refname);
> +			peer_item = hashmap_get_from_hash(&existing_refs,
> +							  strhash(refname),
> +							  refname);
>  			if (peer_item) {
>  				struct object_id *old_oid = &peer_item->oid;
>  				oidcpy(&rm->peer_ref->old_oid, old_oid);
> 

^ permalink raw reply	[relevance 6%]

* [PATCH v2 2/3] pack-objects tests: don't leave test .git corrupt at end
  @ 2018-10-30 18:43  5% ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-10-30 18:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Geert Jansen, Christian Couder,
	Nicolas Pitre, Linus Torvalds, Petr Baudis,
	Ævar Arnfjörð Bjarmason

Change the pack-objects tests to not leave their .git directory
corrupt and the end.

In 2fca19fbb5 ("fix multiple issues with t5300", 2010-02-03) a comment
was added warning against adding any subsequent tests, but since
4614043c8f ("index-pack: use streaming interface for collision test on
large blobs", 2012-05-24) the comment has drifted away from the code,
mentioning two test, when we actually have three.

Instead of having this warning let's just create a new .git directory
specifically for these tests.

As an aside, it would be interesting to instrument the test suite to
run a "git fsck" at the very end (in "test_done"). That would have
errored before this change, and may find other issues #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5300-pack-object.sh | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index a0309e4bab..410a09b0dd 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -468,29 +468,32 @@ test_expect_success 'pack-objects in too-many-packs mode' '
 	git fsck
 '
 
-#
-# WARNING!
-#
-# The following test is destructive.  Please keep the next
-# two tests at the end of this file.
-#
-
-test_expect_success 'fake a SHA1 hash collision' '
-	long_a=$(git hash-object a | sed -e "s!^..!&/!") &&
-	long_b=$(git hash-object b | sed -e "s!^..!&/!") &&
-	test -f	.git/objects/$long_b &&
-	cp -f	.git/objects/$long_a \
-		.git/objects/$long_b
+test_expect_success 'setup: fake a SHA1 hash collision' '
+	git init corrupt &&
+	(
+		cd corrupt &&
+		long_a=$(git hash-object -w ../a | sed -e "s!^..!&/!") &&
+		long_b=$(git hash-object -w ../b | sed -e "s!^..!&/!") &&
+		test -f	.git/objects/$long_b &&
+		cp -f	.git/objects/$long_a \
+			.git/objects/$long_b
+	)
 '
 
 test_expect_success 'make sure index-pack detects the SHA1 collision' '
-	test_must_fail git index-pack -o bad.idx test-3.pack 2>msg &&
-	test_i18ngrep "SHA1 COLLISION FOUND" msg
+	(
+		cd corrupt &&
+		test_must_fail git index-pack -o ../bad.idx ../test-3.pack 2>msg &&
+		test_i18ngrep "SHA1 COLLISION FOUND" msg
+	)
 '
 
 test_expect_success 'make sure index-pack detects the SHA1 collision (large blobs)' '
-	test_must_fail git -c core.bigfilethreshold=1 index-pack -o bad.idx test-3.pack 2>msg &&
-	test_i18ngrep "SHA1 COLLISION FOUND" msg
+	(
+		cd corrupt &&
+		test_must_fail git -c core.bigfilethreshold=1 index-pack -o ../bad.idx ../test-3.pack 2>msg &&
+		test_i18ngrep "SHA1 COLLISION FOUND" msg
+	)
 '
 
 test_done
-- 
2.19.1.899.g0250525e69


^ permalink raw reply related	[relevance 5%]

* Re: [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:<dst>"
  @ 2018-10-29  8:05  5%     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-10-29  8:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Stefan Beller


On Mon, Oct 29 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> This is the first use of the %N$<fmt> style of printf format in
>> the *.[ch] files in our codebase. It's supported by POSIX[2] and
>> there's existing uses for it in po/*.po files,...
>
> For now, I'll eject this from 'pu', as I had spent way too much time
> trying to make it and other topics work there.

I was compiling with DEVELOPER=1 but as it turns out:

    CFLAGS="-O0" DEVELOPER=1

Wasn't doing what I thought, i.e. we just take 'CFLAGS' from the
command-line and don't add any of the DEVELOPER #leftoverbits to
it. Will fix this and other issues raised.

>     CC remote.o
> remote.c: In function 'show_push_unqualified_ref_name_error':
> remote.c:1035:2: error: $ operand number used after format without operand number [-Werror=format=]
>   error(_("The destination you provided is not a full refname (i.e.,\n"
>   ^~~~~
> cc1: all warnings being treated as errors
> Makefile:2323: recipe for target 'remote.o' failed
> make: *** [remote.o] Error 1

Will fix this and other issues raised. FWIW clang gives a much better
error about the actual issue:

    remote.c:1042:46: error: cannot mix positional and non-positional arguments in format string [-Werror,-Wformat]
                    "- Checking if the <src> being pushed ('%2$s')\n"

I.e. this on top fixes it:

    -               "- Looking for a ref that matches '%s' on the remote side.\n"
    -               "- Checking if the <src> being pushed ('%s')\n"
    +               "- Looking for a ref that matches '%1$s' on the remote side.\n"
    +               "- Checking if the <src> being pushed ('%2$s')\n"

Maybe  this whole thing isn't worth it and I should just do:

    @@ -1042 +1042 @@ static void show_push_unqualified_ref_name_error(const char *dst_value,
    -               "- Checking if the <src> being pushed ('%2$s')\n"
    +               "- Checking if the <src> being pushed ('%s')\n"
    @@ -1047 +1047 @@ static void show_push_unqualified_ref_name_error(const char *dst_value,
    -             dst_value, matched_src_name);
    +             dst_value, matched_src_name, matched_src_name);

But I'm leaning on the side of keeping it for the self-documentation
aspect of "this is a repeated parameter". Your objections to this whole
thing being a stupid idea non-withstanding.

^ permalink raw reply	[relevance 5%]

* [PATCH 2/4] pack-objects tests: don't leave test .git corrupt at end
  @ 2018-10-28 22:50  5% ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-10-28 22:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Geert Jansen, Christian Couder,
	Nicolas Pitre, Linus Torvalds, Petr Baudis,
	Ævar Arnfjörð Bjarmason

Change the pack-objects tests to not leave their .git directory
corrupt and the end.

In 2fca19fbb5 ("fix multiple issues with t5300", 2010-02-03) a comment
was added warning against adding any subsequent tests, but since
4614043c8f ("index-pack: use streaming interface for collision test on
large blobs", 2012-05-24) the comment has drifted away from the code,
mentioning two test, when we actually have three.

Instead of having this warning let's just create a new .git directory
specifically for these tests.

As an aside, it would be interesting to instrument the test suite to
run a "git fsck" at the very end (in "test_done"). That would have
errored before this change, and may find other issues #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5300-pack-object.sh | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index a0309e4bab..410a09b0dd 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -468,29 +468,32 @@ test_expect_success 'pack-objects in too-many-packs mode' '
 	git fsck
 '
 
-#
-# WARNING!
-#
-# The following test is destructive.  Please keep the next
-# two tests at the end of this file.
-#
-
-test_expect_success 'fake a SHA1 hash collision' '
-	long_a=$(git hash-object a | sed -e "s!^..!&/!") &&
-	long_b=$(git hash-object b | sed -e "s!^..!&/!") &&
-	test -f	.git/objects/$long_b &&
-	cp -f	.git/objects/$long_a \
-		.git/objects/$long_b
+test_expect_success 'setup: fake a SHA1 hash collision' '
+	git init corrupt &&
+	(
+		cd corrupt &&
+		long_a=$(git hash-object -w ../a | sed -e "s!^..!&/!") &&
+		long_b=$(git hash-object -w ../b | sed -e "s!^..!&/!") &&
+		test -f	.git/objects/$long_b &&
+		cp -f	.git/objects/$long_a \
+			.git/objects/$long_b
+	)
 '
 
 test_expect_success 'make sure index-pack detects the SHA1 collision' '
-	test_must_fail git index-pack -o bad.idx test-3.pack 2>msg &&
-	test_i18ngrep "SHA1 COLLISION FOUND" msg
+	(
+		cd corrupt &&
+		test_must_fail git index-pack -o ../bad.idx ../test-3.pack 2>msg &&
+		test_i18ngrep "SHA1 COLLISION FOUND" msg
+	)
 '
 
 test_expect_success 'make sure index-pack detects the SHA1 collision (large blobs)' '
-	test_must_fail git -c core.bigfilethreshold=1 index-pack -o bad.idx test-3.pack 2>msg &&
-	test_i18ngrep "SHA1 COLLISION FOUND" msg
+	(
+		cd corrupt &&
+		test_must_fail git -c core.bigfilethreshold=1 index-pack -o ../bad.idx ../test-3.pack 2>msg &&
+		test_i18ngrep "SHA1 COLLISION FOUND" msg
+	)
 '
 
 test_done
-- 
2.19.1.759.g500967bb5e


^ permalink raw reply related	[relevance 5%]

* Re: [PATCH v2 00/18] builtin rebase options
  @ 2018-10-12 12:01  4%     ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2018-10-12 12:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin via GitGitGadget, git

Hi Junio,

On Thu, 6 Sep 2018, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
> > This patch series completes the support for all rebase options in the
> > builtin rebase, e.g. --signoff, rerere-autoupdate, etc.
> >
> > It is based on pk/rebase -in-c-3-acts.
> 
> ... which in turn was based on pk/rebase-in-c-2-basic that just got
> rerolled, so I would assume that you want pk/rebase-in-c-3-acts I
> have rebased on top of the result of applying the updated 2-basic
> series.
> 
> I've rebuilt the collection of topics up to pk/rebase-in-c-6-final
> with these two updated series twice, once doing it manually, like I
> did the last time, and another using "rebase -i -r" on top of the
> updated pk/rebase-in-c-4-opts.  The resulting trees match, of
> course.
> 
> I did it twice to try out how it feels to use "rebase -i -r" because
> I wanted to make sure what we are shipping in 'master' behaves
> sensibly ;-)
> 
> Two things I noticed about the recreation of the merge ...
> 
> 	Reminder to bystanders.  We need to merge ag/rebase-i-in-c
> 	topic on top of pk/reabse-in-c-5-test topic before applying
> 	a patch to adjust rebase to call rebase-i using the latter's
> 	new calling convention.  The topics look like
> 
> 	- pk/rebase-in-c has three patches on master
> 	- pk/rebase-in-c-2-basic builds on it, and being replaced
> 	- pk/rebase-in-c-3-acts builds on 2-basic (no update this time)
> 	- pk/rebase-in-c-4-opts builds on 3-acts, and being replaced
> 	- pk/rebase-in-c-5-test builds on 4-opts (no update this time)
> 	- js/rebase-in-c-5.5 builds on 5-test and merges ag/rebase-in-c
> 	  topic before applying one patch on it (no update this time)
> 	- pk/rebase-in-c-6-final builds on 5.5 (no update this time)
> 
> 	and we are replacing 2-basic with 11 patches and 4-opts with
> 	18 patches.
> 
> ... using "rebase -i -r" are that 
> 
>  (1) it rebuilt, or at least offered to rebuild, the entire side
>      branch, even though there is absolutely no need to.  Leaving
>      "pick"s untouched, based on the correct fork point, resulted in
>      all picks fast forwarded, but it was somewhat alarming.

Right. But this is a legacy of our paradigm to script things in Unix shell
script. It not only is slow, error-prone and hard to keep portable, it
also encourages poor design, as you do not have the same expressive power
as C has.

In this case, it harmed us by making it impossible to essentially play out
the rebase in memory and only fall back to writing things into the
worktree upon failure.

However, this is where we want to go. It is still a long way to go,
though, as many code parts are safely in the "we use the worktree to play
out the rebase in its entirety" place.

The "skip_unnecessary_picks" trick is the best we could do so far.

>  (2) "merge -C <original merge commit> ag/rebase-i-in-c" appeared as
>      the insn to merge in the (possibly rebuilt) side branch.  And
>      just like "commit -C", it took the merge message from the
>      original merge commit, which means that the summary of the
>      merged side branch is kept stale.  In this particular case, I
>      did not even want to see ag/rebase-i-in-c topic touched, so I
>      knew I want to keep the original merge summary, but if the user
>      took the offer to rewrite the side branch (e.g. with a "reword"
>      to retitle), using the original merge message would probably
>      disappoint the user.

Right. But the user would then also freely admit that they asked for the
merge commit to be rebased, which is what `--rebase-merges` says.

> I think (1) actually is a feature.  Not everybody is an integrator
> who does not want to touch any commit on the topic branch(es) while
> rebuilding a single-strand-of-pearls that has many commits and an
> occasional merge of the tip of another topic branch.  It's just that
> the feature does not suit the workflow I use when I am playing the
> top-level integrator role.

As I said. The ideal thing would be to invest quite a bit in refactoring
especially the do_pick_commit() function, and then play out the rebase in
memory, where one state variable knows what the "HEAD" is (but the
worktree is left untouched, up until the point when an error occurs, in
which case we want to write out the files). This would also need a major
refactoring of the recursive merge, of course, which conflates the merge
part with the writing of the merge conflicts to disk part.

While I would love to see this happening, I don't think that I can spare
enough time to drive this, at least for a couple of years.

> I am not sure what should be the ideal behaviour for (2).  I would
> imagine that
> 
>  - I do want to keep the original title the merge (e.g. "into
>    <target branch>", if left to "git merge" to come up with the
>    title during "rebase -i" session, would be lost and become "into
>    HEAD", which is not what we want);
> 
>  - I do want to keep the original commentary in the merge (e.g. what
>    you would see in "git log --first-parent master..next" that gives
>    summary of each topic getting merged) so that I can update it as
>    needed; but 
> 
>  - I do want the topic summary fmt-merge-msg produces to be based on
>    the updated side branch.
> 
> I am not sure if the last item can reliably be filtered out of the
> original and replaced with newly generated summary.  If we can do
> so, that would be ideal, I guess.

I think what you want is not the `merge` command, but a custom script that
you can then `exec`.

This could even be automated to some extent, by introducing an option to
`git rebase -i` that lets a script post-process the generated todo list,
something I wanted for a long time.

> Another observation was that after rebuiding pk/rebase-in-c-6^0 on
> top of the updated pk'/rebase-in-c-4 using "rebase -i -r", I of
> course still needed to "branch -f" to update pk/rebase-in-c-5,
> js/reabse-in-c-5.5, and pk/rebase-in-c-6 branches to point at
> appropriate commits.  I do not think it is a good idea to let
> "rebase -i" munge these dependent branches by default, but it might
> be worth considering it as an option.

Yes! Already years ago, I wanted to teach the shears to figure out that a
branch (i.e. a second parent of a merge commit that mentions the branch
name in its oneline) was updated by the rebase, and if the pre-rebase
commit agrees with a local ref of that name, update said ref after the
rebase finished successfully.

There is one big caveat, though: what if one of those branches is checked
out in a worktree?

I think it can be done, and it should be hidden behind an opt-in config
setting.

#leftoverbits?

> Since I want to be more in control of what happens to the tips of topic
> branches, I did not mind at all having to run "branch -f" and having the
> chance to run "diff" before doing so, but at the same time, that means
> doing these manually in steps building 5 on 4, 5.5 on 5 and then 6 on
> 5.5, instead of building 6 on top of 4 using "rebase -i" and then
> tagging the intermediate states, gives me more control without forcing
> me more work.

Sure. This definitely gives you more control.

I am not sure whether you want that control, or whether you *actually*
want more safety guards. If it was me, I would prefer something that can
stop/pause the process when something is obviously going wrong (it could
be a script verifying that, e.g. looking at the length/contents of the
range-diff and ringing an alarm when a commit other than a fixup! was
dropped).

Ciao,
Dscho

^ permalink raw reply	[relevance 4%]

* Re: [PATCH][Outreachy] remove all the inclusions of git-compat-util.h in header files
  @ 2018-10-10  8:06  4%     ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2018-10-10  8:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Ananya Krishna Maram, christian.couder, git

Hi Junio & Ananya,

Ananya, I think you did a really good job at contributing your first
patch, demonstrated by the useful comments you already received.

On Tue, 9 Oct 2018, Junio C Hamano wrote:

> Derrick Stolee <stolee@gmail.com> writes:
> 
> > On 10/8/2018 1:05 PM, Ananya Krishna Maram wrote:
> >> Hi All,
> > Hello, Ananya! Welcome.
> >
> >> I was searching through #leftovers and found this.
> >> https://public-inbox.org/git/CABPp-BGVVXcbZX44er6TO-PUsfEN_6GNYJ1U5cuoN9deaA48OQ@mail.gmail.com/
> >>
> >> This patch address the task discussed in the above link.
> > The discussion above seems to not be intended for your commit message,
> > but it does show up when I run `git am` and provide your email as
> > input. The typical way to avoid this is to place all commentary below
> > the "---" 
> > that signifies the commit message is over.
> 
> >> From: Ananya Krishan Maram <ananyakittu1997@gmail.com>
> >>
> >> skip the #include of git-compat-util.h since all .c files include it.
> >>
> >> Signed-off-by: Ananya Krishna Maram <ananyakittu1997@gmail.com>
> >> ---
> >>   advice.h             | 1 -
> >>   commit-graph.h       | 1 -
> >>   hash.h               | 1 -
> >>   pkt-line.h           | 1 -
> >>   t/helper/test-tool.h | 1 -
> >>   5 files changed, 5 deletions(-)
> >>
> >> diff --git a/advice.h b/advice.h
> >> index ab24df0fd..09148baa6 100644
> >> --- a/advice.h
> >> +++ b/advice.h
> >> @@ -1,7 +1,6 @@
> >>   #ifndef ADVICE_H
> >>   #define ADVICE_H
> >>   -#include "git-compat-util.h"
> >>     extern int advice_push_update_rejected;
> >>   extern int advice_push_non_ff_current;
> 
> The way I read the original discussion is "C source that includes
> compat-util.h shouldn't if it already includes cache.h"; advice.h is
> not C and does not (should not) include cache.h.
> 
> The "left over bits" should not be blindly trusted, and besides,
> Elijah punted to examine and think about each case and left it to
> others, so whoever is picking it up should do the thinking, not a
> blind conversion.  I am not getting a feeling that this patch was
> done with careful thinking after checking only this one.

The mistake -- if any! -- is mine: I suggested to Outreachy students to
look for the #leftoverbits needle in our mail archive haystack, and to
pick something to get a feel for contributing to Git.

Personally, I find the "whoever is picking it up should do the thinking"
much too harsh for a first-time contributor who specifically came through
the Outreachy program, i.e. expected to have a gentle introduction into
the project, and into the ways we work.

Granted, that introduction should have been performed by the potential
mentors (i.e. Chris & I, but I was out sick), but let's face it: we are an
open source project, so every single one of us should feel the call to be
a mentor, and we should certainly try to make every new contributor as
welcome as we would like to be invited into a new project.

In this context, I would think that the "do the thinking" part is
particularly hard because our rules are implicit, and inconsistent: when
do we include header files, when do we skip the include?

If in doubt, follow the age-old wisdom "when in Rome, do as the Romans
do", i.e. ignore the explicitly written-down rules, and instead imitate
what active contributors are doing.

Unfortunately, I have no easy way to suggest for mining the mailing list
for sentiments about including header files. And in any case, it would
probably boil down on personal taste, which -- let's face it -- is rather
diverse in our community... :-)

So in this case, what I would suggest is to look instead for the commit
history, where header files were added or modified. The Git command for
that is:

	git log --no-merges -p \*.h

Apart from the rather wonderful examples you see there for commit messages
(I am a big fan of commit messages that are clear and descriptive, i.e.
start by detailing the why rather than the how, with notes thrown in about
design decisions that are not obvious from the patch), this command will
lead us pretty soon to this commit, especially when looking for the search
term #include:

	https://github.com/git/git/commit/69d846f05381

In other words, we explicitly introduced an `#include "git-compat-util.h"`
in a header there.

The commit message also offers a pretty compelling rationale: it was the
most efficient way to have that header included *first thing* in all test
helpers.

Following that rationale, let's have a look at the patch we are improving
here (because that's what code review really should all be about:
improving the code, putting together all of our expertise to get the best
patch we can in a reasonable amount of time):

The first thing we can already say is that the change to
t/helper/test-tool.h would revert the commit referenced above, so I think
we should drop that change.

Next, I want to have a look at advice.h:

	git grep -O 'advice\.h'

(the backslash is necessary because this is a regular expression, and the
period character has the special meaning "any character" there, unless
escaped by a backslash)

What we can see is that indeed, every file that includes this header
already includes cache.h first. We can even see that cache.h *itself*
includes advice.h, meaning that we could add another patch that drops the
advice.h include from, say, commit.c.

At this point, this seems to become a rabbit hole: which header files are
already included in cache.h that are *also* included (unnecessarily) in .c
files that *already* included cache.h?

Ananya, it is now up to you how far you want to go down that rabbit hole
;-)

If I had the time, *I* would now be tempted to try my hand at writing a
script that analyzes our source code to ensure that:

- "cache.h" or "git-compat-util.h" is the first header included (as per
  that commit message of the commit mentioned above)
- every header that is already included in cache.h is not included by a .c
  file that already included cache.h

It is the kind of side-track I could lose days over, but I have to admit
that the benefit would probably not merit the effort ;-)

In any case, I am delighted by your first patch: you took the first hurdle
:-)

Ciao,
Johannes

P.S.: Please record your contribution on the Outreachy site, unless you
already did...

^ permalink raw reply	[relevance 4%]

* Re: [PATCH] [Outreachy] git/userdiff.c fix regex pattern error
  @ 2018-10-04 19:42  6%         ` Johannes Schindelin
  0 siblings, 0 replies; 200+ results
From: Johannes Schindelin @ 2018-10-04 19:42 UTC (permalink / raw)
  To: Ananya Krishna Maram; +Cc: Christian Couder, git

Hi Ananya,

On Thu, 4 Oct 2018, Ananya Krishna Maram wrote:

> On Thu, 4 Oct 2018 at 20:56, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > [... talking about the reason why a slash does not need to be escaped
> > in a C string specifying a regular expression...]
> >
> > But it does not need to be escaped, when you specify the regular
> > expression the way we do. And the way we specified it is really the
> > standard when specifying regular expressions in C code, i.e. *without* the
> > suggested backslash.
> 
> Aha!. this makes total sense. I was thinking from a general regular expression
> point of view. But I should be thinking from C point of view and how C
> might interpret this newly submitted string.
> This explanation is very clear. Thanks for taking time to reply to my
> patch. From next time on, I will try to think from
> git project's point of view.

Of course! Thank you for taking the time to contribute this patch.

Maybe you have another idea for a micro-project? Maybe there is something
in Git that you wish was more convenient? Or maybe
https://public-inbox.org/git/?q=leftoverbits has something that you would
like to implement?

Ciao,
Johannes

> 
> Thanks,
> Ananya.
> 
> > Ciao,
> > Johannes
> >
> > >
> > > Thanks,
> > > Ananya.
> > >
> > > > Thanks,
> > > > Johannes
> > > >
> > > > > ---
> > > > >  userdiff.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/userdiff.c b/userdiff.c
> > > > > index f565f6731..f4ff9b9e5 100644
> > > > > --- a/userdiff.c
> > > > > +++ b/userdiff.c
> > > > > @@ -123,7 +123,7 @@ PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
> > > > >        /* -- */
> > > > >        "[a-zA-Z_][a-zA-Z0-9_]*"
> > > > >        "|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
> > > > > -      "|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"),
> > > > > +      "|[-+*\/<>%&^|=!]=|\/\/=?|<<=?|>>=?|\\*\\*=?"),
> > > > >        /* -- */
> > > > >  PATTERNS("ruby", "^[ \t]*((class|module|def)[ \t].*)$",
> > > > >        /* -- */
> > > > > --
> > > > > 2.17.1
> > > > >
> > > > >
> > >
> 

^ permalink raw reply	[relevance 6%]

* Re: We should add a "git gc --auto" after "git clone" due to commit graph
  2018-10-03 14:01  0%   ` Ævar Arnfjörð Bjarmason
@ 2018-10-03 14:17  0%     ` SZEDER Gábor
  0 siblings, 0 replies; 200+ results
From: SZEDER Gábor @ 2018-10-03 14:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, Git List, Nguyễn Thái Ngọc Duy

On Wed, Oct 03, 2018 at 04:01:40PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Oct 03 2018, SZEDER Gábor wrote:
> 
> > On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >> Don't have time to patch this now, but thought I'd send a note / RFC
> >> about this.
> >>
> >> Now that we have the commit graph it's nice to be able to set
> >> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
> >> /etc/gitconfig to apply them to all repos.
> >>
> >> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
> >> until whenever my first "gc" kicks in, which may be quite some time if
> >> I'm just using it passively.
> >>
> >> So we should make "git gc --auto" be run on clone,
> >
> > There is no garbage after 'git clone'...
> 
> "git gc" is really "git gc-or-create-indexes" these days.

Because it happens to be convenient to create those indexes at
gc-time.  But that should not be an excuse to run gc when by
definition no gc is needed.

> >> and change the
> >> need_to_gc() / cmd_gc() behavior so that we detect that the
> >> gc.writeCommitGraph=true setting is on, but we have no commit graph, and
> >> then just generate that without doing a full repack.
> >
> > Or just teach 'git clone' to run 'git commit-graph write ...'
> 
> Then when adding something like the commit graph we'd need to patch both
> git-clone and git-gc, it's much more straightforward to make
> need_to_gc() more granular.
> 
> >> As an aside such more granular "gc" would be nice for e.g. pack-refs
> >> too. It's possible for us to just have one pack, but to have 100k loose
> >> refs.
> >>
> >> It might also be good to have some gc.autoDetachOnClone option and have
> >> it false by default, so we don't have a race condition where "clone
> >> linux && git -C linux tag --contains" is slow because the graph hasn't
> >> been generated yet, and generating the graph initially doesn't take that
> >> long compared to the time to clone a large repo (and on a small one it
> >> won't matter either way).
> >>
> >> I was going to say "also for midx", but of course after clone we have
> >> just one pack, so I can't imagine us needing this. But I can see us
> >> having other such optional side-indexes in the future generated by gc,
> >> and they'd also benefit from this.
> >>
> >> #leftoverbits

^ permalink raw reply	[relevance 0%]

* Re: We should add a "git gc --auto" after "git clone" due to commit graph
  2018-10-03 13:36  0% ` SZEDER Gábor
@ 2018-10-03 14:01  0%   ` Ævar Arnfjörð Bjarmason
  2018-10-03 14:17  0%     ` SZEDER Gábor
  0 siblings, 1 reply; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-10-03 14:01 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Derrick Stolee, Git List, Nguyễn Thái Ngọc Duy


On Wed, Oct 03 2018, SZEDER Gábor wrote:

> On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Don't have time to patch this now, but thought I'd send a note / RFC
>> about this.
>>
>> Now that we have the commit graph it's nice to be able to set
>> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
>> /etc/gitconfig to apply them to all repos.
>>
>> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
>> until whenever my first "gc" kicks in, which may be quite some time if
>> I'm just using it passively.
>>
>> So we should make "git gc --auto" be run on clone,
>
> There is no garbage after 'git clone'...

"git gc" is really "git gc-or-create-indexes" these days.

>> and change the
>> need_to_gc() / cmd_gc() behavior so that we detect that the
>> gc.writeCommitGraph=true setting is on, but we have no commit graph, and
>> then just generate that without doing a full repack.
>
> Or just teach 'git clone' to run 'git commit-graph write ...'

Then when adding something like the commit graph we'd need to patch both
git-clone and git-gc, it's much more straightforward to make
need_to_gc() more granular.

>> As an aside such more granular "gc" would be nice for e.g. pack-refs
>> too. It's possible for us to just have one pack, but to have 100k loose
>> refs.
>>
>> It might also be good to have some gc.autoDetachOnClone option and have
>> it false by default, so we don't have a race condition where "clone
>> linux && git -C linux tag --contains" is slow because the graph hasn't
>> been generated yet, and generating the graph initially doesn't take that
>> long compared to the time to clone a large repo (and on a small one it
>> won't matter either way).
>>
>> I was going to say "also for midx", but of course after clone we have
>> just one pack, so I can't imagine us needing this. But I can see us
>> having other such optional side-indexes in the future generated by gc,
>> and they'd also benefit from this.
>>
>> #leftoverbits

^ permalink raw reply	[relevance 0%]

* Re: We should add a "git gc --auto" after "git clone" due to commit graph
  2018-10-03 13:23  5% We should add a "git gc --auto" after "git clone" due to commit graph Ævar Arnfjörð Bjarmason
@ 2018-10-03 13:36  0% ` SZEDER Gábor
  2018-10-03 14:01  0%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 200+ results
From: SZEDER Gábor @ 2018-10-03 13:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, Git List, Nguyễn Thái Ngọc Duy

On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Don't have time to patch this now, but thought I'd send a note / RFC
> about this.
> 
> Now that we have the commit graph it's nice to be able to set
> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
> /etc/gitconfig to apply them to all repos.
> 
> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
> until whenever my first "gc" kicks in, which may be quite some time if
> I'm just using it passively.
> 
> So we should make "git gc --auto" be run on clone,

There is no garbage after 'git clone'...

> and change the
> need_to_gc() / cmd_gc() behavior so that we detect that the
> gc.writeCommitGraph=true setting is on, but we have no commit graph, and
> then just generate that without doing a full repack.

Or just teach 'git clone' to run 'git commit-graph write ...'

> As an aside such more granular "gc" would be nice for e.g. pack-refs
> too. It's possible for us to just have one pack, but to have 100k loose
> refs.
> 
> It might also be good to have some gc.autoDetachOnClone option and have
> it false by default, so we don't have a race condition where "clone
> linux && git -C linux tag --contains" is slow because the graph hasn't
> been generated yet, and generating the graph initially doesn't take that
> long compared to the time to clone a large repo (and on a small one it
> won't matter either way).
> 
> I was going to say "also for midx", but of course after clone we have
> just one pack, so I can't imagine us needing this. But I can see us
> having other such optional side-indexes in the future generated by gc,
> and they'd also benefit from this.
> 
> #leftoverbits

^ permalink raw reply	[relevance 0%]

* We should add a "git gc --auto" after "git clone" due to commit graph
@ 2018-10-03 13:23  5% Ævar Arnfjörð Bjarmason
  2018-10-03 13:36  0% ` SZEDER Gábor
  0 siblings, 1 reply; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-10-03 13:23 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git List, Nguyễn Thái Ngọc Duy

Don't have time to patch this now, but thought I'd send a note / RFC
about this.

Now that we have the commit graph it's nice to be able to set
e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
/etc/gitconfig to apply them to all repos.

But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
until whenever my first "gc" kicks in, which may be quite some time if
I'm just using it passively.

So we should make "git gc --auto" be run on clone, and change the
need_to_gc() / cmd_gc() behavior so that we detect that the
gc.writeCommitGraph=true setting is on, but we have no commit graph, and
then just generate that without doing a full repack.

As an aside such more granular "gc" would be nice for e.g. pack-refs
too. It's possible for us to just have one pack, but to have 100k loose
refs.

It might also be good to have some gc.autoDetachOnClone option and have
it false by default, so we don't have a race condition where "clone
linux && git -C linux tag --contains" is slow because the graph hasn't
been generated yet, and generating the graph initially doesn't take that
long compared to the time to clone a large repo (and on a small one it
won't matter either way).

I was going to say "also for midx", but of course after clone we have
just one pack, so I can't imagine us needing this. But I can see us
having other such optional side-indexes in the future generated by gc,
and they'd also benefit from this.

#leftoverbits

^ permalink raw reply	[relevance 5%]

* Re: [Outreachy] Introducing
       [not found]     <CAEdN-uV6giCp6FbC=J3B--6_kEzFkG8Yq4VXgGnhoNjERsXSQw@mail.gmail.com>
@ 2018-10-01 15:56  5% ` Christian Couder
  0 siblings, 0 replies; 200+ results
From: Christian Couder @ 2018-10-01 15:56 UTC (permalink / raw)
  To: giovana.vmorais; +Cc: git, Johannes Schindelin

Hi Giovana,

On Mon, Oct 1, 2018 at 4:04 PM Giovana Morais <giovana.vmorais@gmail.com> wrote:
>
> Hey there, Christian and Git!
>
> My initial Outreachy application got accepted and, looking through available projects, I was really interested in `git bisect` one, since I want to take my C skills to a next level and, of course, have a deeper understanding of git itself. I think it will be a hard, but awesome challenge. (:

Great!

About possible projects there is also
https://git.github.io/Outreachy-17/ but only the `git bisect` has been
officially proposed as an Outreachy project. I hope Dscho (Johannes
Schindelin) will be ok to submit one of the 2 others soon and to
register himself as a mentor or co-mentor on some of the projects.
Please add him in CC, like I did in this email.

> I took a look at the patches already sent but I'm still lost of where I can start contributing. Can you guys give me some light?

We usually ask Outreachy and Google Summer of Code applicants to first
work on a micro-project before starting to work on a bigger project.
There is https://git.github.io/SoC-2018-Microprojects/ for that though
it is not up-to-date. Some micro-projects we propose might have
already been taken by GSoC students last winter/spring. Sorry we
didn't update the page or create another one. Anyway there are some
micro-projects there like "Add more builtin patterns for userdiff"
that are still valid and still good small tasks to get started working
on Git. And there are explanations about how you can search for
micro-projects (especially how to search for #leftoverbits on the
mailing list archive).

Thank you for your interest in contributing to Git,
Christian.

> Thanks a lot!
> --
> Giovana Vieira de Morais

^ permalink raw reply	[relevance 5%]

* Re: [Outreachy] Introduce myself
  @ 2018-09-30 16:57  6% ` Christian Couder
  0 siblings, 0 replies; 200+ results
From: Christian Couder @ 2018-09-30 16:57 UTC (permalink / raw)
  To: ananyakittu1997; +Cc: git, Johannes Schindelin

Hi Ananya,

On Sun, Sep 30, 2018 at 5:53 PM Ananya Krishna Maram
<ananyakittu1997@gmail.com> wrote:
>
> Hi Git Community, Christian and Johannes,
>
> My initial Outreachy got accepted.

Great! Welcome to the Git community!

[...]

> Having done a lot of assignment in C and
> bash scripting and keen interest to learn about working of git
> internals, I choose to contribute to this project. So I started
> observing the patches sent to git mailing list.

About possible projects I updated https://git.github.io/Outreachy-17/
but only the `git bisect` has been officially proposed as an Outreachy
project. I hope Dscho (Johannes) will be ok to submit one of the 2
others soon and to register himself as a mentor or co-mentor on some
of the projects.

> I am currently looking for first patch opportunities to git. I came
> across[1] and I will try to put maximum effort towards my goal and if
> I need some clarification of the problem statement I guess you guys or
> Outreachy mentors will be here to help me.

The micro-project page you found is not up-to-date, so some
micro-projects we propose might have already been taken by GSoC
students last winter/spring. Sorry we didn't update the page or create
another one. Anyway there are some micro-projects there like "Add more
builtin patterns for userdiff" that are still valid and still good
small tasks to get started working on Git. And there are explanations
about how you can search for micro-projects (especially how to search
for #leftoverbits on the mailing list archive).

Thank you for your interest in contributing to Git,
Christian.

^ permalink raw reply	[relevance 6%]

* Re: Git in Outreachy Dec-Mar?
  2018-09-08  8:57  0%             ` Christian Couder
@ 2018-09-08 15:40  0%               ` Jeff King
  0 siblings, 0 replies; 200+ results
From: Jeff King @ 2018-09-08 15:40 UTC (permalink / raw)
  To: Christian Couder; +Cc: Johannes Schindelin, git

On Sat, Sep 08, 2018 at 10:57:46AM +0200, Christian Couder wrote:

> On Thu, Sep 6, 2018 at 9:31 PM, Jeff King <peff@peff.net> wrote:
> > On Thu, Sep 06, 2018 at 11:51:49AM +0200, Christian Couder wrote:
> >
> >> Yeah, I think the https://git.github.io/Outreachy-17/ is not actually necessary.
> >
> > I think it still may be helpful for explaining in further detail things
> > like #leftoverbits (though I see you put some of that in your project
> > description).
> 
> You mean in https://git.github.io/Outreachy-17/ or somewhere else?
> 
> It is already described in https://git.github.io/SoC-2018-Microprojects/.

Yeah, I meant it may still be useful to have an Outreachy page for our
community explaining community-specific procedures. I agree it's mostly
redundant with what's on the GSoC page, but it might be easier on
applicants to have a page tailored directly towards Outreachy. But I
haven't gone over the material as recently as you, so I'd leave that
decision to you.

-Peff

^ permalink raw reply	[relevance 0%]

* Re: Git in Outreachy Dec-Mar?
  2018-09-06 19:31  6%           ` Jeff King
@ 2018-09-08  8:57  0%             ` Christian Couder
  2018-09-08 15:40  0%               ` Jeff King
  0 siblings, 1 reply; 200+ results
From: Christian Couder @ 2018-09-08  8:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git

On Thu, Sep 6, 2018 at 9:31 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Sep 06, 2018 at 11:51:49AM +0200, Christian Couder wrote:
>
>> Yeah, I think the https://git.github.io/Outreachy-17/ is not actually necessary.
>
> I think it still may be helpful for explaining in further detail things
> like #leftoverbits (though I see you put some of that in your project
> description).

You mean in https://git.github.io/Outreachy-17/ or somewhere else?

It is already described in https://git.github.io/SoC-2018-Microprojects/.

>> I did that for the "Improve `git bisect`" project. As the
>> "coordinator", you will need to approve that project.
>
> Thanks. I approved it, though a few of the descriptions are a little
> funny. For instance, the text says "we use an issue tracker", which then
> links to public-inbox. I assume this is because you filled in a field
> for "issue tracker" and then the system generated the text.

Yeah, it was generated from fields that I filled in.

> I don't know if there's a way go into more detail there.

I don't think so, though we could perhaps improve our web pages.

^ permalink raw reply	[relevance 0%]

* Re: Git in Outreachy Dec-Mar?
  @ 2018-09-06 19:31  6%           ` Jeff King
  2018-09-08  8:57  0%             ` Christian Couder
  0 siblings, 1 reply; 200+ results
From: Jeff King @ 2018-09-06 19:31 UTC (permalink / raw)
  To: Christian Couder; +Cc: Johannes Schindelin, git

On Thu, Sep 06, 2018 at 11:51:49AM +0200, Christian Couder wrote:

> > Thanks. I signed us up as a community (making me the "coordinator" in
> > their terminology). I think the procedure is a little different this
> > year, and we actually propose projects to mentor through their system.
> 
> Yeah, I think the https://git.github.io/Outreachy-17/ is not actually necessary.

I think it still may be helpful for explaining in further detail things
like #leftoverbits (though I see you put some of that in your project
description).

> > So anybody interested in mentoring should go here:
> >
> >   https://www.outreachy.org/communities/cfp/git/
> >
> > (and you'll need to create a login if you don't have one from last
> > year). You should be able to click through "Submit a Project Proposal",
> > after which the fields are pretty self-explanatory.
> 
> I did that for the "Improve `git bisect`" project. As the
> "coordinator", you will need to approve that project.

Thanks. I approved it, though a few of the descriptions are a little
funny. For instance, the text says "we use an issue tracker", which then
links to public-inbox. I assume this is because you filled in a field
for "issue tracker" and then the system generated the text. I don't know
if there's a way go into more detail there.

> I think the person who submits a project becomes some kind of primary
> mentor for the project. So Dscho, if you want to be such a mentor for
> one or both of the other projects on the Outreachy-17 page, please
> submit the project(s) otherwise please tell me and I will submit them.
> You are free of course to change things in these projects when you
> submit them or to submit other completely different projects.

Yes, I think the point is make sure the mentors are invested in the
individual projects. I imagine a kind of "oh, one of us will probably
mentor it" attitude has led to problems in other projects in the past.

-Peff

^ permalink raw reply	[relevance 6%]

* Re: Git in Outreachy Dec-Mar?
  2018-09-06  1:14  6%         ` Jeff King
@ 2018-09-06  9:58  0%           ` Christian Couder
  0 siblings, 0 replies; 200+ results
From: Christian Couder @ 2018-09-06  9:58 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Johannes Schindelin

On Thu, Sep 6, 2018 at 3:14 AM, Jeff King <peff@peff.net> wrote:
> On Wed, Sep 05, 2018 at 09:20:23AM +0200, Christian Couder wrote:
>
>> >> Thanks. I think sooner is better for this (for you or anybody else who's
>> >> interested in mentoring). The application period opens on September
>> >> 10th, but I think the (still growing) list of projects is already being
>> >> looked at by potential candidates.
>>
>> Do you know where is this list? On
>> https://www.outreachy.org/apply/project-selection/ they say
>> "Information about projects are unavailable until applications open".
>
> This was the list I was looking at (scroll down below the timeline):
>
>   https://www.outreachy.org/communities/cfp/

Ok, so it's the list of "communities" not "projects" in Outreachy terms.

> But yeah, most of the "current projects" lists just say "not available
> yet", so I think we're actually OK until the 10th.

Yeah, I think so too.

>> > So here is a landing page for the next Outreachy round:
>> >
>> > https://git.github.io/Outreachy-17/
>> >
>> > about the microprojects I am not sure which page I should create or improve.
>>
>> Any idea about this? Also any idea about new microprojects would be nice.
>
> I think #leftoverbits is your best bet for micro-projects. Last year I
> think we had interns actually hunt for them via the list archive. That's
> a little unfriendly for total newcomers, I think, but it also does give
> a chance to demonstrate some skills. Perhaps it would be help to create
> a curated list of such bits.

Ok, I will see if I have time to create such a list.

^ permalink raw reply	[relevance 0%]

* Re: Git in Outreachy Dec-Mar?
  @ 2018-09-06  1:14  6%         ` Jeff King
  2018-09-06  9:58  0%           ` Christian Couder
  0 siblings, 1 reply; 200+ results
From: Jeff King @ 2018-09-06  1:14 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Johannes Schindelin

On Wed, Sep 05, 2018 at 09:20:23AM +0200, Christian Couder wrote:

> >> Thanks. I think sooner is better for this (for you or anybody else who's
> >> interested in mentoring). The application period opens on September
> >> 10th, but I think the (still growing) list of projects is already being
> >> looked at by potential candidates.
> 
> Do you know where is this list? On
> https://www.outreachy.org/apply/project-selection/ they say
> "Information about projects are unavailable until applications open".

This was the list I was looking at (scroll down below the timeline):

  https://www.outreachy.org/communities/cfp/

But yeah, most of the "current projects" lists just say "not available
yet", so I think we're actually OK until the 10th.

> > So here is a landing page for the next Outreachy round:
> >
> > https://git.github.io/Outreachy-17/
> >
> > about the microprojects I am not sure which page I should create or improve.
> 
> Any idea about this? Also any idea about new microprojects would be nice.

I think #leftoverbits is your best bet for micro-projects. Last year I
think we had interns actually hunt for them via the list archive. That's
a little unfriendly for total newcomers, I think, but it also does give
a chance to demonstrate some skills. Perhaps it would be help to create
a curated list of such bits.

-Peff

^ permalink raw reply	[relevance 6%]

* Re: Trivial enhancement: All commands which require an author should accept --author
  2018-08-30 14:26  5%         ` Junio C Hamano
@ 2018-09-04 17:18  6%           ` Jonathan Nieder
  0 siblings, 0 replies; 200+ results
From: Jonathan Nieder @ 2018-09-04 17:18 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Johannes Schindelin,
	Ulrich Gemkow, git

Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> I believe the "official" way, such as it is, is you just put
>> #leftoverbits in your E-Mail, then search the list archives,
>> e.g. https://public-inbox.org/git/?q=%23leftoverbits
>
> I think that technique has been around long enough to be called a
> recognised way, but I do not think it is "the" official way.  It is
> one of the efforts to allow us remember what we might want to work
> on, and focuses on not wasting too much efforts in curating.
> Another effort to allow us remember is http://crbug.com/git that is
> run by Jonathan Nieder.
>
> Anybody can participate in curating the latter.

Yes, exactly.

Ævar, if you would like to keep better track of #leftoverbits, please
feel free to make use of https://crbug.com/git/new. It even has a
"leftover bit" template you can use.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[relevance 6%]

* Re: Trivial enhancement: All commands which require an author should accept --author
  2018-08-30 14:08  0%         ` Johannes Schindelin
@ 2018-09-03 13:18  7%           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 13:18 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Ulrich Gemkow, git


On Thu, Aug 30 2018, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Thu, 30 Aug 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> On Thu, Aug 30 2018, Johannes Schindelin wrote:
>>
>> > On Wed, 29 Aug 2018, Junio C Hamano wrote:
>> >
>> >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> >>
>> >> > The `stash` command only incidentally requires that the author is set, as
>> >> > it calls `git commit` internally (which records the author). As stashes
>> >> > are intended to be local only, that author information was never meant to
>> >> > be a vital part of the `stash`.
>> >> >
>> >> > I could imagine that an even better enhancement request would ask for `git
>> >> > stash` to work even if `user.name` is not configured.
>> >>
>> >> This would make a good bite-sized microproject, worth marking it as
>> >> #leftoverbits unless somebody is already working on it ;-)
>> >
>> > Right.
>> >
>> > What is our currently-favored approach to this, again? Do we have a
>> > favorite wiki page to list those, or do we have a bug tracker for such
>> > mini-projects?
>> >
>> > Once I know, I will add this, with enough information to get anybody
>> > interested started.
>>
>> I believe the "official" way, such as it is, is you just put
>> #leftoverbits in your E-Mail, then search the list archives,
>> e.g. https://public-inbox.org/git/?q=%23leftoverbits
>>
>> So e.g. I've taken to putting this in my own E-Mails where I spot
>> something I'd like to note as a TODO that I (or someone else) could work
>> on later:
>> https://public-inbox.org/git/?q=%23leftoverbits+f%3Aavarab%40gmail.com
>
> That is a poor way to list the current micro-projects, as it is totally
> non-obvious to the casual interested person which projects are still
> relevant, and which ones have been addressed already.

I don't think this is ideal. To be clear and in reply to both yours and
Junio's E-Mail. I meant "official" in scare quotes in the least official
way possible.

I.e. that you need to search the mailing list archive if you want to see
what these #leftoverbits are, because the full set is stored nowhere
else.

> In a bug tracker, you can at least add a comment stating that something
> has been addressed, or made a lot easier by another topic.

Yeah, a bunch of things suck about it, although I will say at least for
notes I'm leaving for myself I'm using it in a way that I wouldn't
bother to use a bugtracker, so in many cases it's the difference between
offhandendly saying "oh b.t.w. we should fix xyz in way abc
#leftoverbits" and not having a bug at all, because filing a bug /
curating a tracker etc. is a lot more work.

> In a mailing list archive, those mails are immutable, and you cannot
> update squat.

In a lot of bugtrackers you can't update existing comments either, you
make a new one noting some new status. Similarly you can send a new mail
with the correct In-Reply-To.

That doesn't solve all the issues, but helps in many cases.

^ permalink raw reply	[relevance 7%]

* Re: Trivial enhancement: All commands which require an author should accept --author
  2018-08-30 12:29  6%       ` Ævar Arnfjörð Bjarmason
  2018-08-30 14:08  0%         ` Johannes Schindelin
@ 2018-08-30 14:26  5%         ` Junio C Hamano
  2018-09-04 17:18  6%           ` Jonathan Nieder
  1 sibling, 1 reply; 200+ results
From: Junio C Hamano @ 2018-08-30 14:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Johannes Schindelin, Ulrich Gemkow, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I believe the "official" way, such as it is, is you just put
> #leftoverbits in your E-Mail, then search the list archives,
> e.g. https://public-inbox.org/git/?q=%23leftoverbits

I think that technique has been around long enough to be called a
recognised way, but I do not think it is "the" official way.  It is
one of the efforts to allow us remember what we might want to work
on, and focuses on not wasting too much efforts in curating.
Another effort to allow us remember is http://crbug.com/git that is
run by Jonathan Nieder.

Anybody can participate in curating the latter.  The former is
uncurated and deliberately kept informal, but will stay a usable way
until clueless people catch up with the practice and mark any random
garbage they come up with with the marking word.  I myself try to
refrain from using it when I raise the idea/issue for the first time
to avoid "ah, it turns out that it is not such a great idea after
thinking about it for a while"--rather I try to limit my use to my
responses as a reaction to somebody else's idea/issue.  That way, I
can make sure that messages with the marking word from me has idea
supported by at least two people, one of which is known to me to
have a good taste, so mailing list search "from:me #leftoverbits"
would stay meaningful.

^ permalink raw reply	[relevance 5%]

* Re: Trivial enhancement: All commands which require an author should accept --author
  2018-08-30 12:29  6%       ` Ævar Arnfjörð Bjarmason
@ 2018-08-30 14:08  0%         ` Johannes Schindelin
  2018-09-03 13:18  7%           ` Ævar Arnfjörð Bjarmason
  2018-08-30 14:26  5%         ` Junio C Hamano
  1 sibling, 1 reply; 200+ results
From: Johannes Schindelin @ 2018-08-30 14:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, Ulrich Gemkow, git

[-- Attachment #1: Type: text/plain, Size: 1988 bytes --]

Hi Ævar,

On Thu, 30 Aug 2018, Ævar Arnfjörð Bjarmason wrote:

> On Thu, Aug 30 2018, Johannes Schindelin wrote:
> 
> > On Wed, 29 Aug 2018, Junio C Hamano wrote:
> >
> >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> >>
> >> > The `stash` command only incidentally requires that the author is set, as
> >> > it calls `git commit` internally (which records the author). As stashes
> >> > are intended to be local only, that author information was never meant to
> >> > be a vital part of the `stash`.
> >> >
> >> > I could imagine that an even better enhancement request would ask for `git
> >> > stash` to work even if `user.name` is not configured.
> >>
> >> This would make a good bite-sized microproject, worth marking it as
> >> #leftoverbits unless somebody is already working on it ;-)
> >
> > Right.
> >
> > What is our currently-favored approach to this, again? Do we have a
> > favorite wiki page to list those, or do we have a bug tracker for such
> > mini-projects?
> >
> > Once I know, I will add this, with enough information to get anybody
> > interested started.
> 
> I believe the "official" way, such as it is, is you just put
> #leftoverbits in your E-Mail, then search the list archives,
> e.g. https://public-inbox.org/git/?q=%23leftoverbits
> 
> So e.g. I've taken to putting this in my own E-Mails where I spot
> something I'd like to note as a TODO that I (or someone else) could work
> on later:
> https://public-inbox.org/git/?q=%23leftoverbits+f%3Aavarab%40gmail.com

That is a poor way to list the current micro-projects, as it is totally
non-obvious to the casual interested person which projects are still
relevant, and which ones have been addressed already.

In a bug tracker, you can at least add a comment stating that something
has been addressed, or made a lot easier by another topic.

In a mailing list archive, those mails are immutable, and you cannot
update squat.

Ciao,
Johannes

^ permalink raw reply	[relevance 0%]

* Re: Trivial enhancement: All commands which require an author should accept --author
  2018-08-30 11:51  0%     ` Johannes Schindelin
@ 2018-08-30 12:29  6%       ` Ævar Arnfjörð Bjarmason
  2018-08-30 14:08  0%         ` Johannes Schindelin
  2018-08-30 14:26  5%         ` Junio C Hamano
  0 siblings, 2 replies; 200+ results
From: Ævar Arnfjörð Bjarmason @ 2018-08-30 12:29 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Ulrich Gemkow, git


On Thu, Aug 30 2018, Johannes Schindelin wrote:

> Hi Junio,
>
> On Wed, 29 Aug 2018, Junio C Hamano wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>> > The `stash` command only incidentally requires that the author is set, as
>> > it calls `git commit` internally (which records the author). As stashes
>> > are intended to be local only, that author information was never meant to
>> > be a vital part of the `stash`.
>> >
>> > I could imagine that an even better enhancement request would ask for `git
>> > stash` to work even if `user.name` is not configured.
>>
>> This would make a good bite-sized microproject, worth marking it as
>> #leftoverbits unless somebody is already working on it ;-)
>
> Right.
>
> What is our currently-favored approach to this, again? Do we have a
> favorite wiki page to list those, or do we have a bug tracker for such
> mini-projects?
>
> Once I know, I will add this, with enough information to get anybody
> interested started.

I believe the "official" way, such as it is, is you just put
#leftoverbits in your E-Mail, then search the list archives,
e.g. https://public-inbox.org/git/?q=%23leftoverbits

So e.g. I've taken to putting this in my own E-Mails where I spot
something I'd like to note as a TODO that I (or someone else) could work
on later:
https://public-inbox.org/git/?q=%23leftoverbits+f%3Aavarab%40gmail.com

^ permalink raw reply	[relevance 6%]

* Re: Trivial enhancement: All commands which require an author should accept --author
  2018-08-29 19:09  6%   ` Junio C Hamano
@ 2018-08-30 11:51  0%     ` Johannes Schindelin
  2018-08-30 12:29  6%       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 200+ results
From: Johannes Schindelin @ 2018-08-30 11:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ulrich Gemkow, git

Hi Junio,

On Wed, 29 Aug 2018, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > The `stash` command only incidentally requires that the author is set, as
> > it calls `git commit` internally (which records the author). As stashes
> > are intended to be local only, that author information was never meant to
> > be a vital part of the `stash`.
> >
> > I could imagine that an even better enhancement request would ask for `git
> > stash` to work even if `user.name` is not configured.
> 
> This would make a good bite-sized microproject, worth marking it as
> #leftoverbits unless somebody is already working on it ;-)

Right.

What is our currently-favored approach to this, again? Do we have a
favorite wiki page to list those, or do we have a bug tracker for such
mini-projects?

Once I know, I will add this, with enough information to get anybody
interested started.

Ciao,
Dscho

^ permalink raw reply	[relevance 0%]

* Re: Trivial enhancement: All commands which require an author should accept --author
  @ 2018-08-29 19:09  6%   ` Junio C Hamano
  2018-08-30 11:51  0%     ` Johannes Schindelin
  0 siblings, 1 reply; 200+ results
From: Junio C Hamano @ 2018-08-29 19:09 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Ulrich Gemkow, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> The `stash` command only incidentally requires that the author is set, as
> it calls `git commit` internally (which records the author). As stashes
> are intended to be local only, that author information was never meant to
> be a vital part of the `stash`.
>
> I could imagine that an even better enhancement request would ask for `git
> stash` to work even if `user.name` is not configured.

This would make a good bite-sized microproject, worth marking it as
#leftoverbits unless somebody is already working on it ;-)

^ permalink raw reply	[relevance 6%]

* Re: [PATCHv3 0/5] Simple fixes to t7406
  @ 2018-08-18 20:52  6%     ` Elijah Newren
  0 siblings, 0 replies; 200+ results
From: Elijah Newren @ 2018-08-18 20:52 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Git Mailing List, Junio C Hamano, Eric Sunshine,
	Martin Ågren

Hi,

On Mon, Aug 13, 2018 at 1:28 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
> On Tue, Aug 7, 2018 at 6:49 PM Elijah Newren <newren@gmail.com> wrote:
>
> > Since folks like to notice other problems with t7406 while reading my
> > patches, here's a challenge:
> >
> >   Find something *else* wrong with t7406 that neither I nor any of the
> >   reviewers so far have caught that could be fixed.
>
> Well, I'd hate to be that guy...  but since those who already
> commented on previous rounds are not explicitly excluded from the
> challenge, let's see.
>
> - There are still a few command substitutions running git commands,
>   where the exit status of that command is ignored; just look for the
>   '[^=]$(' pattern in the test script.
>
>   (Is not noticing those cases considered as "flubbing"?)

Hmm, borderline.

> - The 'compare_head' helper function defined in this test script looks
>   very similar to the generally available 'test_cmp_rev' function,
>   which has the benefit to provide some visible output on failure
>   (though, IMO, not a particularly useful output, because the diff of
>   two OIDs is not very informative, but at least it's something as
>   opposed to the silence of 'test $this = $that").
>
>   Now, since 'compare_head' always compares the same two revisions,
>   namely 'master' and HEAD, replacing 'compare_head' with an
>   appropriate 'test_cmp_rev' call would result in repeating 'master'
>   and 'HEAD' arguments all over the test script.  I'm not sure whether
>   that's good or bad.  Anyway, I think that 'compare_head' could be
>   turned into a wrapper around 'test_cmp_rev'.

Ooh, that does sound better.

> >     - You get bonus points if that thing is in the context region for
> >       one of my five patches.
> >     - Extra bonus points if the thing needing fixing was on a line I
> >       changed.
> >     - You win outright if it's something big enough that I give up and
> >       request to just have my series merged as-is and punt your
> >       suggested fixes down the road to someone else.
>
> Well, there's always the indentation of the commands run in subshells,
> which doesn't conform to our coding style...
>
> Gah, now you made me that guy ;)

I read this on Monday and got a really good laugh.  I meant to fix it
up, but fell asleep too soon the first couple nights...and now this
series is in next anyway and there are a couple other git things that
have my attention.  You have pointed out a couple additional nice
fixups, like you always do, but I think at this point I'm just going
to declare you the winner and label these as #leftoverbits.

Thanks for always thoroughly looking over the testcase patches and
your constant work to improve the testsuite.

^ permalink raw reply	[relevance 6%]

* Re: [PATCHv4 6/6] Remove forward declaration of an enum
  @ 2018-08-15 20:40  6%     ` Jonathan Nieder
  0 siblings, 0 replies; 200+ results
From: Jonathan Nieder @ 2018-08-15 20:40 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, avarab, peff, ramsay

Elijah Newren wrote:

> According to http://c-faq.com/null/machexamp.html, sizeof(char*) !=
> sizeof(int*) on some platforms.  Since an enum could be a char or int
> (or long or...), knowing the size of the enum thus is important to
> knowing the size of a pointer to an enum, so we cannot just forward
> declare an enum the way we can a struct.  (Also, modern C++ compilers
> apparently define forward declarations of an enum to either be useless
> because the enum was defined, or require an explicit size specifier, or
> be a compilation error.)

Beyond the effect on some obscure platforms, this also makes it
possible to build with gcc -pedantic (which can be useful for finding
some other problems).  Thanks for fixing it.

[...]
> --- a/packfile.h
> +++ b/packfile.h
> @@ -1,12 +1,12 @@
>  #ifndef PACKFILE_H
>  #define PACKFILE_H
>  
> +#include "cache.h"
>  #include "oidset.h"
>  
>  /* in object-store.h */
>  struct packed_git;
>  struct object_info;

Not about this patch: comments like the above are likely to go stale,
since nothing verifies they continue to be true.  So we should remove
them. #leftoverbits

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[relevance 6%]

* Re: [PATCHv3 1/6] Add missing includes and forward declares
  @ 2018-08-15  6:51  6%           ` Elijah Newren
  0 siblings, 0 replies; 200+ results
From: Elijah Newren @ 2018-08-15  6:51 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Git Mailing List, Ævar Arnfjörð, Jeff King,
	Ramsay Jones

On Tue, Aug 14, 2018 at 11:13 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Elijah Newren wrote:
>
> > I didn't want to repeat that description in all 6 patches, since all
> > six came from that, so I put it in the cover letter.  Since patch #1
> > has most that changes though, I guess it makes sense to include it at
> > least in that one?
>
> Yes, that sounds sensible to me.

Will do.

> [...]
> >> enums are of unknown size, so forward declarations don't work for
> >> them.  See bb/pedantic for some examples.
> >
> > structs are also of unknown size; the size is irrelevant when the
> > function signature merely uses a pointer to the struct or enum.  The
> > enum forward declaration fixes a compilation bug.
>
> My rationale may miss the point but the standard and some real compilers
> don't like this, unfortunately.
>
> For structs, having an incomplete type is fine, but for enums we need
> the full definition.  E.g. C99 sayeth (in section 6.7.2.3 "tags")
>
>         A type specifier of the form
>
>                 enum identifier
>
>         without an enumerator list shall only appear after the type it
>         specifies is complete.

What about a type specifier of the form
  enum identifier *
?  Can that kind of type specifier appear before the full definition
of the enum?  (Or, alternatively, if the standard doesn't say, are
there any compilers that have a problem with that?)

If so, we can include cache.h instead.  We'll probably also have to
fix up packfile.h for the exact same issue (even the same enum name)
if that's the case.

> [...]
> >>> --- a/commit-graph.h
> >>> +++ b/commit-graph.h
> >>> @@ -4,6 +4,7 @@
> >>>  #include "git-compat-util.h"
> >>>  #include "repository.h"
> >>>  #include "string-list.h"
> >>> +#include "cache.h"
> >>
> >> We can skip the #include of git-compat-util.h since all .c files
> >> include it.
> >
> > Good point.  Should I go through and remove all the inclusions of
> > git-compat-util.h in header files?
>
> It's orthogonal to this series but might be a good change.

I think I'll leave it as #leftoverbits for someone else interested.  :-)

> [...]
> >>> --- a/pathspec.h
> >>> +++ b/pathspec.h
> >>> @@ -1,6 +1,11 @@
> >>>  #ifndef PATHSPEC_H
> >>>  #define PATHSPEC_H
> >>>
> >>> +#include "string.h"
> >>> +#include "strings.h"
> >>
> >> What are these headers?
> >
> > The original patch[1] had explanations of why I added them:
> >
> > +#include "string.h"   /* For str[n]cmp */
> > +#include "strings.h"  /* For str[n]casecmp */
>
> Ah.  Please remove these #includes: they're part of the standard
> library that we get implicitly via git-compat-util.h.
>
> I was tripped up because they were in quotes instead of angle
> brackets.

Indeed; will do.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v3 07/10] fetch: implement fetch.fsck.*
  @ 2018-07-27 21:08  5%   ` Junio C Hamano
  0 siblings, 0 replies; 200+ results
From: Junio C Hamano @ 2018-07-27 21:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Johannes Schindelin, Jeff King, Eric Sunshine,
	Christian Couder

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> -			argv_array_push(&cmd.args, "--strict");
> +			argv_array_pushf(&cmd.args, "--strict%s",
> +					 fsck_msg_types.buf);
> ...
> +		if (git_config_pathname(&path, var, value))
> +			return 1;
> +		strbuf_addf(&fsck_msg_types, "%cskiplist=%s",
> +			fsck_msg_types.len ? ',' : '=', path);
> ...
> +		if (is_valid_msg_type(var, value))
> +			strbuf_addf(&fsck_msg_types, "%c%s=%s",
> +				fsck_msg_types.len ? ',' : '=', var, value);
> +		else
> +			warning("Skipping unknown msg id '%s'", var);

This follows quite familiar pattern found in receive_pack_config();
looking good.

> diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
> index 57ff78c201..004bfebe98 100755
> --- a/t/t5504-fetch-receive-strict.sh
> +++ b/t/t5504-fetch-receive-strict.sh
> @@ -145,6 +145,20 @@ test_expect_success 'push with receive.fsck.skipList' '
>  	git push --porcelain dst bogus
>  '
>  
> +test_expect_success 'fetch with fetch.fsck.skipList' '
> +	commit="$(git hash-object -t commit -w --stdin <bogus-commit)" &&
> +	refspec=refs/heads/bogus:refs/heads/bogus &&
> +	git push . $commit:refs/heads/bogus &&

I see this used in the previous test for receive.fsck.skipList, but
it is an interesting implementation of "git update-ref" that could
be affected by potential fsck error in push-to-receive-pack transport.
As we are interested in transport into "dst" and we want this creation
of our 'bogus' branch to succeed no matter what, it probably is not
a good idea to use "git push ." like this in the context of this test.

Perhaps leave a 'leftoverbits' comment to force us remember to update
all these uses of local push from the script in the future?

> +	rm -rf dst &&
> +	git init dst &&
> +	git --git-dir=dst/.git config fetch.fsckObjects true &&
> +	test_must_fail git --git-dir=dst/.git fetch "file://$(pwd)" $refspec &&

We see that by default fetch.fsckObjects errors out when it notices
the bogus commit object.

> +	git --git-dir=dst/.git config fetch.fsck.skipList dst/.git/SKIP &&
> +	echo $commit >dst/.git/SKIP &&

And then we set up a skip to see ...

> +	git --git-dir=dst/.git fetch "file://$(pwd)" $refspec

... if that is ignored.  Looks great.

Would this second attempt succeed _without_ the SKIP list, I wonder,
though?  

After the initial attempt that transferred the object, inspected it
and then aborted before pointing a ref to make the object reachable,
wouldn't it be possible for the quickfetch codepath to say "ah, we
locally have that object, so let's see it is a descendant of the tip
of one of our refs *and* all the objects it points at (recursively)
are all available in this repository", as we do not quarantine on
the fetch side?

^ permalink raw reply	[relevance 5%]

Results 201-400 of ~530  next (older) | prev (newer) | reverse | options above

-- pct% links below jump to the message on this page, permalinks otherwise --
2018-03-17 21:20     [GSoC] Some #leftoverbits for anyone looking for little projects Ævar Arnfjörð Bjarmason
2019-05-20 18:23  6% ` Matheus Tavares
2019-05-20 23:49  8%   ` Ævar Arnfjörð Bjarmason
2019-05-21  4:38  8%     ` Matheus Tavares Bernardino
2019-05-28 10:37  8%   ` Johannes Schindelin
2019-05-28 17:37  8%     ` Matheus Tavares Bernardino
2019-05-28 18:16  8%       ` Johannes Schindelin
2019-05-29  9:38  8% ` Johannes Schindelin
2019-05-29  9:40  8%   ` Johannes Schindelin
2018-05-25 19:28     [PATCH v2 0/5] fsck: doc fixes & fetch.fsck.* implementation Ævar Arnfjörð Bjarmason
2018-07-27 14:37     ` [PATCH v3 07/10] fetch: implement fetch.fsck.* Ævar Arnfjörð Bjarmason
2018-07-27 21:08  5%   ` Junio C Hamano
2018-08-06 15:25     [PATCH 0/2] Simple fixes to t7406 Elijah Newren
2018-08-07 16:49     ` [PATCHv3 0/5] " Elijah Newren
2018-08-13 20:28       ` SZEDER Gábor
2018-08-18 20:52  6%     ` Elijah Newren
2018-08-08 15:21     [GSoC] [PATCH 00/18] builtin rebase options Pratik Karki
2018-09-04 21:59     ` [PATCH v2 " Johannes Schindelin via GitGitGadget
2018-09-06 19:50       ` Junio C Hamano
2018-10-12 12:01  4%     ` Johannes Schindelin
2018-08-11 20:50     [PATCHv2 0/6] Add missing includes and forward declares Elijah Newren
2018-08-13 17:17     ` [PATCHv3 " Elijah Newren
2018-08-13 17:17       ` [PATCHv3 1/6] " Elijah Newren
2018-08-15  5:10         ` Jonathan Nieder
2018-08-15  5:50           ` Elijah Newren
2018-08-15  6:13             ` Jonathan Nieder
2018-08-15  6:51  6%           ` Elijah Newren
     [not found]     <https://public-inbox.org/git/20180813171749.10481-1-newren@gmail.com/>
2018-08-15 17:54     ` [PATCHv4 0/6] " Elijah Newren
2018-08-15 17:54       ` [PATCHv4 6/6] Remove forward declaration of an enum Elijah Newren
2018-08-15 20:40  6%     ` Jonathan Nieder
2018-08-28 15:14     Git in Outreachy Dec-Mar? Jeff King
2018-08-31  8:16     ` Christian Couder
2018-09-01  8:43       ` Jeff King
2018-09-03  4:36         ` Christian Couder
2018-09-05  7:20           ` Christian Couder
2018-09-06  1:14  6%         ` Jeff King
2018-09-06  9:58  0%           ` Christian Couder
2018-09-06  1:21           ` Jeff King
2018-09-06  9:51             ` Christian Couder
2018-09-06 19:31  6%           ` Jeff King
2018-09-08  8:57  0%             ` Christian Couder
2018-09-08 15:40  0%               ` Jeff King
2018-08-28 21:05     Trivial enhancement: All commands which require an author should accept --author Ulrich Gemkow
2018-08-29 16:14     ` Johannes Schindelin
2018-08-29 19:09  6%   ` Junio C Hamano
2018-08-30 11:51  0%     ` Johannes Schindelin
2018-08-30 12:29  6%       ` Ævar Arnfjörð Bjarmason
2018-08-30 14:08  0%         ` Johannes Schindelin
2018-09-03 13:18  7%           ` Ævar Arnfjörð Bjarmason
2018-08-30 14:26  5%         ` Junio C Hamano
2018-09-04 17:18  6%           ` Jonathan Nieder
2018-09-26 21:28     [PATCH] fetch: replace string-list used as a look-up table with a hashmap Junio C Hamano
2018-09-27  5:34     ` Jeff King
2018-09-30  2:11       ` Junio C Hamano
2018-10-19  3:48         ` [PATCH v3] " Junio C Hamano
2018-10-22  9:57           ` Johannes Schindelin
2018-10-27  6:47             ` Re*: " Junio C Hamano
2018-10-31 14:50  6%           ` Johannes Schindelin
2018-09-30 10:23     [Outreachy] Introduce myself Ananya Krishna Maram
2018-09-30 16:57  6% ` Christian Couder
     [not found]     <CAEdN-uV6giCp6FbC=J3B--6_kEzFkG8Yq4VXgGnhoNjERsXSQw@mail.gmail.com>
2018-10-01 15:56  5% ` [Outreachy] Introducing Christian Couder
2018-10-03 13:23  5% We should add a "git gc --auto" after "git clone" due to commit graph Ævar Arnfjörð Bjarmason
2018-10-03 13:36  0% ` SZEDER Gábor
2018-10-03 14:01  0%   ` Ævar Arnfjörð Bjarmason
2018-10-03 14:17  0%     ` SZEDER Gábor
2018-10-04 11:30     [PATCH] [Outreachy] git/userdiff.c fix regex pattern error Ananya Krishna Maram
2018-10-04 14:26     ` Johannes Schindelin
2018-10-04  9:35       ` Ananya Krishna Maram
2018-10-04 15:26         ` Johannes Schindelin
2018-10-04 10:39           ` Ananya Krishna Maram
2018-10-04 19:42  6%         ` Johannes Schindelin
2018-10-08 17:05     [PATCH][Outreachy] remove all the inclusions of git-compat-util.h in header files Ananya Krishna Maram
2018-10-08 17:13     ` Derrick Stolee
2018-10-09 10:13       ` Junio C Hamano
2018-10-10  8:06  4%     ` Johannes Schindelin
2018-10-26 19:27     [PATCH v2 0/7] fixes for unqualified <dst> push Ævar Arnfjörð Bjarmason
2018-10-26 23:07     ` [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:<dst>" Ævar Arnfjörð Bjarmason
2018-10-29  7:06       ` Junio C Hamano
2018-10-29  8:05  5%     ` Ævar Arnfjörð Bjarmason
2018-10-27 11:22     [RFC PATCH] index-pack: improve performance on NFS Ævar Arnfjörð Bjarmason
2018-10-28 22:50  5% ` [PATCH 2/4] pack-objects tests: don't leave test .git corrupt at end Ævar Arnfjörð Bjarmason
2018-10-28 22:50     [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking Ævar Arnfjörð Bjarmason
2018-10-30 18:43  5% ` [PATCH v2 2/3] pack-objects tests: don't leave test .git corrupt at end Ævar Arnfjörð Bjarmason
2018-11-12 14:46     [PATCH 0/9] caching loose objects Jeff King
2018-11-12 14:55     ` [PATCH 9/9] fetch-pack: drop custom loose object cache Jeff King
2018-11-12 19:25  6%   ` René Scharfe
2018-11-12 19:32  0%     ` Ævar Arnfjörð Bjarmason
2018-11-12 20:07  0%       ` Jeff King
2018-11-12 20:13  0%       ` René Scharfe
2018-11-13 15:22     [PATCH 1/3] read-cache: use shared perms when writing shared index Ævar Arnfjörð Bjarmason
2018-11-13 15:32  4% ` [RFC/PATCH] read-cache: write all indexes with the same permissions Ævar Arnfjörð Bjarmason
2018-11-16 17:31  4% [PATCH v2] " Christian Couder
2018-11-17  9:29  0% ` Junio C Hamano
2018-11-17 11:19  0%   ` Christian Couder
2018-11-17 13:05  6%     ` Junio C Hamano
2018-11-17 21:14  0%       ` Ævar Arnfjörð Bjarmason
2018-11-29 14:59  4% How de-duplicate similar repositories with alternates Ævar Arnfjörð Bjarmason
2018-11-29 16:09  0% ` Ævar Arnfjörð Bjarmason
2018-12-04  6:59  0% ` Jeff King
2018-12-04 10:43  0%   ` Ævar Arnfjörð Bjarmason
2018-12-04 13:35  0% ` Ævar Arnfjörð Bjarmason
2018-12-21 13:17     [PATCH 0/4] Let the builtin rebase call the git am command directly Johannes Schindelin via GitGitGadget
2018-12-21 13:17     ` [PATCH 4/4] built-in rebase: call `git am` directly Johannes Schindelin via GitGitGadget
2019-01-04 18:38       ` Junio C Hamano
2019-01-18 14:15  5%     ` Johannes Schindelin
2019-01-07 21:30     [PATCH] blame: add the ability to ignore commits Barret Rhoden
2019-01-17 20:29     ` [PATCH v2 0/3] " Barret Rhoden
2019-01-17 20:29       ` [PATCH v2 1/3] Move init_skiplist() outside of fsck Barret Rhoden
2019-01-18  9:45         ` Ævar Arnfjörð Bjarmason
2019-01-18 17:36           ` Junio C Hamano
2019-01-18 20:59             ` Johannes Schindelin
2019-01-18 21:30               ` Jeff King
2019-01-18 22:26                 ` Ævar Arnfjörð Bjarmason
2019-01-22  7:12                   ` Jeff King
2019-01-22  9:46  6%                 ` Ævar Arnfjörð Bjarmason
2019-01-22 18:28  0%                   ` Jeff King
2019-01-14 17:53     Students projects: looking for small and medium project ideas Matthieu Moy
2019-01-14 23:04  6% ` Ævar Arnfjörð Bjarmason
2019-02-23 13:28     ` Fabio Aiuto
2019-02-26 17:51       ` Matthieu Moy
2019-02-26 20:14  8%     ` Johannes Schindelin
2019-02-19  7:03     Antw: Antw: Ulrich Windl
2019-02-19 17:07     ` [PATCH v2] merge-options.txt: correct wording of --no-commit option Elijah Newren
2019-02-19 19:32       ` Junio C Hamano
2019-02-19 22:31  6%     ` Elijah Newren
2019-02-19 22:38  0%       ` Junio C Hamano
2019-02-23 15:11     does "git clean" deliberately ignore "core.excludesFile"? Robert P. J. Day
2019-02-23 15:28     ` Junio C Hamano
2019-02-23 18:06       ` Johannes Schindelin
2019-02-23 18:19         ` Johannes Schindelin
2019-02-23 18:32           ` Robert P. J. Day
2019-02-24  5:30             ` Junio C Hamano
2019-02-24 14:15  6%           ` Johannes Schindelin
2019-02-27 12:47     [BUG] All files in folder are moved when cherry-picking commit that moves fewer files Linus Nilsson
2019-02-27 14:30     ` Phillip Wood
2019-02-27 16:02       ` Elijah Newren
2019-02-27 16:40         ` Jeff King
2019-02-27 17:31  6%       ` Elijah Newren
2019-02-28  8:16  6%         ` Linus Nilsson
2019-03-13 10:16     [PATCH 0/4] get_oid: cope with a possibly stale loose object cache Johannes Schindelin via GitGitGadget
2019-03-13 10:16     ` [PATCH 1/4] rebase -i: demonstrate obscure loose object cache bug Johannes Schindelin via GitGitGadget
2019-03-13 16:11       ` Ævar Arnfjörð Bjarmason
2019-03-13 16:35         ` Jeff King
2019-03-13 22:27  6%       ` Johannes Schindelin
2019-04-03 14:38     Feature request: Add --no-edit to git tag command Robert Dailey
2019-04-04  1:57  6% ` Jeff King
2019-04-04  3:26  0%   ` Taylor Blau
2019-04-04 12:06  0%     ` Jeff King
2019-04-04 13:56  0%       ` Robert Dailey
2019-04-04 13:57  0%         ` Robert Dailey
2019-04-06 13:27     [PATCH 0/2] Minor document fixes Philip Oakley
2019-04-06 13:27     ` [PATCH 2/2] describe doc: remove '7-char' abbreviation reference Philip Oakley
2019-04-07 20:05  5%   ` Ævar Arnfjörð Bjarmason
2019-04-07 21:04  0%     ` Junio C Hamano
2019-04-07 22:23  0%     ` Philip Oakley
2019-04-19 21:47     Resolving deltas dominates clone time Martin Fick
2019-04-22 20:21     ` Martin Fick
2019-04-22 20:56       ` Jeff King
2019-04-22 22:32         ` Martin Fick
2019-04-23  1:55           ` Jeff King
2019-04-23  4:21             ` Jeff King
2019-04-23 10:08               ` Duy Nguyen
2019-04-30 17:50                 ` Jeff King
2019-04-30 18:48  6%               ` Ævar Arnfjörð Bjarmason
2019-04-30 20:33  0%                 ` Jeff King
2019-04-21 13:19     [PATCH/RFC] Makefile: dedup list of files obtained from ls-files Junio C Hamano
2019-04-22 14:49     ` Jeff King
2019-04-22 17:15       ` Ramsay Jones
2019-04-23  1:18  6%     ` Junio C Hamano
2019-05-15  8:36     [PATCH 1/2] t5616: refactor packfile replacement Johannes Schindelin
2019-05-15 18:22  6% ` Jonathan Tan
2019-05-19  5:07     Git ransom campaign incident report - May 2019 Jeff King
2019-05-19  5:10  4% ` [PATCH 1/3] transport_anonymize_url(): support retaining username Jeff King
2019-05-20 16:36  0%   ` Johannes Schindelin
2019-05-20 21:53  6% [PATCH 0/3] hash-object doc: small fixes Ævar Arnfjörð Bjarmason
2019-05-26 20:49     [GIT PULL] KVM changes for Linux 5.2-rc2 Linus Torvalds
2019-05-26 22:54     ` [RFC/PATCH] refs: tone down the dwimmery in refname_match() for {heads,tags,remotes}/* Ævar Arnfjörð Bjarmason
2019-05-27 12:33       ` Paolo Bonzini
2019-05-27 14:29  5%     ` Ævar Arnfjörð Bjarmason
2019-05-27 15:39  0%       ` Junio C Hamano
2019-06-25 21:35     [2.22.0] difftool no longer passes through to git diff if diff.tool is unset Jeff King
2019-06-25 23:09     ` Pugh, Logan
2019-06-26 18:08  6%   ` Jeff King
2019-06-27 23:39     [PATCH v2 0/9] grep: move from kwset to optional PCRE v2 Ævar Arnfjörð Bjarmason
2019-07-01 21:20     ` [PATCH v3 00/10] " Ævar Arnfjörð Bjarmason
2019-07-01 21:31       ` Junio C Hamano
2019-07-02 11:10  5%     ` Ævar Arnfjörð Bjarmason
2019-07-09 22:09     Problem with git diff McRoberts, John
2019-07-09 23:13  5% ` Elijah Newren
2019-07-09 23:26  5%   ` -EXT-Re: " McRoberts, John
2019-07-09 23:29  0%   ` Bryan Turner
2019-07-09 23:35  0%     ` Elijah Newren
2019-07-18 13:19     [PATCH 00/24] Reinstate support for Visual Studio Johannes Schindelin via GitGitGadget
2019-07-29 20:08     ` [PATCH v2 00/23] " Johannes Schindelin via GitGitGadget
2019-07-29 20:08       ` [PATCH v2 20/23] .gitignore: touch up the entries regarding " Philip Oakley via GitGitGadget
2019-08-25 12:07         ` SZEDER Gábor
2019-08-25 13:20           ` Philip Oakley
2019-08-25 19:09             ` SZEDER Gábor
2019-08-25 22:21               ` Philip Oakley
2019-08-26  9:10                 ` SZEDER Gábor
2019-08-28 11:34  6%               ` Johannes Schindelin
2019-08-14  2:49     [PATCH v2] use delete_refs when deleting tags or branches Phil Hord
     [not found]     ` <CABURp0rtkmo7MSSCVrdNXT0UzV9XqV_kXOGkC23C+_vMENNJUg@mail.gmail.com>
2021-01-15 18:43  4%   ` Elijah Newren
2021-01-16  2:27  0%     ` Phil Hord
2019-08-19 23:52     [PATCH v2 0/4] format-patch: learn --infer-cover-subject option Denton Liu
2019-08-20  7:18     ` [PATCH v3 00/13] format-patch: learn --infer-cover-subject option (also t4014 cleanup) Denton Liu
2019-08-20  7:19       ` [PATCH v3 13/13] format-patch: learn --infer-cover-subject option Denton Liu
2019-08-21 19:32         ` Junio C Hamano
2019-08-23 18:15           ` Denton Liu
2019-08-23 18:46  6%         ` Philip Oakley
2019-08-25 18:59     [PATCH] t7300-clean: demonstrate deleting nested repo with an ignored file breakage SZEDER Gábor
2019-09-05 15:47     ` [RFC PATCH v2 00/12] Fix some git clean issues Elijah Newren
2019-09-05 15:47       ` [RFC PATCH v2 12/12] clean: fix theoretical path corruption Elijah Newren
2019-09-05 19:27         ` SZEDER Gábor
2019-09-07  0:34  6%       ` Elijah Newren
2019-08-27  5:17     Git in Outreachy December 2019? Jeff King
2019-09-04 19:41  6% ` Jeff King
2019-09-08 14:56  0%   ` Pratyush Yadav
2019-09-23 18:07  0%   ` SZEDER Gábor
2019-09-26 11:42  6%     ` Johannes Schindelin
2019-09-03 21:11     [PATCH] t: use LF variable defined in the test harness Junio C Hamano
2019-09-04  0:29     ` Taylor Blau
2019-09-05 18:17       ` Junio C Hamano
2019-09-05 18:47  6%     ` Jeff King
2019-09-05 19:34  6%       ` Junio C Hamano
2019-09-05 22:10  1%         ` [PATCH] t: use common $SQ variable Denton Liu
2019-09-05 22:25  6%           ` Taylor Blau
2019-09-05 22:27  6%             ` Taylor Blau
2019-09-06  2:04  0%           ` Denton Liu
2019-09-04  2:22     [RFC PATCH 0/1] commit-graph.c: handle corrupt commit trees Taylor Blau
2019-09-04 18:25     ` Garima Singh
2019-09-04 21:21  6%   ` Taylor Blau
2019-09-05  6:08  0%     ` Jeff King
2019-09-06 16:48  0%     ` Derrick Stolee
2019-11-04  9:54     [RFC PATCH v2 0/2] rebase -i: extend rebase.missingCommitsCheck to `--edit-todo' Alban Gruin
2019-12-02 23:47     ` [PATCH v3 0/2] rebase -i: extend rebase.missingCommitsCheck Alban Gruin
2019-12-02 23:47       ` [PATCH v3 1/2] sequencer: move check_todo_list_from_file() to rebase-interactive.c Alban Gruin
2019-12-06 14:38  6%     ` Johannes Schindelin
2019-11-21  0:45     [PATCH v2 00/21] t: test cleanup stemming from experimentally enabling pipefail Denton Liu
2019-11-22 18:59     ` [PATCH v3 00/22] " Denton Liu
2019-11-22 19:00       ` [PATCH v3 22/22] t7700: stop losing return codes of git commands Denton Liu
2019-11-23  1:49  6%     ` Junio C Hamano
2019-11-25 23:57  0%       ` Denton Liu
2019-11-26  0:58  0%         ` Eric Sunshine
2019-12-12  0:49  8% [PATCH 0/2] dl/format-patch-notes-config-fixup: clean up some leftoverbits Denton Liu
2019-12-19  0:01     [PATCH v1 0/1] gpg-interface: add minTrustLevel as a configuration option Hans Jerry Illikainen
2019-12-22  0:31     ` [PATCH v2 " Hans Jerry Illikainen
2019-12-22  0:31       ` [PATCH v2 1/1] " Hans Jerry Illikainen
2019-12-24 19:02         ` Junio C Hamano
2019-12-27 13:46           ` Hans Jerry Illikainen
2019-12-27 22:21  6%         ` Junio C Hamano
2020-01-17 13:45     [PATCH] commit: replace rebase/sequence booleans with single pick_state enum Ben Curtis via GitGitGadget
2020-01-17 20:01     ` Phillip Wood
2020-01-18 16:34       ` Ben Curtis
2020-01-20 17:09  5%     ` Phillip Wood
2020-02-14  0:52     [RFC][GSOC] Microproject Suggestion Robear Selwans
2020-02-14  1:41     ` Junio C Hamano
2020-02-14  7:29       ` Robear Selwans
2020-02-14  8:49  6%     ` Denton Liu
2020-03-04 11:33     [PATCH 0/7] New execute-commands hook for centralized workflow Jiang Xin
2020-03-04 20:39  4% ` Junio C Hamano
2020-03-05 16:51  0%   ` Jiang Xin
2020-03-09 20:55     [PATCH] rebase --merge: optionally skip upstreamed commits Jonathan Tan
2020-03-10  2:10  5% ` Taylor Blau
2020-03-10 15:51  0%   ` Jonathan Tan
2020-03-30  4:06     ` [PATCH v3] " Jonathan Tan
2020-03-30 12:13       ` Derrick Stolee
2020-03-30 16:49  5%     ` Junio C Hamano
2020-03-27 11:44     git rebase fast-forward fails with abbreviateCommands Jan Alexander Steffens (heftig)
2020-03-30 12:42     ` [PATCH v1 0/2] rebase --merge: fix fast forwarding when `rebase.abbreviateCommands' is set Alban Gruin
2020-03-30 12:42       ` [PATCH v1 1/2] sequencer: don't abbreviate a command if it doesn't have a short form Alban Gruin
2020-03-30 17:50  5%     ` Junio C Hamano
2020-04-01 18:11     [PATCH] commit-graph: fix buggy --expire-time option Derrick Stolee via GitGitGadget
2020-04-01 19:49     ` Junio C Hamano
2020-04-01 19:57       ` Jeff King
2020-04-01 20:33  6%     ` Junio C Hamano
2020-04-01 20:51  0%       ` Derrick Stolee
2020-04-07 14:11     [PATCH 0/6] fixup ra/rebase-i-more-options Phillip Wood
2020-07-16 17:32     ` [PATCH v7 0/5] cleanup ra/rebase-i-more-options Phillip Wood
2020-07-16 17:39  6%   ` Junio C Hamano
2020-04-21 13:12     [PATCH v3 0/4] gitfaq: add issues in the 'Common Issues' section Shourya Shukla
2020-04-21 13:12     ` [PATCH v3 3/4] gitfaq: shallow cloning a repository Shourya Shukla
2020-04-21 20:00       ` Junio C Hamano
2020-04-21 20:43         ` Randall S. Becker
2020-04-22  1:30           ` Derrick Stolee
2020-04-22  4:00  6%         ` Jonathan Nieder
2020-05-07  9:59     [PATCH v12 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
2020-05-11 19:46     ` [PATCH v13 00/13] " Han-Wen Nienhuys via GitGitGadget
2020-05-11 19:46       ` [PATCH v13 04/13] reftable: file format documentation Jonathan Nieder via GitGitGadget
2020-05-19 22:00  3%     ` Junio C Hamano
2020-05-19 18:26  4% [PATCH v2] submodule: port subcommand 'set-branch' from shell to C Shourya Shukla
2020-05-27 21:09     [PATCH] checkout -p: handle new files correctly Johannes Schindelin via GitGitGadget
2020-05-27 23:03     ` Jeff King
2020-05-27 19:51  6%   ` Johannes Schindelin
2020-05-27 19:58  0%     ` Johannes Schindelin
2020-05-28 20:40     [PATCH v2] fast-import: add new --date-format=raw-permissive format Elijah Newren via GitGitGadget
2020-05-30 20:25  3% ` [PATCH v3] " Elijah Newren via GitGitGadget
2020-06-01  0:28     Git multiple remotes push stop at first failed connection John Siu
2020-06-01 21:40     ` Jeff King
2020-06-02 16:26  6%   ` Junio C Hamano
2020-06-02 16:54  0%     ` John Siu
2020-06-11 16:16  5% [PATCH] diff-files: treat "i-t-a" files as "not-in-index" Srinidhi Kaushik
2020-06-23 15:04     [PATCH 0/3] Accommodate for pu having been renamed to seen Johannes Schindelin via GitGitGadget
2020-06-23 15:04     ` [PATCH 1/3] docs: adjust for the recent rename of `pu` to `seen` Johannes Schindelin via GitGitGadget
2020-06-23 15:31       ` Đoàn Trần Công Danh
2020-06-23 19:31         ` Junio C Hamano
2020-06-23 21:32  6%       ` Johannes Schindelin
2020-07-12  8:39     [PATCH v15] Support auto-merge for meld to follow the vim-diff behavior sunlin via GitGitGadget
2020-07-12  9:08     ` [PATCH v16] " sunlin via GitGitGadget
2020-07-12 18:04  4%   ` Junio C Hamano
2020-07-30  0:36     Avoiding 'master' nomenclature Junio C Hamano
2020-07-30 18:02     ` [PATCH v3 0/2] fmt-merge-msg: selectively suppress "into <branch>" Junio C Hamano
2020-07-31  0:42       ` Jeff King
2020-07-31  2:04  4%     ` Junio C Hamano
2020-07-31  2:22  0%       ` Jeff King
2020-07-31 20:03  0%         ` Taylor Blau
2020-08-01  7:15  0%           ` Michal Suchánek
2020-08-08 21:34     [PATCH] gitweb: Map names/emails with mailmap Emma Brooks
2020-08-09 23:04     ` [PATCH v2] gitweb: map " Emma Brooks
2020-08-10 10:02       ` Jeff King
2020-08-11  4:17         ` Emma Brooks
2020-08-11  4:55           ` Jeff King
2020-09-05  2:55             ` Emma Brooks
2020-09-05  3:26               ` Junio C Hamano
2020-09-07 22:10  6%             ` Emma Brooks
2020-08-12 18:33     [PATCH v2] rebase -i: Fix possibly wrong onto hash in todo Antti Keränen
2020-08-13 10:41  6% ` Alban Gruin
2020-08-13 14:38  0%   ` Phillip Wood
2020-08-25  2:01     [PATCH] builtin/repack.c: invalidate MIDX only when necessary Taylor Blau
2020-08-25  2:26     ` Jeff King
2020-08-25  2:37  5%   ` Taylor Blau
2020-08-25 13:14  0%     ` Derrick Stolee
2020-08-25 14:41  0%       ` Taylor Blau
2020-08-25 15:58       ` Junio C Hamano
2020-08-25 17:22         ` Jeff King
2020-08-25 18:05  5%       ` Junio C Hamano
2020-08-26 15:45     [PATCH] clone: add remote.cloneDefault config option Sean Barag via GitGitGadget
2020-08-26 18:46     ` Junio C Hamano
2020-08-26 19:04       ` Derrick Stolee
2020-08-26 19:59  6%     ` Junio C Hamano
2020-08-26 16:45     [PATCH v1 0/3] War on dashed-git Junio C Hamano
2020-08-26 19:46     ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Junio C Hamano
2020-08-26 19:46       ` [PATCH v2 2/2] cvsexportcommit: do not run git programs in " Junio C Hamano
2020-08-26 21:37         ` [PATCH v2 3/2] credential-cache: use child_process.args Junio C Hamano
2020-08-26 22:25           ` [PATCH] run_command: teach API users to use embedded 'args' more Junio C Hamano
2020-08-27  4:21             ` Jeff King
2020-08-27  4:30  6%           ` Junio C Hamano
2020-08-28  6:56     Git in Outreachy? Jeff King
2020-08-31 17:41  6% ` Junio C Hamano
2020-09-06 18:56     ` Kaartic Sivaraam
2020-09-20 16:31  5%   ` Kaartic Sivaraam
2020-09-21  4:22  0%     ` Christian Couder
2020-09-21  7:59  0%       ` Kaartic Sivaraam
2020-09-21 20:56  0%       ` Shourya Shukla
2020-09-04 18:51  2% [PATCH] push: make `--force-with-lease[=<ref>]` safer Srinidhi Kaushik
2020-09-07 15:23  0% ` Phillip Wood
2020-09-07 19:45  0% ` Johannes Schindelin
2020-09-06  8:59     [PATCH] pack-bitmap-write: use hashwrite_be32() in write_hash_cache() René Scharfe
2020-09-06 19:02  5% ` Taylor Blau
2020-09-07  2:23  0%   ` Jeff King
2020-09-07  2:30  0%     ` Taylor Blau
2020-09-23 17:09     [PATCH] bisect: don't use invalid oid as rev when starting Christian Couder
2020-09-24  6:03     ` [PATCH v2] " Christian Couder
2020-09-24 18:55       ` Junio C Hamano
2020-09-24 19:56         ` Junio C Hamano
2020-09-24 20:53  4%       ` Junio C Hamano
2020-09-26 10:13     [PATCH v6 0/3] push: add "--[no-]force-if-includes" Srinidhi Kaushik
2020-09-26 11:46     ` [PATCH v7 " Srinidhi Kaushik
2020-09-26 11:46       ` [PATCH v7 1/3] push: add reflog check for "--force-if-includes" Srinidhi Kaushik
2020-09-26 23:42         ` Junio C Hamano
2020-09-27 12:27  6%       ` Srinidhi Kaushik
2020-09-27 14:17     [PATCH v8 0/3] push: add "--[no-]force-if-includes" Srinidhi Kaushik
2020-10-01  8:21     ` [PATCH v9 " Srinidhi Kaushik
2020-10-01  8:21       ` [PATCH v9 1/3] push: add reflog check for "--force-if-includes" Srinidhi Kaushik
2020-10-02 13:52         ` Johannes Schindelin
2020-10-02 14:50           ` Johannes Schindelin
2020-10-02 16:22  7%         ` Junio C Hamano
2020-10-07 14:09     [PATCH v4 00/10] [GSoC] Implement Corrected Commit Date Abhishek Kumar via GitGitGadget
2020-12-28 11:15     ` [PATCH v5 00/11] " Abhishek Kumar via GitGitGadget
2020-12-30  4:35  4%   ` Derrick Stolee
2021-01-10 14:06  0%     ` Abhishek Kumar
2020-11-04 14:57     [PATCH 0/2] update-ref: Allow creation of multiple transactions Patrick Steinhardt
2020-11-04 14:57     ` [PATCH 1/2] " Patrick Steinhardt
2020-11-05 19:29       ` Jeff King
2020-11-05 21:34         ` Junio C Hamano
2020-11-06 17:52           ` Jeff King
2020-11-06 19:30  6%         ` Junio C Hamano
     [not found]     <aeb24944-17af-cf53-93f4-e727f9fe9988@theori.io>
     [not found]     ` <xmqq4km4lppy.fsf@gitster.c.googlers.com>
2020-11-06 17:02       ` [PATCH v2] diff: handle NULL filespecs in run_external_diff Jinoh Kang
2020-11-06 17:14         ` [PATCH v3] diff: make diff_free_filespec_data accept NULL Jinoh Kang
2020-11-10 14:06           ` [PATCH v4] " Jinoh Kang
2020-11-10 15:38             ` Johannes Schindelin
2020-11-11 12:30               ` Jinoh Kang
2020-11-11 16:28  6%             ` Johannes Schindelin
2020-11-16 12:22     git-log: documenting pathspec usage Adam Spiers
2020-11-16 12:37  6% ` Ævar Arnfjörð Bjarmason
2020-11-16 16:45     Specify resume point with git difftool? Ryan Zoeller
2020-11-16 19:26  5% ` Junio C Hamano
2020-11-18 14:42     [PATCH v3] help.c: configurable suggestions Drew DeVault
2020-11-18 17:16  5% ` Junio C Hamano
2020-11-19 15:52     [PATCH 0/7] config: add --literal-value option Derrick Stolee via GitGitGadget
2020-11-19 15:52     ` [PATCH 1/7] t1300: test "set all" mode with value_regex Derrick Stolee via GitGitGadget
2020-11-19 22:24       ` Junio C Hamano
2020-11-20 18:39         ` Jeff King
2020-11-21 22:27           ` brian m. carlson
2020-11-22  3:31  4%         ` Junio C Hamano
2020-11-23 19:04     [RFC PATCH] usage: add trace2 entry upon warning() Jonathan Tan
2020-11-24 20:05     ` [PATCH v3] " Jonathan Tan
2020-11-24 22:15  6%   ` Junio C Hamano
2020-11-26 22:22     [PATCH v2 00/10] make "mktag" use fsck_tag() Ævar Arnfjörð Bjarmason
2020-12-09 20:01  2% ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2020-12-09 22:30  0%   ` Junio C Hamano
2020-12-23  1:35  2%   ` [PATCH v4 00/20] make "mktag" use fsck_tag() & more Ævar Arnfjörð Bjarmason
2020-12-04  1:26     Unexpected behavior with branch.*.{remote,pushremote,merge} Ben Denhartog
2020-12-04 10:13     ` Jeff King
2020-12-04 16:45       ` Ben Denhartog
2020-12-04 19:57         ` Junio C Hamano
2020-12-04 21:00  6%       ` Jeff King
2020-12-04 22:20  0%         ` Ben Denhartog
2020-12-19 17:07     Documentation errors for HTTP protocol v2 and packfile Ross Light
2020-12-21  7:54     ` [PATCH 0/2] pack-format.txt: document lengths at start of delta data Martin Ågren
2020-12-21  7:54       ` [PATCH 1/2] pack-format.txt: define "varint" format Martin Ågren
2020-12-21 21:40         ` Junio C Hamano
2020-12-29 22:41  6%       ` Martin Ågren
2020-12-23  4:53     [PATCH v5 0/1] mergetool: remove unconflicted lines Felipe Contreras
2020-12-23  4:53     ` [PATCH v5 1/1] mergetool: add automerge configuration Felipe Contreras
2020-12-23 13:34       ` Junio C Hamano
2020-12-23 14:23         ` Felipe Contreras
2020-12-23 20:21           ` Junio C Hamano
2020-12-24  0:14             ` Felipe Contreras
2020-12-24  0:32  6%           ` Junio C Hamano
2020-12-24  1:36  0%             ` Felipe Contreras
2021-01-08 18:19     [PATCH 0/8] pack-revindex: introduce on-disk '.rev' format Taylor Blau
2021-01-13 22:28     ` [PATCH v2 " Taylor Blau
2021-01-13 22:28       ` [PATCH v2 1/8] packfile: prepare for the existence of '*.rev' files Taylor Blau
2021-01-22 22:54         ` Jeff King
2021-01-25 17:44  5%       ` Taylor Blau
2021-01-17  4:02     [PATCH v4 0/3] builtin/ls-files.c:add git ls-file --dedup option 阿德烈 via GitGitGadget
2021-01-19  6:30     ` [PATCH v5 " 阿德烈 via GitGitGadget
2021-01-19  6:30       ` [PATCH v5 3/3] ls-files.c: add --deduplicate option ZheNing Hu via GitGitGadget
2021-01-20 21:26         ` Junio C Hamano
2021-01-21 11:00  6%       ` 胡哲宁
2021-02-02  9:31     [PATCH 0/9] stash show: learn --include-untracked and --only-untracked Denton Liu
2021-02-02  9:33     ` [PATCH 3/9] t3905: move all commands into test cases Denton Liu
2021-02-02 21:41  6%   ` Junio C Hamano
2021-02-25  1:21  5% [PATCH 1/2] remote: add camel-cased *.tagOpt key, like clone Ævar Arnfjörð Bjarmason
2021-02-25  3:02  0% ` Junio C Hamano
2021-03-05  0:55     [PATCH 00/11] Complete merge-ort implementation...almost Elijah Newren via GitGitGadget
2021-03-05  0:55     ` [PATCH 05/11] merge-ort: let renormalization change modify/delete into clean delete Elijah Newren via GitGitGadget
2021-03-08 12:55  6%   ` Ævar Arnfjörð Bjarmason
2021-03-05  0:55     ` [PATCH 08/11] merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries Elijah Newren via GitGitGadget
2021-03-08 13:06       ` Ævar Arnfjörð Bjarmason
2021-03-08 20:54  6%     ` Elijah Newren
2021-03-05  0:55     ` [PATCH 10/11] merge-ort: write $GIT_DIR/AUTO_MERGE whenever we hit a conflict Elijah Newren via GitGitGadget
2021-03-08 13:11       ` Ævar Arnfjörð Bjarmason
2021-03-08 21:51  6%     ` Elijah Newren
2021-03-13 18:35     Regarding GSoC Project 2021 Shubham Verma
2021-03-14  1:26  6% ` ZheNing Hu
2021-03-15  9:08     [PATCH v7] [GSOC] commit: add --trailer option ZheNing Hu via GitGitGadget
2021-03-15 13:07     ` [PATCH v8 0/2] " ZheNing Hu via GitGitGadget
2021-03-15 13:07       ` [PATCH v8 1/2] " ZheNing Hu via GitGitGadget
2021-03-16 12:52         ` Ævar Arnfjörð Bjarmason
2021-03-17  2:01  5%       ` ZheNing Hu
2021-03-17  8:08  0%         ` Ævar Arnfjörð Bjarmason
2021-03-17 13:54  0%           ` ZheNing Hu
2021-04-05 14:01     [PATCH] [GSOC] ref-filter: use single strbuf for all output ZheNing Hu via GitGitGadget
2021-04-07 15:26     ` [PATCH v2] " ZheNing Hu via GitGitGadget
2021-04-07 20:31  5%   ` Junio C Hamano
2021-04-08 12:05  0%     ` ZheNing Hu
2021-04-09 18:10     [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
2021-06-21 22:24     ` [PATCH v2 00/24] " Taylor Blau
2021-06-21 22:25       ` [PATCH v2 04/24] Documentation: build 'technical/bitmap-format' by default Taylor Blau
2021-07-21  9:58         ` Jeff King
2021-07-21 10:08           ` Jeff King
2021-07-21 17:23  6%         ` Taylor Blau
2021-07-23  7:39  0%           ` Jeff King
2021-04-14  6:13     Pain points in Git's patch flow Jonathan Nieder
2021-04-14  7:22     ` Bagas Sanjaya
2021-04-14  8:02  8%   ` Junio C Hamano
2021-04-14 19:14     [PATCH 0/2] prevent `repack` to unpack and delete promisor objects Rafael Silva
2021-04-18 13:57     ` [PATCH v2 0/1] " Rafael Silva
2021-04-18 13:57       ` [PATCH v2 1/1] repack: avoid loosening promisor objects in partial clones Rafael Silva
2021-04-19 23:09  6%     ` Junio C Hamano
2021-04-21 19:25  0%       ` Rafael Silva
2021-04-15 12:33     [PATCH] transport: respect verbosity when setting upstream Øystein Walle
2021-04-15 15:29     ` Eric Sunshine
2021-04-16 13:38       ` Øystein Walle
2021-04-16 18:48  6%     ` Junio C Hamano
2021-05-13  3:38     git-sh-prompt: bash: GIT_PS1_COMPRESSSPARSESTATE: unbound variable Christoph Anton Mitterer
2021-05-13  4:03     ` Junio C Hamano
2021-05-13  4:13       ` Junio C Hamano
2021-05-13  4:53         ` Elijah Newren
2021-05-13  5:01           ` Junio C Hamano
2021-05-19 17:56             ` Christoph Anton Mitterer
2021-05-19 23:29               ` Junio C Hamano
2021-05-20  0:09  6%             ` Elijah Newren
2021-06-20 15:11     [PATCH 00/12] Fix all leaks in tests t0002-t0099: Part 2 andrzej
2021-06-21 21:54  6% ` Elijah Newren
2021-07-14 11:17     [PATCH] refs file backend: remove dead "errno == EISDIR" code Ævar Arnfjörð Bjarmason
2021-07-14 16:21     ` Jeff King
2021-07-14 19:07  4%   ` Ævar Arnfjörð Bjarmason
2021-07-14 23:15  0%     ` Jeff King
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).