#leftoverbits f:avarab@gmail.com

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |

* Re: [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules
  @ 2023-03-07  8:41  7%     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2023-03-07  8:41 UTC (permalink / raw)
  To: Calvin Wan; +Cc: git, chooglen, newren, jonathantanmy, phillip.wood123

On Thu, Mar 02 2023, Calvin Wan wrote:

Some of this is stuff I probably should have noted in earlier rounds,
sorry, but then again the diff-churn in those made it harder to review,
now that that's mostly out of the way (yay!) ....

> +submodule.diffJobs::
> +	Specifies how many submodules are diffed at the same time. A
> +	positive integer allows up to that number of submodules diffed
> +	in parallel. A value of 0 will give some reasonable default.
> +	If unset, it defaults to 1. The diff operation is used by many

Nit: Maybe start a new paragraph as of "The diff..."?

> +	other git commands such as add, merge, diff, status, stash and
> +	more. Note that the expensive part of the diff operation is

Nit: Maybe change 'add', 'merge' etc. to linkgit:git-add[1], or quote
them?

> +	reading the index from cache or memory. Therefore multiple jobs

With how much we conflate "the cache" and "index" saying "the index from
cache" might be especially confusing. I think we can just skip " from
cache or memory" here.

>  static int match_stat_with_submodule(struct diff_options *diffopt,
>  				     const struct cache_entry *ce,
>  				     struct stat *st, unsigned ce_option,
> -				     unsigned *dirty_submodule)
> +				     unsigned *dirty_submodule, int *defer_submodule_status,

Nit: The other one is an "unsigned", shouldn't "defer_submodule_status"
also be (more on this below).

> +				     unsigned *ignore_untracked)
>  {
>  	int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option);
> +	int defer = 0;
> +
>  	if (S_ISGITLINK(ce->ce_mode)) {
>  		struct diff_flags orig_flags = diffopt->flags;
>  		if (!diffopt->flags.override_submodule_config)
>  			set_diffopt_flags_from_submodule_config(diffopt, ce->name);

The meaty functional change here looks *much* better, thanks! I.e. this
is pretty much what I suggested in
https://lore.kernel.org/git/230208.861qn01s4g.gmgdl@evledraar.gmail.com/

> -		if (diffopt->flags.ignore_submodules)
> +		if (diffopt->flags.ignore_submodules) {

Not worth a re-roll in itself, but FWIW I think this change would be
marginally easier to follow with *a* preceding refactoring change, but
per the above &
https://lore.kernel.org/git/230209.867cwrzk1l.gmgdl@evledraar.gmail.com/
I just didn't think v7's 6/7
(https://lore.kernel.org/git/20230207181706.363453-7-calvinwan@google.com/)
was what we needed there.

I.e. in this case a leading change that would add these braces would
make this a bit easier to read...

>  			changed = 0;
> -		else if (!diffopt->flags.ignore_dirty_submodules &&

...ditto this line, which would stay the same.

> -			 (!changed || diffopt->flags.dirty_submodules))
> -			*dirty_submodule = is_submodule_modified(ce->name,
> -								 diffopt->flags.ignore_untracked_in_submodules);

Here you are incorrectly changing the indentation of this away from our
usual coding style, which...

> +		} else if (!diffopt->flags.ignore_dirty_submodules &&
> +			   (!changed || diffopt->flags.dirty_submodules)) {
> +			if (defer_submodule_status && *defer_submodule_status) {

Hrm, if if I remove that "&& *defer_submodule_status" all of our tests
pass, the only two callers of this function are one where this is NULL,
and where it's non-NULL but pre-initilized to 1, and the caller will
check if it's then flipped to 0.

> +				defer = 1;
> +				*ignore_untracked = diffopt->flags.ignore_untracked_in_submodules;
> +			} else {
> +				*dirty_submodule = is_submodule_modified(ce->name,
> +					 diffopt->flags.ignore_untracked_in_submodules);

...needlessly inflates the diff here, at least under -w and move
detection, as we correctly detect the "*dirty_submodule" line as the
same, but the "diffopt->flags" line also has a re-indentation change
unrelated to adding the "else" scope.

> +			}
> +		}
>  		diffopt->flags = orig_flags;
>  	}
> +
> +	if (defer_submodule_status)
> +		*defer_submodule_status = defer;

Having read this whole thing to the end again I think this on top would
be much simpler (if I'm right about it being functionally equivalent),
and would address some of the above:

	diff --git a/diff-lib.c b/diff-lib.c
	index 7fe6ced9501..d5c823f512a 100644
	--- a/diff-lib.c
	+++ b/diff-lib.c
	@@ -78,7 +78,6 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
	 				     unsigned *ignore_untracked)
	 {
	 	int changed = ie_match_stat(diffopt->repo->index, ce, st, ce_option);
	-	int defer = 0;

	 	if (S_ISGITLINK(ce->ce_mode)) {
	 		struct diff_flags orig_flags = diffopt->flags;
	@@ -88,8 +87,8 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
	 			changed = 0;
	 		} else if (!diffopt->flags.ignore_dirty_submodules &&
	 			   (!changed || diffopt->flags.dirty_submodules)) {
	-			if (defer_submodule_status && *defer_submodule_status) {
	-				defer = 1;
	+			if (defer_submodule_status) {
	+				*defer_submodule_status = 1;
	 				*ignore_untracked = diffopt->flags.ignore_untracked_in_submodules;
	 			} else {
	 				*dirty_submodule = is_submodule_modified(ce->name,
	@@ -99,8 +98,6 @@ static int match_stat_with_submodule(struct diff_options *diffopt,
	 		diffopt->flags = orig_flags;
	 	}

	-	if (defer_submodule_status)
	-		*defer_submodule_status = defer;
	 	return changed;
	 }

	@@ -153,7 +150,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
	 		unsigned int newmode;
	 		struct cache_entry *ce = istate->cache[i];
	 		int changed;
	-		int defer_submodule_status = 1;
	+		int defer_submodule_status = 0;

	 		if (diff_can_quit_early(&revs->diffopt))
	 			break;

We could also just leave it, but I for one found it a bit hard to follow
that this interface seems to be a tri-state (NULL, set to 0, set to 1),
but really it's dual-state, i.e. NULL or a "tell me to defer this" bit.

>  	return changed;
>  }
>  
> @@ -124,6 +140,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  			      ? CE_MATCH_RACY_IS_DIRTY : 0);
>  	uint64_t start = getnanotime();
>  	struct index_state *istate = revs->diffopt.repo->index;
> +	struct string_list submodules = STRING_LIST_INIT_NODUP;
>  
>  	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
>  
> @@ -136,7 +153,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
>  		unsigned int newmode;
>  		struct cache_entry *ce = istate->cache[i];
>  		int changed;
> -		unsigned dirty_submodule = 0;
> +		int defer_submodule_status = 1;

Hrm, having suggested the diff above I just noticed this now, I ended up
inverting this, but found the "defer_submodule_status" name a bit odd,
can't we just keep "unsigned dirty_submodule"? (that would also address
the change from "unsigned" to "int" noted above, which is seeminly
unnecessary).

But maybe I'm missing a subtlety here, and we should have "deferred
status" as apposed to "dirty submodule", but in any case the new one
looks like it doesn't need negative values.

> +	}
> +	if (submodules.nr) {
> +		unsigned long parallel_jobs;
> +		struct string_list_item *item;
> +
> +		if (git_config_get_ulong("submodule.diffjobs", &parallel_jobs))
> +			parallel_jobs = 1;
> +		else if (!parallel_jobs)
> +			parallel_jobs = online_cpus();

Given that online_cpus() returns int the "unsigned long" is slightly odd
here, but it's because git_config_get_ulong() exist, but we have no
git_config_get_uint(), so this is OK (but could be cleaned up as some
#leftoverbits).

> +		if (get_submodules_status(&submodules, parallel_jobs))
> +			die(_("submodule status failed"));

Here we're adding get_submodules_status(), and returning the actual
error code from "status", but then ignoring it here, and returning 128
for any non-zero.

I think this would be better as either:

	code = get_submodules_status(...);
	die_message(...)
	exit(code);

Or to just have the function itself return !!status, i.e. a "ok" or "not
ok".

Admittedly a nit, but I have spent quite a bit of time chasing down
various exit-code losses in the submodule code, and it would be nice if
we just carry the code up, or more explicitly ignore it, but don't add
code that seems to care about it, but really doesn't.

I also changed this "die" to a "BUG" and our tests passed, so we have no
tests for when "status" failed, will such a thing even happen in
practice?

> +		for_each_string_list_item(item, &submodules) {
> +			struct submodule_status_util *util = item->util;
> +
> +			record_file_diff(&revs->diffopt, util->newmode,
> +					 util->dirty_submodule, util->changed,
> +					 istate, util->ce);
> +		}
>  	}
> +	string_list_clear(&submodules, 1);
>  	diffcore_std(&revs->diffopt);
>  	diff_flush(&revs->diffopt);
>  	trace_performance_since(start, "diff-files");
> @@ -322,7 +379,7 @@ static int get_stat_data(const struct index_state *istate,
>  			return -1;
>  		}
>  		changed = match_stat_with_submodule(diffopt, ce, &st,
> -						    0, dirty_submodule);
> +						    0, dirty_submodule, NULL, NULL);
>  		if (changed) {
>  			mode = ce_mode_from_stat(ce, st.st_mode);
>  			oid = null_oid();
> diff --git a/submodule.c b/submodule.c
> index 426074cebb..6f6e150a3f 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1373,6 +1373,13 @@ int submodule_touches_in_range(struct repository *r,
>  	return ret;
>  }
>  
> +struct submodule_parallel_status {
> +	size_t index_count;
> +	int result;
> +
> +	struct string_list *submodule_names;
> +};

Hrm, actually reading a bit more I think part of my comments above are
incorrect, i.e. this "result" seems like an exit code, but really in the
guts of the API we're ignoring the actual code we get, and just setting
this to 1.

Per the above I think it might be OK to ignore the exit code (or not),
but I really wish we did this more explicitly, e.g. if you want to
ignore it call this something like "failed", not "result", and make it
an "unsigned int failed:1" to firmly indicate that it's a boolean at the
API level.

> +struct status_task {
> +	const char *path;

I think we should call this "ce_path", but more on that below.

> +	struct strbuf out;
> +	int ignore_untracked;

Continued type mismatch commentary: Elsewhere in this diff this is
"unsigned", and this compiles for me if I make it "unsigned int
ignore_untracked:1", so let's set it to such a flag instead?

> +static int status_finish(int retvalue, struct strbuf *err,
> +			 void *cb, void *task_cb)
> +{
> +	struct submodule_parallel_status *sps = cb;
> +	struct status_task *task = task_cb;
> +	struct string_list_item *it =
> +		string_list_lookup(sps->submodule_names, task->path);
> +	struct submodule_status_util *util = it->util;
> +	struct string_list list = STRING_LIST_INIT_DUP;
> +	struct string_list_item *item;
> +
> +	if (retvalue) {
> +		sps->result = 1;
> +		strbuf_addf(err, _(STATUS_PORCELAIN_FAIL_ERROR), task->path);
> +	}
> +
> +	string_list_split(&list, task->out.buf, '\n', -1);

I think I noted in some earlier round that taking a string and splitting
it by \n was a bit wasteful in the test code, but this uses the same
pattern.

Maybe it's not a performance concern here either, but won't we
potentially have to parse some very large statuses here?

Aside from that, I haven't tried or reviewed this bit in detail, but
this seems to be making things harder than they need to be. Why are we
buffering up all of the output into "out" here, only to split it by "\n"
later on, and then consider each line as a status line?

Shouldn't we be allocating this string_list to begin with, and append to
it in the "status_on_stderr_output" callback instead?

> +	for_each_string_list_item(item, &list) {
> +		if (parse_status_porcelain(item->string,
> +					   strlen(item->string),
> +					   &util->dirty_submodule,
> +					   util->ignore_untracked))

OK, this seemingly buggy bit of error handling seems to actually be OK
on further review, because we'll BUG() out in the function if it fails,
so the non-zero return here just means "we're done here".

> +			break;
> +	}

Style: drop the braces here, as this is just a for/if/body with a single
body line.

> +int get_submodules_status(struct string_list *submodules,
> +			  int max_parallel_jobs)
> +{
> +	struct submodule_parallel_status sps = {
> +		.submodule_names = submodules,
> +	};
> +	const struct run_process_parallel_opts opts = {
> +		.tr2_category = "submodule",
> +		.tr2_label = "parallel/status",
> +
> +		.processes = max_parallel_jobs,
> +
> +		.get_next_task = get_next_submodule_status,
> +		.start_failure = status_start_failure,
> +		.on_stderr_output = status_on_stderr_output,
> +		.task_finished = status_finish,
> +		.data = &sps,
> +	};
> +
> +	string_list_sort(sps.submodule_names);
> +	run_processes_parallel(&opts);
> +
> +	return sps.result;

All OK, except as noted above the "result" here is just "did we fail?".

> +}
> +
>  int submodule_uses_gitfile(const char *path)
>  {
>  	struct child_process cp = CHILD_PROCESS_INIT;
> diff --git a/submodule.h b/submodule.h
> index b52a4ff1e7..08d278a414 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -41,6 +41,13 @@ struct submodule_update_strategy {
>  	.type = SM_UPDATE_UNSPECIFIED, \
>  }
>  
> +struct submodule_status_util {
> +	int changed, ignore_untracked;
> +	unsigned dirty_submodule, newmode;
> +	struct cache_entry *ce;
> +	const char *path;

Re "ce_path" above: What's the point of adding a "path" here if we
already have "ce"? You just seem to assign "path" to "ce->name"
always. I tried this fix-up on top & it worked:

	diff --git a/diff-lib.c b/diff-lib.c
	index d5c823f512a..39d8179f0ed 100644
	--- a/diff-lib.c
	+++ b/diff-lib.c
	@@ -294,7 +294,6 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
	 					.ignore_untracked = ignore_untracked,
	 					.newmode = newmode,
	 					.ce = ce,
	-					.path = ce->name,
	 				};
	 				struct string_list_item *item;

	diff --git a/submodule.c b/submodule.c
	index 3eba00f1533..c220d85815a 100644
	--- a/submodule.c
	+++ b/submodule.c
	@@ -2002,11 +2002,11 @@ get_status_task_from_index(struct submodule_parallel_status *sps,
	 		struct submodule_status_util *util = sps->submodule_names->items[sps->index_count].util;
	 		struct status_task *task;

	-		if (!verify_submodule_git_directory(util->path))
	+		if (!verify_submodule_git_directory(util->ce->name))
	 			continue;

	 		task = xmalloc(sizeof(*task));
	-		task->path = util->path;
	+		task->path = util->ce->name;
	 		task->ignore_untracked = util->ignore_untracked;
	 		strbuf_init(&task->out, 0);
	 		sps->index_count++;
	diff --git a/submodule.h b/submodule.h
	index 3b6abca05cd..3427c495573 100644
	--- a/submodule.h
	+++ b/submodule.h
	@@ -45,7 +45,6 @@ struct submodule_status_util {
	 	int changed, ignore_untracked;
	 	unsigned dirty_submodule, newmode;
	 	struct cache_entry *ce;
	-	const char *path;
	 };

	 int is_gitmodules_unmerged(struct index_state *istate);

I'd be all for actually narrowing the scope of data we get in general,
i.e. do we need all of the "ce" members? I didn't check, but doing this
just seems like needless duplication.

> @@ -94,6 +101,8 @@ int fetch_submodules(struct repository *r,
>  		     int command_line_option,
>  		     int default_option,
>  		     int quiet, int max_parallel_jobs);
> +int get_submodules_status(struct string_list *submodules,
> +			  int max_parallel_jobs);

It would be nice to get some API docs for the new function, re its
"result" behavior etc. noted above

>  unsigned is_submodule_modified(const char *path, int ignore_untracked);
>  int submodule_uses_gitfile(const char *path);
>  
> diff --git a/t/t4027-diff-submodule.sh b/t/t4027-diff-submodule.sh
> index 40164ae07d..1c747cc325 100755
> --- a/t/t4027-diff-submodule.sh
> +++ b/t/t4027-diff-submodule.sh
> @@ -34,6 +34,25 @@ test_expect_success setup '
>  	subtip=$3 subprev=$2
>  '
>  
> +test_expect_success 'diff in superproject with submodules respects parallel settings' '
> +	test_when_finished "rm -f trace.out" &&
> +	(
> +		GIT_TRACE=$(pwd)/trace.out git diff &&
> +		grep "1 tasks" trace.out &&
> +		>trace.out &&
> +
> +		git config submodule.diffJobs 8 &&
> +		GIT_TRACE=$(pwd)/trace.out git diff &&
> +		grep "8 tasks" trace.out &&
> +		>trace.out &&
> +
> +		GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 diff &&
> +		grep "preparing to run up to [0-9]* tasks" trace.out &&
> +		! grep "up to 0 tasks" trace.out &&
> +		>trace.out
> +	)
> +'
> +
>  test_expect_success 'git diff --raw HEAD' '
>  	hexsz=$(test_oid hexsz) &&
>  	git diff --raw --abbrev=$hexsz HEAD >actual &&
> @@ -70,6 +89,18 @@ test_expect_success 'git diff HEAD with dirty submodule (work tree)' '
>  	test_cmp expect.body actual.body
>  '
>  
> +test_expect_success 'git diff HEAD with dirty submodule (work tree, parallel)' '
> +	(
> +		cd sub &&
> +		git reset --hard &&
> +		echo >>world
> +	) &&
> +	git -c submodule.diffJobs=8 diff HEAD >actual &&
> +	sed -e "1,/^@@/d" actual >actual.body &&
> +	expect_from_to >expect.body $subtip $subprev-dirty &&
> +	test_cmp expect.body actual.body
> +'
> +
>  test_expect_success 'git diff HEAD with dirty submodule (index)' '
>  	(
>  		cd sub &&
> diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh
> index d050091345..7da64e4c4c 100755
> --- a/t/t7506-status-submodule.sh
> +++ b/t/t7506-status-submodule.sh
> @@ -412,4 +412,29 @@ test_expect_success 'status with added file in nested submodule (short)' '
>  	EOF
>  '
>  
> +test_expect_success 'status in superproject with submodules respects parallel settings' '
> +	test_when_finished "rm -f trace.out" &&
> +	(
> +		GIT_TRACE=$(pwd)/trace.out git status &&
> +		grep "1 tasks" trace.out &&
> +		>trace.out &&
> +
> +		git config submodule.diffJobs 8 &&
> +		GIT_TRACE=$(pwd)/trace.out git status &&
> +		grep "8 tasks" trace.out &&
> +		>trace.out &&
> +
> +		GIT_TRACE=$(pwd)/trace.out git -c submodule.diffJobs=0 status &&
> +		grep "preparing to run up to [0-9]* tasks" trace.out &&
> +		! grep "up to 0 tasks" trace.out &&
> +		>trace.out
> +	)
> +'
> +
> +test_expect_success 'status in superproject with submodules (parallel)' '
> +	git -C super status --porcelain >output &&
> +	git -C super -c submodule.diffJobs=8 status --porcelain >output_parallel &&
> +	diff output output_parallel

Shouldn't this be a "test_cmp" instead of "diff", and use "actual" and
"expect" instead of "output" and "output_parallel"?

I'd also rename the test to something like "output with
submodule.diffJobs=N equals submodule.diffJobs=1".

Except is that even correct? Don't we need to set submodule.diffJobs=1
explicitly so it doesn't default to online_cpus() here? Maybe I missed
an earlier config setup...

^ permalink raw reply	[relevance 7%]

* Re: [PATCH] cache-tree: fix strbuf growth in prime_cache_tree_rec()
  @ 2023-02-06 16:18 13%     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2023-02-06 16:18 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: René Scharfe, Git List, Junio C Hamano, Victoria Dye


On Mon, Feb 06 2023, Derrick Stolee wrote:

> On 2/5/2023 4:12 PM, Ævar Arnfjörð Bjarmason wrote:
> [...]
>> One wonders if (even for this index-related code) we really need such
>> careful management of growth, and could instead do with:
>> 
>> 	strbuf_setlen(tree_path, base_path_len);
>> 	strbuf_add(tree_path, entry.path, entry.pathlen);
>> 	strbuf_addch(tree_path, '/');
>
> This would be my preferred way to go here.

*nod*

>> Or even just:
>> 
>> 	strbuf_addf(tree_path, "%*.s/", (int)entry.pathlen, entry.path);
>
> Please do not add "addf" functions that can be run in tight loops.
> It's faster to do strbuf_add() followed by strbuf_addch().

Good point.

I wondered just how much slower, and it's up to 3x! At least according
to this[1] artificial test case (where I usurped a random test helper).

I wondered if we could just handle some common strbuf_addf() cases
ourselves, and the benchmark shows (manually annotated, too lazy to set
up the -n option):

	git hyperfine -L rev HEAD~5,HEAD~4,HEAD~3,HEAD~2,HEAD~1,HEAD~0 -s 'make CFLAGS=-O3' './t/helper/test-tool online-cpus' -r 3
	[...]
	Summary
	  './t/helper/test-tool online-cpus' in 'HEAD~0' ran <== strbuf_add() + strbuf_addch()
	    1.06 ± 0.11 times faster than './t/helper/test-tool online-cpus' in 'HEAD~1' <== strbuf_addstr() + strbuf_addch()
	    1.18 ± 0.12 times faster than './t/helper/test-tool online-cpus' in 'HEAD~4' <== hand optimized strbuf_addf() for "%sC"
	    1.33 ± 0.18 times faster than './t/helper/test-tool online-cpus' in 'HEAD~2' <== hand optimized strbuf_addf() for "%*sC"
	    2.63 ± 0.05 times faster than './t/helper/test-tool online-cpus' in 'HEAD~5' <== strbuf_addf("%s/")
	    2.92 ± 0.25 times faster than './t/helper/test-tool online-cpus' in 'HEAD~3' <== strbuf_addf("%*s/")

The "hand optimization" just being a very stupid handling of "%sC" for
arbitrary values of a single char "C", and ditto for "%*sC" (which
curiously is slower here).

So, for truly hot loops we'd still want to use the add + addch, but if
anyone's interested (hashtag #leftoverbits) it looks like we could get
some easy wins (and reduction in code size, as we could stop worrying
about addf being slow in most cases) with some very dumb minimal
vaddf(), which could handle these cases (but not anything involving
padding etc.).

I didn't dig, but wouldn't be surprised if the reason is that C
libraries need to carry a relatively fat & general sprintf() for all
those edge cases, locale handling etc, whereas most of our use could
trivially be represented as some sequence of addstr()/addf() etc.

Another interesting approach (and this is very #leftoverbits) would be
to perform the same optimization with coccinelle.

I.e. our current use of it is purely "this code X should be written like
Y, and we should commit Y".

But there's no reason for why we couldn't effectively implement our own
compiler optimizations for our own APIs with it, so just grab "%s/" etc,
unpack that in OCaml, then emit strbuf_add() + strbuf_addch(), and that
would be what the C compiler would see.

1.
	
	9d23ffb1117 addf + nolen
	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
	index 8cb0d53840f..c802ec579d0 100644
	--- a/t/helper/test-online-cpus.c
	+++ b/t/helper/test-online-cpus.c
	@@ -1,9 +1,17 @@
	 #include "test-tool.h"
	 #include "git-compat-util.h"
	 #include "thread-utils.h"
*	+#include "strbuf.h"
	 
	 int cmd__online_cpus(int argc, const char **argv)
	 {
	-	printf("%d\n", online_cpus());
	+	struct strbuf sb = STRBUF_INIT;
	+	const char *const str = "Hello, World";
	+
	+	for (size_t i = 0; i < 10000000; i++) {
	+		strbuf_reset(&sb);
	+		strbuf_addf(&sb, "%s/", str);
	+		puts(sb.buf);
	+	}
	 	return 0;
	 }
	9f74eff5623 addf + nolen optimize
	diff --git a/strbuf.c b/strbuf.c
	index c383f41a3c5..750e5e6a5b4 100644
	--- a/strbuf.c
	+++ b/strbuf.c
	@@ -332,8 +332,16 @@ void strbuf_addchars(struct strbuf *sb, int c, size_t n)
	 void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
	 {
	 	va_list ap;
	+
	 	va_start(ap, fmt);
	-	strbuf_vaddf(sb, fmt, ap);
	+	if (*fmt == '%' && *(fmt + 1) == 's' && *(fmt + 2) && !*(fmt + 3)) {
	+		const char *arg = va_arg(ap, const char *);
	+
	+		strbuf_addstr(sb, arg);
	+		strbuf_addch(sb, *(fmt + 2));
	+	} else {
	+		strbuf_vaddf(sb, fmt, ap);
	+	}
	 	va_end(ap);
	 }
	 
	ca60bb9b479 addf + len
	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
	index c802ec579d0..7257e622015 100644
	--- a/t/helper/test-online-cpus.c
	+++ b/t/helper/test-online-cpus.c
	@@ -7,10 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
	 {
	 	struct strbuf sb = STRBUF_INIT;
	 	const char *const str = "Hello, World";
	+	const size_t len = strlen(str);
	 
	 	for (size_t i = 0; i < 10000000; i++) {
	 		strbuf_reset(&sb);
	-		strbuf_addf(&sb, "%s/", str);
	+		strbuf_addf(&sb, "%*s/", (int)len, str);
	 		puts(sb.buf);
	 	}
	 	return 0;
	1f47987d095 addf + len optimize
	diff --git a/strbuf.c b/strbuf.c
	index 750e5e6a5b4..88801268f7a 100644
	--- a/strbuf.c
	+++ b/strbuf.c
	@@ -334,11 +334,16 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
	 	va_list ap;
	 
	 	va_start(ap, fmt);
	-	if (*fmt == '%' && *(fmt + 1) == 's' && *(fmt + 2) && !*(fmt + 3)) {
	+	if (*fmt == '%' &&
	+	    *(fmt + 1) == '*' &&
	+	    *(fmt + 2) == 's' &&
	+	    *(fmt + 3) &&
	+	    !*(fmt + 4)) {
	+		int len = va_arg(ap, int);
	 		const char *arg = va_arg(ap, const char *);
	 
	-		strbuf_addstr(sb, arg);
	-		strbuf_addch(sb, *(fmt + 2));
	+		strbuf_add(sb, arg, len);
	+		strbuf_addch(sb, *(fmt + 3));
	 	} else {
	 		strbuf_vaddf(sb, fmt, ap);
	 	}
	55c698c0b95 addstr
	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
	index 7257e622015..2716b44ca15 100644
	--- a/t/helper/test-online-cpus.c
	+++ b/t/helper/test-online-cpus.c
	@@ -7,11 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
	 {
	 	struct strbuf sb = STRBUF_INIT;
	 	const char *const str = "Hello, World";
	-	const size_t len = strlen(str);
	 
	 	for (size_t i = 0; i < 10000000; i++) {
	 		strbuf_reset(&sb);
	-		strbuf_addf(&sb, "%*s/", (int)len, str);
	+		strbuf_addstr(&sb, str);
	+		strbuf_addch(&sb, '/');
	 		puts(sb.buf);
	 	}
	 	return 0;
	b17fb99bf7e (HEAD -> master) add
	diff --git a/t/helper/test-online-cpus.c b/t/helper/test-online-cpus.c
	index 2716b44ca15..5e52b622c4d 100644
	--- a/t/helper/test-online-cpus.c
	+++ b/t/helper/test-online-cpus.c
	@@ -7,10 +7,11 @@ int cmd__online_cpus(int argc, const char **argv)
	 {
	 	struct strbuf sb = STRBUF_INIT;
	 	const char *const str = "Hello, World";
	+	const size_t len = strlen(str);
	 
	 	for (size_t i = 0; i < 10000000; i++) {
	 		strbuf_reset(&sb);
	-		strbuf_addstr(&sb, str);
	+		strbuf_add(&sb, str, len);
	 		strbuf_addch(&sb, '/');
	 		puts(sb.buf);
	 	}

^ permalink raw reply	[relevance 13%]

* "test_atexit" v.s. "test_when_finished" (was: [PATCH 3/3] t1509: facilitate repeated script invocations)
  @ 2022-12-08 13:14 14%         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2022-12-08 13:14 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Eric Sunshine, Eric Sunshine via GitGitGadget, git

On Thu, Dec 08 2022, Johannes Schindelin wrote:

> On Mon, 5 Dec 2022, Eric Sunshine wrote:
>
>> On Mon, Dec 5, 2022 at 9:48 PM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> > On Mon, Nov 21 2022, Eric Sunshine via GitGitGadget wrote:
>>> [...]
>> > This is an existing wart, but I also wondered why the "expected",
>> > "result" etc. was needed. Either we could make the tests creating those
>> > do a "test_when_finished" removal of it, or better yet just create those
>> > in the trash directory.
>
> An even better suggestion would be to use `test_atexit`, of course.

Why?

For assets that are only needed within a given test we prefer cleaning
them up with "test_when_finished", there's legitimate uses for
"test_atexit", but those are for global state.

In this case (and again, we're discussing the #leftoverbits if someone
wants to poke at this again) the tests in question could relatively
easily be changed to do the creation and cleanup of the files that are
"test_cmp"'d (or similar) within the lifetime of individual tests
("test_when_finished"), rather than the lifetime of the script
("test_atexit").

A good reason for why we do it way is that it has a nice interaction
with "--immediate --debug".

On failure we'll skip the cleanup for the current test that just failed,
but we're not distracted by scratch files from earlier tests, those
would have already been cleaned up if they used the same
"test_when_finished" pattern.

If you use "test_atexit" to do that all subsequent tests need to deal
with the sum of your scratch files, until they're cleaned up in one big
operation at the end.

It not only makes that debugging case harder, but also to write tests,
as you'll need to contend with more unwanted global state in your test
playground the further down the test file you are.

So I think what you're recommending here is an anti-pattern for the
common case.

There *are* cases where we really do need the "global cleanup",
e.g. tests that spawn the apache httpd use "test_atexit" rather than
"test_when_finished", we don't want to have to start/stop the httpd for each test.

We should leave "test_atexit" for those sorts of cases, not routine
per-test scratch file creation.

I semi-regularly run into cases where a stale "httpd" is left running in
the background from such tests (and not after I kill -9'd a test), so I
suspect we also have tricky races in that are, that probably aren't
improved by "test_atexit".

^ permalink raw reply	[relevance 14%]

* Re: [PATCH] maintenance: compare output of pthread functions for inequality with 0
  @ 2022-12-02 18:10 16% ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2022-12-02 18:10 UTC (permalink / raw)
  To: Rose via GitGitGadget; +Cc: git, Seija


On Fri, Dec 02 2022, Rose via GitGitGadget wrote:

> From: Seija <doremylover123@gmail.com>
>
> The documentation for pthread_create and pthread_sigmask state that:
>
> "On success, pthread_create() returns 0;
> on error, it returns an error number"
>
> As such, we ought to check for an error
> by seeing if the output is not 0.
>
> Checking for "less than" is a mistake
> as the error code numbers can be greater than 0.
>
> Signed-off-by: Seija <doremylover123@gmail.com>
> ---
>     maintenance: compare output of pthread functions for inequality with 0
>     
>     The documentation for pthread_create and pthread_sigmask state that "On
>     success, pthread_create() returns 0; on error, it returns an error
>     number, and the contents of *thread are undefined."
>     
>     As such, we ought to check for an error by seeing if the output is not
>     0, rather than being less than 0, since nothing stops these functions
>     from returning a positive number.
>     
>     Signed-off by: Seija doremylover123@gmail.com
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1389%2FAtariDreams%2Faddress-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1389/AtariDreams/address-v1
> Pull-Request: https://github.com/git/git/pull/1389
>
>  builtin/fsmonitor--daemon.c | 4 ++--
>  run-command.c               | 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/fsmonitor--daemon.c b/builtin/fsmonitor--daemon.c
> index 6f30a4f93a7..52a08bb3b57 100644
> --- a/builtin/fsmonitor--daemon.c
> +++ b/builtin/fsmonitor--daemon.c
> @@ -1209,7 +1209,7 @@ static int fsmonitor_run_daemon_1(struct fsmonitor_daemon_state *state)
>  	 * events.
>  	 */
>  	if (pthread_create(&state->listener_thread, NULL,
> -			   fsm_listen__thread_proc, state) < 0) {
> +			   fsm_listen__thread_proc, state)) {
>  		ipc_server_stop_async(state->ipc_server_data);
>  		err = error(_("could not start fsmonitor listener thread"));
>  		goto cleanup;
> @@ -1220,7 +1220,7 @@ static int fsmonitor_run_daemon_1(struct fsmonitor_daemon_state *state)
>  	 * Start the health thread to watch over our process.
>  	 */
>  	if (pthread_create(&state->health_thread, NULL,
> -			   fsm_health__thread_proc, state) < 0) {
> +			   fsm_health__thread_proc, state)) {
>  		ipc_server_stop_async(state->ipc_server_data);
>  		err = error(_("could not start fsmonitor health thread"));
>  		goto cleanup;
> diff --git a/run-command.c b/run-command.c
> index 48b9ba6d6f0..756f1839aab 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -1019,7 +1019,7 @@ static void *run_thread(void *data)
>  		sigset_t mask;
>  		sigemptyset(&mask);
>  		sigaddset(&mask, SIGPIPE);
> -		if (pthread_sigmask(SIG_BLOCK, &mask, NULL) < 0) {
> +		if (pthread_sigmask(SIG_BLOCK, &mask, NULL)) {
>  			ret = error("unable to block SIGPIPE in async thread");
>  			return (void *)ret;
>  		}
>
> base-commit: 805265fcf7a737664a8321aaf4a0587b78435184

This looks good to me, and skimming through the rest of the
pthread_create() it seems the rest of the code in-tree is correct.

But (and especially if you're interested) we really should follow-up
here and fix the "error()" etc. part of this. After this we have cases
in-tree where we on failure:

 * Call die_errno() (good)
 * Call die(), error() etc., but with a manual strerror() argument,
   these should just use the *_errno() helper.
 * Don't report on the errno at all, e.g. in this case shown here.

It seems to me that all of these should be using die_errno(),
error_errno() etc.

Or maybe it's the other way around, and we should not rely on the global
"errno", but always capture the return value, and give that to
strerror() (or set "errno = ret", and call {die,error,warning}_errno()).

In any case, some low-hanging #leftoverbits there...


^ permalink raw reply	[relevance 16%]

* Re: [PATCH v2] send-email: relay '-v N' to format-patch
  @ 2022-11-28 12:34 17%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2022-11-28 12:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kyle Meyer, git

On Sun, Nov 27 2022, Junio C Hamano wrote:

> Kyle Meyer <kyle@kyleam.com> writes:
>
>> Here's a patch handling the -v case.  I don't plan on working on a more
>> complete fix for the other cases (as I mentioned before, I don't use
>> send-email to drive format-patch), but in my opinion the -v fix by
>> itself is still valuable.
>
> Yup, I think it is a good place to stop for the first patch.  Other
> people can add more when they discover the need, and anything more
> complex [*] is probably not worth the effort, I would think.
>
>     Side note: [*] we could imagine running "git format-patch -h"
>     (or a new variant of it), parse its output and populate the
>     %options dynamically, for example.
>
> Will queue.  Thanks.

This is just a comment on the #leftoverbits: I've looked at this option
parsing in "git-send-email" before, and IMO the right long-term fix is
to split out the *.perl code into a "git send-email--helper", and do the
option parsing in C using our parse_options().

Some of it will be a bit of a hassle, but it should be much easier after
8de2e2e41b2 (Merge branch 'ab/send-email-optim', 2021-07-22) (and the
subsquent regression fix).

^ permalink raw reply	[relevance 17%]

* Re: [PATCH v3 2/2] worktree add: add --orphan flag
  @ 2022-11-19 11:50  6%               ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2022-11-19 11:50 UTC (permalink / raw)
  To: Jacob Abel; +Cc: Eric Sunshine, git, Taylor Blau


On Sat, Nov 19 2022, Jacob Abel wrote:

> On 22/11/15 11:35PM, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Tue, Nov 15 2022, Eric Sunshine wrote:
>>
>> > On Tue, Nov 15, 2022 at 4:13 PM Ævar Arnfjörð Bjarmason
>> > <avarab@gmail.com> wrote:
>> >> On Thu, Nov 10 2022, Jacob Abel wrote:
>> >> > Adds support for creating an orphan branch when adding a new worktree.
>> >> > This functionality is equivalent to git switch's --orphan flag.
>> >> >
>> >> > The original reason this feature was implemented was to allow a user
>> >> > to initialise a new repository using solely the worktree oriented
>> >> > workflow. Example usage included below.
>> >> >
>> >> > $ GIT_DIR=".git" git init --bare
>> >> > $ git worktree add --orphan master master/
>> >> >
>> >> > Signed-off-by: Jacob Abel <jacobabel@nullpo.dev>
>> >> > ---
>> >> > +Create a worktree containing an orphan branch named `<branch>` with a
>> >> > +clean working directory.  See `--orphan` in linkgit:git-switch[1] for
>> >> > +more details.
>> >>
>> >> Seeing as "git switch" is still marked "EXPERIMENTAL", it may be prudent
>> >> in general to avoid linking to it in lieu of "git checkout".
>> >>
>> >> In this case in particular though the "more details" are almost
>> >> completely absent from the "git-switch" docs, and they don't (which is
>> >> their won flaw) link to the more detailed "git-checkout" docs.
>> >>
>> >> But for this patch, it seems much better to link to the "checkout" docs,
>> >> no?
>> >
>> > Sorry, no. The important point here is that the --orphan option being
>> > added to `git worktree add` closely follows the behavior of `git
>> > switch --orphan`, which is quite different from the behavior of `git
>> > checkout --orphan`.
>> >
>> > The `git switch --orphan` documentation doesn't seem particularly
>> > lacking; it correctly describes the (very) simplified behavior of that
>> > command over `git checkout --orphan`. I might agree that there isn't
>> > much reason to link to git-switch for "more details", though, since
>> > there isn't really anything else that needs to be said.
>>
>> Aside from what it says now: 1/2 of what I'm saying is that linking to
>> it while it says it's "EXPERIMENTAL" might be either jumping the gun.
>>
>> Or maybe we should just declare it non-"EXPERIMENTAL", but in any case
>> this unrelated topic might want to avoid that altogether and just link
>> to the "checkout" version.
>>
>> A quick grep of our docs (for linkgit:git-switch) that this would be the
>> first mention outside of user-manual.txt where we link to it when it's
>> not in the context of "checkout or switch", or where we're explaining
>> something switch-specific (i.e. the "suggestDetachingHead" advice).
>>
>> Having said that I don't really care, just a suggestion...
>>
>> > If we did want to say something else here, we might copy one sentence
>> > from the `git checkout --orphan` documentation:
>> >
>> >     The first commit made on this new branch will have no parents and
>> >     it will be the root of a new history totally disconnected from all
>> >     the other branches and commits.
>> >
>> > The same sentence could be added to `git switch --orphan`
>> > documentation, but that's outside the scope of this patch series (thus
>> > can be done later by someone).
>>
>> I think I was partially confused by skimming the SYNOPSIS and thinking
>> this supported <start-point> like checkout, which as I found in
>> https://lore.kernel.org/git/221115.86edu3kfqz.gmgdl@evledraar.gmail.com/
>> just seems to be a missing assertion where we want to die() if that's
>> provided in this mode.
>>
>> What I also found a bit confusing (but maybe it's just me) is that the
>> "with a clean working directory" seemed at first to be drawing a
>> distinction between this behavior and that of "git switch", but from
>> poking at it some more it seems to be expressing "this is like git
>> switch's --orphan" with that.
>>
>> I think instead of "clean working tree" it would be better to talk about
>> "tracked files", as "git switch --orphan" does, which AFAICT is what it
>> means. But then again the reason "switch" does that is because you have
>> *existing* tracked files, which inherently doesn't apply for "worktree".
>>
>> Hrm.
>>
>> So, I guess it depends on your mental model of this operation, but at
>> least I think it's more intuitive to explain it in terms of "git
>> checkout --orphan", not "git switch --orphan". I.e.:
>>
>> 	Create a worktree containing an orphan branch named
>> 	`<branch>`. This works like linkgit:git-checkout[1]'s `--orphan'
>> 	option, except '<start-point>` isn't supported, and the "clear
>> 	the index" doesn't apply (as "worktree add" will always have a
>> 	new index)".
>>
>> Whereas defining this in terms of git-switch's "All tracked files are
>> removed" might just be more confusing. What files? Since it's "worktree
>> add" there weren't any in the first place.
>>
>> Anyway, I don't mind it as it is, but maybe the above write-up helps for
>> #leftoverbits if we ever want to unify these docs. I.e. AFAICT we could:
>>
>>  * Link from git-worktree to git-checkout, saying the above
>>  * Link from git-switch to git-checkout, ditto, but that we also "remove
>>    tracked files [of the current HEAD]".
>
> Apologies for the mistake in the SYNOPSIS. As mentioned in the other replies
> I've updated it as you indicated to correct that.
>
> As for a path forwards on the referencing of either git-checkout or git-switch
> from git-worktree, I think I'm leaning towards Eric's approach (in his reply
> to this message) where we don't reference either and fully outline the
> behavior itself.

Yeah, that makes sense.

>>
>> >> > +test_expect_success '"add" --orphan/-b mutually exclusive' '
>> >> > +     test_must_fail git worktree add --orphan poodle -b poodle bamboo
>> >> > +'
>> >> > +
>> >> > +test_expect_success '"add" --orphan/-B mutually exclusive' '
>> >> > +     test_must_fail git worktree add --orphan poodle -B poodle bamboo
>> >> > +'
>> >> > +
>> >> > +test_expect_success '"add" --orphan/--detach mutually exclusive' '
>> >> > +     test_must_fail git worktree add --orphan poodle --detach bamboo
>> >> > +'
>> >> > +
>> >> > +test_expect_success '"add" --orphan/--no-checkout mutually exclusive' '
>> >> > +     test_must_fail git worktree add --orphan poodle --no-checkout bamboo
>> >> > +'
>> >> > +
>> >> > +test_expect_success '"add" -B/--detach mutually exclusive' '
>> >> > +     test_must_fail git worktree add -B poodle --detach bamboo main
>> >> > +'
>> >> > +
>> >>
>> >> This would be much better as a for-loop:
>> >>
>> >> for opt in -b -B ...
>> >> do
>> >>         test_expect_success "...$opt" '<test here, uses $opt>'
>> >> done
>> >>
>> >> Note the ""-quotes for the description, and '' for the test, that's not
>> >> a mistake, we eval() the latter.
>> >
>> > Such a loop would need to be more complex than this, wouldn't it, to
>> > account for all the combinations? I'd normally agree about the loop,
>> > but given that it requires extra complexity, I don't really mind
>> > seeing the individual tests spelled out manually in this case; they're
>> > dead simple to understand as written. I don't feel strongly either
>> > way, but I also don't want to ask for extra work from the patch author
>> > for a subjective change.
>>
>> Yeah, it's probably not worth it. This is partially cleaning up existing
>> tests, but maybe:
>>
>> 	diff --git a/t/t2400-worktree-add.sh b/t/t2400-worktree-add.sh
>> 	index 93c340f4aff..5acfd48f418 100755
>> 	--- a/t/t2400-worktree-add.sh
>> 	+++ b/t/t2400-worktree-add.sh
>> 	@@ -298,37 +298,21 @@ test_expect_success '"add" no auto-vivify with --detach and <branch> omitted' '
>> 	 	test_must_fail git -C mish/mash symbolic-ref HEAD
>> 	 '
>>
>> 	-test_expect_success '"add" -b/-B mutually exclusive' '
>> 	-	test_must_fail git worktree add -b poodle -B poodle bamboo main
>> 	-'
>> 	-
>> 	-test_expect_success '"add" -b/--detach mutually exclusive' '
>> 	-	test_must_fail git worktree add -b poodle --detach bamboo main
>> 	-'
>> 	-
>> 	-test_expect_success '"add" -B/--detach mutually exclusive' '
>> 	-	test_must_fail git worktree add -B poodle --detach bamboo main
>> 	-'
>> 	-
>> 	-test_expect_success '"add" --orphan/-b mutually exclusive' '
>> 	-	test_must_fail git worktree add --orphan poodle -b poodle bamboo
>> 	-'
>> 	-
>> 	-test_expect_success '"add" --orphan/-B mutually exclusive' '
>> 	-	test_must_fail git worktree add --orphan poodle -B poodle bamboo
>> 	-'
>> 	-
>> 	-test_expect_success '"add" --orphan/--detach mutually exclusive' '
>> 	-	test_must_fail git worktree add --orphan poodle --detach bamboo
>> 	-'
>> 	-
>> 	-test_expect_success '"add" --orphan/--no-checkout mutually exclusive' '
>> 	-	test_must_fail git worktree add --orphan poodle --no-checkout bamboo
>> 	-'
>> 	-
>> 	-test_expect_success '"add" -B/--detach mutually exclusive' '
>> 	-	test_must_fail git worktree add -B poodle --detach bamboo main
>> 	-'
>> 	+test_wt_add_excl() {
>> 	+	local opts="$@" &&
>> 	+	test_expect_success "'worktree add' with '$opts' has mutually exclusive options" '
>> 	+		test_must_fail git worktree add $opts
>> 	+	'
>> 	+}
>> 	+test_wt_add_excl -b poodle -B poodle bamboo main
>> 	+test_wt_add_excl -b poodle --orphan poodle bamboo
>> 	+test_wt_add_excl -b poodle --detach bamboo main
>> 	+test_wt_add_excl -B poodle --detach bamboo main
>> 	+test_wt_add_excl -B poodle --detach bamboo main
>> 	+test_wt_add_excl -B poodle --orphan poodle bamboo
>> 	+test_wt_add_excl --orphan poodle --detach bamboo
>> 	+test_wt_add_excl --orphan poodle --no-checkout bamboo
>> 	+test_wt_add_excl --orphan poodle bamboo main
>>
>> 	 test_expect_success '"add -B" fails if the branch is checked out' '
>> 	 	git rev-parse newmain >before &&
>>
>> I re-arranged that a bit, but probably not worth a loop. I *did* spot in
>> doing that that if I sort the options I end up with a duplicate test,
>> i.e. we test "-B poodle --detach bamboo main" twice.
>>
>> That seems to be added by mistake in 2/2, i.e. it's the existing test
>> you can see in the diff context, just added at the end.
>
> This is much clearer and more succinct. I've applied this to 2/2 for v4.

Great, nice that it helped!

^ permalink raw reply	[relevance 6%]

* Re: [PATCH] builtin/gc.c: fix use-after-free in maintenance_unregister()
  @ 2022-11-16 15:14 17%       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2022-11-16 15:14 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, Johannes Schindelin, Ronan Pigott


On Wed, Nov 16 2022, Derrick Stolee wrote:

> On 11/15/22 2:54 PM, Taylor Blau wrote:
>> On Tue, Nov 15, 2022 at 08:41:44PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>>> @@ -1543,6 +1543,7 @@ static int maintenance_unregister(int argc, const char **argv, const char *prefi
>>>>  	int found = 0;
>>>>  	struct string_list_item *item;
>>>>  	const struct string_list *list;
>>>> +	struct config_set cs = { { 0 } };
>>>
>>> Just "{ 0 }" here instead? I see it may have been copied from some older
>>> pre-image though, and they'll do the same in either case, so it's not
>>> important...
>> 
>> Copying from other zero-initializations of `struct config_set`:
>> 
>>     $ git grep -oh 'struct config_set.*= {.*' | sort | uniq -c
>>           3 struct config_set cs = { { 0 } };
>
> Yes, without the double braces the compiler will complain on
> macOS, I believe.

Ah, that was sorted in 54795d37d9e (config.mak.dev: disable suggest
braces error on old clang versions, 2022-10-10).

It's fine here, we can follow-up for the #leftoverbits of changing those
some other time.

^ permalink raw reply	[relevance 17%]

* Re: [PATCH v3 2/2] worktree add: add --orphan flag
  @ 2022-11-15 22:35 10%           ` Ævar Arnfjörð Bjarmason
    0 siblings, 1 reply; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2022-11-15 22:35 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Jacob Abel, git, Taylor Blau


On Tue, Nov 15 2022, Eric Sunshine wrote:

> On Tue, Nov 15, 2022 at 4:13 PM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> On Thu, Nov 10 2022, Jacob Abel wrote:
>> > Adds support for creating an orphan branch when adding a new worktree.
>> > This functionality is equivalent to git switch's --orphan flag.
>> >
>> > The original reason this feature was implemented was to allow a user
>> > to initialise a new repository using solely the worktree oriented
>> > workflow. Example usage included below.
>> >
>> > $ GIT_DIR=".git" git init --bare
>> > $ git worktree add --orphan master master/
>> >
>> > Signed-off-by: Jacob Abel <jacobabel@nullpo.dev>
>> > ---
>> > +Create a worktree containing an orphan branch named `<branch>` with a
>> > +clean working directory.  See `--orphan` in linkgit:git-switch[1] for
>> > +more details.
>>
>> Seeing as "git switch" is still marked "EXPERIMENTAL", it may be prudent
>> in general to avoid linking to it in lieu of "git checkout".
>>
>> In this case in particular though the "more details" are almost
>> completely absent from the "git-switch" docs, and they don't (which is
>> their won flaw) link to the more detailed "git-checkout" docs.
>>
>> But for this patch, it seems much better to link to the "checkout" docs,
>> no?
>
> Sorry, no. The important point here is that the --orphan option being
> added to `git worktree add` closely follows the behavior of `git
> switch --orphan`, which is quite different from the behavior of `git
> checkout --orphan`.
>
> The `git switch --orphan` documentation doesn't seem particularly
> lacking; it correctly describes the (very) simplified behavior of that
> command over `git checkout --orphan`. I might agree that there isn't
> much reason to link to git-switch for "more details", though, since
> there isn't really anything else that needs to be said.

Aside from what it says now: 1/2 of what I'm saying is that linking to
it while it says it's "EXPERIMENTAL" might be either jumping the gun.

Or maybe we should just declare it non-"EXPERIMENTAL", but in any case
this unrelated topic might want to avoid that altogether and just link
to the "checkout" version.

A quick grep of our docs (for linkgit:git-switch) that this would be the
first mention outside of user-manual.txt where we link to it when it's
not in the context of "checkout or switch", or where we're explaining
something switch-specific (i.e. the "suggestDetachingHead" advice).

Having said that I don't really care, just a suggestion...

> If we did want to say something else here, we might copy one sentence
> from the `git checkout --orphan` documentation:
>
>     The first commit made on this new branch will have no parents and
>     it will be the root of a new history totally disconnected from all
>     the other branches and commits.
>
> The same sentence could be added to `git switch --orphan`
> documentation, but that's outside the scope of this patch series (thus
> can be done later by someone).

I think I was partially confused by skimming the SYNOPSIS and thinking
this supported <start-point> like checkout, which as I found in
https://lore.kernel.org/git/221115.86edu3kfqz.gmgdl@evledraar.gmail.com/
just seems to be a missing assertion where we want to die() if that's
provided in this mode.

What I also found a bit confusing (but maybe it's just me) is that the
"with a clean working directory" seemed at first to be drawing a
distinction between this behavior and that of "git switch", but from
poking at it some more it seems to be expressing "this is like git
switch's --orphan" with that.

I think instead of "clean working tree" it would be better to talk about
"tracked files", as "git switch --orphan" does, which AFAICT is what it
means. But then again the reason "switch" does that is because you have
*existing* tracked files, which inherently doesn't apply for "worktree".

Hrm.

So, I guess it depends on your mental model of this operation, but at
least I think it's more intuitive to explain it in terms of "git
checkout --orphan", not "git switch --orphan". I.e.:

	Create a worktree containing an orphan branch named
	`<branch>`. This works like linkgit:git-checkout[1]'s `--orphan'
	option, except '<start-point>` isn't supported, and the "clear
	the index" doesn't apply (as "worktree add" will always have a
	new index)".

Whereas defining this in terms of git-switch's "All tracked files are
removed" might just be more confusing. What files? Since it's "worktree
add" there weren't any in the first place.

Anyway, I don't mind it as it is, but maybe the above write-up helps for
#leftoverbits if we ever want to unify these docs. I.e. AFAICT we could:

 * Link from git-worktree to git-checkout, saying the above
 * Link from git-switch to git-checkout, ditto, but that we also "remove
   tracked files [of the current HEAD]".

>> > +test_expect_success '"add" --orphan/-b mutually exclusive' '
>> > +     test_must_fail git worktree add --orphan poodle -b poodle bamboo
>> > +'
>> > +
>> > +test_expect_success '"add" --orphan/-B mutually exclusive' '
>> > +     test_must_fail git worktree add --orphan poodle -B poodle bamboo
>> > +'
>> > +
>> > +test_expect_success '"add" --orphan/--detach mutually exclusive' '
>> > +     test_must_fail git worktree add --orphan poodle --detach bamboo
>> > +'
>> > +
>> > +test_expect_success '"add" --orphan/--no-checkout mutually exclusive' '
>> > +     test_must_fail git worktree add --orphan poodle --no-checkout bamboo
>> > +'
>> > +
>> > +test_expect_success '"add" -B/--detach mutually exclusive' '
>> > +     test_must_fail git worktree add -B poodle --detach bamboo main
>> > +'
>> > +
>>
>> This would be much better as a for-loop:
>>
>> for opt in -b -B ...
>> do
>>         test_expect_success "...$opt" '<test here, uses $opt>'
>> done
>>
>> Note the ""-quotes for the description, and '' for the test, that's not
>> a mistake, we eval() the latter.
>
> Such a loop would need to be more complex than this, wouldn't it, to
> account for all the combinations? I'd normally agree about the loop,
> but given that it requires extra complexity, I don't really mind
> seeing the individual tests spelled out manually in this case; they're
> dead simple to understand as written. I don't feel strongly either
> way, but I also don't want to ask for extra work from the patch author
> for a subjective change.

Yeah, it's probably not worth it. This is partially cleaning up existing
tests, but maybe:
	
	diff --git a/t/t2400-worktree-add.sh b/t/t2400-worktree-add.sh
	index 93c340f4aff..5acfd48f418 100755
	--- a/t/t2400-worktree-add.sh
	+++ b/t/t2400-worktree-add.sh
	@@ -298,37 +298,21 @@ test_expect_success '"add" no auto-vivify with --detach and <branch> omitted' '
	 	test_must_fail git -C mish/mash symbolic-ref HEAD
	 '
	 
	-test_expect_success '"add" -b/-B mutually exclusive' '
	-	test_must_fail git worktree add -b poodle -B poodle bamboo main
	-'
	-
	-test_expect_success '"add" -b/--detach mutually exclusive' '
	-	test_must_fail git worktree add -b poodle --detach bamboo main
	-'
	-
	-test_expect_success '"add" -B/--detach mutually exclusive' '
	-	test_must_fail git worktree add -B poodle --detach bamboo main
	-'
	-
	-test_expect_success '"add" --orphan/-b mutually exclusive' '
	-	test_must_fail git worktree add --orphan poodle -b poodle bamboo
	-'
	-
	-test_expect_success '"add" --orphan/-B mutually exclusive' '
	-	test_must_fail git worktree add --orphan poodle -B poodle bamboo
	-'
	-
	-test_expect_success '"add" --orphan/--detach mutually exclusive' '
	-	test_must_fail git worktree add --orphan poodle --detach bamboo
	-'
	-
	-test_expect_success '"add" --orphan/--no-checkout mutually exclusive' '
	-	test_must_fail git worktree add --orphan poodle --no-checkout bamboo
	-'
	-
	-test_expect_success '"add" -B/--detach mutually exclusive' '
	-	test_must_fail git worktree add -B poodle --detach bamboo main
	-'
	+test_wt_add_excl() {
	+	local opts="$@" &&
	+	test_expect_success "'worktree add' with '$opts' has mutually exclusive options" '
	+		test_must_fail git worktree add $opts
	+	'
	+}
	+test_wt_add_excl -b poodle -B poodle bamboo main
	+test_wt_add_excl -b poodle --orphan poodle bamboo
	+test_wt_add_excl -b poodle --detach bamboo main
	+test_wt_add_excl -B poodle --detach bamboo main
	+test_wt_add_excl -B poodle --detach bamboo main
	+test_wt_add_excl -B poodle --orphan poodle bamboo
	+test_wt_add_excl --orphan poodle --detach bamboo
	+test_wt_add_excl --orphan poodle --no-checkout bamboo
	+test_wt_add_excl --orphan poodle bamboo main
	 
	 test_expect_success '"add -B" fails if the branch is checked out' '
	 	git rev-parse newmain >before &&
	
I re-arranged that a bit, but probably not worth a loop. I *did* spot in
doing that that if I sort the options I end up with a duplicate test,
i.e. we test "-B poodle --detach bamboo main" twice.

That seems to be added by mistake in 2/2, i.e. it's the existing test
you can see in the diff context, just added at the end.

^ permalink raw reply	[relevance 10%]

* Re: [PATCH] builtin/gc.c: fix use-after-free in maintenance_unregister()
  @ 2022-11-15 19:51 15%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2022-11-15 19:51 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Taylor Blau, git, git-security, Johannes Schindelin, Ronan Pigott

On Tue, Nov 15 2022, Derrick Stolee wrote:

> On 11/15/2022 1:53 PM, Taylor Blau wrote:
>> While trying to fix a move based on an uninitialized value (along with a
>> declaration after the first statement), be0fd57228
>> (maintenance --unregister: fix uninit'd data use &
>> -Wdeclaration-after-statement, 2022-11-15) unintentionally introduced a
>> use-after-free.
>> 
>> The problem arises when `maintenance_unregister()` sees a non-NULL
>> `config_file` string and thus tries to call
>> git_configset_get_value_multi() to lookup the corresponding values.
>> 
>> We store the result off, and then call git_configset_clear(), which
>> frees the pointer that we just stored. We then try to read that
>> now-freed pointer a few lines below, and there we have our
>> use-after-free:
>
> Makes sense why this needs to be pulled out to a larger scope, but
> also why it's so easy to make this mistake.

Yeah, the config API's full of foot-guns, although here we return a
"const struct string_list *", not a "struct string_list *", so in
retrospect this should be rather obvious...

But still, we should probably as #leftoverbits make it behave
consistently wrt naming. I.e. in this case it's
git_configset_get_value_multi() really behaves like a
git_configset_get_string_tmp(), and there's no equivalent of a
git_configset_get_string() (i.e. xstrdup()'d) for *_multi().

>> +	struct config_set cs = { { 0 } };
>> 
>>  	argc = parse_options(argc, argv, prefix, options,
>>  			     builtin_maintenance_unregister_usage, 0);
>> @@ -1551,12 +1552,9 @@ static int maintenance_unregister(int argc, const char **argv, const char *prefi
>>  				   options);
>> 
>>  	if (config_file) {
>> -		struct config_set cs;
>> -
>>  		git_configset_init(&cs);
>>  		git_configset_add_file(&cs, config_file);
>>  		list = git_configset_get_value_multi(&cs, key);
>> -		git_configset_clear(&cs);
>
> That the list depends on the configset and not exist as an
> independent entity is non-obvious, but I'm sure is rooted
> in some kind of memory-saving optimization.

Yes, and it's probably worth keeping that, but I haven't benchmarked
etc. This is only a problem in practice if you're constructing your own
configset, e.g. here because we have a custom config file. So for most
users this API is safe in general, i.e. we free() it, but it's the
config that's in "the_repository" normally, so it outlives any "normal"
code.

>>  	} else {
>>  		list = git_config_get_value_multi(key);
>>  	}
>> @@ -1592,6 +1590,7 @@ static int maintenance_unregister(int argc, const char **argv, const char *prefi
>>  		die(_("repository '%s' is not registered"), maintpath);
>>  	}
>> 
>> +	git_configset_clear(&cs);
>>  	free(maintpath);
>>  	return 0;
>>  }
>
> Thanks for drilling down on this. LGTM.

On the related subject of config API foot-guns, it would be great if you
could look over the in-flight series I have to make related parts of the
config API safe by default [1].

8/9 there fixes 6 segfaults, 3 of which are git blame'd to you :), and
9/9 a foot-gun-y interaction with the strvec API, which you'll also
probably find interesting...

1. https://lore.kernel.org/git/cover-v2-0.9-00000000000-20221101T225822Z-avarab@gmail.com/

^ permalink raw reply	[relevance 15%]

* Re: [PATCH 1/2] t/t0021: convert the rot13-filter.pl script to C
  @ 2022-07-23  4:59 14%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2022-07-23  4:59 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git, gitster, larsxschneider, christian.couder

On Fri, Jul 22 2022, Matheus Tavares wrote:

Looking a bit closer...

> however, that we still use the script as a wrapper at
> this commit, in order to minimize the amount of changes it introduces
> and help reviewers. At the next commit we will properly remove the
> script and adjust the affected tests to use test-tool.

I'd prefer if we just squashed this, if you want to avoid some of the
diff verbosity you could leave the PERL prereq on all the
test_expect_success and remove it in a 2/2 (we just wouldn't run the
test until then).

But I think it's all boilerplate, so just doing it in one step would be
better, reasoning about the in-between steps is harder IMO (e.g. "exec"
escaping or whatever)>

> +static char *rot13(char *str)
> +{
> +	char *c;
> +	for (c = str; *c; c++) {
> +		if (*c >= 'a' && *c <= 'z')
> +			*c = 'a' + (*c - 'a' + 13) % 26;
> +		else if (*c >= 'A' && *c <= 'Z')
> +			*c = 'A' + (*c - 'A' + 13) % 26;
> +	}
> +	return str;
> +}

Looks fine, but we should probably put in our CodingGuidelines at some
point that we don't care about EBCDIC, as this isn't portable C (but
probably portable enough, as we can probably assume ASCII) :)

> +static struct string_list *packet_read_capabilities(void)
> +{
> +	struct string_list *caps = xmalloc(sizeof(*caps));

malloc here...

> +	string_list_init_dup(caps);
> +	while (1) {
> +		int size;
> +		char *buf = packet_read_line(0, &size);
> +		if (!buf)
> +			break;
> +		string_list_append_nodup(caps,
> +					 skip_key_dup(buf, size, "capability"));
> +	}
> +	return caps;
> +}
> +
> +/* Read remote capabilities and check them against capabilities we require */
> +static struct string_list *packet_read_and_check_capabilities(
> +		struct string_list *required_caps)
> +{
> +	struct string_list *remote_caps = packet_read_capabilities();

...and here...
> +	struct string_list_item *item;
> +	for_each_string_list_item(item, required_caps) {
> +		if (!unsorted_string_list_has_string(remote_caps, item->string)) {
> +			die("required '%s' capability not available from remote",
> +			    item->string);
> +		}
> +	}
> +	return remote_caps;

...we'll return it...

> +	remote_caps = packet_read_and_check_capabilities(&supported_caps);
> +	packet_check_and_write_capabilities(remote_caps, &requested_caps);
> +	fprintf(logfile, "init handshake complete\n");
> +
> +	string_list_clear(&supported_caps, 0);
> +	string_list_clear(remote_caps, 0);

..and here you're missing a free(), but I wonder why not just declare
this string_list in this function, and pass it down instead?

It's unfortunate that none of these tests seem to pass with
SANITIZE=leak already, but the new command seems not to leak from a
trivial glance except for in that one case.

Not knowing much about the filtering mechanism, I wonder if this code
here wouldn't be better as a built-in some day. I.e. isn't this all
shimmy we need to talk to some arbitrary conversion filter, except for
the rot13 part?

So if we just invoked a "tr" with run_command() to do the actual rot13
filtering we could do any sort of arbitrary replacement, and present a
variant of this this command as a "if you can't be bothered with
packet-line" in gitattributes(5)...

...but maybe that's hopeless for some reason I'm missing, in any case,
more #leftoverbits.

^ permalink raw reply	[relevance 14%]

* Re: [PATCH 0/3] doc: unify config info on some cmds
  @ 2022-07-14 21:17  2% ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2022-07-14 21:17 UTC (permalink / raw)
  To: Matheus Tavares; +Cc: git


On Thu, Jul 14 2022, Matheus Tavares wrote:

> These three patches attempt to remove duplication between some
> config/*.txt and git-*.txt files, to facilitate maintenance and fix any
> divergences.
>
> This series targets the most straightforward conversions, but there are
> also other commands whose config documentation could possibly be unified
> (maybe #leftoverbits):

Great think alike & all that, these patches are pretty much what I've
had locally & been meaning to submit for (check notes) around a year and
a half. So having this move forward is great.

Here's a cleaned up version of what I have, which I figure is probably
better linked-to than contributing to my E-Mail quota :):

	https://github.com/git/git/compare/master...avar:git:avar/doc-config-includes

I cleaned that up just now for this discussion, but I've had these
relatively more messy changes on top too, but I think those could/should
follow:

	https://github.com/avar/git/compare/avar/doc-config-includes...avar:git:avar/doc-config-includes-split

I in that second part I end up e.g. splitting config/gc.txt into that
and config/gc/rerere.txt, so that we can include the latter in both
config/gc.txt (which is included in git-gc.txt) and in git-rerere.txt
(along with config/rerere.txt itself).

I.e. to have all CONFIGURATION sections discuss all the config relevant
to that command, if possible. Not just in the straightforward cases, but
also e.g. the "rerere" case where it needs to "borrow" a part of the
"gc" section.

Another notable one is the config/color.txt, i.e. we want "git branch"
and the like to discuss its part of the "color" configuration.

Anyway, I'm happy to have your versions of this, although maybe the
range-diff below is useful to you to see if there's anything you'd like
to change or steal (it's to the part one above).

The one thing I'd like you to reconsider is to drop the idea of adding
these "ifndef::git-grep[]" defines and the like. In your version it
yields an arguably better result.

But I think what we should be going for is the more general direction
outlined above, at which point that becomes quite a mess of
ifdefs. I.e. config/gc/rerere.txt would need to know what it's going to
get include in, which would be N number of manpages in the genreal case,
not just "main or config" as this series leaves it.

I think the solution I have to that in 1/9 in that first series is a
better trade-off, i.e. we just (eventually, your series doesn't need to
do that) include some standard wording saying that what you're looking
at in git-CMD(1) is transcluded as-is from the relevant part of
git-config(1). I.e.:

	Everything below this line in this section is selectively included
	from the linkgit:git-config[1] documentation. The content is the same
	as what's found there:

What do you think about doing that instead?

 -:  ----------- >  1:  5d0a4562ea8 docs: add and use include template for config/* includes
 -:  ----------- >  2:  450a9d82bf2 docs: include a CONFIGURATION section
 -:  ----------- >  3:  03cdf2d4e4e docs: add includes to the CONFIGURATION section
 -:  ----------- >  4:  959b6ccd6e2 docs: move config discussion to CONFIGURATION section
 1:  439cfdf858f !  5:  f20f207ece7 doc: grep: unify configuration variables definitions
    @@
      ## Metadata ##
    -Author: Matheus Tavares <matheus.bernardino@usp.br>
    +Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    doc: grep: unify configuration variables definitions
    +    grep docs: de-duplicate configuration sections
     
    -    The configuration variables for git-grep are duplicated in
    -    "Documentation/git-grep.txt" and "Documentation/config/grep.txt", which
    -    gqcan make maintenance difficult. The first also contains a definition
    -    that is not present in the latter (grep.fullName), and the latter
    -    received a wording improvement that was not replicated in the former:
    -    see 91028f765 ("grep: clarify what `grep.patternType=default` means",
    -    2021-12-05).
    +    Include the "config/grep.txt" file in "git-grep.txt", instead of
    +    repeating an almost identical description of the "grep" configuration
    +    variables in two places.
     
    -    To avoid such problems, unify the information in one file and include it
    -    in the other.
    +    There is no loss of information here that isn't shown in the addition
    +    to "grep.txt". This change was made by copying the contents of
    +    "git-grep.txt"'s version over the "grep.txt" version. Aside from the
    +    change "grep.txt" being made here the two were identical.
     
    -    Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
    +    This documentation started being copy/pasted around in
    +    b22520a37c8 (grep: allow -E and -n to be turned on by default via
    +    configuration, 2011-03-30). After that in e.g. 6453f7b3486 (grep: add
    +    grep.fullName config variable, 2014-03-17) they started drifting
    +    apart, with only grep.fullName being described in the command
    +    documentation.
    +
    +    In 434e6e753fe (config.txt: move grep.* to a separate file,
    +    2018-10-27) we gained the include, but didn't do this next step, let's
    +    do it now.
    +
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Documentation/config/grep.txt ##
     @@ Documentation/config/grep.txt: grep.extendedRegexp::
    @@ Documentation/config/grep.txt: grep.extendedRegexp::
      grep.threads::
     -	Number of grep worker threads to use.
     -	See `grep.threads` in linkgit:git-grep[1] for more information.
    -+	Number of grep worker threads to use. See `--threads`
    -+ifndef::git-grep[]
    -+	in linkgit:git-grep[1]
    -+endif::git-grep[]
    -+	for more information.
    ++	Number of grep worker threads to use. If unset (or set to 0), Git will
    ++	use as many threads as the number of logical cores available.
     +
     +grep.fullName::
     +	If set to true, enable `--full-name` option by default.
    @@ Documentation/config/grep.txt: grep.extendedRegexp::
      	If set to true, fall back to git grep --no-index if git grep
     
      ## Documentation/git-grep.txt ##
    -@@ Documentation/git-grep.txt: registered in the index file, or blobs in given tree objects.  Patterns
    - are lists of one or more search expressions separated by newline
    - characters.  An empty string as search expression matches all lines.
    - 
    --
    - OPTIONS
    - -------
    - --cached::
    -@@ Documentation/git-grep.txt: providing this option will cause it to die.
    - 	custom hunk-header' in linkgit:gitattributes[5]).
    - 
    - --threads <num>::
    --	Number of grep worker threads to use.
    --	See `grep.threads` in 'CONFIGURATION' for more information.
    -+	Number of grep worker threads to use. If not provided (or set to
    -+	0), Git will use as many worker threads as the number of logical
    -+	cores available. The default value can also be set with the
    -+	`grep.threads` configuration.
    - 
    - -f <file>::
    - 	Read patterns from <file>, one per line.
     @@ Documentation/git-grep.txt: performance in this case, it might be desirable to use `--threads=1`.
      CONFIGURATION
      -------------
    @@ Documentation/git-grep.txt: performance in this case, it might be desirable to u
     -grep.fallbackToNoIndex::
     -	If set to true, fall back to git grep --no-index if git grep
     -	is executed outside of a git repository.  Defaults to false.
    --
    -+:git-grep: 1
    ++include::includes/cmd-config-section-all.txt[]
    + 
     +include::config/grep.txt[]
      
      GIT
 2:  a25a6d89647 <  -:  ----------- doc: apply: unify configuration variables definitions
 -:  ----------- >  6:  58f8fccef11 send-email docs: de-duplicate configuration sections
 -:  ----------- >  7:  acb6fc2aef5 apply docs: de-duplicate configuration sections
 3:  699dda58fc6 !  8:  c8725b99483 doc: notes: unify config variable definitions
    @@
      ## Metadata ##
    -Author: Matheus Tavares <matheus.bernardino@usp.br>
    +Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    doc: notes: unify config variable definitions
    +    notes docs: de-duplicate configuration sections
     
    -    Unify duplicated configuration descriptions from git-notes.txt and
    -    config.txt in order to facilitate maintenance and update. There are some
    -    discrepancies between these two files: git-notes.txt received two
    -    updates that were not made in config.txt: see 66c4c32
    -    ("Documentation/notes: simplify treatment of default display refs",
    -    2010-05-08) and c5ce183 ("Documentation/notes: clean up description of
    -    rewriting configuration", 2010-05-08 ). And there was also an update to
    -    config.txt not propagated to git-notes.txt: see 2b4aa89 ("Documentation:
    -    basic configuration of notes.rewriteRef", 2011-09-13). Let's make sure
    -    to include all these three updates in the unified version.
    +    Let's also fix the "git-notes(1)" docs so that we link to
    +    "git-config(1)", not "git-log(1)" as a reference for the "notes" docs.
     
    -    Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Documentation/config/notes.txt ##
     @@ Documentation/config/notes.txt: notes.mergeStrategy::
    - 	Which merge strategy to choose by default when resolving notes
      	conflicts.  Must be one of `manual`, `ours`, `theirs`, `union`, or
      	`cat_sort_uniq`.  Defaults to `manual`.  See "NOTES MERGE STRATEGIES"
    --	section of linkgit:git-notes[1] for more information on each strategy.
    -+	section
    -+ifdef::git-notes[above]
    -+ifndef::git-notes[of linkgit:git-notes[1]]
    -+	for more information on each strategy.
    + 	section of linkgit:git-notes[1] for more information on each strategy.
    +++
    ++This setting can be overridden by passing the `--strategy` option to
    ++linkgit:git-notes[1].
      
      notes.<name>.mergeStrategy::
      	Which merge strategy to choose when doing a notes merge into
    - 	refs/notes/<name>.  This overrides the more general
    --	"notes.mergeStrategy".  See the "NOTES MERGE STRATEGIES" section in
    --	linkgit:git-notes[1] for more information on the available strategies.
    -+	"notes.mergeStrategy".  See the "NOTES MERGE STRATEGIES" section
    -+ifdef::git-notes[above]
    -+ifndef::git-notes[in linkgit:git-notes[1]]
    -+	for more information on the available strategies.
    +@@ Documentation/config/notes.txt: notes.<name>.mergeStrategy::
    + 	linkgit:git-notes[1] for more information on the available strategies.
      
      notes.displayRef::
     -	The (fully qualified) refname from which to show notes when
    @@ Documentation/config/notes.txt: notes.mergeStrategy::
     -	several times.  A warning will be issued for refs that do not
     -	exist, but a glob that does not match any refs is silently
     -	ignored.
    --+
    --This setting can be overridden with the `GIT_NOTES_DISPLAY_REF`
    --environment variable, which must be a colon separated list of refs or
    --globs.
    --+
    --The effective value of "core.notesRef" (possibly overridden by
    --GIT_NOTES_REF) is also implicitly added to the list of refs to be
    --displayed.
     +	Which ref (or refs, if a glob or specified more than once), in
     +	addition to the default set by `core.notesRef` or
     +	`GIT_NOTES_REF`, to read notes from when showing commit
     +	messages with the 'git log' family of commands.
    -+	This setting can be overridden on the command line or by the
    -+	`GIT_NOTES_DISPLAY_REF` environment variable.
    -+	See linkgit:git-log[1].
    + +
    + This setting can be overridden with the `GIT_NOTES_DISPLAY_REF`
    + environment variable, which must be a colon separated list of refs or
    + globs.
    + +
    ++A warning will be issued for refs that do not exist,
    ++but a glob that does not match any refs is silently ignored.
    +++
    ++This setting can be disabled by the `--no-notes` option to the 'git
    ++log' family of commands, or by the `--notes=<ref>` option accepted by
    ++those commands.
    +++
    + The effective value of "core.notesRef" (possibly overridden by
    + GIT_NOTES_REF) is also implicitly added to the list of refs to be
    + displayed.
      
      notes.rewrite.<command>::
      	When rewriting commits with <command> (currently `amend` or
    @@ Documentation/config/notes.txt: notes.mergeStrategy::
     +	notes from the original to the rewritten commit.  Defaults to
     +	`true`.  See also "`notes.rewriteRef`" below.
     ++
    -+This setting can be overridden by the `GIT_NOTES_REWRITE_REF`
    -+environment variable.
    ++This setting can be overridden with the `GIT_NOTES_REWRITE_REF`
    ++environment variable, which must be a colon separated list of refs or
    ++globs.
      
      notes.rewriteMode::
    --	When copying notes during a rewrite (see the
    --	"notes.rewrite.<command>" option), determines what to do if
    --	the target commit already has a note.  Must be one of
    --	`overwrite`, `concatenate`, `cat_sort_uniq`, or `ignore`.
    --	Defaults to `concatenate`.
    -+	When copying notes during a rewrite, what to do if the target
    -+	commit already has a note.  Must be one of `overwrite`,
    -+	`concatenate`, `cat_sort_uniq`, or `ignore`.  Defaults to
    -+	`concatenate`.
    - +
    - This setting can be overridden with the `GIT_NOTES_REWRITE_MODE`
    - environment variable.
    + 	When copying notes during a rewrite (see the
    +@@ Documentation/config/notes.txt: environment variable.
      
      notes.rewriteRef::
      	When copying notes during a rewrite, specifies the (fully
    @@ Documentation/config/notes.txt: notes.mergeStrategy::
     -environment variable, which must be a colon separated list of refs or
     -globs.
     +Can be overridden with the `GIT_NOTES_REWRITE_REF` environment variable.
    ++See `notes.rewrite.<command>` above for a further description of its format.
    +
    + ## Documentation/git-log.txt ##
    +@@ Documentation/git-log.txt: log.showSignature::
    + mailmap.*::
    + 	See linkgit:git-shortlog[1].
    + 
    +-notes.displayRef::
    +-	Which refs, in addition to the default set by `core.notesRef`
    +-	or `GIT_NOTES_REF`, to read notes from when showing commit
    +-	messages with the `log` family of commands.  See
    +-	linkgit:git-notes[1].
    +-+
    +-May be an unabbreviated ref name or a glob and may be specified
    +-multiple times.  A warning will be issued for refs that do not exist,
    +-but a glob that does not match any refs is silently ignored.
    +-+
    +-This setting can be disabled by the `--no-notes` option,
    +-overridden by the `GIT_NOTES_DISPLAY_REF` environment variable,
    +-and overridden by the `--notes=<ref>` option.
    ++include::includes/cmd-config-section-rest.txt[]
    ++
    ++include::config/notes.txt[]
    + 
    + GIT
    + ---
     
      ## Documentation/git-notes.txt ##
    +@@ Documentation/git-notes.txt: using the `--notes` option. Such notes are added as a patch commentary
    + after a three dash separator line.
    + 
    + To change which notes are shown by 'git log', see the
    +-"notes.displayRef" configuration in linkgit:git-log[1].
    ++"notes.displayRef" configuration in linkgit:git-config[1].
    + 
    + See the "notes.rewrite.<command>" configuration for a way to carry
    + notes across commands that rewrite commits.
     @@ Documentation/git-notes.txt: core.notesRef::
      	This setting can be overridden through the environment and
      	command line.
    @@ Documentation/git-notes.txt: core.notesRef::
     -+
     -This setting can be overridden by the `GIT_NOTES_REWRITE_REF`
     -environment variable.
    --
    ++include::includes/cmd-config-section-rest.txt[]
    + 
     -notes.rewriteMode::
     -	When copying notes during a rewrite, what to do if the target
     -	commit already has a note.  Must be one of `overwrite`,
    @@ Documentation/git-notes.txt: core.notesRef::
     -enable note rewriting.
     -+
     -Can be overridden with the `GIT_NOTES_REWRITE_REF` environment variable.
    --
    -+:git-notes: 1
     +include::config/notes.txt[]
      
    + 
      ENVIRONMENT
    - -----------
 -:  ----------- >  9:  cffa925ccf9 log docs: de-duplicate configuration sections

^ permalink raw reply	[relevance 2%]

* [PATCH v3 4/5] date API: add basic API docs
  @ 2022-02-16  8:14 14%       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2022-02-16  8:14 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Add basic API doc comments to date.h, and while doing so move the the
parse_date_format() function adjacent to show_date(). This way all the
"struct date_mode" functions are grouped together. Documenting the
rest is one of our #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 date.h | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/date.h b/date.h
index c3a00d08ed6..bbd6a6477b5 100644
--- a/date.h
+++ b/date.h
@@ -1,6 +1,12 @@
 #ifndef DATE_H
 #define DATE_H
 
+/**
+ * The date mode type. This has DATE_NORMAL at an explicit "= 0" to
+ * accommodate a memset([...], 0, [...]) initialization when "struct
+ * date_mode" is used as an embedded struct member, as in the case of
+ * e.g. "struct pretty_print_context" and "struct rev_info".
+ */
 enum date_mode_type {
 	DATE_NORMAL = 0,
 	DATE_HUMAN,
@@ -24,7 +30,7 @@ struct date_mode {
 	.type = DATE_NORMAL, \
 }
 
-/*
+/**
  * Convenience helper for passing a constant type, like:
  *
  *   show_date(t, tz, DATE_MODE(NORMAL));
@@ -32,7 +38,22 @@ struct date_mode {
 #define DATE_MODE(t) date_mode_from_type(DATE_##t)
 struct date_mode *date_mode_from_type(enum date_mode_type type);
 
+/**
+ * Format <'time', 'timezone'> into static memory according to 'mode'
+ * and return it. The mode is an initialized "struct date_mode"
+ * (usually from the DATE_MODE() macro).
+ */
 const char *show_date(timestamp_t time, int timezone, const struct date_mode *mode);
+
+/**
+ * Parse a date format for later use with show_date().
+ *
+ * When the "date_mode_type" is DATE_STRFTIME the "strftime_fmt"
+ * member of "struct date_mode" will be a malloc()'d format string to
+ * be used with strbuf_addftime().
+ */
+void parse_date_format(const char *format, struct date_mode *mode);
+
 void show_date_relative(timestamp_t time, struct strbuf *timebuf);
 int parse_date(const char *date, struct strbuf *out);
 int parse_date_basic(const char *date, timestamp_t *timestamp, int *offset);
@@ -41,7 +62,6 @@ void datestamp(struct strbuf *out);
 #define approxidate(s) approxidate_careful((s), NULL)
 timestamp_t approxidate_careful(const char *, int *);
 timestamp_t approxidate_relative(const char *date);
-void parse_date_format(const char *format, struct date_mode *mode);
 int date_overflows(timestamp_t date);
 time_t tm_to_time_t(const struct tm *tm);
 #endif
-- 
2.35.1.1028.g2d2d4be19de


^ permalink raw reply related	[relevance 14%]

* [PATCH v2 4/5] date API: add basic API docs
  @ 2022-02-04 23:53 14%     ` Ævar Arnfjörð Bjarmason
    1 sibling, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2022-02-04 23:53 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Add basic API doc comments to date.h, and while doing so move the the
parse_date_format() function adjacent to show_date(). This way all the
"struct date_mode" functions are grouped together. Documenting the
rest is one of our #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 date.h | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/date.h b/date.h
index c3a00d08ed6..4ed83506de9 100644
--- a/date.h
+++ b/date.h
@@ -1,6 +1,12 @@
 #ifndef DATE_H
 #define DATE_H
 
+/**
+ * The date mode type. This has DATE_NORMAL at an explicit "= 0" to
+ * accommodate a memset([...], 0, [...]) initialization when "struct
+ * date_mode" is used as an embedded struct member, as in the case of
+ * e.g. "struct pretty_print_context" and "struct rev_info".
+ */
 enum date_mode_type {
 	DATE_NORMAL = 0,
 	DATE_HUMAN,
@@ -24,7 +30,7 @@ struct date_mode {
 	.type = DATE_NORMAL, \
 }
 
-/*
+/**
  * Convenience helper for passing a constant type, like:
  *
  *   show_date(t, tz, DATE_MODE(NORMAL));
@@ -32,7 +38,21 @@ struct date_mode {
 #define DATE_MODE(t) date_mode_from_type(DATE_##t)
 struct date_mode *date_mode_from_type(enum date_mode_type type);
 
+/**
+ * Show the date given an initialized "struct date_mode" (usually from
+ * the DATE_MODE() macro).
+ */
 const char *show_date(timestamp_t time, int timezone, const struct date_mode *mode);
+
+/**
+ * Parse a date format for later use with show_date().
+ *
+ * When the "date_mode_type" is DATE_STRFTIME the "strftime_fmt"
+ * member of "struct date_mode" will be a malloc()'d format string to
+ * be used with strbuf_addftime().
+ */
+void parse_date_format(const char *format, struct date_mode *mode);
+
 void show_date_relative(timestamp_t time, struct strbuf *timebuf);
 int parse_date(const char *date, struct strbuf *out);
 int parse_date_basic(const char *date, timestamp_t *timestamp, int *offset);
@@ -41,7 +61,6 @@ void datestamp(struct strbuf *out);
 #define approxidate(s) approxidate_careful((s), NULL)
 timestamp_t approxidate_careful(const char *, int *);
 timestamp_t approxidate_relative(const char *date);
-void parse_date_format(const char *format, struct date_mode *mode);
 int date_overflows(timestamp_t date);
 time_t tm_to_time_t(const struct tm *tm);
 #endif
-- 
2.35.1.940.ge7a5b4b05f2


^ permalink raw reply related	[relevance 14%]

* [PATCH 4/5] date API: add basic API docs
  @ 2022-02-02 21:03 14%   ` Ævar Arnfjörð Bjarmason
    1 sibling, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2022-02-02 21:03 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Add basic API doc comments to date.h, and while doing so move the the
parse_date_format() function adjacent to show_date(). This way all the
"struct date_mode" functions are grouped together. Documenting the
rest is one of our #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 date.h | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/date.h b/date.h
index c3a00d08ed6..4ed83506de9 100644
--- a/date.h
+++ b/date.h
@@ -1,6 +1,12 @@
 #ifndef DATE_H
 #define DATE_H
 
+/**
+ * The date mode type. This has DATE_NORMAL at an explicit "= 0" to
+ * accommodate a memset([...], 0, [...]) initialization when "struct
+ * date_mode" is used as an embedded struct member, as in the case of
+ * e.g. "struct pretty_print_context" and "struct rev_info".
+ */
 enum date_mode_type {
 	DATE_NORMAL = 0,
 	DATE_HUMAN,
@@ -24,7 +30,7 @@ struct date_mode {
 	.type = DATE_NORMAL, \
 }
 
-/*
+/**
  * Convenience helper for passing a constant type, like:
  *
  *   show_date(t, tz, DATE_MODE(NORMAL));
@@ -32,7 +38,21 @@ struct date_mode {
 #define DATE_MODE(t) date_mode_from_type(DATE_##t)
 struct date_mode *date_mode_from_type(enum date_mode_type type);
 
+/**
+ * Show the date given an initialized "struct date_mode" (usually from
+ * the DATE_MODE() macro).
+ */
 const char *show_date(timestamp_t time, int timezone, const struct date_mode *mode);
+
+/**
+ * Parse a date format for later use with show_date().
+ *
+ * When the "date_mode_type" is DATE_STRFTIME the "strftime_fmt"
+ * member of "struct date_mode" will be a malloc()'d format string to
+ * be used with strbuf_addftime().
+ */
+void parse_date_format(const char *format, struct date_mode *mode);
+
 void show_date_relative(timestamp_t time, struct strbuf *timebuf);
 int parse_date(const char *date, struct strbuf *out);
 int parse_date_basic(const char *date, timestamp_t *timestamp, int *offset);
@@ -41,7 +61,6 @@ void datestamp(struct strbuf *out);
 #define approxidate(s) approxidate_careful((s), NULL)
 timestamp_t approxidate_careful(const char *, int *);
 timestamp_t approxidate_relative(const char *date);
-void parse_date_format(const char *format, struct date_mode *mode);
 int date_overflows(timestamp_t date);
 time_t tm_to_time_t(const struct tm *tm);
 #endif
-- 
2.35.0.913.g12b4baa2536


^ permalink raw reply related	[relevance 14%]

* [PATCH v2 07/21] refs/files: remove "name exist?" check in lock_ref_oid_basic()
  @ 2021-10-16  9:39  8%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-10-16  9:39 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Han-Wen Nienhuys,
	Ævar Arnfjörð Bjarmason

In lock_ref_oid_basic() we'll happily lock a reference that doesn't
exist yet. That's normal, and is how references are initially born,
but we don't need to retain checks here in lock_ref_oid_basic() about
the state of the ref, when what we're checking is either checked
already, or something we're about to discover by trying to lock the
ref with raceproof_create_file().

The one exception is the caller in files_reflog_expire(), who passes
us a "type" to find out if the reference is a symref or not. We can
move the that logic over to that caller, which can now defer its
discovery of whether or not the ref is a symref until it's needed. In
the preceding commit an exhaustive regression test was added for that
case in a new test in "t1417-reflog-updateref.sh".

The improved diagnostics here were added in
5b2d8d6f218 (lock_ref_sha1_basic(): improve diagnostics for ref D/F
conflicts, 2015-05-11), and then much of the surrounding code went
away recently in my 245fbba46d6 (refs/files: remove unused "errno ==
EISDIR" code, 2021-08-23).

The refs_resolve_ref_unsafe() code being removed here looks like it
should be tasked with doing that, but it's actually redundant to other
code.

The reason for that is as noted in 245fbba46d6 this once widely used
function now only has a handful of callers left, which all handle this
case themselves.

To the extent that we're racy between their check and ours removing
this check actually improves the situation, as we'll be doing fewer
things between the not-under-lock initial check and acquiring the
lock.

Why this is OK for all the remaining callers of lock_ref_oid_basic()
is noted below. There are only two of those callers:

* "git branch -[cm] <oldbranch> <newbranch>":

  In files_copy_or_rename_ref() we'll call this when we copy or rename
  refs via rename_ref() and copy_ref(). but only after we've checked
  if the refname exists already via its own call to
  refs_resolve_ref_unsafe() and refs_rename_ref_available().

  As the updated comment to the latter here notes neither of those are
  actually needed. If we delete not only this code but also
  refs_rename_ref_available() we'll do just fine, we'll just emit a
  less friendly error message if e.g. "git branch -m A B/C" would have
  a D/F conflict with a "B" file.

  Actually we'd probably die before that in case reflogs for the
  branch existed, i.e. when the try to rename() or copy_file() the
  relevant reflog, since if we've got a D/F conflict with a branch
  name we'll probably also have the same with its reflogs (but not
  necessarily, we might have reflogs, but it might not).

  As some #leftoverbits that code seems buggy to me, i.e. the reflog
  "protocol" should be to get a lock on the main ref, and then perform
  ref and/or reflog operations. That code dates back to
  c976d415e53 (git-branch: add options and tests for branch renaming,
  2006-11-28) and probably pre-dated the solidifying of that
  convention. But in any case, that edge case is not our bug or
  problem right now.

* "git reflog expire <ref>":

  In files_reflog_expire() we'll call this without previous ref
  existence checking in files-backend.c, but that code is in turn
  called by code that's just finished checking if the refname whose
  reflog we're expiring exists.

  See ae35e16cd43 (reflog expire: don't lock reflogs using previously
  seen OID, 2021-08-23) for the current state of that code, and
  5e6f003ca8a (reflog_expire(): ignore --updateref for symbolic
  references, 2015-03-03) for the code we'd break if we only did a
  "update = !!ref" here, which is covered by the aforementioned
  regression test in "t1417-reflog-updateref.sh".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 refs/files-backend.c | 48 ++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0af6ee44552..16e78326381 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1001,7 +1001,7 @@ static int create_reflock(const char *path, void *cb)
  * Locks a ref returning the lock on success and NULL on failure.
  */
 static struct ref_lock *lock_ref_oid_basic(struct files_ref_store *refs,
-					   const char *refname, int *type,
+					   const char *refname,
 					   struct strbuf *err)
 {
 	struct strbuf ref_file = STRBUF_INIT;
@@ -1013,16 +1013,6 @@ static struct ref_lock *lock_ref_oid_basic(struct files_ref_store *refs,
 	CALLOC_ARRAY(lock, 1);

 	files_ref_path(refs, &ref_file, refname);
-	if (!refs_resolve_ref_unsafe(&refs->base, refname,
-				     RESOLVE_REF_NO_RECURSE,
-				     &lock->old_oid, type)) {
-		if (!refs_verify_refname_available(&refs->base, refname,
-						   NULL, NULL, err))
-			strbuf_addf(err, "unable to resolve reference '%s': %s",
-				    refname, strerror(errno));
-
-		goto error_return;
-	}

 	/*
 	 * If the ref did not exist and we are creating it, make sure
@@ -1364,14 +1354,14 @@ static int commit_ref_update(struct files_ref_store *refs,
 			     struct strbuf *err);

 /*
- * Check whether an attempt to rename old_refname to new_refname would
- * cause a D/F conflict with any existing reference (other than
- * possibly old_refname). If there would be a conflict, emit an error
+ * Emit a better error message than lockfile.c's
+ * unable_to_lock_message() would in case there is a D/F conflict with
+ * another existing reference. If there would be a conflict, emit an error
  * message and return false; otherwise, return true.
  *
  * Note that this function is not safe against all races with other
- * processes (though rename_ref() catches some races that might get by
- * this check).
+ * processes, and that's not its job. We'll emit a more verbose error on D/f
+ * conflicts if we get past it into lock_ref_oid_basic().
  */
 static int refs_rename_ref_available(struct ref_store *refs,
 			      const char *old_refname,
@@ -1492,7 +1482,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store,

 	logmoved = log;

-	lock = lock_ref_oid_basic(refs, newrefname, NULL, &err);
+	lock = lock_ref_oid_basic(refs, newrefname, &err);
 	if (!lock) {
 		if (copy)
 			error("unable to copy '%s' to '%s': %s", oldrefname, newrefname, err.buf);
@@ -1514,7 +1504,7 @@ static int files_copy_or_rename_ref(struct ref_store *ref_store,
 	goto out;

  rollback:
-	lock = lock_ref_oid_basic(refs, oldrefname, NULL, &err);
+	lock = lock_ref_oid_basic(refs, oldrefname, &err);
 	if (!lock) {
 		error("unable to lock %s for rollback: %s", oldrefname, err.buf);
 		strbuf_release(&err);
@@ -1921,7 +1911,7 @@ static int files_create_symref(struct ref_store *ref_store,
 	struct ref_lock *lock;
 	int ret;

-	lock = lock_ref_oid_basic(refs, refname, NULL, &err);
+	lock = lock_ref_oid_basic(refs, refname, &err);
 	if (!lock) {
 		error("%s", err.buf);
 		strbuf_release(&err);
@@ -3125,7 +3115,6 @@ static int files_reflog_expire(struct ref_store *ref_store,
 	struct strbuf log_file_sb = STRBUF_INIT;
 	char *log_file;
 	int status = 0;
-	int type;
 	struct strbuf err = STRBUF_INIT;
 	const struct object_id *oid;

@@ -3139,7 +3128,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
 	 * reference itself, plus we might need to update the
 	 * reference if --updateref was specified:
 	 */
-	lock = lock_ref_oid_basic(refs, refname, &type, &err);
+	lock = lock_ref_oid_basic(refs, refname, &err);
 	if (!lock) {
 		error("cannot lock ref '%s': %s", refname, err.buf);
 		strbuf_release(&err);
@@ -3201,9 +3190,20 @@ static int files_reflog_expire(struct ref_store *ref_store,
 		 * a reference if there are no remaining reflog
 		 * entries.
 		 */
-		int update = (flags & EXPIRE_REFLOGS_UPDATE_REF) &&
-			!(type & REF_ISSYMREF) &&
-			!is_null_oid(&cb.last_kept_oid);
+		int update = 0;
+
+		if ((flags & EXPIRE_REFLOGS_UPDATE_REF) &&
+		    !is_null_oid(&cb.last_kept_oid)) {
+			int ignore_errno;
+			int type;
+			const char *ref;
+
+			ref = refs_werrres_ref_unsafe(&refs->base, refname,
+						      RESOLVE_REF_NO_RECURSE,
+						      NULL, &type,
+						      &ignore_errno);
+			update = !!(ref && !(type & REF_ISSYMREF));
+		}

 		if (close_lock_file_gently(&reflog_lock)) {
 			status |= error("couldn't write %s: %s", log_file,
-- 
2.33.1.1338.g20da966911a

^ permalink raw reply related	[relevance 8%]

* [PATCH 0/2] test-lib.sh: add BAIL_OUT function, use it for SANITIZE=leak
@ 2021-10-14  0:47 16% Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-10-14  0:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

This series adds a BAIL_OUT function, and uses it when the new
GIT_TEST_PASSING_SANITIZE_LEAK=true mode is misused.

Once we have this function we'll be able to use it for any other error
that's a cause for aborting the entire test run.

I experimented with making BUG() and error() always be a "BAIL_OUT". I
think that's worth pursuing, but e.g. for the error about missing
"&&-chains" we'd need to support emitting multi-line messages.

TAP consumers only understand what follows the "Bail out!" message up
to the first "\n", so we can't quote the entire "test_expect_success",
as the "&&-chain" error does. I think emitting them with "say_error()"
beforehand (piped with ">&7" in the case of "BUG()") should work, but
let's leave those #leftoverbits for later.

Ævar Arnfjörð Bjarmason (2):
  test-lib.sh: de-duplicate error() teardown code
  test-lib.sh: use "Bail out!" syntax on bad SANITIZE=leak use

 t/test-lib.sh | 28 +++++++++++++++++++++-------
 1 file changed, 21 insertions(+), 7 deletions(-)

-- 
2.33.1.1346.g48288c3c089

^ permalink raw reply	[relevance 16%]

* [PATCH 06/20] refs/files: remove "name exist?" check in lock_ref_oid_basic()
  @ 2021-10-14  0:06 10% ` Ævar Arnfjörð Bjarmason
    1 sibling, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-10-14  0:06 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Han-Wen Nienhuys,
	Ævar Arnfjörð Bjarmason

In lock_ref_oid_basic() we'll happily lock a reference that doesn't
exist yet. That's normal, and is how references are initially born,
but we don't need to retain checks here in lock_ref_oid_basic() about
the state of the ref, when what we're checking is either checked
already, or something we're about to discover by trying to lock the
ref with raceproof_create_file().

The improved diagnostics here were added in
5b2d8d6f218 (lock_ref_sha1_basic(): improve diagnostics for ref D/F
conflicts, 2015-05-11), and then much of the surrounding code went
away recently in my 245fbba46d6 (refs/files: remove unused "errno ==
EISDIR" code, 2021-08-23).

The refs_resolve_ref_unsafe() code being removed here looks like it
should be tasked with doing that, but it's actually redundant to other
code.

The reason for that is as noted in 245fbba46d6 this once widely used
function now only has a handful of callers left, which all handle this
case themselves.

To the extent that we're racy between their check and ours removing
this check actually improves the situation, as we'll be doing fewer
things between the not-under-lock initial check and acquiring the
lock.

Why this is OK for all the remaining callers of lock_ref_oid_basic()
is noted below. There are only two of those callers:

* "git branch -[cm] <oldbranch> <newbranch>":

  In files_copy_or_rename_ref() we'll call this when we copy or rename
  refs via rename_ref() and copy_ref(). but only after we've checked
  if the refname exists already via its own call to
  refs_resolve_ref_unsafe() and refs_rename_ref_available().

  As the updated comment to the latter here notes neither of those are
  actually needed. If we delete not only this code but also
  refs_rename_ref_available() we'll do just fine, we'll just emit a
  less friendly error message if e.g. "git branch -m A B/C" would have
  a D/F conflict with a "B" file.

  Actually we'd probably die before that in case reflogs for the
  branch existed, i.e. when the try to rename() or copy_file() the
  relevant reflog, since if we've got a D/F conflict with a branch
  name we'll probably also have the same with its reflogs (but not
  necessarily, we might have reflogs, but it might not).

  As some #leftoverbits that code seems buggy to me, i.e. the reflog
  "protocol" should be to get a lock on the main ref, and then perform
  ref and/or reflog operations. That code dates back to
  c976d415e53 (git-branch: add options and tests for branch renaming,
  2006-11-28) and probably pre-dated the solidifying of that
  convention. But in any case, that edge case is not our bug or
  problem right now.

* "git reflog expire <ref>":

  In files_reflog_expire() we'll call this without previous ref
  existence checking in files-backend.c, but that code is in turn
  called by code that's just finished checking if the refname whose
  reflog we're expiring exists.

  See ae35e16cd43 (reflog expire: don't lock reflogs using previously
  seen OID, 2021-08-23) for the current state of that code.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 refs/files-backend.c | 20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0af6ee44552..0dd21b2e205 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1013,16 +1013,6 @@ static struct ref_lock *lock_ref_oid_basic(struct files_ref_store *refs,
 	CALLOC_ARRAY(lock, 1);

 	files_ref_path(refs, &ref_file, refname);
-	if (!refs_resolve_ref_unsafe(&refs->base, refname,
-				     RESOLVE_REF_NO_RECURSE,
-				     &lock->old_oid, type)) {
-		if (!refs_verify_refname_available(&refs->base, refname,
-						   NULL, NULL, err))
-			strbuf_addf(err, "unable to resolve reference '%s': %s",
-				    refname, strerror(errno));
-
-		goto error_return;
-	}

 	/*
 	 * If the ref did not exist and we are creating it, make sure
@@ -1364,14 +1354,14 @@ static int commit_ref_update(struct files_ref_store *refs,
 			     struct strbuf *err);

 /*
- * Check whether an attempt to rename old_refname to new_refname would
- * cause a D/F conflict with any existing reference (other than
- * possibly old_refname). If there would be a conflict, emit an error
+ * Emit a better error message than lockfile.c's
+ * unable_to_lock_message() would in case there is a D/F conflict with
+ * another existing reference. If there would be a conflict, emit an error
  * message and return false; otherwise, return true.
  *
  * Note that this function is not safe against all races with other
- * processes (though rename_ref() catches some races that might get by
- * this check).
+ * processes, and that's not its job. We'll emit a more verbose error on D/f
+ * conflicts if we get past it into lock_ref_oid_basic().
  */
 static int refs_rename_ref_available(struct ref_store *refs,
 			      const char *old_refname,
-- 
2.33.1.1346.g48288c3c089

^ permalink raw reply related	[relevance 10%]

* [PATCH v2 1/2] object.[ch]: mark object type names for translation
  2021-10-04 14:27  7% ` [PATCH v2 " Ævar Arnfjörð Bjarmason
@ 2021-10-04 14:27 13%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-10-04 14:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Mark the "commit", "tree", "blob" and "tag" types for translation, and
add an extern "unknown type" string for the OBJ_NONE case.

It is usually bad practice to translate individual words like this,
but for e.g. the list list output emitted by the "short object ID dead
is ambiguous" advice it makes sense.

A subsequent commit will make that output translatable, and use these
translation markings to do so. Well, we won't use "commit", but let's
mark it up anyway for consistency. It'll probably come in handy sooner
than later to have it already be translated, and it's to much of a
burden to place on translators if they're translating the other three
object types anyway.

Aside: I think it would probably make sense to change the "NULL" entry
for type_name() to be the "unknown type". I've ran into cases where
type_name() was unconditionally interpolated in e.g. an sprintf()
format, but let's leave that for #leftoverbits as that would be
changing the behavior of the type_name() function.

All of these will be new in the git.pot file, except "blob" which will
be shared with a "cat-file" command-line option, see
7bcf3414535 (cat-file --textconv/--filters: allow specifying the path
separately, 2016-09-09) for its introduction.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object.c | 27 +++++++++++++++++++++++----
 object.h |  1 +
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/object.c b/object.c
index 4e85955a941..47dbe0d8a2a 100644
--- a/object.c
+++ b/object.c
@@ -22,12 +22,31 @@ struct object *get_indexed_object(unsigned int idx)

 static const char *object_type_strings[] = {
 	NULL,		/* OBJ_NONE = 0 */
-	"commit",	/* OBJ_COMMIT = 1 */
-	"tree",		/* OBJ_TREE = 2 */
-	"blob",		/* OBJ_BLOB = 3 */
-	"tag",		/* OBJ_TAG = 4 */
+	/*
+	 * TRANSLATORS: "commit", "tree", "blob" and "tag" are the
+	 * name of Git's object types. These names are interpolated
+	 * stand-alone when doing so is unambiguous for translation
+	 * and doesn't require extra context. E.g. as part of an
+	 * already-translated string that needs to have a type name
+	 * quoted verbatim, or the short description of a command-line
+	 * option expecting a given type.
+	 */
+	N_("commit"),	/* OBJ_COMMIT = 1 */
+	N_("tree"),	/* OBJ_TREE = 2 */
+	N_("blob"),	/* OBJ_BLOB = 3 */
+	N_("tag"),	/* OBJ_TAG = 4 */
 };

+/*
+ * TRANSLATORS: This is the short type name of an object that's not
+ * one of Git's known object types, as opposed to "commit", "tree",
+ * "blob" and "tag" above.
+ *
+ * A user is unlikely to ever encounter these, but they can be
+ * manually created with "git hash-object --literally".
+ */
+const char *unknown_type = N_("unknown type");
+
 const char *type_name(unsigned int type)
 {
 	if (type >= ARRAY_SIZE(object_type_strings))
diff --git a/object.h b/object.h
index 549f2d256bc..0510dc4b3ea 100644
--- a/object.h
+++ b/object.h
@@ -91,6 +91,7 @@ struct object {
 	struct object_id oid;
 };

+extern const char *unknown_type;
 const char *type_name(unsigned int type);
 int type_from_string_gently(const char *str, ssize_t, int gentle);
 #define type_from_string(str) type_from_string_gently(str, -1, 0)
-- 
2.33.0.1409.ge73c1ecc5b4

^ permalink raw reply related	[relevance 13%]

* [PATCH v2 0/2] i18n: improve translatability of ambiguous object output
  @ 2021-10-04 14:27  7% ` Ævar Arnfjörð Bjarmason
  2021-10-04 14:27 13%   ` [PATCH v2 1/2] object.[ch]: mark object type names for translation Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-10-04 14:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

A mostly-rewritten version in response to the discussion concluding at
http://lore.kernel.org/git/YVrudGOcUxblsfPY@coredump.intra.peff.net;
thanks a lot for the thorough review Jeff!

Ævar Arnfjörð Bjarmason (2):
  object.[ch]: mark object type names for translation
  object-name: make ambiguous object output translatable

 object-name.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++-----
 object.c      | 27 ++++++++++++++++---
 object.h      |  1 +
 3 files changed, 90 insertions(+), 10 deletions(-)

Range-diff against v1:
1:  7085f951a12 ! 1:  55bde16aa23 object-name tests: tighten up advise() output test
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    object-name tests: tighten up advise() output test
    +    object.[ch]: mark object type names for translation
     
    -    Change tests added in 1ffa26c4614 (get_short_sha1: list ambiguous
    -    objects on error, 2016-09-26) to only care about the OIDs that are
    -    listed, which is what the test is trying to check for.
    +    Mark the "commit", "tree", "blob" and "tag" types for translation, and
    +    add an extern "unknown type" string for the OBJ_NONE case.
     
    -    This isn't needed by the subsequent commit, which won't change any of
    -    the output, but a mere tightening of the tests assertions to more
    -    closely match what we really want to test for here.
    +    It is usually bad practice to translate individual words like this,
    +    but for e.g. the list list output emitted by the "short object ID dead
    +    is ambiguous" advice it makes sense.
     
    -    Now if the advise() message itself were change the phrasing around the
    -    list of OIDs we won't have this test break. We're assuming that such
    -    output won't have a need to indent anything except the OIDs.
    +    A subsequent commit will make that output translatable, and use these
    +    translation markings to do so. Well, we won't use "commit", but let's
    +    mark it up anyway for consistency. It'll probably come in handy sooner
    +    than later to have it already be translated, and it's to much of a
    +    burden to place on translators if they're translating the other three
    +    object types anyway.
    +
    +    Aside: I think it would probably make sense to change the "NULL" entry
    +    for type_name() to be the "unknown type". I've ran into cases where
    +    type_name() was unconditionally interpolated in e.g. an sprintf()
    +    format, but let's leave that for #leftoverbits as that would be
    +    changing the behavior of the type_name() function.
    +
    +    All of these will be new in the git.pot file, except "blob" which will
    +    be shared with a "cat-file" command-line option, see
    +    7bcf3414535 (cat-file --textconv/--filters: allow specifying the path
    +    separately, 2016-09-09) for its introduction.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## t/t1512-rev-parse-disambiguation.sh ##
    -@@ t/t1512-rev-parse-disambiguation.sh: test_expect_success 'ambiguity errors are not repeated (peel)' '
    + ## object.c ##
    +@@ object.c: struct object *get_indexed_object(unsigned int idx)
      
    - test_expect_success 'ambiguity hints' '
    - 	test_must_fail git rev-parse 000000000 2>stderr &&
    --	grep ^hint: stderr >hints &&
    --	# 16 candidates, plus one intro line
    --	test_line_count = 17 hints
    -+	grep "^hint:   " stderr >hints &&
    -+	# 16 candidates, minus surrounding prose
    -+	test_line_count = 16 hints
    - '
    + static const char *object_type_strings[] = {
    + 	NULL,		/* OBJ_NONE = 0 */
    +-	"commit",	/* OBJ_COMMIT = 1 */
    +-	"tree",		/* OBJ_TREE = 2 */
    +-	"blob",		/* OBJ_BLOB = 3 */
    +-	"tag",		/* OBJ_TAG = 4 */
    ++	/*
    ++	 * TRANSLATORS: "commit", "tree", "blob" and "tag" are the
    ++	 * name of Git's object types. These names are interpolated
    ++	 * stand-alone when doing so is unambiguous for translation
    ++	 * and doesn't require extra context. E.g. as part of an
    ++	 * already-translated string that needs to have a type name
    ++	 * quoted verbatim, or the short description of a command-line
    ++	 * option expecting a given type.
    ++	 */
    ++	N_("commit"),	/* OBJ_COMMIT = 1 */
    ++	N_("tree"),	/* OBJ_TREE = 2 */
    ++	N_("blob"),	/* OBJ_BLOB = 3 */
    ++	N_("tag"),	/* OBJ_TAG = 4 */
    + };
      
    - test_expect_success 'ambiguity hints respect type' '
    - 	test_must_fail git rev-parse 000000000^{commit} 2>stderr &&
    --	grep ^hint: stderr >hints &&
    --	# 5 commits, 1 tag (which is a committish), plus intro line
    --	test_line_count = 7 hints
    -+	grep "^hint:   " stderr >hints &&
    -+	# 5 commits, 1 tag (which is a committish), minus surrounding prose
    -+	test_line_count = 6 hints
    - '
    - 
    - test_expect_success 'failed type-selector still shows hint' '
    -@@ t/t1512-rev-parse-disambiguation.sh: test_expect_success 'failed type-selector still shows hint' '
    - 	echo 851 | git hash-object --stdin -w &&
    - 	echo 872 | git hash-object --stdin -w &&
    - 	test_must_fail git rev-parse ee3d^{commit} 2>stderr &&
    --	grep ^hint: stderr >hints &&
    --	test_line_count = 3 hints
    -+	grep "^hint:   " stderr >hints &&
    -+	test_line_count = 2 hints
    - '
    ++/*
    ++ * TRANSLATORS: This is the short type name of an object that's not
    ++ * one of Git's known object types, as opposed to "commit", "tree",
    ++ * "blob" and "tag" above.
    ++ *
    ++ * A user is unlikely to ever encounter these, but they can be
    ++ * manually created with "git hash-object --literally".
    ++ */
    ++const char *unknown_type = N_("unknown type");
    ++
    + const char *type_name(unsigned int type)
    + {
    + 	if (type >= ARRAY_SIZE(object_type_strings))
    +
    + ## object.h ##
    +@@ object.h: struct object {
    + 	struct object_id oid;
    + };
      
    - test_expect_success 'core.disambiguate config can prefer types' '
    ++extern const char *unknown_type;
    + const char *type_name(unsigned int type);
    + int type_from_string_gently(const char *str, ssize_t, int gentle);
    + #define type_from_string(str) type_from_string_gently(str, -1, 0)
2:  b6136380c28 ! 2:  c0e873543f5 object-name: make ambiguous object output translatable
    @@ Commit message
         tweaked in [2] to be more friendly to translators. By being able to
         customize the sprintf formats we're even ready for RTL languages.
     
    -    1. ef9b0370da6 (sha1-name.c: store and use repo in struct
    -       disambiguate_state, 2019-04-16)
    +    The "unknown type" message here is unreachable, and has been since
    +    [1], i.e. that code has never worked. If we craft an object of a bogus
    +    type with a conflicting prefix we'll just die:
    +
    +        $ git rev-parse 8315
    +        error: short object ID 8315 is ambiguous
    +        hint: The candidates are:
    +        fatal: invalid object type
    +
    +    But let's continue to pretend that this works, we can eventually use
    +    the API improvements in my ab/fsck-unexpected-type (once it lands) to
    +    inspect these objects and emit the actual type here, or at least not
    +    die as we emit "unknown type".
    +
    +    1. 1ffa26c461 (get_short_sha1: list ambiguous objects on error,
    +       2016-09-26)
         2. 5cc044e0257 (get_short_oid: sort ambiguous objects by type,
            then SHA-1, 2018-05-10)
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## object-name.c ##
    -@@ object-name.c: static int init_object_disambiguation(struct repository *r,
    - 	return 0;
    - }
    - 
    -+struct show_ambiguous_state {
    -+	const struct disambiguate_state *ds;
    -+	struct strbuf *advice;
    -+};
    -+
    - static int show_ambiguous_object(const struct object_id *oid, void *data)
    +@@ object-name.c: static int show_ambiguous_object(const struct object_id *oid, void *data)
      {
    --	const struct disambiguate_state *ds = data;
    -+	struct show_ambiguous_state *state = data;
    -+	const struct disambiguate_state *ds = state->ds;
    -+	struct strbuf *advice = state->advice;
    + 	const struct disambiguate_state *ds = data;
      	struct strbuf desc = STRBUF_INIT;
    ++	struct strbuf ci_ad = STRBUF_INIT;
    ++	struct strbuf ci_s = STRBUF_INIT;
      	int type;
    ++	const char *tag_desc = NULL;
    ++	const char *abbrev;
      
    + 	if (ds->fn && !ds->fn(ds->repo, oid, ds->cb_data))
    + 		return 0;
     @@ object-name.c: static int show_ambiguous_object(const struct object_id *oid, void *data)
      		if (commit) {
      			struct pretty_print_context pp = {0};
      			pp.date_mode.type = DATE_SHORT;
     -			format_commit_message(commit, " %ad - %s", &desc, &pp);
    -+			format_commit_message(commit, _(" %ad - %s"), &desc, &pp);
    ++			format_commit_message(commit, "%ad", &ci_ad, &pp);
    ++			format_commit_message(commit, "%s", &ci_s, &pp);
      		}
      	} else if (type == OBJ_TAG) {
      		struct tag *tag = lookup_tag(ds->repo, oid);
      		if (!parse_tag(tag) && tag->tag)
     -			strbuf_addf(&desc, " %s", tag->tag);
    -+			strbuf_addf(&desc, _(" %s"), tag->tag);
    ++			tag_desc = tag->tag;
      	}
      
     -	advise("  %s %s%s",
     -	       repo_find_unique_abbrev(ds->repo, oid, DEFAULT_ABBREV),
     -	       type_name(type) ? type_name(type) : "unknown type",
     -	       desc.buf);
    -+	strbuf_addf(advice,
    -+		    /*
    -+		     * TRANSLATORS: This is a line of ambiguous object
    -+		     * output. E.g.:
    -+		     *
    -+		     *    "deadbeef commit 2021-01-01 - Some Commit Message\n"
    -+		     *    "deadbeef tag Some Tag Message\n"
    -+		     *    "deadbeef tree\n"
    -+		     *
    -+		     * I.e. the first argument is a short OID, the
    -+		     * second is the type name of the object, and the
    -+		     * third a description of the object, if it's a
    -+		     * commit or tag. In that case the " %ad - %s" and
    -+		     * " %s" formats above will be used for the third
    -+		     * argument.
    -+		     */
    -+		    _("  %s %s%s\n"),
    -+		    repo_find_unique_abbrev(ds->repo, oid, DEFAULT_ABBREV),
    -+		    type_name(type) ? type_name(type) : "unknown type",
    -+		    desc.buf);
    - 
    - 	strbuf_release(&desc);
    - 	return 0;
    -@@ object-name.c: static enum get_oid_result get_short_oid(struct repository *r,
    - 	}
    - 
    - 	if (!quietly && (status == SHORT_NAME_AMBIGUOUS)) {
    -+		struct strbuf sb = STRBUF_INIT;
    - 		struct oid_array collect = OID_ARRAY_INIT;
    -+		struct show_ambiguous_state as = {
    -+			.ds = &ds,
    -+			.advice = &sb,
    -+		};
    - 
    - 		error(_("short object ID %s is ambiguous"), ds.hex_pfx);
    - 
    -@@ object-name.c: static enum get_oid_result get_short_oid(struct repository *r,
    - 		if (!ds.ambiguous)
    - 			ds.fn = NULL;
    - 
    --		advise(_("The candidates are:"));
    - 		repo_for_each_abbrev(r, ds.hex_pfx, collect_ambiguous, &collect);
    - 		sort_ambiguous_oid_array(r, &collect);
    - 
    --		if (oid_array_for_each(&collect, show_ambiguous_object, &ds))
    -+		if (oid_array_for_each(&collect, show_ambiguous_object, &as))
    - 			BUG("show_ambiguous_object shouldn't return non-zero");
    -+
    ++	abbrev = repo_find_unique_abbrev(ds->repo, oid, DEFAULT_ABBREV);
    ++	if (type == OBJ_COMMIT) {
     +		/*
    -+		 * TRANSLATORS: The argument is the list of ambiguous
    -+		 * objects composed in show_ambiguous_object(). See
    -+		 * its "TRANSLATORS" comment for details.
    ++		 * TRANSLATORS: This is a line of ambiguous commit
    ++		 * object output. E.g.:
    ++		 *
    ++		 *    "deadbeef commit 2021-01-01 - Some Commit Message"
    ++		 *
    ++		 * The second argument is the "commit" string from
    ++		 * object.c, it should (hopefully) already be
    ++		 * translated.
     +		 */
    -+		advise(_("The candidates are:\n\n%s"), sb.buf);
    ++		strbuf_addf(&desc, _("%s %s %s - %s"), abbrev, ci_ad.buf,
    ++			    _(type_name(type)), ci_s.buf);
    ++	} else if (tag_desc) {
    ++		/*
    ++		 * TRANSLATORS: This is a line of
    ++		 * ambiguous tag object output. E.g.:
    ++		 *
    ++		 *    "deadbeef tag Some Tag Message"
    ++		 *
    ++		 * The second argument is the "tag" string from
    ++		 * object.c, it should (hopefully) already be
    ++		 * translated.
    ++		 */
    ++		strbuf_addf(&desc, _("%s %s %s"), abbrev, _(type_name(type)),
    ++			    tag_desc);
    ++	} else {
    ++		const char *tname = type_name(type) ? _(type_name(type)) :
    ++			_(unknown_type);
    ++		/*
    ++		 * TRANSLATORS: This is a line of ambiguous <type>
    ++		 * object output. Where <type> is one of the object
    ++		 * types of "tree", "blob", "tag" ("commit" is handled
    ++		 * above).
    ++		 *
    ++		 *    "deadbeef tree"
    ++		 *    "deadbeef blob"
    ++		 *    "deadbeef tag"
    ++		 *    "deadbeef unknown type"
    ++		 *
    ++		 * Note that annotated tags use a separate format
    ++		 * outlined above.
    ++		 *
    ++		 * The second argument is the "tree", "blob" or "tag"
    ++		 * string from object.c, or the "unknown type" string
    ++		 * in the case of an unknown type. All of them should
    ++		 * (hopefully) already be translated.
    ++		 */
    ++		strbuf_addf(&desc, _("%s %s"), abbrev, tname);
    ++	}
     +
    - 		oid_array_clear(&collect);
    - 	}
    ++	/*
    ++	 * TRANSLATORS: This is line item of ambiguous object output,
    ++	 * translated above.
    ++	 */
    ++	advise(_("  %s\n"), desc.buf);
    + 
    + 	strbuf_release(&desc);
    ++	strbuf_release(&ci_ad);
    ++	strbuf_release(&ci_s);
    + 	return 0;
    + }
      
-- 
2.33.0.1409.ge73c1ecc5b4


^ permalink raw reply	[relevance 7%]

* [PATCH] http: check CURLE_SSL_PINNEDPUBKEYNOTMATCH when emitting errors
@ 2021-09-24 10:08 12% Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-09-24 10:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Change the error shown when a http.pinnedPubKey doesn't match to point
the http.pinnedPubKey variable added in aeff8a61216 (http: implement
public key pinning, 2016-02-15), e.g.:

    git -c http.pinnedPubKey=sha256/someNonMatchingKey ls-remote https://github.com/git/git.git
    fatal: unable to access 'https://github.com/git/git.git/' with http.pinnedPubkey configuration: SSL: public key does not match pinned public key!

Before this we'd emit the exact same thing without the " with
http.pinnedPubkey configuration". The advantage of doing this is that
we're going to get a translated message (everything after the ":" is
hardcoded in English in libcurl), and we've got a reference to the
git-specific configuration variable that's causing the error.

Unfortunately we can't test this easily, as there are no tests that
require https:// in the test suite, and t/lib-httpd.sh doesn't know
how to set up such tests. See [1] for the start of a discussion about
what it would take to have divergent "t/lib-httpd/apache.conf" test
setups. #leftoverbits

1. https://lore.kernel.org/git/YUonS1uoZlZEt+Yd@coredump.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

I had this waiting on the now-landed ab/http-drop-old-curl-plus due to
adding a new entry to git-curl-compat.h.

 git-curl-compat.h | 3 ++-
 http.c            | 4 ++++
 http.h            | 1 +
 remote-curl.c     | 4 ++++
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/git-curl-compat.h b/git-curl-compat.h
index a308bdb3b9b..56a83b6bbd8 100644
--- a/git-curl-compat.h
+++ b/git-curl-compat.h
@@ -64,16 +64,17 @@
 #if LIBCURL_VERSION_NUM >= 0x072200
 #define GIT_CURL_HAVE_CURL_SSLVERSION_TLSv1_0
 #endif
 
 /**
  * CURLOPT_PINNEDPUBLICKEY was added in 7.39.0, released in November
- * 2014.
+ * 2014. CURLE_SSL_PINNEDPUBKEYNOTMATCH was added in that same version.
  */
 #if LIBCURL_VERSION_NUM >= 0x072c00
 #define GIT_CURL_HAVE_CURLOPT_PINNEDPUBLICKEY 1
+#define GIT_CURL_HAVE_CURLE_SSL_PINNEDPUBKEYNOTMATCH 1
 #endif
 
 /**
  * CURL_HTTP_VERSION_2 was added in 7.43.0, released in June 2015.
  *
  * The CURL_HTTP_VERSION_2 alias (but not CURL_HTTP_VERSION_2_0) has
diff --git a/http.c b/http.c
index d7c20493d7f..b6735b51c31 100644
--- a/http.c
+++ b/http.c
@@ -1486,12 +1486,16 @@ static int handle_curl_result(struct slot_results *results)
 		 * certificate, bad password, or something else wrong
 		 * with the certificate.  So we reject the credential to
 		 * avoid caching or saving a bad password.
 		 */
 		credential_reject(&cert_auth);
 		return HTTP_NOAUTH;
+#ifdef GIT_CURL_HAVE_CURLE_SSL_PINNEDPUBKEYNOTMATCH
+	} else if (results->curl_result == CURLE_SSL_PINNEDPUBKEYNOTMATCH) {
+		return HTTP_NOMATCHPUBLICKEY;
+#endif
 	} else if (missing_target(results))
 		return HTTP_MISSING_TARGET;
 	else if (results->http_code == 401) {
 		if (http_auth.username && http_auth.password) {
 			credential_reject(&http_auth);
 			return HTTP_NOAUTH;
diff --git a/http.h b/http.h
index 3db5a0cf320..df1590e53a4 100644
--- a/http.h
+++ b/http.h
@@ -151,12 +151,13 @@ struct http_get_options {
 #define HTTP_OK			0
 #define HTTP_MISSING_TARGET	1
 #define HTTP_ERROR		2
 #define HTTP_START_FAILED	3
 #define HTTP_REAUTH	4
 #define HTTP_NOAUTH	5
+#define HTTP_NOMATCHPUBLICKEY	6
 
 /*
  * Requests a URL and stores the result in a strbuf.
  *
  * If the result pointer is NULL, a HTTP HEAD request is made instead of GET.
  */
diff --git a/remote-curl.c b/remote-curl.c
index 598cff7cde6..8700dbdc0ac 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -496,12 +496,16 @@ static struct discovery *discover_refs(const char *service, int for_push)
 		die(_("repository '%s' not found"),
 		    transport_anonymize_url(url.buf));
 	case HTTP_NOAUTH:
 		show_http_message(&type, &charset, &buffer);
 		die(_("Authentication failed for '%s'"),
 		    transport_anonymize_url(url.buf));
+	case HTTP_NOMATCHPUBLICKEY:
+		show_http_message(&type, &charset, &buffer);
+		die(_("unable to access '%s' with http.pinnedPubkey configuration: %s"),
+		    transport_anonymize_url(url.buf), curl_errorstr);
 	default:
 		show_http_message(&type, &charset, &buffer);
 		die(_("unable to access '%s': %s"),
 		    transport_anonymize_url(url.buf), curl_errorstr);
 	}
 
-- 
2.33.0.1231.g24d802460a8


^ permalink raw reply related	[relevance 12%]

* Re: [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
  @ 2021-09-16 22:52 15%       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-09-16 22:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak


On Thu, Sep 16 2021, Taylor Blau wrote:

> On Tue, Sep 07, 2021 at 12:57:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Test for what happens when the -t and -s flags are asked to operate on
>> a missing object, this extends tests added in 3e370f9faf0 (t1006: add
>> tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
>> -s flags are the only ones that can be combined with
>> --allow-unknown-type, so let's test with and without that flag.
>
> I'm a little skeptical to have tests for all four pairs of `-t` or `-s`
> and "with `--allow-unknown-type` and without `--allow-unknown-type`".
>
> Testing both the presence and absence of `--allow-unknown-type` seems
> useful to me, but I'm not sure what testing `-t` and `-s` separately
> buys us.
>
> (If you really feel the need test both, I'd encourage looping like:

Thanks, I'll try to simplify it.

>     for arg in -t -s
>     do
>       test_must_fail git cat-file $arg $missing_oid >out 2>err &&
>       test_must_be_empty out &&
>       test_cmp expect.err err &&
>
>       test_must_fail git cat-file $arg --allow-unknown-type $missing_oid >out 2>err &&
>       test_must_be_empty out &&
>       test_cmp expect.err err
>     done &&
>
> but I would be equally or perhaps even happier to just have one of the
> two tests).

A loop like that can be further simplified as just (just inlining
arg=-s):

	test_must_fail git cat-file -s $missing_oid >out 2>err &&
	test_must_be_empty out &&
	test_cmp expect.err err &&

	test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
	test_must_be_empty out &&
	test_cmp expect.err err

:)

I.e. unless you end &&-chains in loops in the test framework with an ||
return 1 you're only testing your last iteration. Aside from whatever
I'm doing here I generally prefer to either just spell it out twice (if
small enough), or:

    for arg in -t -s
    do
        test_expect_success '...' "[... use $arg ...]"
    done

Which both nicely get around the issue of that easy-to-make mistake.

We've got some in-tree tests that are broken this way, well, at least
4cf67869b2a (list-objects.c: don't segfault for missing cmdline objects,
2018-12-05). But I think I'll leave that for a #leftoverbits submission
given my outstanding patch queue..., oh there's another one in
t1010-mktree.sh ... :)

^ permalink raw reply	[relevance 15%]

* Oddidies in the .mailmap parser & future syntax extensions
  @ 2021-09-10 16:48 11%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-09-10 16:48 UTC (permalink / raw)
  To: Gwyneth Morgan
  Cc: Fangyi Zhou, git, Birger Skogeng Pedersen,
	Birger Skogeng Pedersen, Brandon Williams, Brandon Williams,
	CB Bailey, Christopher Díaz Riveros,
	Christopher Díaz Riveros, Ed Maste, Jean-Noël Avila,
	Jean-Noel Avila, Jessica Clarke, Jiang Xin, Jiang Xin,
	Kazuhiro Kato, Kazuhiro Kato, Kevin Willford, Kevin Willford,
	Peter Kaestle, Peter Kaestle, Sibi Siddharthan, Sibi Siddharthan,
	Slavica Đukić, Slavica Djukic

[Changed subject]

On Fri, Sep 10 2021, Gwyneth Morgan wrote:

> On 2021-09-10 14:02:36+0100, Fangyi Zhou wrote:
>> Similar to a35b13fce0 (Update .mailmap, 2018-11-09).
>> 
>> This patch makes the output of `git shortlog -nse v2.10.0..master`
>> duplicate-free by taking/guessing the current and preferred
>> addresses for authors that appear with more than one address.
>
> The line for Jessica Clarke should probably just be
>
> Jessica Clarke <jrtc27@jrtc27.com>
>
> That works the same and doesn't put a reference to an old name.

It does work exactly the same!

More specifically this is an unintentional bug/misfeature/looseness in
the .mailmap parser, an entry like:

    Foo <foo@example.com> Bar

Is exactly equivalent to:

    Foo <foo@example.com>

I.e. we simply ignore the " Bar" part. The reason for this is that we're
internally treating nonsense input as if the line simply ended there.

Even having documented and tested some of this recently in 05b5ff219c2
(mailmap doc + tests: add better examples & test them, 2021-01-12) I
found this a bit surprising. I probably found out at the time, but
forgot and had to go source spelunking again.

I'd expect:

    Foo <foo@example.com> Bar

To be an alias/shorthand for:

    Foo <foo@example.com> Bar <foo@example.com>

Which is something that might be applicable / useful in some
cases.

E.g. a name might change over time from "Foo", to "Bar", to "Zar", but
just because we're at "Bar" and want to map "Foo" to "Bar", that might
not mean that we'd like to map any future name at the same address
(i.e. the future "Zar") to the same "Foo".

In practice I suspect that's more commonly what people do want to do,
maybe we should warn about it, I did mean to hook some pedantic mode of
the parser at some point up to git-fsck.

More annoying is that this:

    New <foo@example.com> <bar@example.com>
    <foo@example.com> <zar@example.com>

Doesn't mean the same as:

    New <foo@example.com> <bar@example.com>
    New <foo@example.com> <zar@example.com>

I.e. I'd expect the name to map to the empty string, *unless* we saw an
earlier address, i.e. just as we do for the first bar -> foo line (we
map it to a name of "New", we don't map it to an empty name).

So that's some #leftoverbits, perhaps someone somewhere relies on that,
but it seems like an obvious shorthand to have. I can't imagine it being
useful to map to empty names, and much of e.g. git.git's mailmap is
repeated entries with the same name over and over again.

I suppose we could also extend it to new syntax such as:

    New <foo@example.com> <bar@example.com> <zar@example.com>

Doing that would be strictly backwards compatible, i.e. now we'll
entirely ignore the 3rd E-Mail address. It does mean we also
accidentally support things like:

    New <foo@example.com> <bar@example.com> # A comment, because we ignore everything after the 2nd address

But don't tell anyone I told you that :) But that is something that
might technically have inadvertently closed the door to future syntax
extensions, but we could probably do them anyway, or at worst have some
heuristic.

Another useful thing might be to support:

    New <> Old <>

As an explicit mapping of the name "Old" wherever we see it to "New", or:

    New <> Old <>

To change just the name "Old" to "New" everywhere, without considering
the E-Mail address. Both of those are probably too crazy to be useful,
especially since if we supported that we'd logically also support:

    New <> <>

To assign all the commits to the name "New", but retain the address.

^ permalink raw reply	[relevance 11%]

* Re: [PATCH v2 4/4] pack-write: rename *.idx file into place last (really!)
  @ 2021-09-08  1:14 15%     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-09-08  1:14 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Taylor Blau,
	Ævar Arnfjörð Bjarmason

On Wed, Sep 08 2021, Ævar Arnfjörð Bjarmason wrote:

> Follow-up a preceding commit (pack-write.c: rename `.idx` file into
> place last, 2021-08-16)[1] and rename the *.idx file in-place after we
> write the *.bitmap. The preceding commit fixed the issue of *.idx
> being written before *.rev files, but did not do so for *.idx files.
>
> See 7cc8f971085 (pack-objects: implement bitmap writing, 2013-12-21)
> for commentary at the time when *.bitmap was implemented about how
> those files are written out, nothing in that commit contradicts what's
> being done here.
>
> Note that the referenced earlier commit[1] is overly optimistic about
> "clos[ing the] race", i.e. yes we'll now write the files in the right
> order, but we might still race due to our sloppy use of fsync(). See
> the thread at [2] for a rabbit hole of various discussions about
> filesystem races in the face of doing and not doing fsync() (and if
> doing fsync(), not doing it properly).

Actually I think it's a bit worse than that, we will unconditionally
fsync() the *.pack we write out, but in stage_tmp_packfiles() (the
behavior pre-dates both this series and its parent, I just think my
stage_tmp_packfiles() is easier to follow) we'll not write the *.idx
file with fsync() since we won't pass WRITE_IDX_VERIFY.

The same goes for *.rev (which oddly makes its fsync() conditional on
WRITE_IDX_VERIFY), but not *.bitmap, which fsyncs unconditionally just
like *.pack does.

And then of course we'll do all these in-place renames but nothing
fsyncs the fd of the directory, so the metadata and new names being
committed to disk & visible to other processes is anyone's guess.

But not only is that metadata commit not made, but due to the
inconsistent fsync() we might end up with an *.idx that's partial and
renamed in-place.

In any case, any such issues pre-date this series and the series by
Taylor it depends on, just adding some #leftoverbits for future fsync()
fixes since I spent time looking into it.

^ permalink raw reply	[relevance 15%]

* Re: [PATCH] refs file backend: remove dead "errno == EISDIR" code
  @ 2021-07-14 19:07 12%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-07-14 19:07 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Han-Wen Nienhuys, Michael Haggerty


On Wed, Jul 14 2021, Jeff King wrote:

> On Wed, Jul 14, 2021 at 01:17:14PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Since a1c1d8170d (refs_resolve_ref_unsafe: handle d/f conflicts for
>> writes, 2017-10-06) we don't, because our our callstack will look
>> something like:
>> 
>>     files_copy_or_rename_ref() -> lock_ref_oid_basic() -> refs_resolve_ref_unsafe()
>> 
>> And then the refs_resolve_ref_unsafe() call here will in turn (in the
>> code added in a1c1d8170d) do the equivalent of this (via a call to
>> refs_read_raw_ref()):
>> 
>> 	/* Via refs_read_raw_ref() */
>> 	fd = open(path, O_RDONLY);
>> 	if (fd < 0)
>> 		/* get errno == EISDIR */
>> 	/* later, in refs_resolve_ref_unsafe() */
>> 	if ([...] && errno != EISDIR)
>> 		return NULL;
>> 	[...]
>> 	/* returns the refs/heads/foo to the caller, even though it's a directory */
>> 	return refname;
>
> Isn't that pseudo-code missing a conditional that's there in the real
> code? In refs_resolve_ref_unsafe(), I see:
>
>        if (refs_read_raw_ref(refs, refname,
>                              oid, &sb_refname, &read_flags)) {
>                *flags |= read_flags;
>
>                /* In reading mode, refs must eventually resolve */
>                if (resolve_flags & RESOLVE_REF_READING)
>                        return NULL;
>
>                /*
>                 * Otherwise a missing ref is OK. But the files backend
>                 * may show errors besides ENOENT if there are
>                 * similarly-named refs.
>                 */
>                if (errno != ENOENT &&
>                    errno != EISDIR &&
>                    errno != ENOTDIR)
>                        return NULL;
>
> So if RESOLVE_REF_READING is set, we can return NULL immediately, with
> errno set to EISDIR. Which contradicts this:

I opted (perhaps unwisely) to elide that since as you note above we
don't take that path in relation to the removed code. I.e. I'm
describing the relevant codepath we take nowadays given the code & its
callers.

But will reword etc., thanks.

>> I.e. even though we got an "errno == EISDIR" we won't take this
>> branch, since in cases of EISDIR "resolved" is always
>> non-NULL. I.e. we pretend at this point as though everything's OK and
>> there is no "foo" directory.
>
> So when is RESOLVE_REF_READING set? The resolve_flags parameter is
> passed in by the caller. In lock_ref_oid_basic(), it comes from this:
>
>     int mustexist = (old_oid && !is_null_oid(old_oid));
>     [...]
>     if (mustexist)
>             resolve_flags |= RESOLVE_REF_READING;
>
> So do any callers pass in old_oid? Surprisingly few. It used to be
> called from other locking functions, but these days it looks like it is
> only files_reflog_expire().

In general (and not being too familiar with this area) and per:

    7521cc4611 (refs.c: make delete_ref use a transaction, 2014-04-30)
    92b1551b1d (refs: resolve symbolic refs first, 2016-04-25)
    029cdb4ab2 (refs.c: make prune_ref use a transaction to delete the ref, 2014-04-30)

And:

    https://lore.kernel.org/git/20140902205841.GA18279@google.com/    

I wonder if these remaining cases can be migrated over to lock_raw_ref()
or the transaction API, as many other similar callers have been already.

But that's a bigger change, I won't be doing that now, just wondering if
these are some #leftoverbits or if there's a good reason they were left.

> I'm not sure if this case is important or not. If we're expecting the
> ref to exist, then an in-the-way directory is going to mean failure
> either way. It could still exist within the packed-refs file, but then
> refs_read_raw_ref() would not return failure.
>
> So...I think it's fine? But the argument in your commit message seems to
> have missed this case entirely.

Perhaps more succinctly: If we have a directory in the way, it's going
to be impossible for the "old_oid" condition to be satisfied in any case
in the file backend.

Even if we still had a caller that did "care" about that what could they
hope to get from an "old_oid=<some-OID>" for a lock on "foo/bar" where
"foo" is an empty directory?

Except of course for the case where it's not a directory but packed, but
as you noted that's handled in another case.

Perhaps it's informative that the below diff-on-top also passes all
tests, i.e. that we have largely the same
"refs_read_raw_ref(refs->packed_ref_store" copy/pasted in
files_read_raw_ref() in two adjacent places, we're just changing what
errno we pass upwards.

It thoroughly tramples on Han-Wen's series, and it's easier to deal with
(if at all) once his lands, just thought it might be interesting:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 7e4963fd07..4a97cd48d9 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -356,6 +356,8 @@ static int files_read_raw_ref(struct ref_store *ref_store,
 	int ret = -1;
 	int save_errno;
 	int remaining_retries = 3;
+	int lstat_bad_or_not_file = 0;
+	int lstat_errno = 0;
 
 	*type = 0;
 	strbuf_reset(&sb_path);
@@ -382,11 +384,28 @@ static int files_read_raw_ref(struct ref_store *ref_store,
 		goto out;
 
 	if (lstat(path, &st) < 0) {
-		if (errno != ENOENT)
+		lstat_bad_or_not_file = 1;
+		lstat_errno = errno;
+	} else if (S_ISDIR(st.st_mode)) {
+		/*
+		 * Maybe it's an empty directory, maybe it's not, in
+		 * either case this ref does not exist in the files
+		 * backend (but may be packet), later code will handle
+		 * the "create and maybe remove_empty_directories()"
+		 * case if needed, or die otherwise.
+		 */
+		lstat_bad_or_not_file = 1;
+	}
+
+	if (lstat_bad_or_not_file) {
+		if (lstat_errno && lstat_errno != ENOENT)
 			goto out;
 		if (refs_read_raw_ref(refs->packed_ref_store, refname,
 				      oid, referent, type)) {
-			errno = ENOENT;
+			if (lstat_errno)
+				errno = ENOENT;
+			else
+				errno = EISDIR;
 			goto out;
 		}
 		ret = 0;
@@ -417,22 +436,6 @@ static int files_read_raw_ref(struct ref_store *ref_store,
 		 */
 	}
 
-	/* Is it a directory? */
-	if (S_ISDIR(st.st_mode)) {
-		/*
-		 * Even though there is a directory where the loose
-		 * ref is supposed to be, there could still be a
-		 * packed ref:
-		 */
-		if (refs_read_raw_ref(refs->packed_ref_store, refname,
-				      oid, referent, type)) {
-			errno = EISDIR;
-			goto out;
-		}
-		ret = 0;
-		goto out;
-	}
-
 	/*
 	 * Anything else, just open it and try to use it as
 	 * a ref

^ permalink raw reply related	[relevance 12%]

* Re: [PATCH v8 1/2] [GSOC] commit: add --trailer option
  @ 2021-03-17  8:08  6%         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-03-17  8:08 UTC (permalink / raw)
  To: ZheNing Hu
  Cc: ZheNing Hu via GitGitGadget, Git List, Bradley M. Kuhn,
	Junio C Hamano, Brandon Casey, Shourya Shukla, Christian Couder,
	Rafael Silva


On Wed, Mar 17 2021, ZheNing Hu wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> 于2021年3月16日周二 下午8:52写道：
>> > +             if (run_command(&run_trailer))
>> > +                     strvec_clear(&run_trailer.args);
>>
>> This is git-commit, shouldn't we die() here instead of ignoring errors
>> in sub-processes?
>
> After thinking about it carefully, your opinion is more
> reasonable, because if the user uses the wrong `--trailer`
> and does not get the information he needs, I think he will
> have to use `--amend` to modify, and `die()` can exit
> this commit directly.

Yeah, we don't want to silently lose data.

>>
>> > +             strvec_clear(&trailer_args);
>> > +     }
>> > +
>> >       /*
>> >        * Reject an attempt to record a non-merge empty commit without
>> >        * explicit --allow-empty. In the cherry-pick case, it may be
>> > @@ -1507,6 +1529,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix)
>> >               OPT_STRING(0, "fixup", &fixup_message, N_("commit"), N_("use autosquash formatted message to fixup specified commit")),
>> >               OPT_STRING(0, "squash", &squash_message, N_("commit"), N_("use autosquash formatted message to squash specified commit")),
>> >               OPT_BOOL(0, "reset-author", &renew_authorship, N_("the commit is authored by me now (used with -C/-c/--amend)")),
>> > +             OPT_CALLBACK_F(0, "trailer", NULL, N_("trailer"), N_("trailer(s) to add"), PARSE_OPT_NONEG, opt_pass_trailer),
>> >               OPT_BOOL('s', "signoff", &signoff, N_("add a Signed-off-by trailer")),
>>
>> Not required for this change, but perhaps a change here to N_() (if we
>> can get it to fit) + doc update saying that we prefer
>> --trailer="Signed-Off-By: to --signoff"? More on that later.
>>
>> >               OPT_FILENAME('t', "template", &template_file, N_("use specified template file")),
>> >               OPT_BOOL('e', "edit", &edit_flag, N_("force edit of commit")),
>> > diff --git a/t/t7502-commit-porcelain.sh b/t/t7502-commit-porcelain.sh
>> > index 6396897cc818..0acf23799931 100755
>> > --- a/t/t7502-commit-porcelain.sh
>> > +++ b/t/t7502-commit-porcelain.sh
>> > @@ -154,6 +154,26 @@ test_expect_success 'sign off' '
>> >
>> >  '
>> >
>> > +test_expect_success 'trailer' '
>> > +     >file1 &&
>> > +     git add file1 &&
>> > +     git commit -s --trailer "Signed-off-by:C O Mitter1 <committer1@example.com>" \
>> > +             --trailer "Helped-by:C O Mitter2 <committer2@example.com>"  \
>> > +             --trailer "Reported-by:C O Mitter3 <committer3@example.com>" \
>> > +             --trailer "Mentored-by:C O Mitter4 <committer4@example.com>" \
>> > +             -m "hello" &&
>> > +     git cat-file commit HEAD >commit.msg &&
>> > +     sed -e "1,7d" commit.msg >actual &&
>> > +     cat >expected <<-\EOF &&
>> > +     Signed-off-by: C O Mitter <committer@example.com>
>> > +     Signed-off-by: C O Mitter1 <committer1@example.com>
>> > +     Helped-by: C O Mitter2 <committer2@example.com>
>> > +     Reported-by: C O Mitter3 <committer3@example.com>
>> > +     Mentored-by: C O Mitter4 <committer4@example.com>
>> > +     EOF
>> > +     test_cmp expected actual
>> > +'
>> > +
>>
>> How does this interact with cases where the user has configured
>> "trailer.separators" to have a value that doesn't contain ":"?  I
>> haven't tested, but my reading of git-interpret-trailers(1) is that if
>> you supplied "=" instead that case would just work:
>>
>>     By default only : is recognized as a trailer separator, except that
>>     = is always accepted on the command line for compatibility with
>>     other git commands.
>>
> But interpret_trailers interface allow us use "=" instead of other separators.
>
> I did a simple test and modified the configuration "trailer.separators"
> and it still works. Now things are good here:
>
> $ git -c trailer.separators="@" commit --trailer="Signed-off-by=C O <email>"
>
> or
>
> $ git -c trailer.separators="@" commit --trailer="Signed-off-by@C O <email>"
>
> Both can work normally,
>
> --trailer="Signed-off-by@ C O <email>"
>
> will output in the commit message.
>
>> I don't know if that does the right thing in the presence of
>> --if-exists=add.
>>
>
> Yesterday, Christian Couder and I had already discussed this issue:
> Your idea is correct, I should not add "--if-exists = add",  this will destroy
> the user's rights to configure by using `git -c trailer.if-exist`.
>
>> So it would be good to update these tests so you test:
>>
>>  * For the --if-exists=add case at all, there's no tests for it
>>    now. I.e. add some trailers manually to the commit (via -F or
>>    whatever) and then see if they get added to, replacet etc.
>>
>>  * Ditto but for the user having configured trailer.separators (see the
>>    test_config helper for how to set config in a test). I.e. if it's "="
>>    does adding trailers work, how about if it's "=" on the CLI but the
>>    config/commit message has ";" instead of ":" or something?
>>
>
> As mentioned above, it works normally.
>
>>  * Hrm, actually I think tweaking "-c trailer.ifexists" won't work at
>>    all, since the CLI switch would override it. I honestly don't know,
>>    but why not not supply it and keep the addIfDifferentNeighbor
>>    default?
>>
>>    If it's essential that seems like a good test / documentation
>>    addition...
>>
>>  * For the above -c ... case I can't think of a good way to deal with it
>>    that doesn't involve pulling in git_trailer_config() into
>>    git_commit_config(), but perhaps the least nasty way is to just set a
>>    flag in git_commit_config() if we see a "trailer.ifexists" flag, and
>>    if so don't provide "--if-exists=add", if there's no config (this
>>    will include "git -c ... commit" we set provide "--if-exists=add" )
>>    or as noted above, maybe we can skip the whole thing and use the
>>    addIfDifferentNeighbor default.
>>
>
> Has been restored to the default settings.

To clarify: What I really mean is for all these things you've tested:
let's add those to the tests as part of the patch.

>> And, not needed for this patch but worth thinking about:
>>
>>  * We pass through --trailer to git-interpret-trailers, what should we
>>    do about the other options? Should git-commit eventually support
>>    --trailer-where and pass it along as --where to
>>    git-interpret-trailers, or is "git -c trailer.where=... commit" good
>>    enough?
>>
> Logically speaking, `interpret_trailers` should be dedicated to `commit`
> or other sub-commands that require trailers.
>
> But I think that in the later stage, the parse_options of the `cmd_commit`
> can keep the unrecognized options, and then these choices can be directly
> passed to the `interpret_trailers` backend.

We have this interaction with e.g. range-diff and "log", it's often
surprising. You add an option to one command and it appears in the
other.

>>  * It would be good to test for and document if that "-c trailer.*"
>>    trick works (no reason it shouldn't). I.e. to add something like this
>>    after what you have (along with tests, and check if it's even true):
>>
>
> I haven't tested them for the time being, but I will do it.
>
>>        Only the `--trailer` argument to
>>        linkgit:git-interpret-trailers[1] is supported. Other
>>        pass-through switches may be added in the future, but currently
>>        you'll need to pass arguments to
>>        linkgit:git-interpret-trailers[1] along as config, e.g. `git -c
>>        trailer.where=start commit [...] --trailer=[...]`.
>>
>
> I think this is worth writing in the documentation.
>
>>  * We have a longer-term goal of having the .mailmap apply to trailers,
>>    it would be nice if git-interpret-trailers had some fuzzy-matching to
>>    check if the RHS of a trailer is a name/E-Mail pair, and if so did
>>    stricter validation on it with the ident functions we use for fsck
>>    etc. (that's copied & subtly different in several different places in
>>    the codebase, unfortunately[1]).
>>
>
> I may not know much about fuzzy-matching, which may be worth studying later.
>
>> More thoughts:
>>
>>  * Having written all the above I checked how --signoff is implemented.
>>
>>    It seems to me to be a good idea to (at least for testing) convert
>>    the --signoff trailer to your implementation. We have plenty of tests
>>    for it, does migrating it over pass or fail those?
>>
> I don’t know how to migrating yet, it may take a long time.
> Even I think I can leave it as #leftoverbit later.

Sure, I mean (having looked at it) that at least for your own local
testing it would make sense to change it (even if just search-replacing
the --signoff in the test suite) to see if it behaves as you
expect. I.e. does the --trailer behavior mirror --signoff?

>>  * I also agree with Junio that we shouldn't have a --fixed-by or
>>    whatever and wouldn't add --signoff today, but it seems very useful
>>    to me to have a shortcut like:
>>
>>        --trailer "Signed-off-by"
>>
>>    I.e. omitting the value, or:
>>
>>       --trailer "Signed-off-by="
>>
>>    Or some other thing we deem sufficiently useful/sane
>>    syntax/unambiguous.n
>>
>>    Then the value would be provided by fmt_name(WANT_COMMITTER_IDENT)
>>    just as we do in append_signoff() now. I think a *very common* case
>>    for this would be something like:
>>
>>        git commit --amend -v --trailer "Reviewed-by"
>>
>>    And it would be useful to help that along and not have to do:
>>
>>        git commit --amend -v --trailer "Reviewed-by=$(git config user.name) <$(git config user.email)>"
>>
>>    Or worse yet, manually typo your name/e-mail address, as I'm sure I
>>    and many others will inevitably do when using this option...
>>
> I think this idea is very good and easy to implement.
> We only need to do a simple string match when we get the "trailer" string,
> If it can be completed, it can indeed bring great convenience to users.
>
>> 1. https://lore.kernel.org/git/87bld8ov9q.fsf@evledraar.gmail.com/
>
> Thanks, Ævar Arnfjörð Bjarmason!

And thanks for working on this.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 05/11] merge-ort: let renormalization change modify/delete into clean delete
  @ 2021-03-08 12:55 16%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-03-08 12:55 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Jonathan Nieder, Derrick Stolee, Junio C Hamano,
	Elijah Newren


On Fri, Mar 05 2021, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> When we have a modify/delete conflict, but the only change to the
> modification is e.g. change of line endings, then if renormalization is
> requested then we should be able to recognize such a case as a
> not-modified/delete and resolve the conflict automatically.
>
> This fixes t6418.10 under GIT_TEST_MERGE_ALGORITHM=ort.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 61 insertions(+), 2 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index 87c553c0882c..c4bd88b9d3db 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -2416,6 +2416,60 @@ static int string_list_df_name_compare(const char *one, const char *two)
>  	return onelen - twolen;
>  }
>  
> +static int read_oid_strbuf(struct merge_options *opt,
> +			   const struct object_id *oid,
> +			   struct strbuf *dst)
> +{
> +	void *buf;
> +	enum object_type type;
> +	unsigned long size;
> +	buf = read_object_file(oid, &type, &size);
> +	if (!buf)
> +		return err(opt, _("cannot read object %s"), oid_to_hex(oid));
> +	if (type != OBJ_BLOB) {
> +		free(buf);
> +		return err(opt, _("object %s is not a blob"), oid_to_hex(oid));

As an aside I've got another series I'll submit soon which refactors all
these "object is not xyz" calls to a utility function, so in this case
we'd also say what it was other than a blob.

Fine to keep this here, just a #leftoverbits note to myself to
eventually migrate this.

> +	}
> +	strbuf_attach(dst, buf, size, size + 1);
> +	return 0;
> +}
> +
> +static int blob_unchanged(struct merge_options *opt,
> +			  const struct version_info *base,
> +			  const struct version_info *side,
> +			  const char *path)
> +{
> +	struct strbuf basebuf = STRBUF_INIT;
> +	struct strbuf sidebuf = STRBUF_INIT;
> +	int ret = 0; /* assume changed for safety */
> +	const struct index_state *idx = &opt->priv->attr_index;
> +
> +	initialize_attr_index(opt);
> +
> +	if (base->mode != side->mode)
> +		return 0;
> +	if (oideq(&base->oid, &side->oid))
> +		return 1;
> +
> +	if (read_oid_strbuf(opt, &base->oid, &basebuf) ||
> +	    read_oid_strbuf(opt, &side->oid, &sidebuf))
> +		goto error_return;
> +	/*
> +	 * Note: binary | is used so that both renormalizations are
> +	 * performed.  Comparison can be skipped if both files are
> +	 * unchanged since their sha1s have already been compared.
> +	 */
> +	if (renormalize_buffer(idx, path, basebuf.buf, basebuf.len, &basebuf) |
> +	    renormalize_buffer(idx, path, sidebuf.buf, sidebuf.len, &sidebuf))
> +		ret = (basebuf.len == sidebuf.len &&
> +		       !memcmp(basebuf.buf, sidebuf.buf, basebuf.len));
> +
> +error_return:
> +	strbuf_release(&basebuf);
> +	strbuf_release(&sidebuf);
> +	return ret;
> +}
> +
>
>  struct directory_versions {
>  	/*
>  	 * versions: list of (basename -> version_info)
> @@ -3003,8 +3057,13 @@ static void process_entry(struct merge_options *opt,
>  		modify_branch = (side == 1) ? opt->branch1 : opt->branch2;
>  		delete_branch = (side == 1) ? opt->branch2 : opt->branch1;
>  
> -		if (ci->path_conflict &&
> -		    oideq(&ci->stages[0].oid, &ci->stages[side].oid)) {
> +		if (opt->renormalize &&
> +		    blob_unchanged(opt, &ci->stages[0], &ci->stages[side],
> +				   path)) {
> +			ci->merged.is_null = 1;
> +			ci->merged.clean = 1;
> +		} else if (ci->path_conflict &&
> +			   oideq(&ci->stages[0].oid, &ci->stages[side].oid)) {

Small note (no need for re-roll or whatever) on having read a bit of
merge-ort.c code recently: I'd find this thing a bit easier on the eyes
if ci->stages[0] and ci->stages[side] were split into a variable before
the if/else, i.e. used as "side_0.oid and side_n.oid" and "side_0 and
side_n" in this case..

That would also avoid the wrapping of at least one argument list here.

^ permalink raw reply	[relevance 16%]

* [PATCH 1/2] remote: add camel-cased *.tagOpt key, like clone
@ 2021-02-25  1:21 13% Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2021-02-25  1:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Bert Wesarg,
	Ævar Arnfjörð Bjarmason

Change "git remote add" so that it adds a *.tagOpt key, and not the
lower-cased *.tagopt on "git remote add --no-tags", just as "git clone
--no-tags" would do.

This doesn't matter for anything that reads the config. It's just
prettier if we write config keys in their documented camelCase form to
user-readable config files.

When I added support for "clone -no-tags" in 0dab2468ee5 (clone: add a
--no-tags option to clone without tags, 2017-04-26) I made it use
the *.tagOpt form, but the older "git remote add" added in
111fb858654 (remote add: add a --[no-]tags option, 2010-04-20) has
been using *.tagopt all this time.

It's easy enough to add a test for this, so let's do that. We can't
use "git config -l" there, because it'll normalize the keys to their
lower-cased form. Let's add the test for "git clone" too for good
measure, not just to the "git remote" codepath we're fixing.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

I also noticed that we write e.g. init.objectformat instead of
init.objectFormat, and core.logallrefupdates etc. If anyone's got an
even even worse case of OCD there's an interesting #leftoverbits
project there of scouring the code for more cases of this sort of
thing...

 builtin/remote.c         | 2 +-
 t/t5505-remote.sh        | 1 +
 t/t5612-clone-refspec.sh | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index d11a5589e49..f286ae97538 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -221,7 +221,7 @@ static int add(int argc, const char **argv)
 
 	if (fetch_tags != TAGS_DEFAULT) {
 		strbuf_reset(&buf);
-		strbuf_addf(&buf, "remote.%s.tagopt", name);
+		strbuf_addf(&buf, "remote.%s.tagOpt", name);
 		git_config_set(buf.buf,
 			       fetch_tags == TAGS_SET ? "--tags" : "--no-tags");
 	}
diff --git a/t/t5505-remote.sh b/t/t5505-remote.sh
index 045398b94e6..2a7b5cd00a0 100755
--- a/t/t5505-remote.sh
+++ b/t/t5505-remote.sh
@@ -594,6 +594,7 @@ test_expect_success 'add --no-tags' '
 		cd add-no-tags &&
 		git init &&
 		git remote add -f --no-tags origin ../one &&
+		grep tagOpt .git/config &&
 		git tag -l some-tag >../test/output &&
 		git tag -l foobar-tag >../test/output &&
 		git config remote.origin.tagopt >>../test/output
diff --git a/t/t5612-clone-refspec.sh b/t/t5612-clone-refspec.sh
index 6a6af7449ca..3126cfd7e9d 100755
--- a/t/t5612-clone-refspec.sh
+++ b/t/t5612-clone-refspec.sh
@@ -97,6 +97,7 @@ test_expect_success 'by default no tags will be kept updated' '
 test_expect_success 'clone with --no-tags' '
 	(
 		cd dir_all_no_tags &&
+		grep tagOpt .git/config &&
 		git fetch &&
 		git for-each-ref refs/tags >../actual
 	) &&
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply related	[relevance 13%]

* [PATCH v4 00/20] make "mktag" use fsck_tag() & more
  2020-12-09 20:01  6% ` [PATCH v3 " Ævar Arnfjörð Bjarmason
@ 2020-12-23  1:35  7%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2020-12-23  1:35 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, brian m . carlson, Eric Sunshine,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

So, when re-rolling this with Junio's small fixup this grew in scope a
bit, but should paradoxically be easier to deal with even though it's
2x the size now. Read on:

Ævar Arnfjörð Bjarmason (20):
  mktag doc: say <hash> not <sha1>
  mktag doc: grammar fix, when exists -> when it exists
  mktag doc: update to explain why to use this
  mktag tests: don't needlessly use a subshell
  mktag tests: remove needless SHA-1 hardcoding
  mktag tests: improve verify_object() test coverage
  mktag tests: don't pipe to stderr needlessly
  mktag tests: don't create "mytag" twice
  mktag tests: stress test whitespace handling
  mktag tests: test "hash-object" compatibility

I re-arranged this series so the doc/test patches for existing
behavior all come first now. There's some new patches there (see
range-diff), but all rather easy-to review fixes or tests for existing
behavior.

  mktag: use default strbuf_read() hint
  mktag: remove redundant braces in one-line body "if"
  mktag: use puts(str) instead of printf("%s\n", str)

Trivial coding style changes, the puts() patch is new.

  mktag: use fsck instead of custom verify_tag()

Still the real meat of the series, unchanged in any meaningful way,
except in (as seen in the range-diff) carrying forward doc/test
changes made earlier.

  fsck: make fsck_config() re-usable
  mktag: allow turning off fsck.extraHeaderEntry

ditto unchanged.

  mktag: allow omitting the header/body \n separator

I discovered a regression in mktag in git since 2008 where it refuses
to accept input without an empty newline separating the body & message
in cases where there's no message.

Now we again accept the same input as hash-object, and with the new
"hash-object" test integration earlier in the series we're confident
that mktag & hash-object do the same thing in all these cases.

  mktag: convert to parse-options
  mktag: mark strings for translation
  mktag: add a --no-strict option

The #leftoverbits I suggested in v3 of converting to parse-options &
doing i18n for mktag, and finally supporting --no-strict so you can
make it behave like "fsck" does in its default mode.

 Documentation/git-hash-object.txt |   4 +
 Documentation/git-mktag.txt       |  42 +++++-
 builtin/fsck.c                    |  20 +--
 builtin/mktag.c                   | 235 +++++++++++-------------------
 fsck.c                            |  59 +++++++-
 fsck.h                            |  16 ++
 parse-options.h                   |   1 +
 t/t1006-cat-file.sh               |   2 +-
 t/t3800-mktag.sh                  | 211 +++++++++++++++++++++------
 9 files changed, 361 insertions(+), 229 deletions(-)

Range-diff:
 1:  aee3f52a47 =  1:  a31c305cfc mktag doc: say <hash> not <sha1>
 -:  ---------- >  2:  81cb4cba5c mktag doc: grammar fix, when exists -> when it exists
 8:  fa04664f7f !  3:  b4bc6f894c mktag doc: update to explain why to use this
    @@ Commit message
         documentation wouldn't have much of an idea what the difference
         was.
     
    -    Let's make it clear that it's to do with slightly different fsck
    -    validation logic, and cross-link the "mktag" and "hash-object"
    -    documentation to aid discover-ability.
    +    Let's allude to our own validation logic, and cross-link the "mktag"
    +    and "hash-object" documentation to aid discover-ability. A follow-up
    +    change to migrate "mktag" to use "fsck" validation will make the part
    +    about validation logic clearer.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ Documentation/git-mktag.txt: SYNOPSIS
     +    git hash-object -t tag -w --stdin <my-tag
     +
     +The difference is that mktag will die before writing the tag if the
    -+tag doesn't pass a linkgit:git-fsck[1] check.
    -+
    -+The "fsck" check done mktag is is stricter than what
    -+linkgit:git-fsck[1] would run by default in that all `fsck.<msg-id>`
    -+messages are promoted from warnings to errors (so e.g. a missing
    -+"tagger" line is an error). Extra headers in the object are also an
    -+error under mktag, but ignored by linkgit:git-fsck[1].
    ++tag doesn't pass a sanity check.
      
      Tag Format
      ----------
 4:  1f06b9c0cf =  4:  acb94e0289 mktag tests: don't needlessly use a subshell
 5:  5d1cb73ca3 =  5:  4ae76ec5e3 mktag tests: remove needless SHA-1 hardcoding
 6:  cf86f4ca37 =  6:  9effb4532b mktag tests: improve verify_object() test coverage
 -:  ---------- >  7:  b81d31a917 mktag tests: don't pipe to stderr needlessly
 -:  ---------- >  8:  11f59718b4 mktag tests: don't create "mytag" twice
 -:  ---------- >  9:  dd6b012b0c mktag tests: stress test whitespace handling
 -:  ---------- > 10:  56c6b562fd mktag tests: test "hash-object" compatibility
 2:  6e98557709 = 11:  1e2e4ec269 mktag: use default strbuf_read() hint
 3:  8e5fe08f15 = 12:  be2ab3edab mktag: remove redundant braces in one-line body "if"
 -:  ---------- > 13:  d8514df970 mktag: use puts(str) instead of printf("%s\n", str)
 7:  5812ee53c9 ! 14:  346d73cc97 mktag: use fsck instead of custom verify_tag()
    @@ Commit message
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + ## Documentation/git-mktag.txt ##
    +@@ Documentation/git-mktag.txt: write a tag found in `my-tag`:
    +     git hash-object -t tag -w --stdin <my-tag
    + 
    + The difference is that mktag will die before writing the tag if the
    +-tag doesn't pass a sanity check.
    ++tag doesn't pass a linkgit:git-fsck[1] check.
    ++
    ++The "fsck" check done mktag is stricter than what linkgit:git-fsck[1]
    ++would run by default in that all `fsck.<msg-id>` messages are promoted
    ++from warnings to errors (so e.g. a missing "tagger" line is an error).
    ++
    ++Extra headers in the object are also an error under mktag, but ignored
    ++by linkgit:git-fsck[1]
    + 
    + Tag Format
    + ----------
    +
      ## builtin/mktag.c ##
     @@
      #include "tag.h"
    @@ builtin/mktag.c
     +
     +	buffer = read_object_file(tagged_oid, &type, &size);
     +	if (!buffer)
    -+		die("could not read tagged object '%s'\n",
    ++		die("could not read tagged object '%s'",
     +		    oid_to_hex(tagged_oid));
     +	if (type != *tagged_type)
    -+		die("object '%s' tagged as '%s', but is a '%s' type\n",
    ++		die("object '%s' tagged as '%s', but is a '%s' type",
     +		    oid_to_hex(tagged_oid),
     +		    type_name(*tagged_type), type_name(type));
     +
    @@ t/t3800-mktag.sh: tagger  <> 0 +0000
      
     -check_verify_failure 'disallow missing tag author name' \
     -	'^error: char.*: missing tagger name$'
    -+test_expect_success 'allow missing tag author name' '
    -+	git mktag <tag.sig
    -+'
    ++test_expect_mktag_success 'allow missing tag author name'
      
      ############################################################
      # 14. disallow missing tag author name
    @@ t/t3800-mktag.sh: tagger T A Gger <
      
      ############################################################
      # 15. allow empty tag email
    -@@ t/t3800-mktag.sh: test_expect_success \
    -     'git mktag <tag.sig >.git/refs/tags/mytag 2>message'
    +@@ t/t3800-mktag.sh: EOF
    + test_expect_mktag_success 'allow empty tag email'
      
      ############################################################
     -# 16. disallow spaces in tag email
    @@ t/t3800-mktag.sh: tagger T A Gger <tag ger@example.com> 0 +0000
      
     -check_verify_failure 'disallow spaces in tag email' \
     -	'^error: char.*: malformed tagger field$'
    -+test_expect_success 'allow spaces in tag email like fsck' '
    -+	git mktag <tag.sig
    -+'
    ++test_expect_mktag_success 'allow spaces in tag email like fsck'
      
      ############################################################
      # 17. disallow missing tag timestamp
    @@ t/t3800-mktag.sh: tagger T A Gger <tagger@example.com> 1206478233 -1430
      
     -check_verify_failure 'detect invalid tag timezone3' \
     -	'^error: char.*: malformed tag timezone$'
    -+test_expect_success 'allow invalid tag timezone' '
    -+	git mktag <tag.sig
    -+'
    ++test_expect_mktag_success 'allow invalid tag timezone'
      
      ############################################################
      # 23. detect invalid header entry
    @@ t/t3800-mktag.sh: this line should not be here
      check_verify_failure 'detect invalid header entry' \
     -	'^error: char.*: trailing garbage in tag header$'
     +	'^error:.* extraHeaderEntry:'
    -+
    -+cat >tag.sig <<EOF
    -+object $head
    -+type commit
    -+tag mytag
    -+tagger T A Gger <tagger@example.com> 1206478233 -0500
    -+
    -+
    -+this line comes after an extra newline
    -+EOF
    -+
    -+test_expect_success \
    -+    'allow extra newlines at start of body' \
    -+    'git mktag <tag.sig >.git/refs/tags/mytag 2>message'
    + 
    + cat >tag.sig <<EOF
    + object $head
    +@@ t/t3800-mktag.sh: tagger T A Gger <tagger@example.com> 1206478233 -0500$space
    + EOF
    + 
    + check_verify_failure 'extra whitespace at end of headers' \
    +-	'^error: char.*: malformed tag timezone$'
    ++	'^error:.* badTimezone:'
    + 
    + cat >tag.sig <<EOF
    + object $head
    +@@ t/t3800-mktag.sh: tagger T A Gger <tagger@example.com> 1206478233 -0500
    + EOF
    + 
    + check_verify_failure 'disallow no header / body newline separator' \
    +-	'^error: char.*: trailing garbage in tag header$'
    ++	'^error:.* extraHeaderEntry:'
      
      ############################################################
      # 24. create valid tag
 9:  30eff9170f = 15:  0e7994d8fc fsck: make fsck_config() re-usable
10:  11139ec2b8 ! 16:  5e8046022b mktag: allow turning off fsck.extraHeaderEntry
    @@ Commit message
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Documentation/git-mktag.txt ##
    -@@ Documentation/git-mktag.txt: tag doesn't pass a linkgit:git-fsck[1] check.
    - The "fsck" check done mktag is is stricter than what
    - linkgit:git-fsck[1] would run by default in that all `fsck.<msg-id>`
    - messages are promoted from warnings to errors (so e.g. a missing
    --"tagger" line is an error). Extra headers in the object are also an
    --error under mktag, but ignored by linkgit:git-fsck[1].
    -+"tagger" line is an error).
    -+
    -+Extra headers in the object are also an error under mktag, but ignored
    +@@ Documentation/git-mktag.txt: would run by default in that all `fsck.<msg-id>` messages are promoted
    + from warnings to errors (so e.g. a missing "tagger" line is an error).
    + 
    + Extra headers in the object are also an error under mktag, but ignored
    +-by linkgit:git-fsck[1]
     +by linkgit:git-fsck[1]. This extra check can be turned off by setting
     +the appropriate `fsck.<msg-id>` varible:
     +
 -:  ---------- > 17:  32698e1d00 mktag: allow omitting the header/body \n separator
 -:  ---------- > 18:  b6a22f2f99 mktag: convert to parse-options
 -:  ---------- > 19:  7fc0b81df7 mktag: mark strings for translation
 -:  ---------- > 20:  6fa443d528 mktag: add a --no-strict option
-- 
2.29.2.222.g5d2a92d10f8


^ permalink raw reply	[relevance 7%]

* [PATCH v3 00/10] make "mktag" use fsck_tag()
  @ 2020-12-09 20:01  6% ` Ævar Arnfjörð Bjarmason
  2020-12-23  1:35  7%   ` [PATCH v4 00/20] make "mktag" use fsck_tag() & more Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2020-12-09 20:01 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, brian m . carlson, Eric Sunshine,
	Johannes Schindelin, Ævar Arnfjörð Bjarmason

This version should address all the comments Junio made on v2. Changes:

 * The whole "extra" fsck option is gone, I just didn't realize I
   could set the new check to "ignore", and then manually promote it.

 * Ejected "mktag: reword write_object_file() error". It was the same
   phrasing as "git tag" uses, let's just keep it.

 * Clarifications in docs/commit messages

 * There's 2 extra patches at the end now which take the first steps
   into making "git mktag" more of a normal builtin. It reads fsck.*
   config variables, so you can turn off that "no extra headers" check
   through the normal fsck.<msg-id>=ignore config.

   It should also be moved to getopts, and we could make it support
   --no-strict to have the same idea of error/warning as fsck itself,
   but that's #leftoverbits, along with moving it to i18n.

   It would be nice to have patches 1-8 merged down if they're deemed
   ready, and if 9-10 aren't deemed wanted just discard them. I think
   it makes sense though...

Ævar Arnfjörð Bjarmason (10):
  mktag doc: say <hash> not <sha1>
  mktag: use default strbuf_read() hint
  mktag: remove redundant braces in one-line body "if"
  mktag tests: don't needlessly use a subshell
  mktag tests: remove needless SHA-1 hardcoding
  mktag tests: improve verify_object() test coverage
  mktag: use fsck instead of custom verify_tag()
  mktag doc: update to explain why to use this
  fsck: make fsck_config() re-usable
  mktag: allow turning off fsck.extraHeaderEntry

 Documentation/git-hash-object.txt |   4 +
 Documentation/git-mktag.txt       |  34 ++++-
 builtin/fsck.c                    |  20 +--
 builtin/mktag.c                   | 204 +++++++++---------------------
 fsck.c                            |  57 ++++++++-
 fsck.h                            |  16 +++
 t/t1006-cat-file.sh               |   2 +-
 t/t3800-mktag.sh                  | 132 ++++++++++++++-----
 8 files changed, 261 insertions(+), 208 deletions(-)

Range-diff:
 1:  f46abb37df9 =  1:  aee3f52a478 mktag doc: say <hash> not <sha1>
 2:  1b4d9a53302 =  2:  6e98557709a mktag: use default strbuf_read() hint
 3:  83f4af6013e <  -:  ----------- mktag: reword write_object_file() error
 4:  bca1484ed96 =  3:  8e5fe08f155 mktag: remove redundant braces in one-line body "if"
 5:  ac7c4097c90 =  4:  1f06b9c0cf9 mktag tests: don't needlessly use a subshell
 6:  5e076659e45 !  5:  5d1cb73ca35 mktag tests: remove needless SHA-1 hardcoding
    @@ t/t3800-mktag.sh: EOF
      
      ############################################################
     -#  3. object line SHA1 check
    -+#  3. object line SHA check
    ++#  3. object line hash check
      
      cat >tag.sig <<EOF
     -object zz9e9b33986b1c2670fff52c5067603117b3e895
 7:  a048c3e6401 !  6:  cf86f4ca37d mktag tests: improve verify_object() test coverage
    @@ t/t3800-mktag.sh: check_verify_failure '"type" line type-name length check' \
      
      ############################################################
     -#  9. verify object (SHA1/type) check
    -+#  9. verify object (SHA/type) check
    ++#  9. verify object (hash/type) check
      
      cat >tag.sig <<EOF
      object $(test_oid deadbeef)
    @@ t/t3800-mktag.sh: check_verify_failure '"type" line type-name length check' \
     +
     +EOF
     +
    -+check_verify_failure 'verify object (SHA/type) check -- correct type, nonexisting object' \
    ++check_verify_failure 'verify object (hash/type) check -- correct type, nonexisting object' \
     +	'^error: char7: could not verify object.*$'
     +
     +cat >tag.sig <<EOF
    @@ t/t3800-mktag.sh: check_verify_failure '"type" line type-name length check' \
      EOF
      
     -check_verify_failure 'verify object (SHA1/type) check' \
    -+check_verify_failure 'verify object (SHA/type) check -- made-up type, nonexisting object' \
    ++check_verify_failure 'verify object (hash/type) check -- made-up type, nonexisting object' \
     +	'^fatal: invalid object type'
     +
     +cat >tag.sig <<EOF
    @@ t/t3800-mktag.sh: check_verify_failure '"type" line type-name length check' \
     +
     +EOF
     +
    -+check_verify_failure 'verify object (SHA/type) check -- incorrect type, valid object' \
    ++check_verify_failure 'verify object (hash/type) check -- incorrect type, valid object' \
      	'^error: char7: could not verify object.*$'
      
     +cat >tag.sig <<EOF
    @@ t/t3800-mktag.sh: check_verify_failure '"type" line type-name length check' \
     +
     +EOF
     +
    -+check_verify_failure 'verify object (SHA/type) check -- incorrect type, valid object' \
    ++check_verify_failure 'verify object (hash/type) check -- incorrect type, valid object' \
     +	'^error: char7: could not verify object'
     +
      ############################################################
 8:  dab44d32359 <  -:  ----------- fsck: add new "extra" checks for "mktag"
 9:  8ff853caeea !  7:  5812ee53c97 mktag: use fsck instead of custom verify_tag()
    @@ Commit message
         back to the same commit[1]. Let's unify them so we're not maintaining
         two sets functions to verify that a tag is OK.
     
    -    Moving to fsck_tag() required teaching it to optionally use some
    -    validations that only the old mktag code could perform. That was done
    -    in an earlier commit, the "extraHeaderEntry" and
    -    "extraHeaderBodyNewline" tests being added here make use of that
    -    logic.
    +    The behavior of fsck_tag() and the old "mktag" code being removed here
    +    is different in few aspects.
     
    -    There was other "mktag" validation logic that I think makes sense to
    -    just remove. Namely:
    +    I think it makes sense to remove some of those checks, namely:
     
          A. fsck only cares that the timezone matches [-+][0-9]{4}. The mktag
             code disallowed values larger than 1400.
    @@ Commit message
          C. Like B, but "mktag" disallowed spaces in the <email> part, fsck
             allows it.
     
    -    We didn't only lose obscure validation logic, we also gained some:
    +    In some ways fsck_tag() is stricter than "mktag" was, namely:
     
          D. fsck disallows zero-padded dates, but mktag didn't care. So
             e.g. the timestamp "0000000000 +0000" produces an error now. A
             test in "t1006-cat-file.sh" relied on this, it's been changed to
             use "hash-object" (without fsck) instead.
     
    +    There was one check I deemed worth keeping by porting it over to
    +    fsck_tag():
    +
    +     E. "mktag" did not allow any custom headers, and by extension (as an
    +        empty commit is allowed) also forbade an extra stray trailing
    +        newline after the headers it knew about.
    +
    +        Add a new check in the "ignore" category to fsck and use it. This
    +        somewhat abuses the facility added in efaba7cc77f (fsck:
    +        optionally ignore specific fsck issues completely, 2015-06-22).
    +
    +        This is somewhat of hack, but probably the least invasive change
    +        we can make here. The fsck command will shuffle these categories
    +        around, e.g. under --strict the "info" becomes a "warn" and "warn"
    +        becomes "error". Existing users of fsck's (and others,
    +        e.g. index-pack) --strict option rely on this.
    +
    +        So we need to put something into a category that'll be ignored by
    +        all existing users of the API. Pretending that
    +        fsck.extraHeaderEntry=error ("ignore" by default) was set serves
    +        to do this for us.
    +
         1. ec4465adb38 (Add "tag" objects that can be used to sign other
            objects., 2005-04-25)
     
    @@ builtin/mktag.c
     +	switch (msg_type) {
     +	case FSCK_WARN:
     +	case FSCK_ERROR:
    -+	case FSCK_EXTRA:
     +		/*
     +		 * We treat both warnings and errors as errors, things
     +		 * like missing "tagger" lines are "only" warnings
    @@ builtin/mktag.c: int cmd_mktag(int argc, const char **argv, const char *prefix)
     -	   "object <sha1>\ntype\ntagger " */
     -	if (verify_tag(buf.buf, buf.len) < 0)
     -		die("invalid tag signature file");
    -+	fsck_options.extra = 1;
     +	fsck_options.error_func = mktag_fsck_error_func;
    ++	fsck_set_msg_type(&fsck_options, "extraheaderentry", "warn");
     +	if (fsck_tag_standalone(NULL, buf.buf, buf.len, &fsck_options,
     +				&tagged_oid, &tagged_type))
     +		die("tag on stdin did not pass our strict fsck check");
    @@ builtin/mktag.c: int cmd_mktag(int argc, const char **argv, const char *prefix)
     +		die("tag on stdin did not refer to a valid object");
      
      	if (write_object_file(buf.buf, buf.len, tag_type, &result) < 0)
    - 		die("unable to write annotated tag object");
    + 		die("unable to write tag file");
     
      ## fsck.c ##
    +@@ fsck.c: static struct oidset gitmodules_done = OIDSET_INIT;
    + 	/* infos (reported as warnings, but ignored by default) */ \
    + 	FUNC(GITMODULES_PARSE, INFO) \
    + 	FUNC(BAD_TAG_NAME, INFO) \
    +-	FUNC(MISSING_TAGGER_ENTRY, INFO)
    ++	FUNC(MISSING_TAGGER_ENTRY, INFO) \
    ++	/* ignored (elevated when requested) */ \
    ++	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
    + 
    + #define MSG_ID(id, msg_type) FSCK_MSG_##id,
    + enum fsck_msg_id {
     @@ fsck.c: static int fsck_tag(const struct object_id *oid, const char *buffer,
      		    unsigned long size, struct fsck_options *options)
      {
    @@ fsck.c: static int fsck_tag(const struct object_id *oid, const char *buffer,
      		ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_TYPE, "invalid 'type' value");
      	if (ret)
      		goto done;
    +@@ fsck.c: static int fsck_tag(const struct object_id *oid, const char *buffer,
    + 	else
    + 		ret = fsck_ident(&buffer, oid, OBJ_TAG, options);
    + 
    ++	if (!starts_with(buffer, "\n")) {
    ++		/*
    ++		 * The verify_headers() check will allow
    ++		 * e.g. "[...]tagger <tagger>\nsome
    ++		 * garbage\n\nmessage" to pass, thinking "some
    ++		 * garbage" could be a custom header. E.g. "mktag"
    ++		 * doesn't want any unknown headers.
    ++		 */
    ++		ret = report(options, oid, OBJ_TAG, FSCK_MSG_EXTRA_HEADER_ENTRY, "invalid format - extra header(s) after 'tagger'");
    ++		if (ret)
    ++			goto done;
    ++	}
    ++
    + done:
    + 	strbuf_release(&sb);
    + 	return ret;
     
      ## fsck.h ##
     @@ fsck.h: int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
    @@ t/t3800-mktag.sh: tagger . <> 0 +0000
     +check_verify_failure '"object" line label check' '^error:.* missingObject:'
      
      ############################################################
    - #  3. object line SHA check
    + #  3. object line hash check
     @@ t/t3800-mktag.sh: tagger . <> 0 +0000
      
      EOF
    @@ t/t3800-mktag.sh: tag mytag
     +	'^error:.* badType:'
      
      ############################################################
    - #  9. verify object (SHA/type) check
    + #  9. verify object (hash/type) check
     @@ t/t3800-mktag.sh: tagger . <> 0 +0000
      EOF
      
    - check_verify_failure 'verify object (SHA/type) check -- correct type, nonexisting object' \
    + check_verify_failure 'verify object (hash/type) check -- correct type, nonexisting object' \
     -	'^error: char7: could not verify object.*$'
     +	'^fatal: could not read tagged object'
      
    @@ t/t3800-mktag.sh: tagger . <> 0 +0000
     @@ t/t3800-mktag.sh: tagger . <> 0 +0000
      EOF
      
    - check_verify_failure 'verify object (SHA/type) check -- made-up type, nonexisting object' \
    + check_verify_failure 'verify object (hash/type) check -- made-up type, nonexisting object' \
     -	'^fatal: invalid object type'
     +	'^error:.* badType:'
      
    @@ t/t3800-mktag.sh: tagger . <> 0 +0000
     @@ t/t3800-mktag.sh: tagger . <> 0 +0000
      EOF
      
    - check_verify_failure 'verify object (SHA/type) check -- incorrect type, valid object' \
    + check_verify_failure 'verify object (hash/type) check -- incorrect type, valid object' \
     -	'^error: char7: could not verify object.*$'
     +	'^error:.* badType:'
      
    @@ t/t3800-mktag.sh: tagger . <> 0 +0000
     @@ t/t3800-mktag.sh: tagger . <> 0 +0000
      EOF
      
    - check_verify_failure 'verify object (SHA/type) check -- incorrect type, valid object' \
    + check_verify_failure 'verify object (hash/type) check -- incorrect type, valid object' \
     -	'^error: char7: could not verify object'
     +	'^fatal: object.*tagged as.*tree.*but is.*commit'
      
    @@ t/t3800-mktag.sh: this line should not be here
     +tagger T A Gger <tagger@example.com> 1206478233 -0500
     +
     +
    -+this line should be one line up
    ++this line comes after an extra newline
     +EOF
     +
    -+check_verify_failure 'detect invalid header entry' \
    -+	'^error:.* extraHeaderBodyNewline:'
    ++test_expect_success \
    ++    'allow extra newlines at start of body' \
    ++    'git mktag <tag.sig >.git/refs/tags/mytag 2>message'
      
      ############################################################
      # 24. create valid tag
10:  e38feefd3f8 !  8:  fa04664f7f1 mktag doc: update to explain why to use this
    @@ Documentation/git-mktag.txt: SYNOPSIS
     +Reads a tag contents on standard input and creates a tag object. The
     +output is the new tag's <object> identifier.
     +
    -+This command accepts a subset of what linkgit:git-hash-object[1] would
    -+accept with `-t tag --stdin`. I.e. both of these work:
    ++This command is mostly equivalent to linkgit:git-hash-object[1]
    ++invoked with `-t tag -w --stdin`. I.e. both of these will create and
    ++write a tag found in `my-tag`:
     +
     +    git mktag <my-tag
    -+    git hash-object -t tag --stdin <my-tag
    ++    git hash-object -t tag -w --stdin <my-tag
     +
    -+The difference between the two is that mktag does the equivalent of a
    -+linkgit:git-fsck(1) check on its input, and furthermore disallows some
    -+thing linkgit:git-hash-object[1] would pass, e.g. extra headers in the
    -+object before the message.
    ++The difference is that mktag will die before writing the tag if the
    ++tag doesn't pass a linkgit:git-fsck[1] check.
    ++
    ++The "fsck" check done mktag is is stricter than what
    ++linkgit:git-fsck[1] would run by default in that all `fsck.<msg-id>`
    ++messages are promoted from warnings to errors (so e.g. a missing
    ++"tagger" line is an error). Extra headers in the object are also an
    ++error under mktag, but ignored by linkgit:git-fsck[1].
      
      Tag Format
      ----------
    @@ Documentation/git-mktag.txt: exists, is separated by a blank line from the heade
      message part may contain a signature that Git itself doesn't
      care about, but that can be verified with gpg.
      
    -+HISTORY
    -+-------
    -+
    -+In versions of Git before v2.30.0 the "mktag" command's validation
    -+logic was subtly different than that of linkgit:git-fsck[1]. It is now
    -+a strict superset of linkgit:git-fsck[1]'s validation logic.
    -+
     +SEE ALSO
     +--------
     +linkgit:git-hash-object[1],
 -:  ----------- >  9:  30eff9170fb fsck: make fsck_config() re-usable
 -:  ----------- > 10:  11139ec2b8d mktag: allow turning off fsck.extraHeaderEntry
-- 
2.29.2.222.g5d2a92d10f8


^ permalink raw reply	[relevance 6%]

* Re: git-log: documenting pathspec usage
  @ 2020-11-16 12:37 16% ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2020-11-16 12:37 UTC (permalink / raw)
  To: Adam Spiers; +Cc: git mailing list


On Mon, Nov 16 2020, Adam Spiers wrote:

> Hi all,
>
> I just noticed that git-log.txt has: 
>
>     SYNOPSIS
>     --------
>     [verse]
>     'git log' [<options>] [<revision range>] [[--] <path>...]
>
> and builtin/log.c has: 
>
>     static const char * const builtin_log_usage[] = {
>             N_("git log [<options>] [<revision-range>] [[--] <path>...]"),
>
> IIUC, the references to <path> should actually be <pathspec> instead,
> as seen with other pathspec-supporting commands such as git add/rm
> whose man pages are extra helpful in explicitly calling out how
> pathspecs can be used, e.g.:
>
>     OPTIONS
>     -------
>     <pathspec>...::
>             Files to add content from.  Fileglobs (e.g. `*.c`) can
>             be given to add all matching files.  Also a
>             leading directory name (e.g. `dir` to add `dir/file1`
>             and `dir/file2`) can be given to update the index to
>             match the current state of the directory as a whole (e.g.
>             specifying `dir` will record not just a file `dir/file1`
>             modified in the working tree, a file `dir/file2` added to
>             the working tree, but also a file `dir/file3` removed from
>             the working tree). Note that older versions of Git used
>             to ignore removed files; use `--no-all` option if you want
>             to add modified or new files but ignore removed ones.
>     +
>     For more details about the <pathspec> syntax, see the 'pathspec' entry
>     in linkgit:gitglossary[7].
>
> Would it be fair to say the git-log usage syntax and man page should
> be updated to match?  If so perhaps I can volunteer for that.

It seems like a good idea to make these consistent, if you're feeling
more ambitious than just git-log's manpage then:
    
    $ git grep '<pathspec>' -- Documentation/git-*.txt|wc -l
    54
    $ git grep '<path>' -- Documentation/git-*.txt|wc -l
    161

Most/all of these should probably be changed to one or the other.

I've also long wanted (but haven't come up with a patch for) that part
of gitglossary to be ripped out into its own manual page,
e.g. "gitpathspec(5)". And if possible for "PATTERN FORMAT" in
"gitignore" to be unified with that/other docs that describe how our
wildmatch.c works.

There's also the "Conditional includes" section in git-config(1) that
repeats some of that, and probably other stuff I'm forgetting
#leftoverbits.

^ permalink raw reply	[relevance 16%]

* Re: [PATCH v3 00/10] grep: move from kwset to optional PCRE v2
  @ 2019-07-02 11:10 14%     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2019-07-02 11:10 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, git-packagers, gitgitgadget, johannes.schindelin, peff,
	sandals, szeder.dev

On Mon, Jul 01 2019, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> This v3 has a new patch (3/10) that I believe fixes the regression on
>> MinGW Johannes noted in
>> https://public-inbox.org/git/nycvar.QRO.7.76.6.1907011515150.44@tvgsbejvaqbjf.bet/
>>
>> As noted in the updated commit message in 10/10 I believe just
>> skipping this test & documenting this in a commit message is the least
>> amount of suck for now. It's really an existing issue with us doing
>> nothing sensible when the log/grep haystack encoding doesn't match the
>> needle encoding supplied via the command line.
>
> Is that quite the case?  If they do not match, not finding the match
> is the right answer, because we are byte-for-byte matching/searching
> IIUC.
>
>> We swept that under the carpet with the kwset backend, but PCRE v2
>> exposes it.
>
> Is it exposing, or just showing the limitation of the rewritten
> implementation where it cannot do byte-for-byte matching/searching
> as we used to be able to?
>
> Without having a way to know what encoding is used on the command
> line, there is no sensible way to reencode them to match the
> haystack encoding (even when it is known), so "you got to feed the
> strings in the same encoding, as we are going to match/search
> byte-for-byte" is the only sensible way to work, given the design
> space, I would think.
>
> Not that it is all that useful to be able to match/search
> byte-for-byte, of course, so I am OK if we punt with these tests,
> but I'd prefer to see us admit we are punting when we do ;-).

I'm guilty as charged in punting this larger encoding issue. As it
pertains to this patch series it unearths an obscure case I think nobody
cares about in practice, and I'd like to move on with the "remove kwset"
optimization.

But I strongly believe that the new behavior with the PCRE v2
optimization is the only sane thing to do, and to the extent we have
anything left to do (#leftoverbits) it's that we should modify git more
generally (aside from string searching) to do the same thing where
appropriate.

Remember, this only happens if the user has set a UTF-8 locale and thus
promised that they're going to give us UTF-8. We then take that promise
and make e.g. "æ" match "Æ" under --ignore-case.

Just falling back on raw byte matching isn't going to cut it, because
then "æ<invalid utf8>" won't match "Æ<same invalid utf8>" under
--ignore-case, and there's other cases like that with matching word
boundaries & other Unicode gotchas.

The best that can be hoped for at that point is some "loose UTF-8"
mode. I see both perl & GNU grep seem to support that (although I'm sure
it falls apart at some point). GNU grep will also die in the same way
that we now die with --perl-regexp (since it also use PCRE).

I think that's saner, if the user thinks they're feeding us UTF-8 but
they're not I think they'd like to know rather than having the string
matching library fall back.

^ permalink raw reply	[relevance 14%]

* Re: [RFC/PATCH] refs: tone down the dwimmery in refname_match() for {heads,tags,remotes}/*
  @ 2019-05-27 14:29 13%     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2019-05-27 14:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: git, Linus Torvalds, Junio C Hamano, Linux List Kernel Mailing,
	Radim Krčmář, KVM list, Michael Haggerty

On Mon, May 27 2019, Paolo Bonzini wrote:

> On 27/05/19 00:54, Ævar Arnfjörð Bjarmason wrote:
>> This resulted in a case[1] where someone on LKML did:
>>
>>     git push kvm +HEAD:tags/for-linus
>>
>> Which would have created a new "tags/for-linus" branch in their "kvm"
>> repository, except because they happened to have an existing
>> "refs/tags/for-linus" reference we pushed there instead, and replaced
>> an annotated tag with a lightweight tag.
>
> Actually, I would not be surprised even if "git push foo
> someref:tags/foo" _always_ created a lightweight tag (i.e. push to
> refs/tags/foo).

That's not the intention (I think), and not what we document.

It mostly (and I believe always should) works by looking at whether
"someref" is a named ref, and e.g. looking at whether it's "master". We
then see that it lives in "refs/heads/master" locally, and thus
correspondingly add a "refs/heads/" to your <dst> "tags/foo", making it
"refs/heads/tags/foo".

*Or* we take e.g. <some random SHA-1>:master, the <some random...> is
ambiguous, but we see that "master" unambiguously refers to
"refs/heads/master" on the remote (so e.g. a refs/tags/master doesn't
exist). If you had both refs/{heads,tags}/master refs on the remote we'd
emit:

    error: dst refspec master matches more than one

(We should improve that error to note what conflicted, #leftoverbits)

So your HEAD:tags/for-linus resulted in pushing a HEAD that referred to
some refs/heads/* to refs/tags/for-linus. I believe that's an unintendedem
ergent effect in how we try to apply these two rules. We should apply
one, not both in combination.

And as an aside none of these rules have to do with whether the <src> is
a lightweight or annotated tag, and both types live in the refs/tags/*
namespace.

> In my opinion, the bug is that "git request-pull" should warn if the tag
> is lightweight remotely but not locally, and possibly even vice versa.
> Here is a simple testcase:
>
>   # setup "local" repo
>   mkdir -p testdir/a
>   cd testdir/a
>   git init
>   echo a > test
>   git add test
>   git commit -minitial
>
>   # setup "remote" repo
>   git clone --bare . ../b
>
>   # setup "local" tag
>   echo b >> test
>   git commit -msecond test
>   git tag -mtag tag1
>
>   # create remote lightweight tag and prepare a pull request
>   git push ../b HEAD:refs/tags/tag1
>   git request-pull HEAD^ ../b tags/tag1

Yeah, maybe. I don't use git-request-pull. So maybe this is a simple
mitigation for that tool since you supply a <remote> to it already.

I was more interested and surprised by HEAD being implicitly resolved to
refs/tags/* in a way that would be *different* than if you didn't have
an existing tag there, but of course if we errored on that you might
have just done "+HEAD:refs/tags/for-linus" and ended up with the same
thing.

As an aside, in *general* tags, unlike branches, don't have "remote
tracking". That's something we'd eventually want, but we're nowhere near
the refstore and porcelain supporting that.

Thus such a check is hard to support in general, we'd always need a
remote name and a network roundtrip. Otherwise we couldn't do anything
sensible if you have 10 remotes of fellow LKML developers, all of whom
have a "for-linus" tag, which I'm assuming is a common use-case.

But since git-request-pull gets the remote it can (and does) check on
that remote, but seems to satisfied to see that the ref exists somewhere
on that remote.

^ permalink raw reply	[relevance 13%]

* Re: [GSoC] Some #leftoverbits for anyone looking for little projects
  @ 2019-05-20 23:49 19%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2019-05-20 23:49 UTC (permalink / raw)
  To: Matheus Tavares
  Cc: git, Christian Couder,
	Оля Тележная,
	Johannes Schindelin


On Mon, May 20 2019, Matheus Tavares wrote:

> Hi, Ævar
>
>> Give "rebase -i" some option so when you "reword" the patch is
>> included in the message.
>>
>> I keep going to the shell because I have no idea what change I'm
>> describing.
>
> I have the same problem, so I wanted to try solving this. The patch
> bellow creates a "rebase.verboseCommit" configuration that includes
> a diff when rewording or squashing. I'd appreciate knowing your thoughts
> on it.
>
> As Christian wisely pointed out to me, though, we can also achieve this
> behavior by setting "commit.verbose" to true. The only "downside" of it
> is that users cannot choose to see the diff only when rebasing. Despite
> of that, if we decide not to go with this patch, what do you think of
> adding a "commit.verbose" entry at git-rebase's man page?

Thanks for working on this. I'd somehow missed the addition of the
commit.verbose option, so the problem I had is 100% solved by it (and
I've turned it on).

I think it's better to just document it with rebase, perhaps rather than
mention that option specifically (but that would also be fine) promise
that we support "commit" options in general.

Do we promise anywhere that interactive rebase is going to run the
"normal" git-commit command. From a quick skimming of the docs it
doesn't seem so, perhaps we should explicitly promise that, and then
test for it if we don't (e.g. by stealing the tests you added).

Aside from that, if this patch is kept I see commit.verbose is a
bool-or-int option, but yours is maybe-bool, so there's no way with
rebase.verboseCommit to turn on the higher level of verbosity. Perhaps
if this option is kept some implementation that just grabs whatever "X"
rebase.verboseCommit=X is set to and passes it as commit.verbase=X down
to git-commit is better, letting it deal with the validation?

> diff --git a/Documentation/config/rebase.txt b/Documentation/config/rebase.txt
> index d98e32d812..ae50b3e05d 100644
> --- a/Documentation/config/rebase.txt
> +++ b/Documentation/config/rebase.txt
> @@ -62,3 +62,8 @@ rebase.rescheduleFailedExec::
>  	Automatically reschedule `exec` commands that failed. This only makes
>  	sense in interactive mode (or when an `--exec` option was provided).
>  	This is the same as specifying the `--reschedule-failed-exec` option.
> +
> +rebase.verboseCommit::
> +	When rewording or squashing commits, during an interactive rebase, show
> +	the commits' diff to help describe the modifications they bring. False
> +	by default.
> diff --git a/sequencer.c b/sequencer.c
> index f88a97fb10..1596fc4cd0 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -914,6 +914,7 @@ N_("you have staged changes in your working tree\n"
>  #define CLEANUP_MSG (1<<3)
>  #define VERIFY_MSG  (1<<4)
>  #define CREATE_ROOT_COMMIT (1<<5)
> +#define VERBOSE_COMMIT (1<<6)
>
>  static int run_command_silent_on_success(struct child_process *cmd)
>  {
> @@ -1007,6 +1008,8 @@ static int run_git_commit(struct repository *r,
>  		argv_array_push(&cmd.args, "-n");
>  	if ((flags & AMEND_MSG))
>  		argv_array_push(&cmd.args, "--amend");
> +	if ((flags & VERBOSE_COMMIT))
> +		argv_array_push(&cmd.args, "-v");
>  	if (opts->gpg_sign)
>  		argv_array_pushf(&cmd.args, "-S%s", opts->gpg_sign);
>  	if (defmsg)
> @@ -1782,7 +1785,7 @@ static int do_pick_commit(struct repository *r,
>  	char *author = NULL;
>  	struct commit_message msg = { NULL, NULL, NULL, NULL };
>  	struct strbuf msgbuf = STRBUF_INIT;
> -	int res, unborn = 0, allow;
> +	int res, unborn = 0, allow, verbose_commit = 0;
>
>  	if (opts->no_commit) {
>  		/*
> @@ -1843,6 +1846,9 @@ static int do_pick_commit(struct repository *r,
>  		return error(_("cannot get commit message for %s"),
>  			oid_to_hex(&commit->object.oid));
>
> +	if (git_config_get_maybe_bool("rebase.verbosecommit", &verbose_commit) < 0)
> +		warning("Invalid value for rebase.verboseCommit. Using 'false' instead.");
> +
>  	if (opts->allow_ff && !is_fixup(command) &&
>  	    ((parent && oideq(&parent->object.oid, &head)) ||
>  	     (!parent && unborn))) {
> @@ -1853,6 +1859,8 @@ static int do_pick_commit(struct repository *r,
>  		if (res || command != TODO_REWORD)
>  			goto leave;
>  		flags |= EDIT_MSG | AMEND_MSG | VERIFY_MSG;
> +		if (verbose_commit)
> +			flags |= VERBOSE_COMMIT;
>  		msg_file = NULL;
>  		goto fast_forward_edit;
>  	}
> @@ -1909,12 +1917,17 @@ static int do_pick_commit(struct repository *r,
>  			author = get_author(msg.message);
>  	}
>
> -	if (command == TODO_REWORD)
> +	if (command == TODO_REWORD) {
>  		flags |= EDIT_MSG | VERIFY_MSG;
> +		if (verbose_commit)
> +			flags |= VERBOSE_COMMIT;
> +	}
>  	else if (is_fixup(command)) {
>  		if (update_squash_messages(r, command, commit, opts))
>  			return -1;
>  		flags |= AMEND_MSG;
> +		if (verbose_commit)
> +			flags |= VERBOSE_COMMIT;
>  		if (!final_fixup)
>  			msg_file = rebase_path_squash_msg();
>  		else if (file_exists(rebase_path_fixup_msg())) {
> diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
> index 1723e1a858..9b410d31e2 100755
> --- a/t/t3404-rebase-interactive.sh
> +++ b/t/t3404-rebase-interactive.sh
> @@ -1477,4 +1477,60 @@ test_expect_success 'valid author header when author contains single quote' '
>  	test_cmp expected actual
>  '
>
> +write_script "reword-and-check-for-diff" <<\EOF &&
> +case "$1" in
> +*/git-rebase-todo)
> +	sed s/pick/reword/ "$1" > "$1.tmp"
> +	mv -f "$1.tmp" "$1"
> +	;;
> +*)
> +	grep '^diff --git' "$1" >has-diff
> +	;;
> +esac
> +exit 0
> +EOF
> +
> +test_expect_success 'rebase -i does not show diff by default when rewording' '
> +	rebase_setup_and_clean no-verbose-commit-reword &&
> +	test_set_editor "$PWD/reword-and-check-for-diff" &&
> +	git rebase -i HEAD~1 &&
> +	test_line_count = 0 has-diff
> +'
> +
> +test_expect_success 'rebase -i respects rebase.verboseCommit when rewording' '
> +	rebase_setup_and_clean verbose-commit-reword &&
> +	test_config rebase.verboseCommit true &&
> +	test_set_editor "$PWD/reword-and-check-for-diff" &&
> +	git rebase -i HEAD~1 &&
> +	test_line_count -gt 0 has-diff
> +'
> +
> +write_script "squash-and-check-for-diff" <<\EOF &&
> +case "$1" in
> +*/git-rebase-todo)
> +	sed "s/pick \([0-9a-f]*\) E/squash \1 E/" "$1" > "$1.tmp"
> +	mv -f "$1.tmp" "$1"
> +	;;
> +*)
> +	grep '^diff --git' "$1" >has-diff
> +	;;
> +esac
> +exit 0
> +EOF
> +
> +test_expect_success 'rebase -i does not show diff by default when squashing' '
> +	rebase_setup_and_clean no-verbose-commit-squash &&
> +	test_set_editor "$PWD/squash-and-check-for-diff" &&
> +	git rebase -i HEAD~2 &&
> +	test_line_count = 0 has-diff
> +'
> +
> +test_expect_success 'rebase -i respects rebase.verboseCommit when squashing' '
> +	rebase_setup_and_clean verbose-commit-squash &&
> +	test_config rebase.verboseCommit true &&
> +	test_set_editor "$PWD/squash-and-check-for-diff" &&
> +	git rebase -i HEAD~2 &&
> +	test_line_count -gt 0 has-diff
> +'
> +
>  test_done

^ permalink raw reply	[relevance 19%]

* [PATCH 0/3] hash-object doc: small fixes
@ 2019-05-20 21:53 16% Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2019-05-20 21:53 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Adam Roben, Bryan Larsen,
	Matthias Urlichs, Eric Sunshine,
	Ævar Arnfjörð Bjarmason

Small doc fixes. Maybe trivial enough to land in 2.22, but there's no
rush.

A pair of #leftoverbits I noticed is that we've implemented the
"--stdin-paths" option via unquote_c_style() from day one, so our
current docs lie (and still do with this series) about wanting
\n-delimited files, you can't hash a file called '"foo"' as you'd
expect, you need to pass '"\"foo\""'.

I wonder if we should document this at this point, or just change it
and add a "-z" option. None of our tests fail if I remove this
unquote_c_style() codepath, and it's never been documented, but
someone in the wild may have organically depended on it.

Ævar Arnfjörð Bjarmason (3):
  hash-object doc: stop mentioning git-cvsimport
  hash-object doc: elaborate on -w and --literally promises
  hash-object doc: point to ls-files and rev-parse

 Documentation/git-hash-object.txt | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

-- 
2.21.0.1020.gf2820cf01a

^ permalink raw reply	[relevance 16%]

* Re: Resolving deltas dominates clone time
  @ 2019-04-30 18:48 17%               ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2019-04-30 18:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Duy Nguyen, Martin Fick, Git Mailing List


On Tue, Apr 30 2019, Jeff King wrote:

> On Tue, Apr 23, 2019 at 05:08:40PM +0700, Duy Nguyen wrote:
>
>> On Tue, Apr 23, 2019 at 11:45 AM Jeff King <peff@peff.net> wrote:
>> >
>> > On Mon, Apr 22, 2019 at 09:55:38PM -0400, Jeff King wrote:
>> >
>> > > Here are my p5302 numbers on linux.git, by the way.
>> > >
>> > >   Test                                           jk/p5302-repeat-fix
>> > >   ------------------------------------------------------------------
>> > >   5302.2: index-pack 0 threads                   307.04(303.74+3.30)
>> > >   5302.3: index-pack 1 thread                    309.74(306.13+3.56)
>> > >   5302.4: index-pack 2 threads                   177.89(313.73+3.60)
>> > >   5302.5: index-pack 4 threads                   117.14(344.07+4.29)
>> > >   5302.6: index-pack 8 threads                   112.40(607.12+5.80)
>> > >   5302.7: index-pack default number of threads   135.00(322.03+3.74)
>> > >
>> > > which still imply that "4" is a win over "3" ("8" is slightly better
>> > > still in wall-clock time, but the total CPU rises dramatically; that's
>> > > probably because this is a quad-core with hyperthreading, so by that
>> > > point we're just throttling down the CPUs).
>> >
>> > And here's a similar test run on a 20-core Xeon w/ hyperthreading (I
>> > tweaked the test to keep going after eight threads):
>> >
>> > Test                            HEAD
>> > ----------------------------------------------------
>> > 5302.2: index-pack 1 threads    376.88(364.50+11.52)
>> > 5302.3: index-pack 2 threads    228.13(371.21+17.86)
>> > 5302.4: index-pack 4 threads    151.41(387.06+21.12)
>> > 5302.5: index-pack 8 threads    113.68(413.40+25.80)
>> > 5302.6: index-pack 16 threads   100.60(511.85+37.53)
>> > 5302.7: index-pack 32 threads   94.43(623.82+45.70)
>> > 5302.8: index-pack 40 threads   93.64(702.88+47.61)
>> >
>> > I don't think any of this is _particularly_ relevant to your case, but
>> > it really seems to me that the default of capping at 3 threads is too
>> > low.
>>
>> Looking back at the multithread commit, I think the trend was the same
>> and I capped it because the gain was not proportional to the number of
>> cores we threw at index-pack anymore. I would not be opposed to
>> raising the cap though (or maybe just remove it)
>
> I'm not sure what the right cap would be. I don't think it's static;
> we'd want ~4 threads on the top case, and 10-20 on the bottom one.
>
> It does seem like there's an inflection point in the graph at N/2
> threads. But then maybe that's just because these are hyper-threaded
> machines, so "N/2" is the actual number of physical cores, and the
> inflated CPU times above that are just because we can't turbo-boost
> then, so we're actually clocking slower. Multi-threaded profiling and
> measurement is such a mess. :)
>
> So I'd say the right answer is probably either online_cpus() or half
> that. The latter would be more appropriate for the machines I have, but
> I'd worry that it would leave performance on the table for non-intel
> machines.

It would be a nice #leftoverbits project to do this dynamically at
runtime, i.e. hook up the throughput code in progress.c to some new
utility functions where the current code using pthreads would
occasionally stop and try to find some (local) maximum throughput given
N threads.

You could then dynamically save that optimum for next time, or adjust
threading at runtime every X seconds, e.g. on a server with N=24 cores
you might want 24 threads if you have one index-pack, but if you have 24
index-packs you probably don't want each with 24 threads, for a total of
576.

^ permalink raw reply	[relevance 17%]

* Re: [PATCH 2/2] describe doc: remove '7-char' abbreviation reference
  @ 2019-04-07 20:05 13%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 20:05 UTC (permalink / raw)
  To: Philip Oakley; +Cc: GitList, Linus Torvalds, Jeff King


On Sat, Apr 06 2019, Philip Oakley wrote:

> While the minimum is 7-char, the unambiguous length can be longer.
>
> Signed-off-by: Philip Oakley <philipoakley@iee.org>
> ---
> noticed while looking int the Git-for-Windows patch thicket -
> was looking for the ~n^m style!
> ---
>  Documentation/git-describe.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
> index ccdc5f83d6..a88f6ae2c6 100644
> --- a/Documentation/git-describe.txt
> +++ b/Documentation/git-describe.txt
> @@ -139,7 +139,7 @@ at the end.
>
>  The number of additional commits is the number
>  of commits which would be displayed by "git log v1.0.4..parent".
> -The hash suffix is "-g" + 7-char abbreviation for the tip commit
> +The hash suffix is "-g" + unambiguous abbreviation for the tip commit
>  of parent (which was `2414721b194453f058079d897d13c4e377f92dc6`).
>  The "g" prefix stands for "git" and is used to allow describing the version of
>  a software depending on the SCM the software is managed with. This is useful

Both the old/new version are subtly wrong. Whether the new one is better
is another matter.

First, there's more places we mention the now-incorrect 7 characters, at
least these (one of which you're fixing). Found by grepping for ' 7 '
and '7.*abbr':

    Documentation/git-branch.txt-181---abbrev=<length>::
    Documentation/git-branch.txt-182-       Alter the sha1's minimum display length in the output listing.
    Documentation/git-branch.txt:183:       The default value is 7 and can be overridden by the `core.abbrev`
    Documentation/git-branch.txt-184-       config option.
    Documentation/git-describe.txt-65---abbrev=<n>::
    Documentation/git-describe.txt:66:      Instead of using the default 7 hexadecimal digits as the
    Documentation/git-describe.txt-67-      abbreviated object name, use <n> digits, or as many digits
    Documentation/git-ls-tree.txt-93-Object size identified by <object> is given in bytes, and right-justified
    Documentation/git-ls-tree.txt:94:with minimum width of 7 characters.  Object size is given only for blobs
    Documentation/git-ls-tree.txt-95-(file) entries; for other entries `-` character is used in place of size.
    Documentation/gittutorial-2.txt-44-
    Documentation/gittutorial-2.txt:45:What are the 7 digits of hex that Git responded to the commit with?
    Documentation/gittutorial-2.txt-46-
    [...]
    Documentation/gittutorial-2.txt-52-name), and that the contents of a Git object will never change (since
    Documentation/gittutorial-2.txt:53:that would change the object's name as well). The 7 char hex strings
    Documentation/gittutorial-2.txt-54-here are simply the abbreviation of such 40 character long strings.

It was never correct that we'd pick 7 characters, we'd *try* that before
e6c587c733 ("abbrev: auto size the default abbreviation", 2016-09-30)
but would pick a longer one if it was unambiguous.

Whereas "unambiguous abbreviation" isn't correct either, and arguably
less correct. At least 7 is what we *still* pick as a fallback in lieu
of the auto-sizing, but just "unambiguous abbreviation" implies that in
a repo with some 10 objects we might show just one character, or that
we'd post-e6c587c733 pick say 7 characters in a repository where it *is*
unambiguous but where we've auto-sized to 12.

I've been meaning to follow-up on
https://public-inbox.org/git/20190204161217.20047-1-avarab@gmail.com/
where I among other things wanted to just have these instances all say
"commits will be abbreviated as described in XYZ in linkgit:<something>"
and summarize what happens there.

I don't mind if this goes in, I mainly wrote this E-Mail as a brain dump
since it jolted my memory on the topic, and so that I could dig it up
later & see how I intended to follow-up on those #leftoverbits

^ permalink raw reply	[relevance 13%]

* Re: [PATCH v2 1/3] Move init_skiplist() outside of fsck
  @ 2019-01-22  9:46 15%                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2019-01-22  9:46 UTC (permalink / raw)
  To: Jeff King
  Cc: Johannes Schindelin, Junio C Hamano, Barret Rhoden, git,
	David Kastrup, Jeff Smith, René Scharfe, Stefan Beller


On Tue, Jan 22 2019, Jeff King wrote:

> On Fri, Jan 18, 2019 at 11:26:29PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> I stand corrected, I thought these still needed to be updated to parse
>> anything that wasn't 40 chars, since I hadn't seen anything about these
>> formats in the hash transition document.
>>
>> So fair enough, let's change that while we're at it, but this seems like
>> something that needs to be planned for in more detail / documented in
>> the hash transition doc.
>>
>> I.e. many (e.g. me) maintain some system-wide skiplist for strict fsck
>> cloning of legacy repos. So I can see there being some need for a
>> SHA1<->SHA256 map in this case, but since these files might stretch
>> across repo boundaries and not be checked into the repo itself this is a
>> new use-case that needs thinking about.
>
> My assumption had been that changing your local repository would be a
> (local) flag day, and you'd update any ancillary files like skiplists,
> mailmap.blob, etc at the same time. I'm not opposed to making those
> features more clever, though.
>
>> But now that I think about it this sort of thing would be a good
>> use-case for just fixing these various historical fsck issues while
>> we're at it when possible, e.g. "missing space before email" (probably
>> not all could be unambiguously fixed). So instead of sha256<->sha1
>> fn(sha256)<->fn(sha1)[1]?
>
> That is a very tempting thing to do, but I think it comes with its own
> complications. We do not want to do fn(sha1), I don't think; the reason
> we care about sha1 at all is that those hashes are already set in stone.
>
> There could be a "clean up the data as we convert to sha256" operation,
> but:
>
>   - it needs to be set in stone from day 1, I'd think. The last thing we
>     want is to modify it after conversions are in the wild
>
>   - I think we need to be bi-directional. So it must be a mapping that
>     can be undone to retrieve the original bytes, so we can compute
>     their "real" sha1.

It needing to be bidirectional is a very good point, and I think that
makes my suggestion a non-starter. Thanks.

> At which point, I think it might be simpler to just make git more
> permissive with respect to those minor data errors (and in fact, we are
> already pretty permissive for the most part in non-fsck operations).

Yeah it's probably better to make some of these "errors" softer
warnings.

The X-Y issue I have is that I turned on transfer.fsckObjects, so then I
can't clone repos with various minor historical issues in commit headers
etc., so I maintain a big skip list. But what I was actually after was
fsck checks like the .gitmodules security check.

Of course I could chase them all down and turn them into
warn/error/ignore individually, but it would be better if we e.g. had
some way to say "serious things error, minor things warn", maybe with
the option of only having the looser version on fetch but not recieve
with the principle that we should be loose in what we accept from
existing data but strict with new data #leftoverbits

^ permalink raw reply	[relevance 15%]

* Re: Students projects: looking for small and medium project ideas
  @ 2019-01-14 23:04 17% ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2019-01-14 23:04 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git


On Mon, Jan 14 2019, Matthieu Moy wrote:

> I haven't been active for a while on this list, but for those who don't
> know me, I'm a CS teacher and I'm regularly offering my students to
> contribute to open-source projects as part of their school projects. A
> few nice features like "git rebase -i --exec" or many of the hints in
> "git status" were implemented as part of these projects.
>
> I'm starting another instance of such project next week.

Good to hear!

> Part of the work of students is to choose which feature they want to
> work on, but I try to prepare this for them. I'm keeping a list of ideas
> here:
>
>   https://git.wiki.kernel.org/index.php/SmallProjectsIdeas
>
> (At some point, I should probably migrate this to git.github.io, since
> the wiki only seems half-alive these days).
>
> I'm looking for small to medium size projects (typically, a GSoC project
> is far too big in comparison, but we may expect more than just
> microprojects).
>
> You may suggest ideas by editting the wiki page, or just by replying to
> this email (I'll point my students to the thread). Don't hesitate to
> remove entries (or ask me to do so) on the wiki page if you think they
> are not relevant anymore.

Some #leftoverbits I've noted on-list before would qualify, some of
these (e.g. grep --only-matching) have been implemented, but others not:

https://public-inbox.org/git/87in9ucsbb.fsf@evledraar.gmail.com/
https://public-inbox.org/git/87bmcyfh67.fsf@evledraar.gmail.com/

^ permalink raw reply	[relevance 17%]

* Re: How de-duplicate similar repositories with alternates
  2018-11-29 14:59 11% How de-duplicate similar repositories with alternates Ævar Arnfjörð Bjarmason
  2018-11-29 16:09  6% ` Ævar Arnfjörð Bjarmason
  @ 2018-12-04 13:35  4% ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-12-04 13:35 UTC (permalink / raw)
  To: git, Git for human beings; +Cc: Christian Couder, Derrick Stolee


On Thu, Nov 29 2018, Ævar Arnfjörð Bjarmason wrote:

> A co-worker asked me today how space could be saved when you have
> multiple checkouts of the same repository (at different revs) on the
> same machine. I said since these won't block-level de-duplicate well[1]
> one way to do this is with alternates.
>
> However, once you have an existing clone I didn't know how to get the
> gains without a full re-clone, but I hadn't looked deeply into it. As it
> turns out I'm wrong about that, which I found when writing the following
> test-case which shows that it works:
>
>     (
>         cd /tmp &&
>         rm -rf /tmp/git-{master,pu,pu-alt}.git &&
>
>         # Normal clones
>         git clone --bare --no-tags --single-branch --branch master https://github.com/git/git.git /tmp/git-master.git &&
>         git clone --bare --no-tags --single-branch --branch pu https://github.com/git/git.git /tmp/git-pu.git &&
>
>         # An 'alternate' clone using 'master' objects from another repo
>         git --bare init /tmp/git-pu-alt.git &&
>         for git in git-pu.git git-pu-alt.git
>         do
>             echo /tmp/git-master.git/objects >/tmp/$git/objects/info/alternates
>         done &&
>         git -C git-pu-alt.git fetch --no-tags https://github.com/git/git.git pu:pu
>
>         # Respective sizes, 'alternate' clone much smaller
>         du -shc /tmp/git-*.git &&
>
>         # GC them all. Compacts the git-pu.git to git-pu-alt.git's size
>         for repo in git-*.git
>         do
>             git -C $repo gc
>         done &&
>         du -shc /tmp/git-*.git
>
>         # Add another big history (GFW) to git-{pu,master}.git (in that order!)
>         for repo in $(ls -d /tmp/git-*.git | sort -r)
>         do
>             git -C $repo fetch --no-tags https://github.com/git-for-windows/git master:master-gfw
>         done &&
>         du -shc /tmp/git-*.git &&
>
>         # Another GC. The objects now in git-master.git will be de-duped by all
>         for repo in git-*.git
>         do
>             git -C $repo gc
>         done &&
>         du -shc /tmp/git-*.git
>     )
>
> This shows a scenario where we clone git.git at "master" and "pu" in
> different places. After clone the relevant sizes are:
>
>     108M    /tmp/git-master.git
>     3.2M    /tmp/git-pu-alt.git
>     109M    /tmp/git-pu.git
>     219M    total
>
> I.e. git-pu-alt.git is much smaller since it points via alternates to
> git-master.git, and the history of "pu" shares most of the objects with
> "master". But then how do you get those gains for git-pu.git? Turns out
> you just "git gc"
>
>     111M    /tmp/git-master.git
>     2.1M    /tmp/git-pu-alt.git
>     2.1M    /tmp/git-pu.git
>     115M    total
>
> This is the thing I was wrong about, in retrospect probably because I'd
> been putting PATH_TO_REPO in objects/info/alternates, but we actually
> need PATH_TO_REPO/objects, and "git gc" won't warn about this (or "git
> fsck"). Probably a good idea to patch that at some point, i.e. whine
> about paths in alternates that don't have objects, or at the very least
> those that don't exist. #leftoverbits
>
> Then when we fetch git-for-windows:master to all the repos they all grow
> by the amount git-for-windows has diverged:
>
>     144M    /tmp/git-master.git
>     36M     /tmp/git-pu-alt.git
>     36M     /tmp/git-pu.git
>     214M    total
>
> Note that the "sort -r" is critical here. If we fetched git-master.git
> first (at this point the alternate for git-pu*.git) we wouldn't get the
> duplication in the first place, but instead:
>
>     144M    /tmp/git-master.git
>     2.1M    /tmp/git-pu-alt.git
>     2.1M    /tmp/git-pu.git
>     148M    total
>
> This shows the importance of keeping such an 'alternate' repo
> up-to-date, i.e. we don't get the duplication in the first place, but
> regardless (this from a run with sort -r) a "git gc" will coalesce them:
>
>     131M    /tmp/git-master.git
>     2.1M    /tmp/git-pu-alt.git
>     2.2M    /tmp/git-pu.git
>     135M    total
>
> If you find this interesting make sure to read my
> https://public-inbox.org/git/87k1s3bomt.fsf@evledraar.gmail.com/ and
> https://public-inbox.org/git/87in7nbi5b.fsf@evledraar.gmail.com/ for the
> caveats, i.e. if this is something intended for users then no ref in the
> alternate can ever be rewound, that'll potentially result in repository
> corruption.
>
> 1. https://public-inbox.org/git/87bmhiykvw.fsf@evledraar.gmail.com/

Maybe this is useful to someone. Here's a cronjob I wrote since I wrote
this thread that runs in daily cron on some of our systems.

It expects repositories in /var/lib/git_tree-for-alternates like
/var/lib/git_tree-for-alternates/git/git.git to exist, then scours /home
and /etc/puppet/environments (which we had a lot of) for "config" files
with the string in git/git (this saves us some work) and then tries to
find a git repository relative to that "config" file with "rev-parse
--absolute-git-dir".

If there is one, we check if the repository has a SHA-1 that the history
of our /var/lib/git_tree-for-alternates/git/git.git started with (if >1
we pick the oldest), if so this is a repository that can benefit from
using /var/lib/git_tree-for-alternates/git/git.git/objects as an
alternate, and we add the appropriate alternate info, unset
gc.bigPackThreshold so GC will actually do its work, and run "git gc"
sudo'd as the the user who owns the thing.

One one server the .git directories in /home went from ~2TB to ~100GB
using this script. On another from ~250G to ~5G. The leftover space
spent is the commit-grah (not de-duped like objects are), and whatever
accumulated divergence (topic branches mainly) exist in those repos
different than what the alternate store has in the HEAD branch.

#!/bin/bash

set -euo pipefail

ALTERNATES_STORE=/var/lib/git_tree-for-alternates

if ! test -d $ALTERNATES_STORE
then
    echo 'We have no alternates repositories here to point to!' >&2
    exit 0
fi


find_owning_user() {
    path=$1
    case $path in
        /home/*|/etc/puppet/environments/*)
            who=$(echo $path | perl -pe 's[^
                (?:
                    /home
                    |
                    /etc/puppet/environments
                )
                /
                ([^/]+)
                /
                .*
            ][$1]gx')
            if getent passwd $who >/dev/null
            then
                echo $who
            else
                echo "Know how to get user from path '$path', but '$who' is not a valid user!" >&2
            fi
            ;;
        *)
            echo "Don't know how to get user from path '$path' yet!" >&2
            ;;
    esac
}

find $ALTERNATES_STORE -type d -name '*.git' -printf "%P\n" |
while read alternate
do
    alternate_no_git=$(echo $alternate | sed 's/\.git//')
    ALTERNATES_STORE_OBJECTS=$ALTERNATES_STORE/$alternate/objects

    # If these repositories we're finding don't share a root commit
    # with the repo we have this is not going to work and we have the
    # wrong match. Note that we can have more than one root commit
    # and try to find the oldest one. Pretty sure bet that that's
    # the "real" root.
    root_commit=$(git -C $ALTERNATES_STORE/$alternate log --max-parents=0 --date-order --reverse --pretty=format:%H | head -n 1)
    echo "> Finding repositories on the system that share the $root_commit commit with $alternate" >&2

    find \
        /home \
        $(if test -d /etc/puppet/environments; then echo /etc/puppet/environments; fi) \
        -type f -name 'config' -exec grep -Hl $alternate_no_git {} \; 2>/dev/null |
    while read config
    do
        dirname=$(dirname $config)
        echo ">> Checking if $dirname is in a $alternate git repository..." >&2
        if git_dir=$(git -C $dirname rev-parse --absolute-git-dir) &&
                git -C $git_dir cat-file -e $root_commit
        then
            echo ">>> ...Yes it was, at $git_dir" >&2
            echo ">>>> Is it already migrated?..." >&2
            if test -e $git_dir/objects/info/alternates &&
                    grep -x -F -q $ALTERNATES_STORE_OBJECTS $git_dir/objects/info/alternates
            then
                echo ">>>> ...yes, nothing to do here" >&2
                continue
            else
                echo ">>>> ...no, doing migration" >&2

                who=$(find_owning_user $git_dir)
                if test -z "$who"
                then
                    echo ">>>>> unable to find who owns $git_dir" >&2
                    continue
                else
                    echo ">>>>> found that $who owns $git_dir" >&2
                fi

                if test "$DRY_RUN" = "1"
                then
                    echo ">>>>>> Would have ran commands migrating $git_dir"
                else
                    if ! sudo -u $who stat $git_dir >/dev/null 2>&1
                    then
                        echo ">>>>>> The '$who' user can't access his own '$git_dir'. Could be e.g. ex-employee. Using 'root'"
                        who=root
                    fi

                    echo ">>>>>> Migrating $git_dir is now $(sudo -u $who du -sh $git_dir | cut -f1)"
                    sudo -u $who git -C $git_dir config gc.bigPackThreshold 0
                    echo $ALTERNATES_STORE_OBJECTS | sudo tee -a $git_dir/objects/info/alternates >/dev/null
                    sudo -u $who git -C $git_dir gc
                    echo ">>>>>> Migrated $git_dir is now $(sudo -u $who du -sh $git_dir | cut -f1)"
                fi
            fi
        else
            echo ">>> No it isn't. Skipping it" >&2
            continue
        fi
    done
done

^ permalink raw reply	[relevance 4%]

* Re: How de-duplicate similar repositories with alternates
  @ 2018-12-04 10:43  6%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-12-04 10:43 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Git for human beings, Christian Couder


On Tue, Dec 04 2018, Jeff King wrote:

> On Thu, Nov 29, 2018 at 03:59:26PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> This is the thing I was wrong about, in retrospect probably because I'd
>> been putting PATH_TO_REPO in objects/info/alternates, but we actually
>> need PATH_TO_REPO/objects, and "git gc" won't warn about this (or "git
>> fsck"). Probably a good idea to patch that at some point, i.e. whine
>> about paths in alternates that don't have objects, or at the very least
>> those that don't exist. #leftoverbits
>
> We do complain about missing directories; see alt_odb_usable().
> Pointing to a real directory that doesn't happen to contain any objects
> is harder. If there are no loose objects, there might not be any hashed
> object directories. For a "real" object database, there should always be
> a "pack/" directory. But technically the object storage directory does
> not even need to have that; it can just be a directory full of loose
> objects that happens not to have any at this moment.
>
> That said, I suspect if we issued a warning for "woah, it looks like
> this doesn't have any objects in it, nor does it even have a pack
> directory" that nobody would complain.

Yeah, although see my <87sgzjyif2.fsf@evledraar.gmail.com>, I also ran
into a different issue.

I think a warning (or even error) like this would be more useful:

    test ! -d $objdir && error... # current behavior
    test -d $objdir/objects && error "Did you mean $objdir/objects, silly?" # new error

I.e. I suspect I'm not the only one who's not read the documentation
carefully enough and thought it was a path to the root of the repo and
wondered why it silently didn't work.

^ permalink raw reply	[relevance 6%]

* Re: How de-duplicate similar repositories with alternates
  2018-11-29 14:59 11% How de-duplicate similar repositories with alternates Ævar Arnfjörð Bjarmason
@ 2018-11-29 16:09  6% ` Ævar Arnfjörð Bjarmason
    2018-12-04 13:35  4% ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-11-29 16:09 UTC (permalink / raw)
  To: git, Git for human beings; +Cc: Christian Couder, Duy Nguyen


On Thu, Nov 29 2018, Ævar Arnfjörð Bjarmason wrote:

> A co-worker asked me today how space could be saved when you have
> multiple checkouts of the same repository (at different revs) on the
> same machine. I said since these won't block-level de-duplicate well[1]
> one way to do this is with alternates.
>
> However, once you have an existing clone I didn't know how to get the
> gains without a full re-clone, but I hadn't looked deeply into it. As it
> turns out I'm wrong about that, which I found when writing the following
> test-case which shows that it works:
>
>     (
>         cd /tmp &&
>         rm -rf /tmp/git-{master,pu,pu-alt}.git &&
>
>         # Normal clones
>         git clone --bare --no-tags --single-branch --branch master https://github.com/git/git.git /tmp/git-master.git &&
>         git clone --bare --no-tags --single-branch --branch pu https://github.com/git/git.git /tmp/git-pu.git &&
>
>         # An 'alternate' clone using 'master' objects from another repo
>         git --bare init /tmp/git-pu-alt.git &&
>         for git in git-pu.git git-pu-alt.git
>         do
>             echo /tmp/git-master.git/objects >/tmp/$git/objects/info/alternates
>         done &&
>         git -C git-pu-alt.git fetch --no-tags https://github.com/git/git.git pu:pu
>
>         # Respective sizes, 'alternate' clone much smaller
>         du -shc /tmp/git-*.git &&
>
>         # GC them all. Compacts the git-pu.git to git-pu-alt.git's size
>         for repo in git-*.git
>         do
>             git -C $repo gc
>         done &&
>         du -shc /tmp/git-*.git
>
>         # Add another big history (GFW) to git-{pu,master}.git (in that order!)
>         for repo in $(ls -d /tmp/git-*.git | sort -r)
>         do
>             git -C $repo fetch --no-tags https://github.com/git-for-windows/git master:master-gfw
>         done &&
>         du -shc /tmp/git-*.git &&
>
>         # Another GC. The objects now in git-master.git will be de-duped by all
>         for repo in git-*.git
>         do
>             git -C $repo gc
>         done &&
>         du -shc /tmp/git-*.git
>     )
>
> This shows a scenario where we clone git.git at "master" and "pu" in
> different places. After clone the relevant sizes are:
>
>     108M    /tmp/git-master.git
>     3.2M    /tmp/git-pu-alt.git
>     109M    /tmp/git-pu.git
>     219M    total
>
> I.e. git-pu-alt.git is much smaller since it points via alternates to
> git-master.git, and the history of "pu" shares most of the objects with
> "master". But then how do you get those gains for git-pu.git? Turns out
> you just "git gc"
>
>     111M    /tmp/git-master.git
>     2.1M    /tmp/git-pu-alt.git
>     2.1M    /tmp/git-pu.git
>     115M    total
>
> This is the thing I was wrong about, in retrospect probably because I'd
> been putting PATH_TO_REPO in objects/info/alternates, but we actually
> need PATH_TO_REPO/objects, and "git gc" won't warn about this (or "git
> fsck"). Probably a good idea to patch that at some point, i.e. whine
> about paths in alternates that don't have objects, or at the very least
> those that don't exist. #leftoverbits

Actually looking at this again the thing that may have stumped me last
time is that this has a bad interaction with gc.bigPackThreshold. If you
have an alternate that would otherwise house most of your objects *and*
you have a pack that's larger than the gc.bigPackThreshold your mostly
redundant pack won't be removed.

That's understandable in terms of implementation, but unfortunate. It
would be nice if we learned some way to detect this, i.e. "I have this
10GB pack, but with this alternate I can extract this 100MB out of it
and throw it away". Now we just keep the 10GB pack even if it's mostly
redundant to what's in the alternate.

> Then when we fetch git-for-windows:master to all the repos they all grow
> by the amount git-for-windows has diverged:
>
>     144M    /tmp/git-master.git
>     36M     /tmp/git-pu-alt.git
>     36M     /tmp/git-pu.git
>     214M    total
>
> Note that the "sort -r" is critical here. If we fetched git-master.git
> first (at this point the alternate for git-pu*.git) we wouldn't get the
> duplication in the first place, but instead:
>
>     144M    /tmp/git-master.git
>     2.1M    /tmp/git-pu-alt.git
>     2.1M    /tmp/git-pu.git
>     148M    total
>
> This shows the importance of keeping such an 'alternate' repo
> up-to-date, i.e. we don't get the duplication in the first place, but
> regardless (this from a run with sort -r) a "git gc" will coalesce them:
>
>     131M    /tmp/git-master.git
>     2.1M    /tmp/git-pu-alt.git
>     2.2M    /tmp/git-pu.git
>     135M    total
>
> If you find this interesting make sure to read my
> https://public-inbox.org/git/87k1s3bomt.fsf@evledraar.gmail.com/ and
> https://public-inbox.org/git/87in7nbi5b.fsf@evledraar.gmail.com/ for the
> caveats, i.e. if this is something intended for users then no ref in the
> alternate can ever be rewound, that'll potentially result in repository
> corruption.
>
> 1. https://public-inbox.org/git/87bmhiykvw.fsf@evledraar.gmail.com/

^ permalink raw reply	[relevance 6%]

* How de-duplicate similar repositories with alternates
@ 2018-11-29 14:59 11% Ævar Arnfjörð Bjarmason
  2018-11-29 16:09  6% ` Ævar Arnfjörð Bjarmason
                   ` (2 more replies)
  0 siblings, 3 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-11-29 14:59 UTC (permalink / raw)
  To: git, Git for human beings; +Cc: Christian Couder

A co-worker asked me today how space could be saved when you have
multiple checkouts of the same repository (at different revs) on the
same machine. I said since these won't block-level de-duplicate well[1]
one way to do this is with alternates.

However, once you have an existing clone I didn't know how to get the
gains without a full re-clone, but I hadn't looked deeply into it. As it
turns out I'm wrong about that, which I found when writing the following
test-case which shows that it works:

    (
        cd /tmp &&
        rm -rf /tmp/git-{master,pu,pu-alt}.git &&

        # Normal clones
        git clone --bare --no-tags --single-branch --branch master https://github.com/git/git.git /tmp/git-master.git &&
        git clone --bare --no-tags --single-branch --branch pu https://github.com/git/git.git /tmp/git-pu.git &&

        # An 'alternate' clone using 'master' objects from another repo
        git --bare init /tmp/git-pu-alt.git &&
        for git in git-pu.git git-pu-alt.git
        do
            echo /tmp/git-master.git/objects >/tmp/$git/objects/info/alternates
        done &&
        git -C git-pu-alt.git fetch --no-tags https://github.com/git/git.git pu:pu

        # Respective sizes, 'alternate' clone much smaller
        du -shc /tmp/git-*.git &&

        # GC them all. Compacts the git-pu.git to git-pu-alt.git's size
        for repo in git-*.git
        do
            git -C $repo gc
        done &&
        du -shc /tmp/git-*.git

        # Add another big history (GFW) to git-{pu,master}.git (in that order!)
        for repo in $(ls -d /tmp/git-*.git | sort -r)
        do
            git -C $repo fetch --no-tags https://github.com/git-for-windows/git master:master-gfw
        done &&
        du -shc /tmp/git-*.git &&

        # Another GC. The objects now in git-master.git will be de-duped by all
        for repo in git-*.git
        do
            git -C $repo gc
        done &&
        du -shc /tmp/git-*.git
    )

This shows a scenario where we clone git.git at "master" and "pu" in
different places. After clone the relevant sizes are:

    108M    /tmp/git-master.git
    3.2M    /tmp/git-pu-alt.git
    109M    /tmp/git-pu.git
    219M    total

I.e. git-pu-alt.git is much smaller since it points via alternates to
git-master.git, and the history of "pu" shares most of the objects with
"master". But then how do you get those gains for git-pu.git? Turns out
you just "git gc"

    111M    /tmp/git-master.git
    2.1M    /tmp/git-pu-alt.git
    2.1M    /tmp/git-pu.git
    115M    total

This is the thing I was wrong about, in retrospect probably because I'd
been putting PATH_TO_REPO in objects/info/alternates, but we actually
need PATH_TO_REPO/objects, and "git gc" won't warn about this (or "git
fsck"). Probably a good idea to patch that at some point, i.e. whine
about paths in alternates that don't have objects, or at the very least
those that don't exist. #leftoverbits

Then when we fetch git-for-windows:master to all the repos they all grow
by the amount git-for-windows has diverged:

    144M    /tmp/git-master.git
    36M     /tmp/git-pu-alt.git
    36M     /tmp/git-pu.git
    214M    total

Note that the "sort -r" is critical here. If we fetched git-master.git
first (at this point the alternate for git-pu*.git) we wouldn't get the
duplication in the first place, but instead:

    144M    /tmp/git-master.git
    2.1M    /tmp/git-pu-alt.git
    2.1M    /tmp/git-pu.git
    148M    total

This shows the importance of keeping such an 'alternate' repo
up-to-date, i.e. we don't get the duplication in the first place, but
regardless (this from a run with sort -r) a "git gc" will coalesce them:

    131M    /tmp/git-master.git
    2.1M    /tmp/git-pu-alt.git
    2.2M    /tmp/git-pu.git
    135M    total

If you find this interesting make sure to read my
https://public-inbox.org/git/87k1s3bomt.fsf@evledraar.gmail.com/ and
https://public-inbox.org/git/87in7nbi5b.fsf@evledraar.gmail.com/ for the
caveats, i.e. if this is something intended for users then no ref in the
alternate can ever be rewound, that'll potentially result in repository
corruption.

1. https://public-inbox.org/git/87bmhiykvw.fsf@evledraar.gmail.com/

^ permalink raw reply	[relevance 11%]

* Re: [PATCH v2] read-cache: write all indexes with the same permissions
  @ 2018-11-17 21:14  5%       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-11-17 21:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, git, Jeff King, Nguyen Thai Ngoc Duy,
	Michael Haggerty, Christian Couder


On Sat, Nov 17 2018, Junio C Hamano wrote:

> Christian Couder <christian.couder@gmail.com> writes:
>
>> "However, as noted in those commits we'd still create the file as
>> 0600, and would just re-chmod it only if core.sharedRepository is set
>> to "true" or "all". If core.sharedRepository is unset or set to
>> "false", then the file mode will not be changed, so without
>> core.splitIndex a system with e.g. the umask set to group writeability
>> would work for a group member, but not with core.splitIndex set, as
>> group members would not be able to access the shared index file.
>
> That is irrelevant.  The repository needs to be configured properly
> if it wanted to be used by the members of the group, period.
>
>> It is unfortunately not short lived when core.sharedrepository is
>> unset for example as adjust_shared_perm() starts with:
>>
>> int adjust_shared_perm(const char *path)
>> {
>>         int old_mode, new_mode;
>>
>>         if (!get_shared_repository())
>>                 return 0;
>>
>> but get_shared_repository() will return PERM_UMASK which is 0 when
>> git_config_get_value("core.sharedrepository", ...) returns a non zero
>> value which happens when "core.sharedrepository" is unset.
>
> Which is to say, you get an unwanted result when your repository is
> not configured properly.  It is not a news, and I have no sympathy.
>
> Just configure your repository properly and you'll be fine.
>
>>> > Ideally we'd split up the adjust_shared_perm() function to one that
>>> > can give us the mode we want so we could just call open() instead of
>>> > open() followed by chmod(), but that's an unrelated cleanup.
>>>
>>> I would drop this paragraph, as I think this is totally incorrect.
>>> Imagine your umask is tighter than the target permission.  You ask
>>> such a helper function and get "you want 0660".  Doing open(0660)
>>> would not help you an iota---you'd need chmod() or fchmod() to
>>> adjust the result anyway, which already is done by
>>> adjust-shared-perm.
>>
>> It seems to me that it is not done when "core.sharedrepository" is unset.
>
> So?  You are assuming that the repository is misconfigured and it is
> not set to widen the perm bit in the first place, no?
>
>>> > We already have that minor issue with the "index" file
>>> > #leftoverbits.
>>>
>>> The above "Ideally", which I suspect is totally bogus, would show up
>>> whey people look for that keyword in the list archive.  This is one
>>> of the reasons why I try to write it after at least one person
>>> sanity checks that an idea floated is worth remembering.
>>
>> It was in Ævar's commit message and I thought it might be better to
>> keep it so that people looking for that keyword could find the above
>> as well as the previous RFC patch.
>
> So do you agree that open(0660) does not guarantee the result will
> be group writable, the above "Ideally" is misguided nonsense, and
> giving the #leftoverbits label to it will clutter the search result
> and harm readers?  That's good.

Aside from issues with the clarity of the commit message, which I'll fix
& thanks for pointing them out. I think we may have stumbled on
something more important here.

Do you mean that you don't agree that following should always create
both "foo" and e.g. ".git/refs/heads/master" with the same 644
(-rw-rw-r--) mode:

    (
        rm -rf /tmp/repo &&
        umask 022 &&
        git init /tmp/repo &&
        cd /tmp/repo &&
        echo hi >foo &&
        git add foo &&
        git commit -m"first"
    )

To me what we should do with the standard umask and what
core.sharedRepository are for are completely different things.

We should in git be creating files such that if I set my umask to
e.g. 022 all users on the system can read what I'm creating.

E.g. I tend to use this on something like a production server where
others (if I'm asleep) might want to look at my .bash_history as a last
resort, and also some one-off repo I've created without setting
core.sharedRepository.

I've yet to run into a case where this doesn't just work, aside from
core.splitIndex where before the patch here we're using a tempfile API
for something that isn't a tempfile.

This is distinct from the core.sharedRepository use-case, where you'd
like to on a per-repo basis override what you'd otherwise get with the
umask. E.g. if you have a shared server hosting a shared git repo, where
users with umask 077 will still be forced to create e.g. group rw files.

^ permalink raw reply	[relevance 5%]

* [RFC/PATCH] read-cache: write all indexes with the same permissions
  @ 2018-11-13 15:32 11% ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-11-13 15:32 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Christian Couder,
	Nguyễn Thái Ngọc Duy, Michael Haggerty,
	Ævar Arnfjörð Bjarmason

Change the code that writes out the shared index to use
create_tempfile() instead of mks_tempfile();

The create_tempfile() function is used to write out the main
.git/index (via .git/index.lock) using lock_file(). The
create_tempfile() function respects the umask, whereas the
mks_tempfile() function will create files with 0600 permissions.

A bug related to this was spotted, fixed and tested for in
df801f3f9f ("read-cache: use shared perms when writing shared index",
2017-06-25) and 3ee83f48e5 ("t1700: make sure split-index respects
core.sharedrepository", 2017-06-25).

However, as noted in those commits we'd still create the file as 0600,
and would just re-chmod it depending on the setting of
core.sharedRepository. So without core.splitIndex a system with
e.g. the umask set to group writeability would work, but not with
core.splitIndex set.

Let's instead make the two consistent by using create_tempfile(). This
allows us to remove the code added in df801f3f9f (subsequently
modified in 59f9d2dd60 ("read-cache.c: move tempfile creation/cleanup
out of write_shared_index", 2018-01-14)) as redundant. The
create_tempfile() function itself calls adjust_shared_perm().

Now we're not leaking the implementation detail that we're using a
mkstemp()-like API for something that's not really a mkstemp()
use-case. See c18b80a0e8 ("update-index: new options to enable/disable
split index mode", 2014-06-13) for the initial implementation which
used mkstemp() without a wrapper.

One thing I was paranoid about when making this change was not
introducing a race condition where with
e.g. core.sharedRepository=0600 we'd do something different for
"index" v.s. "sharedindex.*", as the former has a *.lock file, not the
latter.

But I'm confident that we're exposing no such edge-case. With a user
umask of e.g. 0022 and core.sharedRepository=0600 we initially create
both "index' and "sharedindex.*" files that are globally readable, but
re-chmod them while they're still empty.

Ideally we'd split up the adjust_shared_perm() function to one that
can give us the mode we want so we could just call open() instead of
open() followed by chmod(), but that's an unrelated cleanup. We
already have that minor issue with the "index" file #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

I won't have time to finish this today, as noted in
https://public-inbox.org/git/874lcl2e9t.fsf@evledraar.gmail.com/
there's a pretty major bug here in that we're now writing out literal
sharedindex_XXXXXX files.

Obviously that needs to be fixed, and the fix is trivial, I can use
another one of the mks_*() functions with the same mode we use to
create the index.

But we really ought to have tests for the bug this patch introduces,
and as noted in the E-Mail linked above we don't.

So hopefully Duy or someone with more knowledge of the split index
will chime in to say what's missing there...

 read-cache.c           |  7 +------
 t/t1700-split-index.sh | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index f3a848d61c..7135537554 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -3074,11 +3074,6 @@ static int write_shared_index(struct index_state *istate,
 	ret = do_write_index(si->base, *temp, 1);
 	if (ret)
 		return ret;
-	ret = adjust_shared_perm(get_tempfile_path(*temp));
-	if (ret) {
-		error("cannot fix permission bits on %s", get_tempfile_path(*temp));
-		return ret;
-	}
 	ret = rename_tempfile(temp,
 			      git_path("sharedindex.%s", oid_to_hex(&si->base->oid)));
 	if (!ret) {
@@ -3159,7 +3154,7 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock,
 		struct tempfile *temp;
 		int saved_errno;

-		temp = mks_tempfile(git_path("sharedindex_XXXXXX"));
+		temp = create_tempfile(git_path("sharedindex_XXXXXX"));
 		if (!temp) {
 			oidclr(&si->base_oid);
 			ret = do_write_locked_index(istate, lock, flags);
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 2ac47aa0e4..fa1d3d468b 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -381,6 +381,26 @@ test_expect_success 'check splitIndex.sharedIndexExpire set to "never" and "now"
 	test $(ls .git/sharedindex.* | wc -l) -le 2
 '

+test_expect_success POSIXPERM 'same mode for index & split index' '
+	git init same-mode &&
+	(
+		cd same-mode &&
+		test_commit A &&
+		test_modebits .git/index >index_mode &&
+		test_must_fail git config core.sharedRepository &&
+		git -c core.splitIndex=true status &&
+		shared=$(ls .git/sharedindex.*) &&
+		case "$shared" in
+		*" "*)
+			# we have more than one???
+			false ;;
+		*)
+			test_modebits "$shared" >split_index_mode &&
+			test_cmp index_mode split_index_mode ;;
+		esac
+	)
+'
+
 while read -r mode modebits
 do
 	test_expect_success POSIXPERM "split index respects core.sharedrepository $mode" '
-- 
2.19.1.1182.g4ecb1133ce

^ permalink raw reply related	[relevance 11%]

* Re: [PATCH 9/9] fetch-pack: drop custom loose object cache
  @ 2018-11-12 19:32  6%     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-11-12 19:32 UTC (permalink / raw)
  To: René Scharfe
  Cc: Jeff King, Geert Jansen, Junio C Hamano, git@vger.kernel.org,
	Takuto Ikuta


On Mon, Nov 12 2018, René Scharfe wrote:

> Am 12.11.2018 um 15:55 schrieb Jeff King:
>> Commit 024aa4696c (fetch-pack.c: use oidset to check existence of loose
>> object, 2018-03-14) added a cache to avoid calling stat() for a bunch of
>> loose objects we don't have.
>>
>> Now that OBJECT_INFO_QUICK handles this caching itself, we can drop the
>> custom solution.
>>
>> Note that this might perform slightly differently, as the original code
>> stopped calling readdir() when we saw more loose objects than there were
>> refs. So:
>>
>>   1. The old code might have spent work on readdir() to fill the cache,
>>      but then decided there were too many loose objects, wasting that
>>      effort.
>>
>>   2. The new code might spend a lot of time on readdir() if you have a
>>      lot of loose objects, even though there are very few objects to
>>      ask about.
>
> Plus the old code used an oidset while the new one uses an oid_array.
>
>> In practice it probably won't matter either way; see the previous commit
>> for some discussion of the tradeoff.
>>
>> Signed-off-by: Jeff King <peff@peff.net>
>> ---
>>  fetch-pack.c | 39 ++-------------------------------------
>>  1 file changed, 2 insertions(+), 37 deletions(-)
>>
>> diff --git a/fetch-pack.c b/fetch-pack.c
>> index b3ed7121bc..25a88f4eb2 100644
>> --- a/fetch-pack.c
>> +++ b/fetch-pack.c
>> @@ -636,23 +636,6 @@ struct loose_object_iter {
>>  	struct ref *refs;
>>  };
>>
>> -/*
>> - *  If the number of refs is not larger than the number of loose objects,
>> - *  this function stops inserting.
>> - */
>> -static int add_loose_objects_to_set(const struct object_id *oid,
>> -				    const char *path,
>> -				    void *data)
>> -{
>> -	struct loose_object_iter *iter = data;
>> -	oidset_insert(iter->loose_object_set, oid);
>> -	if (iter->refs == NULL)
>> -		return 1;
>> -
>> -	iter->refs = iter->refs->next;
>> -	return 0;
>> -}
>> -
>>  /*
>>   * Mark recent commits available locally and reachable from a local ref as
>>   * COMPLETE. If args->no_dependents is false, also mark COMPLETE remote refs as
>> @@ -670,30 +653,14 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>>  	struct ref *ref;
>>  	int old_save_commit_buffer = save_commit_buffer;
>>  	timestamp_t cutoff = 0;
>> -	struct oidset loose_oid_set = OIDSET_INIT;
>> -	int use_oidset = 0;
>> -	struct loose_object_iter iter = {&loose_oid_set, *refs};
>> -
>> -	/* Enumerate all loose objects or know refs are not so many. */
>> -	use_oidset = !for_each_loose_object(add_loose_objects_to_set,
>> -					    &iter, 0);
>>
>>  	save_commit_buffer = 0;
>>
>>  	for (ref = *refs; ref; ref = ref->next) {
>>  		struct object *o;
>> -		unsigned int flags = OBJECT_INFO_QUICK;
>>
>> -		if (use_oidset &&
>> -		    !oidset_contains(&loose_oid_set, &ref->old_oid)) {
>> -			/*
>> -			 * I know this does not exist in the loose form,
>> -			 * so check if it exists in a non-loose form.
>> -			 */
>> -			flags |= OBJECT_INFO_IGNORE_LOOSE;
>
> This removes the only user of OBJECT_INFO_IGNORE_LOOSE.  #leftoverbits

With this series applied there's still a use of it left in
oid_object_info_extended()

^ permalink raw reply	[relevance 6%]

* [PATCH v2 2/3] pack-objects tests: don't leave test .git corrupt at end
  @ 2018-10-30 18:43 13% ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-10-30 18:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Geert Jansen, Christian Couder,
	Nicolas Pitre, Linus Torvalds, Petr Baudis,
	Ævar Arnfjörð Bjarmason

Change the pack-objects tests to not leave their .git directory
corrupt and the end.

In 2fca19fbb5 ("fix multiple issues with t5300", 2010-02-03) a comment
was added warning against adding any subsequent tests, but since
4614043c8f ("index-pack: use streaming interface for collision test on
large blobs", 2012-05-24) the comment has drifted away from the code,
mentioning two test, when we actually have three.

Instead of having this warning let's just create a new .git directory
specifically for these tests.

As an aside, it would be interesting to instrument the test suite to
run a "git fsck" at the very end (in "test_done"). That would have
errored before this change, and may find other issues #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5300-pack-object.sh | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index a0309e4bab..410a09b0dd 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -468,29 +468,32 @@ test_expect_success 'pack-objects in too-many-packs mode' '
 	git fsck
 '
 
-#
-# WARNING!
-#
-# The following test is destructive.  Please keep the next
-# two tests at the end of this file.
-#
-
-test_expect_success 'fake a SHA1 hash collision' '
-	long_a=$(git hash-object a | sed -e "s!^..!&/!") &&
-	long_b=$(git hash-object b | sed -e "s!^..!&/!") &&
-	test -f	.git/objects/$long_b &&
-	cp -f	.git/objects/$long_a \
-		.git/objects/$long_b
+test_expect_success 'setup: fake a SHA1 hash collision' '
+	git init corrupt &&
+	(
+		cd corrupt &&
+		long_a=$(git hash-object -w ../a | sed -e "s!^..!&/!") &&
+		long_b=$(git hash-object -w ../b | sed -e "s!^..!&/!") &&
+		test -f	.git/objects/$long_b &&
+		cp -f	.git/objects/$long_a \
+			.git/objects/$long_b
+	)
 '
 
 test_expect_success 'make sure index-pack detects the SHA1 collision' '
-	test_must_fail git index-pack -o bad.idx test-3.pack 2>msg &&
-	test_i18ngrep "SHA1 COLLISION FOUND" msg
+	(
+		cd corrupt &&
+		test_must_fail git index-pack -o ../bad.idx ../test-3.pack 2>msg &&
+		test_i18ngrep "SHA1 COLLISION FOUND" msg
+	)
 '
 
 test_expect_success 'make sure index-pack detects the SHA1 collision (large blobs)' '
-	test_must_fail git -c core.bigfilethreshold=1 index-pack -o bad.idx test-3.pack 2>msg &&
-	test_i18ngrep "SHA1 COLLISION FOUND" msg
+	(
+		cd corrupt &&
+		test_must_fail git -c core.bigfilethreshold=1 index-pack -o ../bad.idx ../test-3.pack 2>msg &&
+		test_i18ngrep "SHA1 COLLISION FOUND" msg
+	)
 '
 
 test_done
-- 
2.19.1.899.g0250525e69


^ permalink raw reply related	[relevance 13%]

* Re: [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:<dst>"
  @ 2018-10-29  8:05 15%     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-10-29  8:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Stefan Beller


On Mon, Oct 29 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> This is the first use of the %N$<fmt> style of printf format in
>> the *.[ch] files in our codebase. It's supported by POSIX[2] and
>> there's existing uses for it in po/*.po files,...
>
> For now, I'll eject this from 'pu', as I had spent way too much time
> trying to make it and other topics work there.

I was compiling with DEVELOPER=1 but as it turns out:

    CFLAGS="-O0" DEVELOPER=1

Wasn't doing what I thought, i.e. we just take 'CFLAGS' from the
command-line and don't add any of the DEVELOPER #leftoverbits to
it. Will fix this and other issues raised.

>     CC remote.o
> remote.c: In function 'show_push_unqualified_ref_name_error':
> remote.c:1035:2: error: $ operand number used after format without operand number [-Werror=format=]
>   error(_("The destination you provided is not a full refname (i.e.,\n"
>   ^~~~~
> cc1: all warnings being treated as errors
> Makefile:2323: recipe for target 'remote.o' failed
> make: *** [remote.o] Error 1

Will fix this and other issues raised. FWIW clang gives a much better
error about the actual issue:

    remote.c:1042:46: error: cannot mix positional and non-positional arguments in format string [-Werror,-Wformat]
                    "- Checking if the <src> being pushed ('%2$s')\n"

I.e. this on top fixes it:

    -               "- Looking for a ref that matches '%s' on the remote side.\n"
    -               "- Checking if the <src> being pushed ('%s')\n"
    +               "- Looking for a ref that matches '%1$s' on the remote side.\n"
    +               "- Checking if the <src> being pushed ('%2$s')\n"

Maybe  this whole thing isn't worth it and I should just do:

    @@ -1042 +1042 @@ static void show_push_unqualified_ref_name_error(const char *dst_value,
    -               "- Checking if the <src> being pushed ('%2$s')\n"
    +               "- Checking if the <src> being pushed ('%s')\n"
    @@ -1047 +1047 @@ static void show_push_unqualified_ref_name_error(const char *dst_value,
    -             dst_value, matched_src_name);
    +             dst_value, matched_src_name, matched_src_name);

But I'm leaning on the side of keeping it for the self-documentation
aspect of "this is a repeated parameter". Your objections to this whole
thing being a stupid idea non-withstanding.

^ permalink raw reply	[relevance 15%]

* [PATCH 2/4] pack-objects tests: don't leave test .git corrupt at end
  @ 2018-10-28 22:50 13% ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-10-28 22:50 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Geert Jansen, Christian Couder,
	Nicolas Pitre, Linus Torvalds, Petr Baudis,
	Ævar Arnfjörð Bjarmason

Change the pack-objects tests to not leave their .git directory
corrupt and the end.

In 2fca19fbb5 ("fix multiple issues with t5300", 2010-02-03) a comment
was added warning against adding any subsequent tests, but since
4614043c8f ("index-pack: use streaming interface for collision test on
large blobs", 2012-05-24) the comment has drifted away from the code,
mentioning two test, when we actually have three.

Instead of having this warning let's just create a new .git directory
specifically for these tests.

As an aside, it would be interesting to instrument the test suite to
run a "git fsck" at the very end (in "test_done"). That would have
errored before this change, and may find other issues #leftoverbits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t5300-pack-object.sh | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index a0309e4bab..410a09b0dd 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -468,29 +468,32 @@ test_expect_success 'pack-objects in too-many-packs mode' '
 	git fsck
 '
 
-#
-# WARNING!
-#
-# The following test is destructive.  Please keep the next
-# two tests at the end of this file.
-#
-
-test_expect_success 'fake a SHA1 hash collision' '
-	long_a=$(git hash-object a | sed -e "s!^..!&/!") &&
-	long_b=$(git hash-object b | sed -e "s!^..!&/!") &&
-	test -f	.git/objects/$long_b &&
-	cp -f	.git/objects/$long_a \
-		.git/objects/$long_b
+test_expect_success 'setup: fake a SHA1 hash collision' '
+	git init corrupt &&
+	(
+		cd corrupt &&
+		long_a=$(git hash-object -w ../a | sed -e "s!^..!&/!") &&
+		long_b=$(git hash-object -w ../b | sed -e "s!^..!&/!") &&
+		test -f	.git/objects/$long_b &&
+		cp -f	.git/objects/$long_a \
+			.git/objects/$long_b
+	)
 '
 
 test_expect_success 'make sure index-pack detects the SHA1 collision' '
-	test_must_fail git index-pack -o bad.idx test-3.pack 2>msg &&
-	test_i18ngrep "SHA1 COLLISION FOUND" msg
+	(
+		cd corrupt &&
+		test_must_fail git index-pack -o ../bad.idx ../test-3.pack 2>msg &&
+		test_i18ngrep "SHA1 COLLISION FOUND" msg
+	)
 '
 
 test_expect_success 'make sure index-pack detects the SHA1 collision (large blobs)' '
-	test_must_fail git -c core.bigfilethreshold=1 index-pack -o bad.idx test-3.pack 2>msg &&
-	test_i18ngrep "SHA1 COLLISION FOUND" msg
+	(
+		cd corrupt &&
+		test_must_fail git -c core.bigfilethreshold=1 index-pack -o ../bad.idx ../test-3.pack 2>msg &&
+		test_i18ngrep "SHA1 COLLISION FOUND" msg
+	)
 '
 
 test_done
-- 
2.19.1.759.g500967bb5e


^ permalink raw reply related	[relevance 13%]

* Re: We should add a "git gc --auto" after "git clone" due to commit graph
  @ 2018-10-03 14:01  6%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-10-03 14:01 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Derrick Stolee, Git List, Nguyễn Thái Ngọc Duy


On Wed, Oct 03 2018, SZEDER Gábor wrote:

> On Wed, Oct 03, 2018 at 03:23:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Don't have time to patch this now, but thought I'd send a note / RFC
>> about this.
>>
>> Now that we have the commit graph it's nice to be able to set
>> e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
>> /etc/gitconfig to apply them to all repos.
>>
>> But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
>> until whenever my first "gc" kicks in, which may be quite some time if
>> I'm just using it passively.
>>
>> So we should make "git gc --auto" be run on clone,
>
> There is no garbage after 'git clone'...

"git gc" is really "git gc-or-create-indexes" these days.

>> and change the
>> need_to_gc() / cmd_gc() behavior so that we detect that the
>> gc.writeCommitGraph=true setting is on, but we have no commit graph, and
>> then just generate that without doing a full repack.
>
> Or just teach 'git clone' to run 'git commit-graph write ...'

Then when adding something like the commit graph we'd need to patch both
git-clone and git-gc, it's much more straightforward to make
need_to_gc() more granular.

>> As an aside such more granular "gc" would be nice for e.g. pack-refs
>> too. It's possible for us to just have one pack, but to have 100k loose
>> refs.
>>
>> It might also be good to have some gc.autoDetachOnClone option and have
>> it false by default, so we don't have a race condition where "clone
>> linux && git -C linux tag --contains" is slow because the graph hasn't
>> been generated yet, and generating the graph initially doesn't take that
>> long compared to the time to clone a large repo (and on a small one it
>> won't matter either way).
>>
>> I was going to say "also for midx", but of course after clone we have
>> just one pack, so I can't imagine us needing this. But I can see us
>> having other such optional side-indexes in the future generated by gc,
>> and they'd also benefit from this.
>>
>> #leftoverbits

^ permalink raw reply	[relevance 6%]

* We should add a "git gc --auto" after "git clone" due to commit graph
@ 2018-10-03 13:23 14% Ævar Arnfjörð Bjarmason
    0 siblings, 1 reply; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-10-03 13:23 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Git List, Nguyễn Thái Ngọc Duy

Don't have time to patch this now, but thought I'd send a note / RFC
about this.

Now that we have the commit graph it's nice to be able to set
e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or
/etc/gitconfig to apply them to all repos.

But when I clone e.g. linux.git stuff like 'tag --contains' will be slow
until whenever my first "gc" kicks in, which may be quite some time if
I'm just using it passively.

So we should make "git gc --auto" be run on clone, and change the
need_to_gc() / cmd_gc() behavior so that we detect that the
gc.writeCommitGraph=true setting is on, but we have no commit graph, and
then just generate that without doing a full repack.

As an aside such more granular "gc" would be nice for e.g. pack-refs
too. It's possible for us to just have one pack, but to have 100k loose
refs.

It might also be good to have some gc.autoDetachOnClone option and have
it false by default, so we don't have a race condition where "clone
linux && git -C linux tag --contains" is slow because the graph hasn't
been generated yet, and generating the graph initially doesn't take that
long compared to the time to clone a large repo (and on a small one it
won't matter either way).

I was going to say "also for midx", but of course after clone we have
just one pack, so I can't imagine us needing this. But I can see us
having other such optional side-indexes in the future generated by gc,
and they'd also benefit from this.

#leftoverbits

^ permalink raw reply	[relevance 14%]

* Re: Trivial enhancement: All commands which require an author should accept --author
  @ 2018-09-03 13:18 18%           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-09-03 13:18 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Ulrich Gemkow, git


On Thu, Aug 30 2018, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Thu, 30 Aug 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> On Thu, Aug 30 2018, Johannes Schindelin wrote:
>>
>> > On Wed, 29 Aug 2018, Junio C Hamano wrote:
>> >
>> >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> >>
>> >> > The `stash` command only incidentally requires that the author is set, as
>> >> > it calls `git commit` internally (which records the author). As stashes
>> >> > are intended to be local only, that author information was never meant to
>> >> > be a vital part of the `stash`.
>> >> >
>> >> > I could imagine that an even better enhancement request would ask for `git
>> >> > stash` to work even if `user.name` is not configured.
>> >>
>> >> This would make a good bite-sized microproject, worth marking it as
>> >> #leftoverbits unless somebody is already working on it ;-)
>> >
>> > Right.
>> >
>> > What is our currently-favored approach to this, again? Do we have a
>> > favorite wiki page to list those, or do we have a bug tracker for such
>> > mini-projects?
>> >
>> > Once I know, I will add this, with enough information to get anybody
>> > interested started.
>>
>> I believe the "official" way, such as it is, is you just put
>> #leftoverbits in your E-Mail, then search the list archives,
>> e.g. https://public-inbox.org/git/?q=%23leftoverbits
>>
>> So e.g. I've taken to putting this in my own E-Mails where I spot
>> something I'd like to note as a TODO that I (or someone else) could work
>> on later:
>> https://public-inbox.org/git/?q=%23leftoverbits+f%3Aavarab%40gmail.com
>
> That is a poor way to list the current micro-projects, as it is totally
> non-obvious to the casual interested person which projects are still
> relevant, and which ones have been addressed already.

I don't think this is ideal. To be clear and in reply to both yours and
Junio's E-Mail. I meant "official" in scare quotes in the least official
way possible.

I.e. that you need to search the mailing list archive if you want to see
what these #leftoverbits are, because the full set is stored nowhere
else.

> In a bug tracker, you can at least add a comment stating that something
> has been addressed, or made a lot easier by another topic.

Yeah, a bunch of things suck about it, although I will say at least for
notes I'm leaving for myself I'm using it in a way that I wouldn't
bother to use a bugtracker, so in many cases it's the difference between
offhandendly saying "oh b.t.w. we should fix xyz in way abc
#leftoverbits" and not having a bug at all, because filing a bug /
curating a tracker etc. is a lot more work.

> In a mailing list archive, those mails are immutable, and you cannot
> update squat.

In a lot of bugtrackers you can't update existing comments either, you
make a new one noting some new status. Similarly you can send a new mail
with the correct In-Reply-To.

That doesn't solve all the issues, but helps in many cases.

^ permalink raw reply	[relevance 18%]

* Re: Trivial enhancement: All commands which require an author should accept --author
  @ 2018-08-30 12:29 17%       ` Ævar Arnfjörð Bjarmason
    0 siblings, 1 reply; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-08-30 12:29 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Ulrich Gemkow, git


On Thu, Aug 30 2018, Johannes Schindelin wrote:

> Hi Junio,
>
> On Wed, 29 Aug 2018, Junio C Hamano wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>> > The `stash` command only incidentally requires that the author is set, as
>> > it calls `git commit` internally (which records the author). As stashes
>> > are intended to be local only, that author information was never meant to
>> > be a vital part of the `stash`.
>> >
>> > I could imagine that an even better enhancement request would ask for `git
>> > stash` to work even if `user.name` is not configured.
>>
>> This would make a good bite-sized microproject, worth marking it as
>> #leftoverbits unless somebody is already working on it ;-)
>
> Right.
>
> What is our currently-favored approach to this, again? Do we have a
> favorite wiki page to list those, or do we have a bug tracker for such
> mini-projects?
>
> Once I know, I will add this, with enough information to get anybody
> interested started.

I believe the "official" way, such as it is, is you just put
#leftoverbits in your E-Mail, then search the list archives,
e.g. https://public-inbox.org/git/?q=%23leftoverbits

So e.g. I've taken to putting this in my own E-Mails where I spot
something I'd like to note as a TODO that I (or someone else) could work
on later:
https://public-inbox.org/git/?q=%23leftoverbits+f%3Aavarab%40gmail.com

^ permalink raw reply	[relevance 17%]

* Re: Is origin/HEAD only being created on clone a bug? #leftoverbits
  @ 2018-05-31  7:42 20%   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-05-31  7:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Johannes Schindelin


On Wed, May 30 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> If you make an initial commit and push to a remote repo "origin", you
>> don't get a remote origin/HEAD reference, and a "fetch" won't create it
>> either.
>> ...
>> Some code spelunking reveals remote_head_points_at, guess_remote_head()
>> etc. in builtin/clone.c. I.e. this is special-cased as part of the
>> "clone".
>
> Correct.  Originally, there was *no* way in the protocol to carry
> the information, so the code always had to guess.  The point of
> setting origin/HEAD was mostly so that you can say "log origin.."
> and rely on it getting dwimmed down to "refs/remotes/%s/HEAD..",
> and it wasn't a common practice to interact with multiple remotes
> with remote tracking branches (integrator interacting with dozens
> of remotes, responding to pull requests using explicit URL but
> without configured remotes was not uncommon), so it was sufficient
> for "git clone" to create it, and "git remote add" did not exist
> back then anyway.
>
> There are two aspects in my answer to your question.
>
>  - If we create additional remote (that is, other than the one we
>    get when we create a repository via "clone", so if your "origin"
>    is from "git init there && cd there && git remote add origin", it
>    does count in this category), should we get a remote-tracking
>    symref $name/HEAD so that we can say "log $name.."?
>
>    We absolutely should.  We (eh, rather, those who added "remote
>    add"; this was not my itch and I am using "royal we" in this
>    sentence) just did not bother to and I think it is a bug that you
>    cannot say "log $name.."  Of course, it is just a "git symbolic-ref"
>    away to make it possible locally, so it is understandable if
>    "remote add" did not bother to.
>
>  - When we fetch from a remote that has refs/remotes/$name/HEAD, and
>    if the protocol notices that their HEAD today is pointing to a
>    branch different from what our side has, should we repoint ours
>    to match?
>
>    I am leaning against doing this, but mostly out of superstition.
>    Namely, I feel uneasy about the fact that the meaning of "log
>    ..origin" changes across a fetch in this sequence:
>
>      log ..origin && fetch origin && log ..origin
>
>    Without repointing origin/HEAD, two occurrences of "log ..origin"
>    both means "how much ahead the primary branch we have been
>    interested in from this remote is, relative to our effort?".
>    Even though we fully expect that two "log ..origin" would report
>    different results (after all, that is the whole point of doing
>    another one after "fetch" in such a sequence like this example),
>    our question is about the same "primary branch we have been
>    interested in".  But once fetch starts messing with where
>    origin/HEAD points at, that would no longer be the case, which is
>    why I am against doing something magical like that.

We already have to deal with this special case of origin/HEAD being
re-pointed in a repository that we "clone", so we would just do whatever
happens to a repository that's cloned.

I.e. the "clone" sets the origin/HEAD up as a one-off, and then keeps
updating it on the basis of updating existing refs. We'd similarly set
it up as a one-off if we ever "fetch" and notice that the ref doesn't
exist yet, and then we'd update it in the same way we update it now.

So this seems like a non-issue to me as far as me coming up with some
patch to one-off write the origin/HEAD on the first "fetch", or am I
missing something?

^ permalink raw reply	[relevance 20%]

* Is origin/HEAD only being created on clone a bug? #leftoverbits
@ 2018-05-29 18:30 29% Ævar Arnfjörð Bjarmason
    0 siblings, 1 reply; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-05-29 18:30 UTC (permalink / raw)
  To: Git List; +Cc: Johannes Schindelin

Here's some more #leftoverbits where we have a clone/fetch feature
discrepancy and where clone is magical in ways that "fetch" isn't.

If you make an initial commit and push to a remote repo "origin", you
don't get a remote origin/HEAD reference, and a "fetch" won't create it
either.

You will get it if you subseuqently "clone" the repo, but not if you use
"git init / remote add / fetch / git checkout -t" which should otherwise
be equivalent.

If you push to "master" (or whatever HEAD is) from the clone the
origin/HEAD will be updated accordingly, but from the repo you pushed
from & the one you did init+fetch instead of clone you'll never see it.

Some code spelunking reveals remote_head_points_at, guess_remote_head()
etc. in builtin/clone.c. I.e. this is special-cased as part of the
"clone".

Can anyone thing of a reason for why this shouldn't be fixed as a bug?
I've tried searching the archives but "origin/HEAD" comes up with too
many
results. https://public-inbox.org/git/alpine.LSU.1.00.0803020556380.22527@racer.site/#t
seems to be the patch that initially added it, but it is not discussed
why this should be a clone-only special case that doesn't apply to
"fetch".

^ permalink raw reply	[relevance 29%]

* [GSoC] Some #leftoverbits for anyone looking for little projects
@ 2018-03-17 21:20 15% Ævar Arnfjörð Bjarmason
    0 siblings, 1 reply; 55+ results
From: Ævar Arnfjörð Bjarmason @ 2018-03-17 21:20 UTC (permalink / raw)
  To: Git Mailing List

In lieu of sending a PR to https://git.github.io/SoC-2018-Microprojects/
I thought I'd list a few more suggestions, and hopefully others will
chime in.

This is all TODO stuff I've been meaning to do myself, but wouldn't mind
at all if someone else tackled.

I'm not interested in mentoring GSoC, but these are all small enough to
need to special help from me (or anyone in particular), and if nobody
picks them up I can refer back to this mail for my own use.

 * Having grep support the -o option like GNU grep et al.

   We have most of the code for this already in the form of our color
   hi-lighting, it would mostly just be a matter of "just print out the
   stuff you'd have colored", with the small exception that if you have
   more than one match on a line they should be printed out on their own
   lines.

 * Give "rebase -i" some option so when you "reword" the patch is
   included in the message.

   I keep going to the shell because I have no idea what change I'm
   describing.

 * Add more config IncludeIf conditions.

   Recently there was a mention on git-users to excend the includeIf
   statement to read config:
   https://groups.google.com/forum/?fromgroups#!searchin/git-users/includeif%7Csort:date/git-users/SHd506snwSk/UdVCsCILBwAJ

   Now that seems like a nasty circular dependency but there's other
   low-hanging fruit there, like make it match a given env name to a
   value (or glob?).

 * Add another set of GIT_{AUTHOR,COMMITTER}_{NAME,EMAIL} with lower
   priorities.

   There is a script at work which I have to manually blacklist which
   sets git author names & e-mails via LDAP for all logged in users via
   /etc/profile (and gets my name wrong)[1].

   It would be nice if git supported a way to do this that didn't either
   involve overriding everything (as the current env vars do) or munging
   the user's ~ config (ew!). I.e. the priority of these new env vars
   would come after reading from the config, not overriding the config
   as the current ones do. So it could be used to make a suggestion if
   no other value was found.

 * Write git-unpack-{refs,objects}

   I don't know if this is small enough (maybe the refs part?). This
   would give you purely loose objects & refs. This is a terrible idea
   for any "real" use, but very useful for testing.

   Now when I'm testing repack I need to keep an old copy of the repo
   around, because there's no easy way (that I know of) to pack things
   and then get back to loose object state. Ditto for packing refs.

 * I had a previous TODO list of "small" things at
   https://public-inbox.org/git/CACBZZX5wdnA-96e11edE7xRnAHo19RFTrZmqFQj-0ogLOJTncQ@mail.gmail.com/

1. At work like in so many companies LDAP is synced everywhere, but of
   course that means catering to the lowest common denominator. Last I
   heard attempts to give me a non-ASCII name (in the GEOS field) had
   failed because some phone or printer somewhere refused to accept it.

^ permalink raw reply	[relevance 15%]

Results 1-55 of 55 | reverse | options above

-- pct% links below jump to the message on this page, permalinks otherwise --
2018-03-17 21:20 15% [GSoC] Some #leftoverbits for anyone looking for little projects Ævar Arnfjörð Bjarmason
2019-05-20 18:23     ` Matheus Tavares
2019-05-20 23:49 19%   ` Ævar Arnfjörð Bjarmason
2018-05-29 18:30 29% Is origin/HEAD only being created on clone a bug? #leftoverbits Ævar Arnfjörð Bjarmason
2018-05-30  1:24     ` Junio C Hamano
2018-05-31  7:42 20%   ` Ævar Arnfjörð Bjarmason
2018-08-28 21:05     Trivial enhancement: All commands which require an author should accept --author Ulrich Gemkow
2018-08-29 16:14     ` Johannes Schindelin
2018-08-29 19:09       ` Junio C Hamano
2018-08-30 11:51         ` Johannes Schindelin
2018-08-30 12:29 17%       ` Ævar Arnfjörð Bjarmason
2018-08-30 14:08             ` Johannes Schindelin
2018-09-03 13:18 18%           ` Ævar Arnfjörð Bjarmason
2018-10-03 13:23 14% We should add a "git gc --auto" after "git clone" due to commit graph Ævar Arnfjörð Bjarmason
2018-10-03 13:36     ` SZEDER Gábor
2018-10-03 14:01  6%   ` Ævar Arnfjörð Bjarmason
2018-10-26 19:27     [PATCH v2 0/7] fixes for unqualified <dst> push Ævar Arnfjörð Bjarmason
2018-10-26 23:07     ` [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:<dst>" Ævar Arnfjörð Bjarmason
2018-10-29  7:06       ` Junio C Hamano
2018-10-29  8:05 15%     ` Ævar Arnfjörð Bjarmason
2018-10-27 11:22     [RFC PATCH] index-pack: improve performance on NFS Ævar Arnfjörð Bjarmason
2018-10-28 22:50 13% ` [PATCH 2/4] pack-objects tests: don't leave test .git corrupt at end Ævar Arnfjörð Bjarmason
2018-10-28 22:50     [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking Ævar Arnfjörð Bjarmason
2018-10-30 18:43 13% ` [PATCH v2 2/3] pack-objects tests: don't leave test .git corrupt at end Ævar Arnfjörð Bjarmason
2018-11-12 14:46     [PATCH 0/9] caching loose objects Jeff King
2018-11-12 14:55     ` [PATCH 9/9] fetch-pack: drop custom loose object cache Jeff King
2018-11-12 19:25       ` René Scharfe
2018-11-12 19:32  6%     ` Ævar Arnfjörð Bjarmason
2018-11-13 15:22     [PATCH 1/3] read-cache: use shared perms when writing shared index Ævar Arnfjörð Bjarmason
2018-11-13 15:32 11% ` [RFC/PATCH] read-cache: write all indexes with the same permissions Ævar Arnfjörð Bjarmason
2018-11-16 17:31     [PATCH v2] " Christian Couder
2018-11-17  9:29     ` Junio C Hamano
2018-11-17 11:19       ` Christian Couder
2018-11-17 13:05         ` Junio C Hamano
2018-11-17 21:14  5%       ` Ævar Arnfjörð Bjarmason
2018-11-29 14:59 11% How de-duplicate similar repositories with alternates Ævar Arnfjörð Bjarmason
2018-11-29 16:09  6% ` Ævar Arnfjörð Bjarmason
2018-12-04  6:59     ` Jeff King
2018-12-04 10:43  6%   ` Ævar Arnfjörð Bjarmason
2018-12-04 13:35  4% ` Ævar Arnfjörð Bjarmason
2019-01-07 21:30     [PATCH] blame: add the ability to ignore commits Barret Rhoden
2019-01-17 20:29     ` [PATCH v2 0/3] " Barret Rhoden
2019-01-17 20:29       ` [PATCH v2 1/3] Move init_skiplist() outside of fsck Barret Rhoden
2019-01-18  9:45         ` Ævar Arnfjörð Bjarmason
2019-01-18 17:36           ` Junio C Hamano
2019-01-18 20:59             ` Johannes Schindelin
2019-01-18 21:30               ` Jeff King
2019-01-18 22:26                 ` Ævar Arnfjörð Bjarmason
2019-01-22  7:12                   ` Jeff King
2019-01-22  9:46 15%                 ` Ævar Arnfjörð Bjarmason
2019-01-14 17:53     Students projects: looking for small and medium project ideas Matthieu Moy
2019-01-14 23:04 17% ` Ævar Arnfjörð Bjarmason
2019-04-06 13:27     [PATCH 0/2] Minor document fixes Philip Oakley
2019-04-06 13:27     ` [PATCH 2/2] describe doc: remove '7-char' abbreviation reference Philip Oakley
2019-04-07 20:05 13%   ` Ævar Arnfjörð Bjarmason
2019-04-19 21:47     Resolving deltas dominates clone time Martin Fick
2019-04-22 20:21     ` Martin Fick
2019-04-22 20:56       ` Jeff King
2019-04-22 22:32         ` Martin Fick
2019-04-23  1:55           ` Jeff King
2019-04-23  4:21             ` Jeff King
2019-04-23 10:08               ` Duy Nguyen
2019-04-30 17:50                 ` Jeff King
2019-04-30 18:48 17%               ` Ævar Arnfjörð Bjarmason
2019-05-20 21:53 16% [PATCH 0/3] hash-object doc: small fixes Ævar Arnfjörð Bjarmason
2019-05-26 20:49     [GIT PULL] KVM changes for Linux 5.2-rc2 Linus Torvalds
2019-05-26 22:54     ` [RFC/PATCH] refs: tone down the dwimmery in refname_match() for {heads,tags,remotes}/* Ævar Arnfjörð Bjarmason
2019-05-27 12:33       ` Paolo Bonzini
2019-05-27 14:29 13%     ` Ævar Arnfjörð Bjarmason
2019-06-27 23:39     [PATCH v2 0/9] grep: move from kwset to optional PCRE v2 Ævar Arnfjörð Bjarmason
2019-07-01 21:20     ` [PATCH v3 00/10] " Ævar Arnfjörð Bjarmason
2019-07-01 21:31       ` Junio C Hamano
2019-07-02 11:10 14%     ` Ævar Arnfjörð Bjarmason
2020-11-16 12:22     git-log: documenting pathspec usage Adam Spiers
2020-11-16 12:37 16% ` Ævar Arnfjörð Bjarmason
2020-11-26 22:22     [PATCH v2 00/10] make "mktag" use fsck_tag() Ævar Arnfjörð Bjarmason
2020-12-09 20:01  6% ` [PATCH v3 " Ævar Arnfjörð Bjarmason
2020-12-23  1:35  7%   ` [PATCH v4 00/20] make "mktag" use fsck_tag() & more Ævar Arnfjörð Bjarmason
2021-02-25  1:21 13% [PATCH 1/2] remote: add camel-cased *.tagOpt key, like clone Ævar Arnfjörð Bjarmason
2021-03-05  0:55     [PATCH 00/11] Complete merge-ort implementation...almost Elijah Newren via GitGitGadget
2021-03-05  0:55     ` [PATCH 05/11] merge-ort: let renormalization change modify/delete into clean delete Elijah Newren via GitGitGadget
2021-03-08 12:55 16%   ` Ævar Arnfjörð Bjarmason
2021-03-15  9:08     [PATCH v7] [GSOC] commit: add --trailer option ZheNing Hu via GitGitGadget
2021-03-15 13:07     ` [PATCH v8 0/2] " ZheNing Hu via GitGitGadget
2021-03-15 13:07       ` [PATCH v8 1/2] " ZheNing Hu via GitGitGadget
2021-03-16 12:52         ` Ævar Arnfjörð Bjarmason
2021-03-17  2:01           ` ZheNing Hu
2021-03-17  8:08  6%         ` Ævar Arnfjörð Bjarmason
2021-07-10 13:37     [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
2021-09-07 10:57     ` [PATCH v6 00/22] " Ævar Arnfjörð Bjarmason
2021-09-07 10:57       ` [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
2021-09-16 19:57         ` Taylor Blau
2021-09-16 22:52 15%       ` Ævar Arnfjörð Bjarmason
2021-07-14 11:17     [PATCH] refs file backend: remove dead "errno == EISDIR" code Ævar Arnfjörð Bjarmason
2021-07-14 16:21     ` Jeff King
2021-07-14 19:07 12%   ` Ævar Arnfjörð Bjarmason
2021-09-01  2:05     [PATCH 0/2] pack-write,repack: prevent opening packs too early Taylor Blau
2021-09-08  0:38     ` [PATCH v2 0/4] rename *.idx file into place last (also after *.bitmap) Ævar Arnfjörð Bjarmason
2021-09-08  0:38       ` [PATCH v2 4/4] pack-write: rename *.idx file into place last (really!) Ævar Arnfjörð Bjarmason
2021-09-08  1:14 15%     ` Ævar Arnfjörð Bjarmason
2021-09-10 13:02     [PATCH] .mailmap: Update mailmap Fangyi Zhou
2021-09-10 15:22     ` Gwyneth Morgan
2021-09-10 16:48 11%   ` Oddidies in the .mailmap parser & future syntax extensions Ævar Arnfjörð Bjarmason
2021-09-24 10:08 12% [PATCH] http: check CURLE_SSL_PINNEDPUBKEYNOTMATCH when emitting errors Ævar Arnfjörð Bjarmason
2021-10-04  1:42     [PATCH 0/2] i18n: improve translatability of ambiguous object output Ævar Arnfjörð Bjarmason
2021-10-04 14:27  7% ` [PATCH v2 " Ævar Arnfjörð Bjarmason
2021-10-04 14:27 13%   ` [PATCH v2 1/2] object.[ch]: mark object type names for translation Ævar Arnfjörð Bjarmason
2021-10-14  0:06     [PATCH 00/20] refs: stop having the API set "errno" Ævar Arnfjörð Bjarmason
2021-10-14  0:06 10% ` [PATCH 06/20] refs/files: remove "name exist?" check in lock_ref_oid_basic() Ævar Arnfjörð Bjarmason
2021-10-16  9:39     ` [PATCH v2 00/21] refs: stop having the API set "errno" Ævar Arnfjörð Bjarmason
2021-10-16  9:39  8%   ` [PATCH v2 07/21] refs/files: remove "name exist?" check in lock_ref_oid_basic() Ævar Arnfjörð Bjarmason
2021-10-14  0:47 16% [PATCH 0/2] test-lib.sh: add BAIL_OUT function, use it for SANITIZE=leak Ævar Arnfjörð Bjarmason
2021-11-16 19:31     [PATCH] t0006: date_mode can leak .strftime_fmt member Jeff King
2022-02-02 21:03     ` [PATCH 0/5] date.[ch] API: split from cache.h, add API docs, stop leaking memory Ævar Arnfjörð Bjarmason
2022-02-02 21:03 14%   ` [PATCH 4/5] date API: add basic API docs Ævar Arnfjörð Bjarmason
2022-02-04 23:53       ` [PATCH v2 0/5] date.[ch] API: split from cache.h, add API docs, stop leaking memory Ævar Arnfjörð Bjarmason
2022-02-04 23:53 14%     ` [PATCH v2 4/5] date API: add basic API docs Ævar Arnfjörð Bjarmason
2022-02-16  8:14         ` [PATCH v3 0/5] date.[ch] API: split from cache.h, add API docs, stop leaking memory Ævar Arnfjörð Bjarmason
2022-02-16  8:14 14%       ` [PATCH v3 4/5] date API: add basic API docs Ævar Arnfjörð Bjarmason
2022-07-14 17:44     [PATCH 0/3] doc: unify config info on some cmds Matheus Tavares
2022-07-14 21:17  2% ` Ævar Arnfjörð Bjarmason
2022-07-22 19:42     [PATCH 0/2] t0021: convert perl script to C test-tool helper Matheus Tavares
2022-07-22 19:42     ` [PATCH 1/2] t/t0021: convert the rot13-filter.pl script to C Matheus Tavares
2022-07-23  4:59 14%   ` Ævar Arnfjörð Bjarmason
2022-11-04  1:02     [PATCH 0/4] worktree: Support `--orphan` when creating new worktrees Jacob Abel
2022-11-04 21:34     ` [PATCH v2 0/2] " Jacob Abel
2022-11-10 23:32       ` [PATCH v3 " Jacob Abel
2022-11-10 23:32         ` [PATCH v3 2/2] worktree add: add --orphan flag Jacob Abel
2022-11-15 21:08           ` Ævar Arnfjörð Bjarmason
2022-11-15 21:29             ` Eric Sunshine
2022-11-15 22:35 10%           ` Ævar Arnfjörð Bjarmason
2022-11-19  3:09                 ` Jacob Abel
2022-11-19 11:50  6%               ` Ævar Arnfjörð Bjarmason
2022-11-15 18:53     [PATCH] builtin/gc.c: fix use-after-free in maintenance_unregister() Taylor Blau
2022-11-15 19:00     ` Derrick Stolee
2022-11-15 19:51 15%   ` Ævar Arnfjörð Bjarmason
2022-11-15 19:41     ` Ævar Arnfjörð Bjarmason
2022-11-15 19:54       ` Taylor Blau
2022-11-16 13:44         ` Derrick Stolee
2022-11-16 15:14 17%       ` Ævar Arnfjörð Bjarmason
2022-11-21  3:00     [PATCH 0/3] fix t1509-root-work-tree failure Eric Sunshine via GitGitGadget
2022-11-21  3:00     ` [PATCH 3/3] t1509: facilitate repeated script invocations Eric Sunshine via GitGitGadget
2022-12-06  2:42       ` Ævar Arnfjörð Bjarmason
2022-12-06  3:23         ` Eric Sunshine
2022-12-08 12:04           ` Johannes Schindelin
2022-12-08 13:14 14%         ` "test_atexit" v.s. "test_when_finished" (was: [PATCH 3/3] t1509: facilitate repeated script invocations) Ævar Arnfjörð Bjarmason
2022-11-26 20:21     [PATCH v2] send-email: relay '-v N' to format-patch Kyle Meyer
2022-11-27  1:25     ` Junio C Hamano
2022-11-28 12:34 17%   ` Ævar Arnfjörð Bjarmason
2022-12-02 17:02     [PATCH] maintenance: compare output of pthread functions for inequality with 0 Rose via GitGitGadget
2022-12-02 18:10 16% ` Ævar Arnfjörð Bjarmason
2023-02-04 19:10     [PATCH] cache-tree: fix strbuf growth in prime_cache_tree_rec() René Scharfe
2023-02-05 21:12     ` Ævar Arnfjörð Bjarmason
2023-02-06 15:27       ` Derrick Stolee
2023-02-06 16:18 13%     ` Ævar Arnfjörð Bjarmason
2023-02-09  0:02     [PATCH v8 0/6] submodule: parallelize diff Calvin Wan
2023-03-02 21:52     ` [PATCH v9 " Calvin Wan
2023-03-02 22:02       ` [PATCH v9 6/6] diff-lib: parallelize run_diff_files for submodules Calvin Wan
2023-03-07  8:41  7%     ` Ævar Arnfjörð Bjarmason
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).