git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] multi-pack-index: use real paths for --object-dir
@ 2022-04-21 14:57 Derrick Stolee via GitGitGadget
  2022-04-21 14:57 ` [PATCH 1/2] midx: use real paths in lookup_multi_pack_index() Derrick Stolee via GitGitGadget
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-21 14:57 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, me, Derrick Stolee

A VFS for Git user reported [1] that the background maintenance did not seem
to be doing anything. At least, the 'git multi-pack-index expire' and 'git
multi-pack-index repack' steps did nothing. The 'write' subcommand worked to
collect all pack-files into a single multi-pack-index file, but the packs
were only growing in number instead of being maintained carefully.

[1] https://github.com/microsoft/git/issues/497

The root issue is about path comparisons between Windows and Unix path
styles.

The fix is two-fold:

 * Patch 1 performs real-path normalization at a low-level where it is
   required.
 * Patch 2 performs real-path normalization at a high-level to be extra
   safe.

I'm not exactly sure how to test this properly in the Git codebase, but I'm
looking to add some tests into VFS for Git to ensure this doesn't regress in
the future.

Thanks, -Stolee

Derrick Stolee (2):
  midx: use real paths in lookup_multi_pack_index()
  multi-pack-index: use --object-dir real path

 builtin/multi-pack-index.c | 19 ++++++++++++++++---
 midx.c                     | 17 +++++++++++++----
 2 files changed, 29 insertions(+), 7 deletions(-)


base-commit: 6cd33dceed60949e2dbc32e3f0f5e67c4c882e1e
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1221%2Fderrickstolee%2Fmidx-normalize-object-dir-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1221/derrickstolee/midx-normalize-object-dir-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1221
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] midx: use real paths in lookup_multi_pack_index()
  2022-04-21 14:57 [PATCH 0/2] multi-pack-index: use real paths for --object-dir Derrick Stolee via GitGitGadget
@ 2022-04-21 14:57 ` Derrick Stolee via GitGitGadget
  2022-04-21 14:57 ` [PATCH 2/2] multi-pack-index: use --object-dir real path Derrick Stolee via GitGitGadget
  2022-04-25 18:27 ` [PATCH v2 0/3] multi-pack-index: use real paths for --object-dir Derrick Stolee via GitGitGadget
  2 siblings, 0 replies; 11+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-21 14:57 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, me, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

This helper looks for a parsed multi-pack-index whose object directory
matches the given object_dir. Before going into the loop over the parsed
multi-pack-indexes, it calls find_odb() to ensure that the given
object_dir is actually a known object directory.

However, find_odb() uses real-path manipulations to compare the input to
the alternate directories. This same real-path comparison is not used in
the loop, leading to potential issues with the strcmp().

Update the method to use the real-path values instead.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 midx.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/midx.c b/midx.c
index 107365d2114..3db0e47735f 100644
--- a/midx.c
+++ b/midx.c
@@ -1132,17 +1132,26 @@ cleanup:
 static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 							const char *object_dir)
 {
+	struct multi_pack_index *result = NULL;
 	struct multi_pack_index *cur;
+	char *obj_dir_real = real_pathdup(object_dir, 1);
+	struct strbuf cur_path_real = STRBUF_INIT;
 
 	/* Ensure the given object_dir is local, or a known alternate. */
-	find_odb(r, object_dir);
+	find_odb(r, obj_dir_real);
 
 	for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
-		if (!strcmp(object_dir, cur->object_dir))
-			return cur;
+		strbuf_realpath(&cur_path_real, cur->object_dir, 1);
+		if (!strcmp(obj_dir_real, cur_path_real.buf)) {
+			result = cur;
+			goto cleanup;
+		}
 	}
 
-	return NULL;
+cleanup:
+	free(obj_dir_real);
+	strbuf_release(&cur_path_real);
+	return result;
 }
 
 static int write_midx_internal(const char *object_dir,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] multi-pack-index: use --object-dir real path
  2022-04-21 14:57 [PATCH 0/2] multi-pack-index: use real paths for --object-dir Derrick Stolee via GitGitGadget
  2022-04-21 14:57 ` [PATCH 1/2] midx: use real paths in lookup_multi_pack_index() Derrick Stolee via GitGitGadget
@ 2022-04-21 14:57 ` Derrick Stolee via GitGitGadget
  2022-04-21 19:50   ` Victoria Dye
  2022-04-25 18:27 ` [PATCH v2 0/3] multi-pack-index: use real paths for --object-dir Derrick Stolee via GitGitGadget
  2 siblings, 1 reply; 11+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-21 14:57 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, me, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The --object-dir argument to 'git multi-pack-index' allows a user to
specify an alternate to use instead of the local $GITDIR. This is used
by third-party tools like VFS for Git to maintain the pack-files in a
"shared object cache" used by multiple clones.

On Windows, the user can specify a path using a Windows-style file path
with backslashes such as "C:\Path\To\ObjectDir". This same path style is
used in the .git/objects/info/alternates file, so it already matches the
path of that alternate. However, find_odb() converts these paths to
real-paths for the comparison, which use forward slashes. As of the
previous change, lookup_multi_pack_index() uses real-paths, so it
correctly finds the target multi-pack-index when given these paths.

Some commands such as 'git multi-pack-index repack' call child processes
using the object_dir value, so it can be helpful to convert the path to
the real-path before sending it to those locations.

Adding the normalization in builtin/multi-pack-index.c is a little
complicated because of how the sub-commands were split in 60ca94769
(builtin/multi-pack-index.c: split sub-commands, 2021-03-30). The
--object-dir argument could be parsed before the sub-command name _or_
after it. Thus, create a normalize_object_dir() helper to call after all
arguments are parsed, but before any logic is run on that object dir.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/multi-pack-index.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 4480ba39827..3853960f9ba 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -90,6 +90,14 @@ static void read_packs_from_stdin(struct string_list *to)
 	strbuf_release(&buf);
 }
 
+static void normalize_object_dir(void)
+{
+	if (!opts.object_dir)
+		opts.object_dir = get_object_directory();
+	else
+		opts.object_dir = real_pathdup(opts.object_dir, 1);
+}
+
 static int cmd_multi_pack_index_write(int argc, const char **argv)
 {
 	struct option *options;
@@ -127,6 +135,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 
 	FREE_AND_NULL(options);
 
+	normalize_object_dir();
+
 	if (opts.stdin_packs) {
 		struct string_list packs = STRING_LIST_INIT_DUP;
 		int ret;
@@ -169,6 +179,8 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv)
 
 	FREE_AND_NULL(options);
 
+	normalize_object_dir();
+
 	return verify_midx_file(the_repository, opts.object_dir, opts.flags);
 }
 
@@ -195,6 +207,8 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv)
 
 	FREE_AND_NULL(options);
 
+	normalize_object_dir();
+
 	return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
 }
 
@@ -225,6 +239,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv)
 
 	FREE_AND_NULL(options);
 
+	normalize_object_dir();
+
 	return midx_repack(the_repository, opts.object_dir,
 			   (size_t)opts.batch_size, opts.flags);
 }
@@ -241,9 +257,6 @@ int cmd_multi_pack_index(int argc, const char **argv,
 			     builtin_multi_pack_index_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
-	if (!opts.object_dir)
-		opts.object_dir = get_object_directory();
-
 	if (!argc)
 		goto usage;
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] multi-pack-index: use --object-dir real path
  2022-04-21 14:57 ` [PATCH 2/2] multi-pack-index: use --object-dir real path Derrick Stolee via GitGitGadget
@ 2022-04-21 19:50   ` Victoria Dye
  2022-04-21 19:55     ` Derrick Stolee
  2022-04-21 20:28     ` Junio C Hamano
  0 siblings, 2 replies; 11+ messages in thread
From: Victoria Dye @ 2022-04-21 19:50 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git; +Cc: gitster, me, Derrick Stolee

Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The --object-dir argument to 'git multi-pack-index' allows a user to
> specify an alternate to use instead of the local $GITDIR. This is used
> by third-party tools like VFS for Git to maintain the pack-files in a
> "shared object cache" used by multiple clones.
> 
> On Windows, the user can specify a path using a Windows-style file path
> with backslashes such as "C:\Path\To\ObjectDir". This same path style is
> used in the .git/objects/info/alternates file, so it already matches the
> path of that alternate. However, find_odb() converts these paths to
> real-paths for the comparison, which use forward slashes. As of the
> previous change, lookup_multi_pack_index() uses real-paths, so it
> correctly finds the target multi-pack-index when given these paths.
> 
> Some commands such as 'git multi-pack-index repack' call child processes
> using the object_dir value, so it can be helpful to convert the path to
> the real-path before sending it to those locations.
> 
> Adding the normalization in builtin/multi-pack-index.c is a little
> complicated because of how the sub-commands were split in 60ca94769
> (builtin/multi-pack-index.c: split sub-commands, 2021-03-30). The
> --object-dir argument could be parsed before the sub-command name _or_
> after it. Thus, create a normalize_object_dir() helper to call after all
> arguments are parsed, but before any logic is run on that object dir.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  builtin/multi-pack-index.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> index 4480ba39827..3853960f9ba 100644
> --- a/builtin/multi-pack-index.c
> +++ b/builtin/multi-pack-index.c
> @@ -90,6 +90,14 @@ static void read_packs_from_stdin(struct string_list *to)
>  	strbuf_release(&buf);
>  }
>  
> +static void normalize_object_dir(void)
> +{
> +	if (!opts.object_dir)
> +		opts.object_dir = get_object_directory();
> +	else
> +		opts.object_dir = real_pathdup(opts.object_dir, 1);
> +}
> +

Rather than copy the 'normalize_object_dir()' calls to every subcommand, you
could "centralize" this by making the 'object_dir' option an 'OPT_CALLBACK'
option, something like:

static struct option common_opts[] = {
	OPT_CALLBACK(0, "object-dir", &opts.object_dir, N_("file"),
		     N_("object directory containing set of packfile and pack-index pairs"),
		     normalize_object_dir),
	OPT_END(),
};

It would require changing the function signature of 'normalize_object_dir'
to match what's shown in 'Documentation/technical/api-parse-options.txt',
and it potentially needs prefix handling similar to what's done in
parse-options.c:get_value() (which internally calls 'fix_filename()' for
filename opts), but I think it's probably worth reducing duplication here
and avoiding the need to add 'normalize_object_dir()' to any new subcommand
in the future.

>  static int cmd_multi_pack_index_write(int argc, const char **argv)
>  {
>  	struct option *options;
> @@ -127,6 +135,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
>  
>  	FREE_AND_NULL(options);
>  
> +	normalize_object_dir();
> +
>  	if (opts.stdin_packs) {
>  		struct string_list packs = STRING_LIST_INIT_DUP;
>  		int ret;
> @@ -169,6 +179,8 @@ static int cmd_multi_pack_index_verify(int argc, const char **argv)
>  
>  	FREE_AND_NULL(options);
>  
> +	normalize_object_dir();
> +
>  	return verify_midx_file(the_repository, opts.object_dir, opts.flags);
>  }
>  
> @@ -195,6 +207,8 @@ static int cmd_multi_pack_index_expire(int argc, const char **argv)
>  
>  	FREE_AND_NULL(options);
>  
> +	normalize_object_dir();
> +
>  	return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
>  }
>  
> @@ -225,6 +239,8 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv)
>  
>  	FREE_AND_NULL(options);
>  
> +	normalize_object_dir();
> +
>  	return midx_repack(the_repository, opts.object_dir,
>  			   (size_t)opts.batch_size, opts.flags);
>  }
> @@ -241,9 +257,6 @@ int cmd_multi_pack_index(int argc, const char **argv,
>  			     builtin_multi_pack_index_usage,
>  			     PARSE_OPT_STOP_AT_NON_OPTION);
>  
> -	if (!opts.object_dir)
> -		opts.object_dir = get_object_directory();
> -
>  	if (!argc)
>  		goto usage;
>  


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] multi-pack-index: use --object-dir real path
  2022-04-21 19:50   ` Victoria Dye
@ 2022-04-21 19:55     ` Derrick Stolee
  2022-04-21 20:28     ` Junio C Hamano
  1 sibling, 0 replies; 11+ messages in thread
From: Derrick Stolee @ 2022-04-21 19:55 UTC (permalink / raw)
  To: Victoria Dye, Derrick Stolee via GitGitGadget, git; +Cc: gitster, me

On 4/21/2022 3:50 PM, Victoria Dye wrote:
> Derrick Stolee via GitGitGadget wrote:
>> +static void normalize_object_dir(void)
>> +{
>> +	if (!opts.object_dir)
>> +		opts.object_dir = get_object_directory();
>> +	else
>> +		opts.object_dir = real_pathdup(opts.object_dir, 1);
>> +}
>> +
> 
> Rather than copy the 'normalize_object_dir()' calls to every subcommand, you
> could "centralize" this by making the 'object_dir' option an 'OPT_CALLBACK'
> option, something like:
> 
> static struct option common_opts[] = {
> 	OPT_CALLBACK(0, "object-dir", &opts.object_dir, N_("file"),
> 		     N_("object directory containing set of packfile and pack-index pairs"),
> 		     normalize_object_dir),
> 	OPT_END(),
> };
> 
> It would require changing the function signature of 'normalize_object_dir'
> to match what's shown in 'Documentation/technical/api-parse-options.txt',
> and it potentially needs prefix handling similar to what's done in
> parse-options.c:get_value() (which internally calls 'fix_filename()' for
> filename opts), but I think it's probably worth reducing duplication here
> and avoiding the need to add 'normalize_object_dir()' to any new subcommand
> in the future.

Thanks! I agree that that would be a cleaner approach, especially if
a new subcommand is added in the future.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] multi-pack-index: use --object-dir real path
  2022-04-21 19:50   ` Victoria Dye
  2022-04-21 19:55     ` Derrick Stolee
@ 2022-04-21 20:28     ` Junio C Hamano
  1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2022-04-21 20:28 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Derrick Stolee via GitGitGadget, git, me, Derrick Stolee

Victoria Dye <vdye@github.com> writes:

>> +static void normalize_object_dir(void)
>> +{
>> +	if (!opts.object_dir)
>> +		opts.object_dir = get_object_directory();
>> +	else
>> +		opts.object_dir = real_pathdup(opts.object_dir, 1);
>> +}
>> +
>
> Rather than copy the 'normalize_object_dir()' calls to every subcommand, you
> could "centralize" this by making the 'object_dir' option an 'OPT_CALLBACK'
> option, something like:
>
> static struct option common_opts[] = {
> 	OPT_CALLBACK(0, "object-dir", &opts.object_dir, N_("file"),
> 		     N_("object directory containing set of packfile and pack-index pairs"),
> 		     normalize_object_dir),
> 	OPT_END(),
> };
>
> It would require changing the function signature of 'normalize_object_dir'
> to match what's shown in 'Documentation/technical/api-parse-options.txt',
> and it potentially needs prefix handling similar to what's done in
> parse-options.c:get_value() (which internally calls 'fix_filename()' for
> filename opts), but I think it's probably worth reducing duplication here
> and avoiding the need to add 'normalize_object_dir()' to any new subcommand
> in the future.

Good suggestion.  Thanks, both, for taking care of this.

Are there other places that we take end-user input and treat it as a
pathname without necessary normalization, I wonder.  The codepath
fixed by this series is relatively new, and I am not surprised such
a bug was still there and hopefully it was an isolated remaining bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 0/3] multi-pack-index: use real paths for --object-dir
  2022-04-21 14:57 [PATCH 0/2] multi-pack-index: use real paths for --object-dir Derrick Stolee via GitGitGadget
  2022-04-21 14:57 ` [PATCH 1/2] midx: use real paths in lookup_multi_pack_index() Derrick Stolee via GitGitGadget
  2022-04-21 14:57 ` [PATCH 2/2] multi-pack-index: use --object-dir real path Derrick Stolee via GitGitGadget
@ 2022-04-25 18:27 ` Derrick Stolee via GitGitGadget
  2022-04-25 18:27   ` [PATCH v2 1/3] midx: use real paths in lookup_multi_pack_index() Derrick Stolee via GitGitGadget
                     ` (2 more replies)
  2 siblings, 3 replies; 11+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-25 18:27 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, me, Derrick Stolee

A VFS for Git user reported [1] that the background maintenance did not seem
to be doing anything. At least, the 'git multi-pack-index expire' and 'git
multi-pack-index repack' steps did nothing. The 'write' subcommand worked to
collect all pack-files into a single multi-pack-index file, but the packs
were only growing in number instead of being maintained carefully.

[1] https://github.com/microsoft/git/issues/497

The root issue is about path comparisons between Windows and Unix path
styles.

The fix is two-fold:

 * Patch 1 performs real-path normalization at a low-level where it is
   required.
 * Patch 2 performs real-path normalization at a high-level to be extra
   safe.
 * Patch 3 adds an extra API hardening that I noticed while making Patch 2
   for v2.

I'm not exactly sure how to test this properly in the Git codebase, but I'm
looking to add some tests into VFS for Git to ensure this doesn't regress in
the future.


Updates in v2
=============

 * Patch 2 is rewritten to use an option callback, as recommended by
   Victoria. This is more robust to future changes to the CLI.
 * Patch 3 is new, and something I found while working on patch 2.

Thanks, -Stolee

Derrick Stolee (3):
  midx: use real paths in lookup_multi_pack_index()
  multi-pack-index: use --object-dir real path
  cache: use const char * for get_object_directory()

 builtin/multi-pack-index.c | 45 ++++++++++++++++++++++++++++----------
 cache.h                    |  2 +-
 environment.c              |  2 +-
 midx.c                     | 17 ++++++++++----
 4 files changed, 49 insertions(+), 17 deletions(-)


base-commit: 6cd33dceed60949e2dbc32e3f0f5e67c4c882e1e
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1221%2Fderrickstolee%2Fmidx-normalize-object-dir-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1221/derrickstolee/midx-normalize-object-dir-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1221

Range-diff vs v1:

 1:  34785a0c7cc = 1:  34785a0c7cc midx: use real paths in lookup_multi_pack_index()
 2:  0435406e2db ! 2:  fd580e79477 multi-pack-index: use --object-dir real path
     @@ Commit message
          using the object_dir value, so it can be helpful to convert the path to
          the real-path before sending it to those locations.
      
     -    Adding the normalization in builtin/multi-pack-index.c is a little
     -    complicated because of how the sub-commands were split in 60ca94769
     -    (builtin/multi-pack-index.c: split sub-commands, 2021-03-30). The
     -    --object-dir argument could be parsed before the sub-command name _or_
     -    after it. Thus, create a normalize_object_dir() helper to call after all
     -    arguments are parsed, but before any logic is run on that object dir.
     +    Add a callback to convert the real path immediately upon parsing the
     +    argument. We need to be careful that we don't store the exact value out
     +    of get_object_directory() and free it, or we could corrupt a later use
     +    of the_repository->objects->odb->path.
     +
     +    We don't use get_object_directory() for the initial instantiation in
     +    cmd_multi_pack_index() because we need 'git multi-pack-index -h' to work
     +    without a Git repository.
      
          Signed-off-by: Derrick Stolee <derrickstolee@github.com>
      
       ## builtin/multi-pack-index.c ##
     -@@ builtin/multi-pack-index.c: static void read_packs_from_stdin(struct string_list *to)
     - 	strbuf_release(&buf);
     - }
     +@@ builtin/multi-pack-index.c: static char const * const builtin_multi_pack_index_usage[] = {
     + };
     + 
     + static struct opts_multi_pack_index {
     +-	const char *object_dir;
     ++	char *object_dir;
     + 	const char *preferred_pack;
     + 	const char *refs_snapshot;
     + 	unsigned long batch_size;
     +@@ builtin/multi-pack-index.c: static struct opts_multi_pack_index {
     + 	int stdin_packs;
     + } opts;
       
     -+static void normalize_object_dir(void)
     ++
     ++static int parse_object_dir(const struct option *opt, const char *arg,
     ++			    int unset)
      +{
     -+	if (!opts.object_dir)
     -+		opts.object_dir = get_object_directory();
     ++	free(opts.object_dir);
     ++	if (unset)
     ++		opts.object_dir = xstrdup(get_object_directory());
      +	else
     -+		opts.object_dir = real_pathdup(opts.object_dir, 1);
     ++		opts.object_dir = real_pathdup(arg, 1);
     ++	return 0;
      +}
      +
     - static int cmd_multi_pack_index_write(int argc, const char **argv)
     - {
     - 	struct option *options;
     -@@ builtin/multi-pack-index.c: static int cmd_multi_pack_index_write(int argc, const char **argv)
     - 
     - 	FREE_AND_NULL(options);
     - 
     -+	normalize_object_dir();
     -+
     - 	if (opts.stdin_packs) {
     - 		struct string_list packs = STRING_LIST_INIT_DUP;
     - 		int ret;
     -@@ builtin/multi-pack-index.c: static int cmd_multi_pack_index_verify(int argc, const char **argv)
     - 
     - 	FREE_AND_NULL(options);
     - 
     -+	normalize_object_dir();
     -+
     - 	return verify_midx_file(the_repository, opts.object_dir, opts.flags);
     - }
     - 
     -@@ builtin/multi-pack-index.c: static int cmd_multi_pack_index_expire(int argc, const char **argv)
     - 
     - 	FREE_AND_NULL(options);
     - 
     -+	normalize_object_dir();
     -+
     - 	return expire_midx_packs(the_repository, opts.object_dir, opts.flags);
     - }
     + static struct option common_opts[] = {
     +-	OPT_FILENAME(0, "object-dir", &opts.object_dir,
     +-	  N_("object directory containing set of packfile and pack-index pairs")),
     ++	OPT_CALLBACK(0, "object-dir", &opts.object_dir,
     ++	  N_("directory"),
     ++	  N_("object directory containing set of packfile and pack-index pairs"),
     ++	  parse_object_dir),
     + 	OPT_END(),
     + };
       
      @@ builtin/multi-pack-index.c: static int cmd_multi_pack_index_repack(int argc, const char **argv)
     + int cmd_multi_pack_index(int argc, const char **argv,
     + 			 const char *prefix)
     + {
     ++	int res;
     + 	struct option *builtin_multi_pack_index_options = common_opts;
       
     - 	FREE_AND_NULL(options);
     + 	git_config(git_default_config, NULL);
       
     -+	normalize_object_dir();
     ++	if (the_repository &&
     ++	    the_repository->objects &&
     ++	    the_repository->objects->odb)
     ++		opts.object_dir = xstrdup(the_repository->objects->odb->path);
      +
     - 	return midx_repack(the_repository, opts.object_dir,
     - 			   (size_t)opts.batch_size, opts.flags);
     - }
     -@@ builtin/multi-pack-index.c: int cmd_multi_pack_index(int argc, const char **argv,
     + 	argc = parse_options(argc, argv, prefix,
     + 			     builtin_multi_pack_index_options,
       			     builtin_multi_pack_index_usage,
       			     PARSE_OPT_STOP_AT_NON_OPTION);
       
     @@ builtin/multi-pack-index.c: int cmd_multi_pack_index(int argc, const char **argv
       	if (!argc)
       		goto usage;
       
     + 	if (!strcmp(argv[0], "repack"))
     +-		return cmd_multi_pack_index_repack(argc, argv);
     ++		res = cmd_multi_pack_index_repack(argc, argv);
     + 	else if (!strcmp(argv[0], "write"))
     +-		return cmd_multi_pack_index_write(argc, argv);
     ++		res =  cmd_multi_pack_index_write(argc, argv);
     + 	else if (!strcmp(argv[0], "verify"))
     +-		return cmd_multi_pack_index_verify(argc, argv);
     ++		res =  cmd_multi_pack_index_verify(argc, argv);
     + 	else if (!strcmp(argv[0], "expire"))
     +-		return cmd_multi_pack_index_expire(argc, argv);
     ++		res =  cmd_multi_pack_index_expire(argc, argv);
     ++	else {
     ++		error(_("unrecognized subcommand: %s"), argv[0]);
     ++		goto usage;
     ++	}
     ++
     ++	free(opts.object_dir);
     ++	return res;
     + 
     +-	error(_("unrecognized subcommand: %s"), argv[0]);
     + usage:
     + 	usage_with_options(builtin_multi_pack_index_usage,
     + 			   builtin_multi_pack_index_options);
 -:  ----------- > 3:  c9b13f0e146 cache: use const char * for get_object_directory()

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] midx: use real paths in lookup_multi_pack_index()
  2022-04-25 18:27 ` [PATCH v2 0/3] multi-pack-index: use real paths for --object-dir Derrick Stolee via GitGitGadget
@ 2022-04-25 18:27   ` Derrick Stolee via GitGitGadget
  2022-04-25 18:27   ` [PATCH v2 2/3] multi-pack-index: use --object-dir real path Derrick Stolee via GitGitGadget
  2022-04-25 18:27   ` [PATCH v2 3/3] cache: use const char * for get_object_directory() Derrick Stolee via GitGitGadget
  2 siblings, 0 replies; 11+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-25 18:27 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, me, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

This helper looks for a parsed multi-pack-index whose object directory
matches the given object_dir. Before going into the loop over the parsed
multi-pack-indexes, it calls find_odb() to ensure that the given
object_dir is actually a known object directory.

However, find_odb() uses real-path manipulations to compare the input to
the alternate directories. This same real-path comparison is not used in
the loop, leading to potential issues with the strcmp().

Update the method to use the real-path values instead.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 midx.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/midx.c b/midx.c
index 107365d2114..3db0e47735f 100644
--- a/midx.c
+++ b/midx.c
@@ -1132,17 +1132,26 @@ cleanup:
 static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 							const char *object_dir)
 {
+	struct multi_pack_index *result = NULL;
 	struct multi_pack_index *cur;
+	char *obj_dir_real = real_pathdup(object_dir, 1);
+	struct strbuf cur_path_real = STRBUF_INIT;
 
 	/* Ensure the given object_dir is local, or a known alternate. */
-	find_odb(r, object_dir);
+	find_odb(r, obj_dir_real);
 
 	for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
-		if (!strcmp(object_dir, cur->object_dir))
-			return cur;
+		strbuf_realpath(&cur_path_real, cur->object_dir, 1);
+		if (!strcmp(obj_dir_real, cur_path_real.buf)) {
+			result = cur;
+			goto cleanup;
+		}
 	}
 
-	return NULL;
+cleanup:
+	free(obj_dir_real);
+	strbuf_release(&cur_path_real);
+	return result;
 }
 
 static int write_midx_internal(const char *object_dir,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] multi-pack-index: use --object-dir real path
  2022-04-25 18:27 ` [PATCH v2 0/3] multi-pack-index: use real paths for --object-dir Derrick Stolee via GitGitGadget
  2022-04-25 18:27   ` [PATCH v2 1/3] midx: use real paths in lookup_multi_pack_index() Derrick Stolee via GitGitGadget
@ 2022-04-25 18:27   ` Derrick Stolee via GitGitGadget
  2022-04-25 18:58     ` Junio C Hamano
  2022-04-25 18:27   ` [PATCH v2 3/3] cache: use const char * for get_object_directory() Derrick Stolee via GitGitGadget
  2 siblings, 1 reply; 11+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-25 18:27 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, me, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The --object-dir argument to 'git multi-pack-index' allows a user to
specify an alternate to use instead of the local $GITDIR. This is used
by third-party tools like VFS for Git to maintain the pack-files in a
"shared object cache" used by multiple clones.

On Windows, the user can specify a path using a Windows-style file path
with backslashes such as "C:\Path\To\ObjectDir". This same path style is
used in the .git/objects/info/alternates file, so it already matches the
path of that alternate. However, find_odb() converts these paths to
real-paths for the comparison, which use forward slashes. As of the
previous change, lookup_multi_pack_index() uses real-paths, so it
correctly finds the target multi-pack-index when given these paths.

Some commands such as 'git multi-pack-index repack' call child processes
using the object_dir value, so it can be helpful to convert the path to
the real-path before sending it to those locations.

Add a callback to convert the real path immediately upon parsing the
argument. We need to be careful that we don't store the exact value out
of get_object_directory() and free it, or we could corrupt a later use
of the_repository->objects->odb->path.

We don't use get_object_directory() for the initial instantiation in
cmd_multi_pack_index() because we need 'git multi-pack-index -h' to work
without a Git repository.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/multi-pack-index.c | 45 ++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 4480ba39827..5edbb7fe86e 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -44,7 +44,7 @@ static char const * const builtin_multi_pack_index_usage[] = {
 };
 
 static struct opts_multi_pack_index {
-	const char *object_dir;
+	char *object_dir;
 	const char *preferred_pack;
 	const char *refs_snapshot;
 	unsigned long batch_size;
@@ -52,9 +52,23 @@ static struct opts_multi_pack_index {
 	int stdin_packs;
 } opts;
 
+
+static int parse_object_dir(const struct option *opt, const char *arg,
+			    int unset)
+{
+	free(opts.object_dir);
+	if (unset)
+		opts.object_dir = xstrdup(get_object_directory());
+	else
+		opts.object_dir = real_pathdup(arg, 1);
+	return 0;
+}
+
 static struct option common_opts[] = {
-	OPT_FILENAME(0, "object-dir", &opts.object_dir,
-	  N_("object directory containing set of packfile and pack-index pairs")),
+	OPT_CALLBACK(0, "object-dir", &opts.object_dir,
+	  N_("directory"),
+	  N_("object directory containing set of packfile and pack-index pairs"),
+	  parse_object_dir),
 	OPT_END(),
 };
 
@@ -232,31 +246,40 @@ static int cmd_multi_pack_index_repack(int argc, const char **argv)
 int cmd_multi_pack_index(int argc, const char **argv,
 			 const char *prefix)
 {
+	int res;
 	struct option *builtin_multi_pack_index_options = common_opts;
 
 	git_config(git_default_config, NULL);
 
+	if (the_repository &&
+	    the_repository->objects &&
+	    the_repository->objects->odb)
+		opts.object_dir = xstrdup(the_repository->objects->odb->path);
+
 	argc = parse_options(argc, argv, prefix,
 			     builtin_multi_pack_index_options,
 			     builtin_multi_pack_index_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
-	if (!opts.object_dir)
-		opts.object_dir = get_object_directory();
-
 	if (!argc)
 		goto usage;
 
 	if (!strcmp(argv[0], "repack"))
-		return cmd_multi_pack_index_repack(argc, argv);
+		res = cmd_multi_pack_index_repack(argc, argv);
 	else if (!strcmp(argv[0], "write"))
-		return cmd_multi_pack_index_write(argc, argv);
+		res =  cmd_multi_pack_index_write(argc, argv);
 	else if (!strcmp(argv[0], "verify"))
-		return cmd_multi_pack_index_verify(argc, argv);
+		res =  cmd_multi_pack_index_verify(argc, argv);
 	else if (!strcmp(argv[0], "expire"))
-		return cmd_multi_pack_index_expire(argc, argv);
+		res =  cmd_multi_pack_index_expire(argc, argv);
+	else {
+		error(_("unrecognized subcommand: %s"), argv[0]);
+		goto usage;
+	}
+
+	free(opts.object_dir);
+	return res;
 
-	error(_("unrecognized subcommand: %s"), argv[0]);
 usage:
 	usage_with_options(builtin_multi_pack_index_usage,
 			   builtin_multi_pack_index_options);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] cache: use const char * for get_object_directory()
  2022-04-25 18:27 ` [PATCH v2 0/3] multi-pack-index: use real paths for --object-dir Derrick Stolee via GitGitGadget
  2022-04-25 18:27   ` [PATCH v2 1/3] midx: use real paths in lookup_multi_pack_index() Derrick Stolee via GitGitGadget
  2022-04-25 18:27   ` [PATCH v2 2/3] multi-pack-index: use --object-dir real path Derrick Stolee via GitGitGadget
@ 2022-04-25 18:27   ` Derrick Stolee via GitGitGadget
  2 siblings, 0 replies; 11+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-04-25 18:27 UTC (permalink / raw)
  To: git; +Cc: gitster, vdye, me, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The get_object_directory() method returns the exact string stored at
the_repository->objects->odb->path. The return type of "char *" implies
that the caller must keep track of the buffer and free() it when
complete. This causes significant problems later when the ODB is
accessed.

Use "const char *" as the return type to avoid this confusion. There are
no current callers that care about the non-const definition.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 cache.h       | 2 +-
 environment.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 6226f6a8a53..595582becc8 100644
--- a/cache.h
+++ b/cache.h
@@ -566,7 +566,7 @@ extern char *git_work_tree_cfg;
 int is_inside_work_tree(void);
 const char *get_git_dir(void);
 const char *get_git_common_dir(void);
-char *get_object_directory(void);
+const char *get_object_directory(void);
 char *get_index_file(void);
 char *get_graft_file(struct repository *r);
 void set_git_dir(const char *path, int make_realpath);
diff --git a/environment.c b/environment.c
index 5bff1b386fd..b3296ce7d15 100644
--- a/environment.c
+++ b/environment.c
@@ -273,7 +273,7 @@ const char *get_git_work_tree(void)
 	return the_repository->worktree;
 }
 
-char *get_object_directory(void)
+const char *get_object_directory(void)
 {
 	if (!the_repository->objects->odb)
 		BUG("git environment hasn't been setup");
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/3] multi-pack-index: use --object-dir real path
  2022-04-25 18:27   ` [PATCH v2 2/3] multi-pack-index: use --object-dir real path Derrick Stolee via GitGitGadget
@ 2022-04-25 18:58     ` Junio C Hamano
  0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2022-04-25 18:58 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, vdye, me, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The --object-dir argument to 'git multi-pack-index' allows a user to
> specify an alternate to use instead of the local $GITDIR. This is used
> by third-party tools like VFS for Git to maintain the pack-files in a
> "shared object cache" used by multiple clones.
>
> On Windows, the user can specify a path using a Windows-style file path
> with backslashes such as "C:\Path\To\ObjectDir". This same path style is
> used in the .git/objects/info/alternates file, so it already matches the
> path of that alternate. However, find_odb() converts these paths to
> real-paths for the comparison, which use forward slashes. As of the
> previous change, lookup_multi_pack_index() uses real-paths, so it
> correctly finds the target multi-pack-index when given these paths.
>
> Some commands such as 'git multi-pack-index repack' call child processes
> using the object_dir value, so it can be helpful to convert the path to
> the real-path before sending it to those locations.
>
> Add a callback to convert the real path immediately upon parsing the
> argument. We need to be careful that we don't store the exact value out
> of get_object_directory() and free it, or we could corrupt a later use
> of the_repository->objects->odb->path.
>
> We don't use get_object_directory() for the initial instantiation in
> cmd_multi_pack_index() because we need 'git multi-pack-index -h' to work
> without a Git repository.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  builtin/multi-pack-index.c | 45 ++++++++++++++++++++++++++++----------
>  1 file changed, 34 insertions(+), 11 deletions(-)

Much nicer compared to the previous round, by doing the
normalization early.

Will queue.  thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-04-25 18:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21 14:57 [PATCH 0/2] multi-pack-index: use real paths for --object-dir Derrick Stolee via GitGitGadget
2022-04-21 14:57 ` [PATCH 1/2] midx: use real paths in lookup_multi_pack_index() Derrick Stolee via GitGitGadget
2022-04-21 14:57 ` [PATCH 2/2] multi-pack-index: use --object-dir real path Derrick Stolee via GitGitGadget
2022-04-21 19:50   ` Victoria Dye
2022-04-21 19:55     ` Derrick Stolee
2022-04-21 20:28     ` Junio C Hamano
2022-04-25 18:27 ` [PATCH v2 0/3] multi-pack-index: use real paths for --object-dir Derrick Stolee via GitGitGadget
2022-04-25 18:27   ` [PATCH v2 1/3] midx: use real paths in lookup_multi_pack_index() Derrick Stolee via GitGitGadget
2022-04-25 18:27   ` [PATCH v2 2/3] multi-pack-index: use --object-dir real path Derrick Stolee via GitGitGadget
2022-04-25 18:58     ` Junio C Hamano
2022-04-25 18:27   ` [PATCH v2 3/3] cache: use const char * for get_object_directory() Derrick Stolee via GitGitGadget

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).