git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] prune: quiet ENOENT on missing directories
@ 2022-11-19 20:12 Eric Wong
  2022-11-21  6:02 ` Junio C Hamano
  2022-11-21 11:16 ` [PATCH] prune: quiet ENOENT on missing directories Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Wong @ 2022-11-19 20:12 UTC (permalink / raw)
  To: git

$GIT_DIR/objects/pack may be removed to save inodes in shared
repositories.  Quiet down prune in cases where either
$GIT_DIR/objects or $GIT_DIR/objects/pack is non-existent,
but emit the system error in other cases to help users diagnose
permissions problems or resource constraints.

Signed-off-by: Eric Wong <e@80x24.org>
---
 builtin/prune.c  | 4 +++-
 t/t5304-prune.sh | 8 ++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index df376b2ed1..2719220108 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -127,7 +127,9 @@ static void remove_temporary_files(const char *path)
 
 	dir = opendir(path);
 	if (!dir) {
-		fprintf(stderr, "Unable to open directory %s\n", path);
+		if (errno != ENOENT)
+			fprintf(stderr, "Unable to open directory %s: %s\n",
+				path, strerror(errno));
 		return;
 	}
 	while ((de = readdir(dir)) != NULL)
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 8ae314af58..d65a5f94b4 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -29,6 +29,14 @@ test_expect_success setup '
 	git gc
 '
 
+test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
+	git clone -q --shared --template= --bare . bare.git &&
+	rmdir bare.git/objects/pack &&
+	git --git-dir=bare.git prune --no-progress 2>prune.err &&
+	test_must_be_empty prune.err &&
+	rm -r bare.git prune.err
+'
+
 test_expect_success 'prune stale packs' '
 	orig_pack=$(echo .git/objects/pack/*.pack) &&
 	>.git/objects/tmp_1.pack &&

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] prune: quiet ENOENT on missing directories
  2022-11-19 20:12 [PATCH] prune: quiet ENOENT on missing directories Eric Wong
@ 2022-11-21  6:02 ` Junio C Hamano
  2022-11-21 10:44   ` Eric Wong
  2022-11-21 11:16 ` [PATCH] prune: quiet ENOENT on missing directories Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2022-11-21  6:02 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

Eric Wong <e@80x24.org> writes:

> $GIT_DIR/objects/pack may be removed to save inodes in shared
> repositories.  Quiet down prune in cases where either
> $GIT_DIR/objects or $GIT_DIR/objects/pack is non-existent,

Wouldn't setup.c::is_git_directory() say "nope, you do not have a
repository there" if you are missing $GIT_DIR/objects?  So I suspect
that the only case this matters in practice is a missing pack/
subdirectory.

I agree that silently ignoring missing objects/pack/ is perfectly
fine, whether we auto-vivify it when we actually create a pack.

> but emit the system error in other cases to help users diagnose
> permissions problems or resource constraints.

OK.

> @@ -127,7 +127,9 @@ static void remove_temporary_files(const char *path)
>  
>  	dir = opendir(path);
>  	if (!dir) {
> -		fprintf(stderr, "Unable to open directory %s\n", path);
> +		if (errno != ENOENT)
> +			fprintf(stderr, "Unable to open directory %s: %s\n",
> +				path, strerror(errno));
>  		return;
>  	}

This is called twice, with $GIT_OBJECT_DIRECTORY and its pack
subdirectory, as it does not recurse.  

This is a tangent, I have to wonder how effective the first call
would be, though.  When writing a loose object file, we compute its
object name first in-core and determine the final filename, create a
temporary file in the same directory as the final file, write into
it and then finally rename the temporary to the final name.  The
fan-out $GIT_OBJECT_DIRECTORY/??/ directories may have temporary
files left when such a process crashed, but do we create cruft "git
prune" should remove in $GIT_OBJECT_DIRECTORY/ itself?

> diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
> index 8ae314af58..d65a5f94b4 100755
> --- a/t/t5304-prune.sh
> +++ b/t/t5304-prune.sh
> @@ -29,6 +29,14 @@ test_expect_success setup '
>  	git gc
>  '
>  
> +test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
> +	git clone -q --shared --template= --bare . bare.git &&
> +	rmdir bare.git/objects/pack &&
> +	git --git-dir=bare.git prune --no-progress 2>prune.err &&
> +	test_must_be_empty prune.err &&
> +	rm -r bare.git prune.err
> +'
> +
>  test_expect_success 'prune stale packs' '
>  	orig_pack=$(echo .git/objects/pack/*.pack) &&
>  	>.git/objects/tmp_1.pack &&

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] prune: quiet ENOENT on missing directories
  2022-11-21  6:02 ` Junio C Hamano
@ 2022-11-21 10:44   ` Eric Wong
  2022-11-21 13:08     ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2022-11-21 10:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > $GIT_DIR/objects/pack may be removed to save inodes in shared
> > repositories.  Quiet down prune in cases where either
> > $GIT_DIR/objects or $GIT_DIR/objects/pack is non-existent,
> 
> Wouldn't setup.c::is_git_directory() say "nope, you do not have a
> repository there" if you are missing $GIT_DIR/objects?  So I suspect
> that the only case this matters in practice is a missing pack/
> subdirectory.

Right.  Removing $GIT_DIR/objects isn't currently OK, but maybe
someday it could be...  Supporting missing pack/ is the primary
reason for this change, but making a small step towards allowing
objects/-free $GIT_DIR doesn't seem harmful.

> I agree that silently ignoring missing objects/pack/ is perfectly
> fine, whether we auto-vivify it when we actually create a pack.
> 
> > but emit the system error in other cases to help users diagnose
> > permissions problems or resource constraints.
> 
> OK.
> 
> > @@ -127,7 +127,9 @@ static void remove_temporary_files(const char *path)
> >  
> >  	dir = opendir(path);
> >  	if (!dir) {
> > -		fprintf(stderr, "Unable to open directory %s\n", path);
> > +		if (errno != ENOENT)
> > +			fprintf(stderr, "Unable to open directory %s: %s\n",
> > +				path, strerror(errno));
> >  		return;
> >  	}
> 
> This is called twice, with $GIT_OBJECT_DIRECTORY and its pack
> subdirectory, as it does not recurse.  

Right.

> This is a tangent, I have to wonder how effective the first call
> would be, though.  When writing a loose object file, we compute its
> object name first in-core and determine the final filename, create a
> temporary file in the same directory as the final file, write into
> it and then finally rename the temporary to the final name.  The
> fan-out $GIT_OBJECT_DIRECTORY/??/ directories may have temporary
> files left when such a process crashed, but do we create cruft "git
> prune" should remove in $GIT_OBJECT_DIRECTORY/ itself?

Good question, perhaps this could be a followup:

diff --git a/builtin/prune.c b/builtin/prune.c
index 2719220108..041c45ecbe 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -188,7 +188,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 				      prune_cruft, prune_subdir, &revs);
 
 	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
-	remove_temporary_files(get_object_directory());
 	s = mkpathdup("%s/pack", get_object_directory());
 	remove_temporary_files(s);
 	free(s);

OTOH, perhaps there's some 3rd-party tools (e.g. backup tools)
that leave stuff in top-level objects/ and we'd risk breaking
a rare setup via ENOSPC.

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] prune: quiet ENOENT on missing directories
  2022-11-19 20:12 [PATCH] prune: quiet ENOENT on missing directories Eric Wong
  2022-11-21  6:02 ` Junio C Hamano
@ 2022-11-21 11:16 ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-21 11:16 UTC (permalink / raw)
  To: Eric Wong; +Cc: git


On Sat, Nov 19 2022, Eric Wong wrote:

> $GIT_DIR/objects/pack may be removed to save inodes in shared
> repositories.  Quiet down prune in cases where either
> $GIT_DIR/objects or $GIT_DIR/objects/pack is non-existent,
> but emit the system error in other cases to help users diagnose
> permissions problems or resource constraints.
>
> Signed-off-by: Eric Wong <e@80x24.org>
> ---
>  builtin/prune.c  | 4 +++-
>  t/t5304-prune.sh | 8 ++++++++
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/prune.c b/builtin/prune.c
> index df376b2ed1..2719220108 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -127,7 +127,9 @@ static void remove_temporary_files(const char *path)
>  
>  	dir = opendir(path);
>  	if (!dir) {
> -		fprintf(stderr, "Unable to open directory %s\n", path);
> +		if (errno != ENOENT)
> +			fprintf(stderr, "Unable to open directory %s: %s\n",
> +				path, strerror(errno));

We sometimes use fprintf() instead of "error" or "warning" for output
compatibility with an older version, or because it's written in an old
style.

But as you're changing the anyway let's not re-invent error_errno() or
warning_errno(), but just use those.

We could also s/^Unable/unable/ in the message while at it, per
CodingGuidelines.

>  		return;
>  	}
>  	while ((de = readdir(dir)) != NULL)
> diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
> index 8ae314af58..d65a5f94b4 100755
> --- a/t/t5304-prune.sh
> +++ b/t/t5304-prune.sh
> @@ -29,6 +29,14 @@ test_expect_success setup '
>  	git gc
>  '
>  
> +test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
> +	git clone -q --shared --template= --bare . bare.git &&
> +	rmdir bare.git/objects/pack &&
> +	git --git-dir=bare.git prune --no-progress 2>prune.err &&
> +	test_must_be_empty prune.err &&
> +	rm -r bare.git prune.err
> +'
> +
>  test_expect_success 'prune stale packs' '
>  	orig_pack=$(echo .git/objects/pack/*.pack) &&
>  	>.git/objects/tmp_1.pack &&

This seems like a good isolated change, but FWIW I think what we really
should be doing here is using the "report_garbage" facility added in
543c5caa6c9 (count-objects: report garbage files in pack directory too,
2013-02-15) and 478f34d2b6e (gc: remove garbage .idx files from pack
dir, 2015-11-03) for "pack".

I.e. we have already iterated over "pack" and found all the files
therein, and in packfile.c error_errno() etc. That we're
re-opendir()-ing the "pack", walking it again etc. doesn't make much
sense, or does it?

Then the:

	remove_temporary_files(get_object_directory());

Also seems odd, just a few lines above we passed "prune_cruft" to
"for_each_loose_file_in_objdir()", haven't we already walked the loose
object dir & removed temporary cruft there?




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] prune: quiet ENOENT on missing directories
  2022-11-21 10:44   ` Eric Wong
@ 2022-11-21 13:08     ` Junio C Hamano
  2022-11-21 23:09       ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2022-11-21 13:08 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

Eric Wong <e@80x24.org> writes:

> Good question, perhaps this could be a followup:
>
> diff --git a/builtin/prune.c b/builtin/prune.c
> index 2719220108..041c45ecbe 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -188,7 +188,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
>  				      prune_cruft, prune_subdir, &revs);
>  
>  	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
> -	remove_temporary_files(get_object_directory());
>  	s = mkpathdup("%s/pack", get_object_directory());
>  	remove_temporary_files(s);
>  	free(s);

I actually was hinting at making the remove_temporary_files()
recurse, so that you do not need the separate invocation in pack/
subdirectory.

Or make 256 calls for each of the fan-out subdirectory, in which
case the ENOENT silencing you did would really matter and shine.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] prune: quiet ENOENT on missing directories
  2022-11-21 13:08     ` Junio C Hamano
@ 2022-11-21 23:09       ` Junio C Hamano
  2022-11-22  0:09         ` [PATCH] prune: recursively prune objects directory Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2022-11-21 23:09 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

>>  	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
>> -	remove_temporary_files(get_object_directory());
>>  	s = mkpathdup("%s/pack", get_object_directory());
>>  	remove_temporary_files(s);
>>  	free(s);
>
> I actually was hinting at making the remove_temporary_files()
> recurse, so that you do not need the separate invocation in pack/
> subdirectory.
>
> Or make 256 calls for each of the fan-out subdirectory, in which
> case the ENOENT silencing you did would really matter and shine.

But of course, neither is any part of this topic.  They are possible
follow-on works.

Thanks and sorry for making a confusing statement that could be
mistaken as "let's do this too", which wasn't what I meant.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] prune: recursively prune objects directory
  2022-11-21 23:09       ` Junio C Hamano
@ 2022-11-22  0:09         ` Eric Wong
  2022-11-22  1:28           ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2022-11-22  0:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> >>  	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
> >> -	remove_temporary_files(get_object_directory());
> >>  	s = mkpathdup("%s/pack", get_object_directory());
> >>  	remove_temporary_files(s);
> >>  	free(s);
> >
> > I actually was hinting at making the remove_temporary_files()
> > recurse, so that you do not need the separate invocation in pack/
> > subdirectory.
> >
> > Or make 256 calls for each of the fan-out subdirectory, in which
> > case the ENOENT silencing you did would really matter and shine.
> 
> But of course, neither is any part of this topic.  They are possible
> follow-on works.
> 
> Thanks and sorry for making a confusing statement that could be
> mistaken as "let's do this too", which wasn't what I meant.

Oh, no worries.  I already wrote this earlier and got distracted
with something else while waiting for tests :x.  Anyways, the
below supercedes my original patch and I think it's better in
every way.

I am unsure about duplicating ishex() from name-rev.c, however...

------8<-----
Subject: [PATCH] prune: recursively prune objects directory

$GIT_DIR/objects/pack may be removed to save inodes in shared
repositories, so avoid scanning it if it does not exist.  Loose
object directories ($GIT_DIR/objects/??) may have old temporary
files, so we now prune those, too.

Recursion is limited to a single level since git doesn't use
deeper levels.  This avoids the risk of stack overflows via
infinite recursion when pruning untrusted repos.

We'll also emit the system error in case a directory cannot be
opened to help users diagnose permissions problems or resource
constraints.

Signed-off-by: Eric Wong <e@80x24.org>
---
 builtin/prune.c  | 28 ++++++++++++++++++++--------
 t/t5304-prune.sh | 16 ++++++++++++++++
 2 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index df376b2ed1..0f6a33690a 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -114,25 +114,41 @@ static int prune_subdir(unsigned int nr, const char *path, void *data)
 	return 0;
 }
 
+/*
+ * XXX ishex is duplicated in builtin/name-rev.c, perhaps git-compat-util.h
+ * is a better home for it
+ */
+#define ishex(x) (isdigit((x)) || ((x) >= 'a' && (x) <= 'f'))
+static int is_loose_prefix(const char *d_name)
+{
+	return strlen(d_name) == 2 && ishex(d_name[0]) && ishex(d_name[1]);
+}
+
 /*
  * Write errors (particularly out of space) can result in
  * failed temporary packs (and more rarely indexes and other
  * files beginning with "tmp_") accumulating in the object
  * and the pack directories.
  */
-static void remove_temporary_files(const char *path)
+static void remove_temporary_files(const char *path, int recurse)
 {
 	DIR *dir;
 	struct dirent *de;
 
 	dir = opendir(path);
 	if (!dir) {
-		fprintf(stderr, "Unable to open directory %s\n", path);
+		warning_errno(_("unable to open directory %s"), path);
 		return;
 	}
 	while ((de = readdir(dir)) != NULL)
-		if (starts_with(de->d_name, "tmp_"))
+		if (starts_with(de->d_name, "tmp_")) {
 			prune_tmp_file(mkpath("%s/%s", path, de->d_name));
+		} else if (recurse && (strcmp(de->d_name, "packs") == 0 ||
+					is_loose_prefix(de->d_name))) {
+			char *s = mkpathdup("%s/%s", path, de->d_name);
+			remove_temporary_files(s, 0);
+			free(s);
+		}
 	closedir(dir);
 }
 
@@ -150,7 +166,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 			 N_("limit traversal to objects outside promisor packfiles")),
 		OPT_END()
 	};
-	char *s;
 
 	expire = TIME_MAX;
 	save_commit_buffer = 0;
@@ -186,10 +201,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 				      prune_cruft, prune_subdir, &revs);
 
 	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
-	remove_temporary_files(get_object_directory());
-	s = mkpathdup("%s/pack", get_object_directory());
-	remove_temporary_files(s);
-	free(s);
+	remove_temporary_files(get_object_directory(), 1);
 
 	if (is_repository_shallow(the_repository)) {
 		perform_reachability_traversal(&revs);
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 8ae314af58..8c2278035e 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -29,6 +29,22 @@ test_expect_success setup '
 	git gc
 '
 
+test_expect_success 'prune stale loose objects' '
+	mkdir .git/objects/aa &&
+	>.git/objects/aa/tmp_foo &&
+	test-tool chmtime =-86501 .git/objects/aa/tmp_foo &&
+	git prune --expire 1.day &&
+	test_path_is_missing .git/objects/aa/tmp_foo
+'
+
+test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
+	git clone -q --shared --template= --bare . bare.git &&
+	rmdir bare.git/objects/pack &&
+	git --git-dir=bare.git prune --no-progress 2>prune.err &&
+	test_must_be_empty prune.err &&
+	rm -r bare.git prune.err
+'
+
 test_expect_success 'prune stale packs' '
 	orig_pack=$(echo .git/objects/pack/*.pack) &&
 	>.git/objects/tmp_1.pack &&

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] prune: recursively prune objects directory
  2022-11-22  0:09         ` [PATCH] prune: recursively prune objects directory Eric Wong
@ 2022-11-22  1:28           ` Junio C Hamano
  2022-11-22  9:59             ` Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2022-11-22  1:28 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

Eric Wong <e@80x24.org> writes:

> I am unsure about duplicating ishex() from name-rev.c, however...

Yeah, I wonder why name-rev.c does not use isxdigit() in the first
place.

> ------8<-----
> Subject: [PATCH] prune: recursively prune objects directory
>
> $GIT_DIR/objects/pack may be removed to save inodes in shared
> repositories, so avoid scanning it if it does not exist.  Loose
> object directories ($GIT_DIR/objects/??) may have old temporary
> files, so we now prune those, too.
>
> Recursion is limited to a single level since git doesn't use
> deeper levels.  This avoids the risk of stack overflows via
> infinite recursion when pruning untrusted repos.
>
> We'll also emit the system error in case a directory cannot be
> opened to help users diagnose permissions problems or resource
> constraints.
>
> Signed-off-by: Eric Wong <e@80x24.org>
> ---
>  builtin/prune.c  | 28 ++++++++++++++++++++--------
>  t/t5304-prune.sh | 16 ++++++++++++++++
>  2 files changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/prune.c b/builtin/prune.c
> index df376b2ed1..0f6a33690a 100644
> --- a/builtin/prune.c
> +++ b/builtin/prune.c
> @@ -114,25 +114,41 @@ static int prune_subdir(unsigned int nr, const char *path, void *data)
>  	return 0;
>  }
>  
> +/*
> + * XXX ishex is duplicated in builtin/name-rev.c, perhaps git-compat-util.h
> + * is a better home for it
> + */
> +#define ishex(x) (isdigit((x)) || ((x) >= 'a' && (x) <= 'f'))
> +static int is_loose_prefix(const char *d_name)
> +{
> +	return strlen(d_name) == 2 && ishex(d_name[0]) && ishex(d_name[1]);
> +}
> +
>  /*
>   * Write errors (particularly out of space) can result in
>   * failed temporary packs (and more rarely indexes and other
>   * files beginning with "tmp_") accumulating in the object
>   * and the pack directories.
>   */
> -static void remove_temporary_files(const char *path)
> +static void remove_temporary_files(const char *path, int recurse)
>  {
>  	DIR *dir;
>  	struct dirent *de;
>  
>  	dir = opendir(path);
>  	if (!dir) {
> -		fprintf(stderr, "Unable to open directory %s\n", path);
> +		warning_errno(_("unable to open directory %s"), path);
>  		return;
>  	}
>  	while ((de = readdir(dir)) != NULL)
> -		if (starts_with(de->d_name, "tmp_"))
> +		if (starts_with(de->d_name, "tmp_")) {
>  			prune_tmp_file(mkpath("%s/%s", path, de->d_name));
> +		} else if (recurse && (strcmp(de->d_name, "packs") == 0 ||
> +					is_loose_prefix(de->d_name))) {

OK, the intent is to be careful and deal only with the fan-out
directories objects/[0-9a-f]{2}/ and objects/pack/ and leave crufts
in objects/info and any other unknown subdirectories, which makes
sense.

Two nits are:

 - "packs" wants to be "pack".
 - "strcmp() == 0" wants to be "!strcmp()".

> +			char *s = mkpathdup("%s/%s", path, de->d_name);
> +			remove_temporary_files(s, 0);
> +			free(s);
> +		}
>  	closedir(dir);
>  }
>  
> @@ -150,7 +166,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
>  			 N_("limit traversal to objects outside promisor packfiles")),
>  		OPT_END()
>  	};
> -	char *s;
>  
>  	expire = TIME_MAX;
>  	save_commit_buffer = 0;
> @@ -186,10 +201,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
>  				      prune_cruft, prune_subdir, &revs);
>  
>  	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
> -	remove_temporary_files(get_object_directory());
> -	s = mkpathdup("%s/pack", get_object_directory());
> -	remove_temporary_files(s);
> -	free(s);
> +	remove_temporary_files(get_object_directory(), 1);
>  
>  	if (is_repository_shallow(the_repository)) {
>  		perform_reachability_traversal(&revs);
> diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
> index 8ae314af58..8c2278035e 100755
> --- a/t/t5304-prune.sh
> +++ b/t/t5304-prune.sh
> @@ -29,6 +29,22 @@ test_expect_success setup '
>  	git gc
>  '
>  
> +test_expect_success 'prune stale loose objects' '
> +	mkdir .git/objects/aa &&
> +	>.git/objects/aa/tmp_foo &&
> +	test-tool chmtime =-86501 .git/objects/aa/tmp_foo &&
> +	git prune --expire 1.day &&
> +	test_path_is_missing .git/objects/aa/tmp_foo
> +'
> +
> +test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
> +	git clone -q --shared --template= --bare . bare.git &&
> +	rmdir bare.git/objects/pack &&
> +	git --git-dir=bare.git prune --no-progress 2>prune.err &&
> +	test_must_be_empty prune.err &&
> +	rm -r bare.git prune.err
> +'

Is the last "clean-up" step necessary?

> +
>  test_expect_success 'prune stale packs' '
>  	orig_pack=$(echo .git/objects/pack/*.pack) &&
>  	>.git/objects/tmp_1.pack &&

Other than that, looks like a good idea.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] prune: recursively prune objects directory
  2022-11-22  1:28           ` Junio C Hamano
@ 2022-11-22  9:59             ` Eric Wong
  2022-11-22 23:16               ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2022-11-22  9:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > I am unsure about duplicating ishex() from name-rev.c, however...
> 
> Yeah, I wonder why name-rev.c does not use isxdigit() in the first
> place.

isxdigit includes uppercase [A-F].  I think being strict is
better, here.  I don't want to open up a can of worms if we
become tolerant of 3rd-party git implementations developed on
case-insensitive FSes.

> > -static void remove_temporary_files(const char *path)
> > +static void remove_temporary_files(const char *path, int recurse)
> >  {
> >  	DIR *dir;
> >  	struct dirent *de;
> >  
> >  	dir = opendir(path);
> >  	if (!dir) {
> > -		fprintf(stderr, "Unable to open directory %s\n", path);
> > +		warning_errno(_("unable to open directory %s"), path);
> >  		return;
> >  	}
> >  	while ((de = readdir(dir)) != NULL)
> > -		if (starts_with(de->d_name, "tmp_"))
> > +		if (starts_with(de->d_name, "tmp_")) {
> >  			prune_tmp_file(mkpath("%s/%s", path, de->d_name));
> > +		} else if (recurse && (strcmp(de->d_name, "packs") == 0 ||
> > +					is_loose_prefix(de->d_name))) {
> 
> OK, the intent is to be careful and deal only with the fan-out
> directories objects/[0-9a-f]{2}/ and objects/pack/ and leave crufts
> in objects/info and any other unknown subdirectories, which makes
> sense.
> 
> Two nits are:
> 
>  - "packs" wants to be "pack".

OK, fixed.  Along with existing test cases, since packs handling
wasn't being tested properly.

>  - "strcmp() == 0" wants to be "!strcmp()".

OK

> > diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
> > index 8ae314af58..8c2278035e 100755
> > --- a/t/t5304-prune.sh
> > +++ b/t/t5304-prune.sh
> > @@ -29,6 +29,22 @@ test_expect_success setup '
> >  	git gc
> >  '
> >  
> > +test_expect_success 'prune stale loose objects' '
> > +	mkdir .git/objects/aa &&
> > +	>.git/objects/aa/tmp_foo &&
> > +	test-tool chmtime =-86501 .git/objects/aa/tmp_foo &&
> > +	git prune --expire 1.day &&
> > +	test_path_is_missing .git/objects/aa/tmp_foo
> > +'
> > +
> > +test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
> > +	git clone -q --shared --template= --bare . bare.git &&
> > +	rmdir bare.git/objects/pack &&
> > +	git --git-dir=bare.git prune --no-progress 2>prune.err &&
> > +	test_must_be_empty prune.err &&
> > +	rm -r bare.git prune.err
> > +'
> 
> Is the last "clean-up" step necessary?

Guess not, removed in v2 below.

> > +
> >  test_expect_success 'prune stale packs' '
> >  	orig_pack=$(echo .git/objects/pack/*.pack) &&
> >  	>.git/objects/tmp_1.pack &&
> 
> Other than that, looks like a good idea.

'prune stale packs' was actually insufficient for catching
the extraneous `s' in `pack'.  I've kept existing checks against
objects/tmp_*, but added extra checks for objects/pack/tmp_*

v2 fixes:
* `pack' directory fixed, tests added
* !strcmp
* remove needless cleanup step in test

-----8<-----
Subject: [PATCH] prune: recursively prune objects directory

$GIT_DIR/objects/pack may be removed to save inodes in shared
repositories, so avoid scanning it if it does not exist.  Loose
object directories ($GIT_DIR/objects/??) may have old temporary
files, so we now prune those, too.

Recursion is limited to a single level since git doesn't use
deeper levels.  This avoids the risk of stack overflows via
infinite recursion when pruning untrusted repos.

We'll also emit the system error in case a directory cannot be
opened to help users diagnose permissions problems or resource
constraints.

Signed-off-by: Eric Wong <e@80x24.org>
---
Interdiff:
  diff --git a/builtin/prune.c b/builtin/prune.c
  index 0f6a33690a..a05f1a2704 100644
  --- a/builtin/prune.c
  +++ b/builtin/prune.c
  @@ -143,7 +143,7 @@ static void remove_temporary_files(const char *path, int recurse)
   	while ((de = readdir(dir)) != NULL)
   		if (starts_with(de->d_name, "tmp_")) {
   			prune_tmp_file(mkpath("%s/%s", path, de->d_name));
  -		} else if (recurse && (strcmp(de->d_name, "packs") == 0 ||
  +		} else if (recurse && (!strcmp(de->d_name, "pack") ||
   					is_loose_prefix(de->d_name))) {
   			char *s = mkpathdup("%s/%s", path, de->d_name);
   			remove_temporary_files(s, 0);
  diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
  index 8c2278035e..64d5f4e5b3 100755
  --- a/t/t5304-prune.sh
  +++ b/t/t5304-prune.sh
  @@ -41,19 +41,23 @@ test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
   	git clone -q --shared --template= --bare . bare.git &&
   	rmdir bare.git/objects/pack &&
   	git --git-dir=bare.git prune --no-progress 2>prune.err &&
  -	test_must_be_empty prune.err &&
  -	rm -r bare.git prune.err
  +	test_must_be_empty prune.err
   '
   
   test_expect_success 'prune stale packs' '
   	orig_pack=$(echo .git/objects/pack/*.pack) &&
   	>.git/objects/tmp_1.pack &&
   	>.git/objects/tmp_2.pack &&
  -	test-tool chmtime =-86501 .git/objects/tmp_1.pack &&
  +	>.git/objects/pack/tmp_3.pack &&
  +	>.git/objects/pack/tmp_4.pack &&
  +	test-tool chmtime =-86501 .git/objects/tmp_1.pack \
  +		.git/objects/pack/tmp_3.pack &&
   	git prune --expire 1.day &&
   	test_path_is_file $orig_pack &&
   	test_path_is_file .git/objects/tmp_2.pack &&
  -	test_path_is_missing .git/objects/tmp_1.pack
  +	test_path_is_file .git/objects/pack/tmp_4.pack &&
  +	test_path_is_missing .git/objects/tmp_1.pack &&
  +	test_path_is_missing .git/objects/pack/tmp_3.pack
   '
   
   test_expect_success 'prune --expire' '

 builtin/prune.c  | 28 ++++++++++++++++++++--------
 t/t5304-prune.sh | 24 ++++++++++++++++++++++--
 2 files changed, 42 insertions(+), 10 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index df376b2ed1..a05f1a2704 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -114,25 +114,41 @@ static int prune_subdir(unsigned int nr, const char *path, void *data)
 	return 0;
 }
 
+/*
+ * XXX ishex is duplicated in builtin/name-rev.c, perhaps git-compat-util.h
+ * is a better home for it
+ */
+#define ishex(x) (isdigit((x)) || ((x) >= 'a' && (x) <= 'f'))
+static int is_loose_prefix(const char *d_name)
+{
+	return strlen(d_name) == 2 && ishex(d_name[0]) && ishex(d_name[1]);
+}
+
 /*
  * Write errors (particularly out of space) can result in
  * failed temporary packs (and more rarely indexes and other
  * files beginning with "tmp_") accumulating in the object
  * and the pack directories.
  */
-static void remove_temporary_files(const char *path)
+static void remove_temporary_files(const char *path, int recurse)
 {
 	DIR *dir;
 	struct dirent *de;
 
 	dir = opendir(path);
 	if (!dir) {
-		fprintf(stderr, "Unable to open directory %s\n", path);
+		warning_errno(_("unable to open directory %s"), path);
 		return;
 	}
 	while ((de = readdir(dir)) != NULL)
-		if (starts_with(de->d_name, "tmp_"))
+		if (starts_with(de->d_name, "tmp_")) {
 			prune_tmp_file(mkpath("%s/%s", path, de->d_name));
+		} else if (recurse && (!strcmp(de->d_name, "pack") ||
+					is_loose_prefix(de->d_name))) {
+			char *s = mkpathdup("%s/%s", path, de->d_name);
+			remove_temporary_files(s, 0);
+			free(s);
+		}
 	closedir(dir);
 }
 
@@ -150,7 +166,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 			 N_("limit traversal to objects outside promisor packfiles")),
 		OPT_END()
 	};
-	char *s;
 
 	expire = TIME_MAX;
 	save_commit_buffer = 0;
@@ -186,10 +201,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 				      prune_cruft, prune_subdir, &revs);
 
 	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
-	remove_temporary_files(get_object_directory());
-	s = mkpathdup("%s/pack", get_object_directory());
-	remove_temporary_files(s);
-	free(s);
+	remove_temporary_files(get_object_directory(), 1);
 
 	if (is_repository_shallow(the_repository)) {
 		perform_reachability_traversal(&revs);
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 8ae314af58..64d5f4e5b3 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -29,15 +29,35 @@ test_expect_success setup '
 	git gc
 '
 
+test_expect_success 'prune stale loose objects' '
+	mkdir .git/objects/aa &&
+	>.git/objects/aa/tmp_foo &&
+	test-tool chmtime =-86501 .git/objects/aa/tmp_foo &&
+	git prune --expire 1.day &&
+	test_path_is_missing .git/objects/aa/tmp_foo
+'
+
+test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
+	git clone -q --shared --template= --bare . bare.git &&
+	rmdir bare.git/objects/pack &&
+	git --git-dir=bare.git prune --no-progress 2>prune.err &&
+	test_must_be_empty prune.err
+'
+
 test_expect_success 'prune stale packs' '
 	orig_pack=$(echo .git/objects/pack/*.pack) &&
 	>.git/objects/tmp_1.pack &&
 	>.git/objects/tmp_2.pack &&
-	test-tool chmtime =-86501 .git/objects/tmp_1.pack &&
+	>.git/objects/pack/tmp_3.pack &&
+	>.git/objects/pack/tmp_4.pack &&
+	test-tool chmtime =-86501 .git/objects/tmp_1.pack \
+		.git/objects/pack/tmp_3.pack &&
 	git prune --expire 1.day &&
 	test_path_is_file $orig_pack &&
 	test_path_is_file .git/objects/tmp_2.pack &&
-	test_path_is_missing .git/objects/tmp_1.pack
+	test_path_is_file .git/objects/pack/tmp_4.pack &&
+	test_path_is_missing .git/objects/tmp_1.pack &&
+	test_path_is_missing .git/objects/pack/tmp_3.pack
 '
 
 test_expect_success 'prune --expire' '

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] prune: recursively prune objects directory
  2022-11-22  9:59             ` Eric Wong
@ 2022-11-22 23:16               ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2022-11-22 23:16 UTC (permalink / raw)
  To: Eric Wong; +Cc: git

Eric Wong <e@80x24.org> writes:

> Junio C Hamano <gitster@pobox.com> wrote:
>> Eric Wong <e@80x24.org> writes:
>> 
>> > I am unsure about duplicating ishex() from name-rev.c, however...
>> 
>> Yeah, I wonder why name-rev.c does not use isxdigit() in the first
>> place.
>
> isxdigit includes uppercase [A-F].  I think being strict is
> better, here.  I don't want to open up a can of worms if we
> become tolerant of 3rd-party git implementations developed on
> case-insensitive FSes.

OK, we do not recurse into .git/objects/AA/ for the same reason why
we do not recurse into .git/objects/info/.  We do expect [0-9a-f]{2}
and pack to be directories, so we go silent if they are missing, but
we do complain if somebody creates a regular file .git/objects/aa
for fun.

I agree that isxdigit() is not a good match.  I also agree with what
you said about it belong to git-compat-util.h but let's leave it for
a future clean-up patch to remove both copies of ishex(), introduce
islxdigit() in git-compat-util.h and use it as its replacement.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-22 23:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-19 20:12 [PATCH] prune: quiet ENOENT on missing directories Eric Wong
2022-11-21  6:02 ` Junio C Hamano
2022-11-21 10:44   ` Eric Wong
2022-11-21 13:08     ` Junio C Hamano
2022-11-21 23:09       ` Junio C Hamano
2022-11-22  0:09         ` [PATCH] prune: recursively prune objects directory Eric Wong
2022-11-22  1:28           ` Junio C Hamano
2022-11-22  9:59             ` Eric Wong
2022-11-22 23:16               ` Junio C Hamano
2022-11-21 11:16 ` [PATCH] prune: quiet ENOENT on missing directories Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).