git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: [PATCH] prune: recursively prune objects directory
Date: Tue, 22 Nov 2022 00:09:27 +0000	[thread overview]
Message-ID: <20221122000927.M873500@dcvr> (raw)
In-Reply-To: <xmqqleo3vraj.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> >>  	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
> >> -	remove_temporary_files(get_object_directory());
> >>  	s = mkpathdup("%s/pack", get_object_directory());
> >>  	remove_temporary_files(s);
> >>  	free(s);
> >
> > I actually was hinting at making the remove_temporary_files()
> > recurse, so that you do not need the separate invocation in pack/
> > subdirectory.
> >
> > Or make 256 calls for each of the fan-out subdirectory, in which
> > case the ENOENT silencing you did would really matter and shine.
> 
> But of course, neither is any part of this topic.  They are possible
> follow-on works.
> 
> Thanks and sorry for making a confusing statement that could be
> mistaken as "let's do this too", which wasn't what I meant.

Oh, no worries.  I already wrote this earlier and got distracted
with something else while waiting for tests :x.  Anyways, the
below supercedes my original patch and I think it's better in
every way.

I am unsure about duplicating ishex() from name-rev.c, however...

------8<-----
Subject: [PATCH] prune: recursively prune objects directory

$GIT_DIR/objects/pack may be removed to save inodes in shared
repositories, so avoid scanning it if it does not exist.  Loose
object directories ($GIT_DIR/objects/??) may have old temporary
files, so we now prune those, too.

Recursion is limited to a single level since git doesn't use
deeper levels.  This avoids the risk of stack overflows via
infinite recursion when pruning untrusted repos.

We'll also emit the system error in case a directory cannot be
opened to help users diagnose permissions problems or resource
constraints.

Signed-off-by: Eric Wong <e@80x24.org>
---
 builtin/prune.c  | 28 ++++++++++++++++++++--------
 t/t5304-prune.sh | 16 ++++++++++++++++
 2 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/builtin/prune.c b/builtin/prune.c
index df376b2ed1..0f6a33690a 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -114,25 +114,41 @@ static int prune_subdir(unsigned int nr, const char *path, void *data)
 	return 0;
 }
 
+/*
+ * XXX ishex is duplicated in builtin/name-rev.c, perhaps git-compat-util.h
+ * is a better home for it
+ */
+#define ishex(x) (isdigit((x)) || ((x) >= 'a' && (x) <= 'f'))
+static int is_loose_prefix(const char *d_name)
+{
+	return strlen(d_name) == 2 && ishex(d_name[0]) && ishex(d_name[1]);
+}
+
 /*
  * Write errors (particularly out of space) can result in
  * failed temporary packs (and more rarely indexes and other
  * files beginning with "tmp_") accumulating in the object
  * and the pack directories.
  */
-static void remove_temporary_files(const char *path)
+static void remove_temporary_files(const char *path, int recurse)
 {
 	DIR *dir;
 	struct dirent *de;
 
 	dir = opendir(path);
 	if (!dir) {
-		fprintf(stderr, "Unable to open directory %s\n", path);
+		warning_errno(_("unable to open directory %s"), path);
 		return;
 	}
 	while ((de = readdir(dir)) != NULL)
-		if (starts_with(de->d_name, "tmp_"))
+		if (starts_with(de->d_name, "tmp_")) {
 			prune_tmp_file(mkpath("%s/%s", path, de->d_name));
+		} else if (recurse && (strcmp(de->d_name, "packs") == 0 ||
+					is_loose_prefix(de->d_name))) {
+			char *s = mkpathdup("%s/%s", path, de->d_name);
+			remove_temporary_files(s, 0);
+			free(s);
+		}
 	closedir(dir);
 }
 
@@ -150,7 +166,6 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 			 N_("limit traversal to objects outside promisor packfiles")),
 		OPT_END()
 	};
-	char *s;
 
 	expire = TIME_MAX;
 	save_commit_buffer = 0;
@@ -186,10 +201,7 @@ int cmd_prune(int argc, const char **argv, const char *prefix)
 				      prune_cruft, prune_subdir, &revs);
 
 	prune_packed_objects(show_only ? PRUNE_PACKED_DRY_RUN : 0);
-	remove_temporary_files(get_object_directory());
-	s = mkpathdup("%s/pack", get_object_directory());
-	remove_temporary_files(s);
-	free(s);
+	remove_temporary_files(get_object_directory(), 1);
 
 	if (is_repository_shallow(the_repository)) {
 		perform_reachability_traversal(&revs);
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 8ae314af58..8c2278035e 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -29,6 +29,22 @@ test_expect_success setup '
 	git gc
 '
 
+test_expect_success 'prune stale loose objects' '
+	mkdir .git/objects/aa &&
+	>.git/objects/aa/tmp_foo &&
+	test-tool chmtime =-86501 .git/objects/aa/tmp_foo &&
+	git prune --expire 1.day &&
+	test_path_is_missing .git/objects/aa/tmp_foo
+'
+
+test_expect_success 'bare repo prune is quiet without $GIT_DIR/objects/pack' '
+	git clone -q --shared --template= --bare . bare.git &&
+	rmdir bare.git/objects/pack &&
+	git --git-dir=bare.git prune --no-progress 2>prune.err &&
+	test_must_be_empty prune.err &&
+	rm -r bare.git prune.err
+'
+
 test_expect_success 'prune stale packs' '
 	orig_pack=$(echo .git/objects/pack/*.pack) &&
 	>.git/objects/tmp_1.pack &&

  reply	other threads:[~2022-11-22  0:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-19 20:12 [PATCH] prune: quiet ENOENT on missing directories Eric Wong
2022-11-21  6:02 ` Junio C Hamano
2022-11-21 10:44   ` Eric Wong
2022-11-21 13:08     ` Junio C Hamano
2022-11-21 23:09       ` Junio C Hamano
2022-11-22  0:09         ` Eric Wong [this message]
2022-11-22  1:28           ` [PATCH] prune: recursively prune objects directory Junio C Hamano
2022-11-22  9:59             ` Eric Wong
2022-11-22 23:16               ` Junio C Hamano
2022-11-21 11:16 ` [PATCH] prune: quiet ENOENT on missing directories Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221122000927.M873500@dcvr \
    --to=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).