git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] repack: don't move existing packs out of the way
@ 2020-11-16 18:41 Taylor Blau
  2020-11-16 18:41 ` [PATCH 1/3] repack: make "exts" array available outside cmd_repack() Taylor Blau
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Taylor Blau @ 2020-11-16 18:41 UTC (permalink / raw)
  To: git; +Cc: gitster, peff

Hi,

Here are a few patches that GitHub has been running to get rid of the
rename-to-old behavior that 'git repack' has when its invocation of
'pack-objects' produced a pack that already exists.

This was developed in the context of a circular dependency involving
writing a multi-pack index during 'git repack' (we have some patches
that do so by adding a '--write-midx' in the repack builtin), but it
should be generally useful.

The idea (which is explained in detail in the final patch) is that prior
to 1190a1acf8 (pack-objects: name pack files after trailer hash,
2013-12-05), 'git repack' had to move existing packs out of the way for
safety, but after 1190a1acf8 no longer needs to do so.

This makes 'git repack' a little simpler since it no longer has to deal
with any failures encountered during this rename-to-old behavior. It
also paves the way for sending the MIDX-during-repack patches.

Thanks,
Taylor

Jeff King (1):
  repack: make "exts" array available outside cmd_repack()

Taylor Blau (2):
  builtin/repack.c: keep track of what pack-objects wrote
  builtin/repack.c: don't move existing packs out of the way

 builtin/repack.c | 153 +++++++++++++++++------------------------------
 1 file changed, 54 insertions(+), 99 deletions(-)

--
2.29.2.312.gabc4d358d8

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] repack: make "exts" array available outside cmd_repack()
  2020-11-16 18:41 [PATCH 0/3] repack: don't move existing packs out of the way Taylor Blau
@ 2020-11-16 18:41 ` Taylor Blau
  2020-11-16 18:41 ` [PATCH 2/3] builtin/repack.c: keep track of what pack-objects wrote Taylor Blau
  2020-11-16 18:41 ` [PATCH 3/3] builtin/repack.c: don't move existing packs out of the way Taylor Blau
  2 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2020-11-16 18:41 UTC (permalink / raw)
  To: git; +Cc: gitster, peff

From: Jeff King <peff@peff.net>

We'll use it in a helper function soon.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 01e7767c79..03e2c2c44b 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -202,6 +202,16 @@ static int write_oid(const struct object_id *oid, struct packed_git *pack,
 	return 0;
 }
 
+static struct {
+	const char *name;
+	unsigned optional:1;
+} exts[] = {
+	{".pack"},
+	{".idx"},
+	{".bitmap", 1},
+	{".promisor", 1},
+};
+
 static void repack_promisor_objects(const struct pack_objects_args *args,
 				    struct string_list *names)
 {
@@ -265,15 +275,6 @@ static void repack_promisor_objects(const struct pack_objects_args *args,
 
 int cmd_repack(int argc, const char **argv, const char *prefix)
 {
-	struct {
-		const char *name;
-		unsigned optional:1;
-	} exts[] = {
-		{".pack"},
-		{".idx"},
-		{".bitmap", 1},
-		{".promisor", 1},
-	};
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct string_list_item *item;
 	struct string_list names = STRING_LIST_INIT_DUP;
-- 
2.29.2.312.gabc4d358d8


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/3] builtin/repack.c: keep track of what pack-objects wrote
  2020-11-16 18:41 [PATCH 0/3] repack: don't move existing packs out of the way Taylor Blau
  2020-11-16 18:41 ` [PATCH 1/3] repack: make "exts" array available outside cmd_repack() Taylor Blau
@ 2020-11-16 18:41 ` Taylor Blau
  2020-11-16 18:41 ` [PATCH 3/3] builtin/repack.c: don't move existing packs out of the way Taylor Blau
  2 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2020-11-16 18:41 UTC (permalink / raw)
  To: git; +Cc: gitster, peff

In the subsequent commit, it will become useful to keep track of which
metadata files were written by pack-objects. We already do this to an
extent with the 'exts' array, which only is used in the context of
existing packs.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 03e2c2c44b..bb839180da 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -212,6 +212,27 @@ static struct {
 	{".promisor", 1},
 };
 
+static unsigned populate_pack_exts(char *name)
+{
+	struct stat statbuf;
+	struct strbuf path = STRBUF_INIT;
+	unsigned ret = 0;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(exts); i++) {
+		strbuf_reset(&path);
+		strbuf_addf(&path, "%s-%s%s", packtmp, name, exts[i].name);
+
+		if (stat(path.buf, &statbuf))
+			continue;
+
+		ret |= (1 << i);
+	}
+
+	strbuf_release(&path);
+	return ret;
+}
+
 static void repack_promisor_objects(const struct pack_objects_args *args,
 				    struct string_list *names)
 {
@@ -240,11 +261,12 @@ static void repack_promisor_objects(const struct pack_objects_args *args,
 
 	out = xfdopen(cmd.out, "r");
 	while (strbuf_getline_lf(&line, out) != EOF) {
+		struct string_list_item *item;
 		char *promisor_name;
 		int fd;
 		if (line.len != the_hash_algo->hexsz)
 			die(_("repack: Expecting full hex object ID lines only from pack-objects."));
-		string_list_append(names, line.buf);
+		item = string_list_append(names, line.buf);
 
 		/*
 		 * pack-objects creates the .pack and .idx files, but not the
@@ -263,6 +285,9 @@ static void repack_promisor_objects(const struct pack_objects_args *args,
 		if (fd < 0)
 			die_errno(_("unable to create '%s'"), promisor_name);
 		close(fd);
+
+		item->util = (void *)(uintptr_t)populate_pack_exts(item->string);
+
 		free(promisor_name);
 	}
 	fclose(out);
@@ -430,6 +455,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (!names.nr && !po_args.quiet)
 		printf_ln(_("Nothing new to pack."));
 
+	for_each_string_list_item(item, &names) {
+		item->util = (void *)(uintptr_t)populate_pack_exts(item->string);
+	}
+
 	close_object_store(the_repository->objects);
 
 	/*
-- 
2.29.2.312.gabc4d358d8


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/3] builtin/repack.c: don't move existing packs out of the way
  2020-11-16 18:41 [PATCH 0/3] repack: don't move existing packs out of the way Taylor Blau
  2020-11-16 18:41 ` [PATCH 1/3] repack: make "exts" array available outside cmd_repack() Taylor Blau
  2020-11-16 18:41 ` [PATCH 2/3] builtin/repack.c: keep track of what pack-objects wrote Taylor Blau
@ 2020-11-16 18:41 ` Taylor Blau
  2020-11-16 23:29   ` Junio C Hamano
  2 siblings, 1 reply; 11+ messages in thread
From: Taylor Blau @ 2020-11-16 18:41 UTC (permalink / raw)
  To: git; +Cc: gitster, peff

When 'git repack' creates a pack with the same name as any existing
pack, it moves the existing one to 'old-pack-xxx.{pack,idx,...}' and
then renames the new one into place.

Eventually, it would be nice to have 'git repack' allow for writing a
multi-pack index at the critical time (after the new packs have been
written / moved into place, but before the old ones have been deleted).
Guessing that this option might be called '--write-midx', this makes the
following situation (where repacks are issued back-to-back without any
new objects) impossible:

    $ git repack -adb
    $ git repack -adb --write-midx

In the second repack, the existing packs are overwritten verbatim with
the same rename-to-old sequence. At that point, the current MIDX is
invalidated, since it refers to now-missing packs. So that code wants to
be run after the MIDX is re-written. But (prior to this patch) the new
MIDX can't be written until the new packs are moved into place. So, we
have a circular dependency.

This is all hypothetical, since no code currently exists to write a MIDX
safely during a 'git repack' (the 'GIT_TEST_MULTI_PACK_INDEX' does so
unsafely). Putting hypothetical aside, though: why do we need to rename
existing packs to be prefixed with 'old-' anyway?

This behavior dates all the way back to 2ad47d6 (git-repack: Be
careful when updating the same pack as an existing one., 2006-06-25).
2ad47d6 is mainly concerned about a case where a newly written pack
would have a different structure than its index. This used to be
possible when the pack name was a hash of the set of objects. Under this
naming scheme, two packs that store the same set of objects could differ
in delta selection, object positioning, or both. If this happened, then
any such packs would be unreadable in the instant between copying the
new pack and new index (i.e., either the index or pack will be stale
depending on the order that they were copied).

But since 1190a1a (pack-objects: name pack files after trailer hash,
2013-12-05), this is no longer possible, since pack files are named not
after their logical contents (i.e., the set of objects), but by the
actual checksum of their contents. So, this old- behavior can safely go,
which allows us to avoid our circular dependency above.

In addition to avoiding the circular dependency, this patch also makes
'git repack' a lot simpler, since we don't have to deal with failures
encountered when renaming existing packs to be prefixed with 'old-'.

This patch is mostly limited to removing code paths that deal with the
'old' prefixing, with the exception of pack metadata. t7700.14 ensures
that 'git repack' will, for example, remove existing bitmaps even if the
pack written is verbatim the same (when repacking without '-b' in a
repository unchanged from the time 'git repack -b' was run). So, we have
to handle metadata that we didn't write, by unlinking any existing
metadata that our invocation of pack-objects didn't generate itself.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c | 103 +++++++----------------------------------------
 1 file changed, 14 insertions(+), 89 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index bb839180da..279be11a16 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -306,7 +306,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	struct string_list rollback = STRING_LIST_INIT_NODUP;
 	struct string_list existing_packs = STRING_LIST_INIT_DUP;
 	struct strbuf line = STRBUF_INIT;
-	int i, ext, ret, failed;
+	int i, ext, ret;
 	FILE *out;
 
 	/* variables to be filled by option parsing */
@@ -463,109 +463,34 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 
 	/*
 	 * Ok we have prepared all new packfiles.
-	 * First see if there are packs of the same name and if so
-	 * if we can move them out of the way (this can happen if we
-	 * repacked immediately after packing fully.
 	 */
-	failed = 0;
 	for_each_string_list_item(item, &names) {
 		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
 			char *fname, *fname_old;
 
-			fname = mkpathdup("%s/pack-%s%s", packdir,
-						item->string, exts[ext].name);
-			if (!file_exists(fname)) {
-				free(fname);
-				continue;
-			}
-
-			fname_old = mkpathdup("%s/old-%s%s", packdir,
-						item->string, exts[ext].name);
-			if (file_exists(fname_old))
-				if (unlink(fname_old))
-					failed = 1;
-
-			if (!failed && rename(fname, fname_old)) {
-				free(fname);
-				free(fname_old);
-				failed = 1;
-				break;
-			} else {
-				string_list_append(&rollback, fname);
-				free(fname_old);
-			}
-		}
-		if (failed)
-			break;
-	}
-	if (failed) {
-		struct string_list rollback_failure = STRING_LIST_INIT_DUP;
-		for_each_string_list_item(item, &rollback) {
-			char *fname, *fname_old;
-			fname = mkpathdup("%s/%s", packdir, item->string);
-			fname_old = mkpathdup("%s/old-%s", packdir, item->string);
-			if (rename(fname_old, fname))
-				string_list_append(&rollback_failure, fname);
-			free(fname);
-			free(fname_old);
-		}
-
-		if (rollback_failure.nr) {
-			int i;
-			fprintf(stderr,
-				_("WARNING: Some packs in use have been renamed by\n"
-				  "WARNING: prefixing old- to their name, in order to\n"
-				  "WARNING: replace them with the new version of the\n"
-				  "WARNING: file.  But the operation failed, and the\n"
-				  "WARNING: attempt to rename them back to their\n"
-				  "WARNING: original names also failed.\n"
-				  "WARNING: Please rename them in %s manually:\n"), packdir);
-			for (i = 0; i < rollback_failure.nr; i++)
-				fprintf(stderr, "WARNING:   old-%s -> %s\n",
-					rollback_failure.items[i].string,
-					rollback_failure.items[i].string);
-		}
-		exit(1);
-	}
-
-	/* Now the ones with the same name are out of the way... */
-	for_each_string_list_item(item, &names) {
-		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
-			char *fname, *fname_old;
-			struct stat statbuffer;
-			int exists = 0;
 			fname = mkpathdup("%s/pack-%s%s",
 					packdir, item->string, exts[ext].name);
 			fname_old = mkpathdup("%s-%s%s",
 					packtmp, item->string, exts[ext].name);
-			if (!stat(fname_old, &statbuffer)) {
-				statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
-				chmod(fname_old, statbuffer.st_mode);
-				exists = 1;
-			}
-			if (exists || !exts[ext].optional) {
+
+			if (((uintptr_t)item->util) & (1 << ext)) {
+				struct stat statbuffer;
+				if (!stat(fname_old, &statbuffer)) {
+					statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
+					chmod(fname_old, statbuffer.st_mode);
+				}
+
 				if (rename(fname_old, fname))
 					die_errno(_("renaming '%s' failed"), fname_old);
-			}
+			} else if (!exts[ext].optional)
+				die(_("missing required file: %s"), fname_old);
+			else if (unlink(fname) < 0 && errno != ENOENT)
+				die_errno(_("could not unlink: %s"), fname);
+
 			free(fname);
 			free(fname_old);
 		}
 	}
-
-	/* Remove the "old-" files */
-	for_each_string_list_item(item, &names) {
-		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
-			char *fname;
-			fname = mkpathdup("%s/old-%s%s",
-					  packdir,
-					  item->string,
-					  exts[ext].name);
-			if (remove_path(fname))
-				warning(_("failed to remove '%s'"), fname);
-			free(fname);
-		}
-	}
-
 	/* End of pack replacement. */
 
 	reprepare_packed_git(the_repository);
-- 
2.29.2.312.gabc4d358d8

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] builtin/repack.c: don't move existing packs out of the way
  2020-11-16 18:41 ` [PATCH 3/3] builtin/repack.c: don't move existing packs out of the way Taylor Blau
@ 2020-11-16 23:29   ` Junio C Hamano
  2020-11-17  0:02     ` Jeff King
  2020-11-17  0:25     ` Taylor Blau
  0 siblings, 2 replies; 11+ messages in thread
From: Junio C Hamano @ 2020-11-16 23:29 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff

Taylor Blau <me@ttaylorr.com> writes:

> This behavior dates all the way back to 2ad47d6 (git-repack: Be
> careful when updating the same pack as an existing one., 2006-06-25).
> 2ad47d6 is mainly concerned about a case where a newly written pack
> would have a different structure than its index. This used to be
> possible when the pack name was a hash of the set of objects. Under this
> naming scheme, two packs that store the same set of objects could differ
> in delta selection, object positioning, or both. If this happened, then
> any such packs would be unreadable in the instant between copying the
> new pack and new index (i.e., either the index or pack will be stale
> depending on the order that they were copied).

True.  So the idea is that we can now pretend that we never wrote
the new packfile and leave the existing one as-is?

> This patch is mostly limited to removing code paths that deal with the
> 'old' prefixing, with the exception of pack metadata.

... "pack metadata" meaning?  We do not remove and replace the file,
but we update their mtime to keep these packfiles more fresh than
other packfiles, or something?

> t7700.14 ensures
> that 'git repack' will, for example, remove existing bitmaps even if the
> pack written is verbatim the same (when repacking without '-b' in a
> repository unchanged from the time 'git repack -b' was run). So, we have
> to handle metadata that we didn't write, by unlinking any existing
> metadata that our invocation of pack-objects didn't generate itself.

Hmph, t7700.14 wants it that way because?

If we were told to generate a packfile, and we ended up regenerating
the exactly the same one, it appears to me that we can just pretend
nothing happened and leave things as they were?  Puzzled...

> @@ -463,109 +463,34 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  
>  	/*
>  	 * Ok we have prepared all new packfiles.
>  	 */
>  	for_each_string_list_item(item, &names) {
>  		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
>  			char *fname, *fname_old;
>  
>  			fname = mkpathdup("%s/pack-%s%s",
>  					packdir, item->string, exts[ext].name);
>  			fname_old = mkpathdup("%s-%s%s",
>  					packtmp, item->string, exts[ext].name);
> +
> +			if (((uintptr_t)item->util) & (1 << ext)) {
> +				struct stat statbuffer;
> +				if (!stat(fname_old, &statbuffer)) {
> +					statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
> +					chmod(fname_old, statbuffer.st_mode);
> +				}
> +
>  				if (rename(fname_old, fname))
>  					die_errno(_("renaming '%s' failed"), fname_old);

OK, so this is where the previous step matters.  We do the same as
before (i.e. stat+chmod and rename) only for paths we have created.

We don't reuse the old one because we have already written a new
file so we won't save anything by doing so, and checking if the
target of rename exists already before deciding not to rename cannot
be made atomic, so just relying on rename() to do the right thing is
a good idea anyway.

> +			} else if (!exts[ext].optional)
> +				die(_("missing required file: %s"), fname_old);

If one branch of if/else if.../else requires multi-statement block,
use {} on all of them, for consistency.

So, if we wrote a few .$ext for a packfile but not some .$ext, if
the one we didn't write is among the necessary one, we are in
trouble?  OK.

> +			else if (unlink(fname) < 0 && errno != ENOENT)
> +				die_errno(_("could not unlink: %s"), fname);

And if we wrote .pack and .idx but not .bitmap, the old .bitmpa that
has the same pack hash may be stale and we discard it for safety?
That sounds "prudent" but it is not immdiately clear from what
danger we are protecting ourselves.

In any case, much of what I speculated while reading the proposed
log message turned out to be false, which may be a sign that the log
message did not explain the approach clearly enough.  I thought that
a newly created file that happened to be identical to existing ones
would be discarded without getting renamed to their final location,
but the code does not do such special casing.  I thought the
'metadata' it talks about were to compensate for side effects of
reusing the old files, but that was not what the 'metadata' was even
about.

Other than that, the change in [2/3] and [3/3] look quite sensible
(I am not saying [1/3] is bad---I haven't looked at it yet).

Thanks.

>  			free(fname);
>  			free(fname_old);
>  		}
>  	}
>  	/* End of pack replacement. */
>  
>  	reprepare_packed_git(the_repository);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] builtin/repack.c: don't move existing packs out of the way
  2020-11-16 23:29   ` Junio C Hamano
@ 2020-11-17  0:02     ` Jeff King
  2020-11-17  0:26       ` Taylor Blau
  2020-11-17  0:25     ` Taylor Blau
  1 sibling, 1 reply; 11+ messages in thread
From: Jeff King @ 2020-11-17  0:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git

On Mon, Nov 16, 2020 at 03:29:05PM -0800, Junio C Hamano wrote:

> > t7700.14 ensures
> > that 'git repack' will, for example, remove existing bitmaps even if the
> > pack written is verbatim the same (when repacking without '-b' in a
> > repository unchanged from the time 'git repack -b' was run). So, we have
> > to handle metadata that we didn't write, by unlinking any existing
> > metadata that our invocation of pack-objects didn't generate itself.
> 
> Hmph, t7700.14 wants it that way because?
> 
> If we were told to generate a packfile, and we ended up regenerating
> the exactly the same one, it appears to me that we can just pretend
> nothing happened and leave things as they were?  Puzzled...

We definitely could, and the outcome would be correct in the sense that
the bitmap would still work. I'm not sure that the test in t7700 cares
particularly about this "making the same pack" case. It only wants to
know that if we disable bitmaps and repack, we end up without a bitmap.
And normally, if you added in even a single extra object, you'd get a
new pack and that would happen.

But of course the test just repacks back-to-back, so it does trigger
the "making the same pack" case.

I think you could make an argument either way:

  - we have an existing bitmap for free, and bitmaps make things faster,
    so why not keep it?

  - the user did not ask for bitmaps, so we should make the outcome
    consistent whether a pack of the exact name existed before or not

The second one seems less surprising to me. But I think if we did the
first, the code would be shorter (it would not need any of this "keep
track of what pack-objects generated" stuff at all, but would just copy
whatever files we see into place).

> >  				if (rename(fname_old, fname))
> >  					die_errno(_("renaming '%s' failed"), fname_old);
> 
> OK, so this is where the previous step matters.  We do the same as
> before (i.e. stat+chmod and rename) only for paths we have created.
> 
> We don't reuse the old one because we have already written a new
> file so we won't save anything by doing so, and checking if the
> target of rename exists already before deciding not to rename cannot
> be made atomic, so just relying on rename() to do the right thing is
> a good idea anyway.

Even though the pack is byte-for-byte identical, the new .idx, etc,
might not be. And those could be affected by options. E.g.:

  git -c pack.writeBitmapHashCache=false repack -adb
  git -c pack.writeBitmapHashCache=true  repack -adb

should probably end up with a bitmap file that contains a hash cache.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] builtin/repack.c: don't move existing packs out of the way
  2020-11-16 23:29   ` Junio C Hamano
  2020-11-17  0:02     ` Jeff King
@ 2020-11-17  0:25     ` Taylor Blau
  2020-11-17  0:46       ` Junio C Hamano
  1 sibling, 1 reply; 11+ messages in thread
From: Taylor Blau @ 2020-11-17  0:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff

On Mon, Nov 16, 2020 at 03:29:05PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > This behavior dates all the way back to 2ad47d6 (git-repack: Be
> > careful when updating the same pack as an existing one., 2006-06-25).
> > 2ad47d6 is mainly concerned about a case where a newly written pack
> > would have a different structure than its index. This used to be
> > possible when the pack name was a hash of the set of objects. Under this
> > naming scheme, two packs that store the same set of objects could differ
> > in delta selection, object positioning, or both. If this happened, then
> > any such packs would be unreadable in the instant between copying the
> > new pack and new index (i.e., either the index or pack will be stale
> > depending on the order that they were copied).
>
> True.  So the idea is that we can now pretend that we never wrote
> the new packfile and leave the existing one as-is?

True for packs, which are guaranteed to be unchanged by having the same
checksum in their filename. For other files that use their pack's
checksum in their filename, we want to overwrite the existing copy with
the one we just wrote.

> > This patch is mostly limited to removing code paths that deal with the
> > 'old' prefixing, with the exception of pack metadata.
>
> ... "pack metadata" meaning?  We do not remove and replace the file,
> but we update their mtime to keep these packfiles more fresh than
> other packfiles, or something?

Meaning: .idx, .bitmap, .promisor files. I probably ought to be more
clear in what "metadata" means, since it could easily be confused with
mtime and others.

> > @@ -463,109 +463,34 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> >
> >  	/*
> >  	 * Ok we have prepared all new packfiles.
> >  	 */
> >  	for_each_string_list_item(item, &names) {
> >  		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
> >  			char *fname, *fname_old;
> >
> >  			fname = mkpathdup("%s/pack-%s%s",
> >  					packdir, item->string, exts[ext].name);
> >  			fname_old = mkpathdup("%s-%s%s",
> >  					packtmp, item->string, exts[ext].name);
> > +
> > +			if (((uintptr_t)item->util) & (1 << ext)) {
> > +				struct stat statbuffer;
> > +				if (!stat(fname_old, &statbuffer)) {
> > +					statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
> > +					chmod(fname_old, statbuffer.st_mode);
> > +				}
> > +
> >  				if (rename(fname_old, fname))
> >  					die_errno(_("renaming '%s' failed"), fname_old);
>
> OK, so this is where the previous step matters.  We do the same as
> before (i.e. stat+chmod and rename) only for paths we have created.
>
> We don't reuse the old one because we have already written a new
> file so we won't save anything by doing so, and checking if the
> target of rename exists already before deciding not to rename cannot
> be made atomic, so just relying on rename() to do the right thing is
> a good idea anyway.
>
> > +			} else if (!exts[ext].optional)
> > +				die(_("missing required file: %s"), fname_old);
>
> If one branch of if/else if.../else requires multi-statement block,
> use {} on all of them, for consistency.

Thanks.

> So, if we wrote a few .$ext for a packfile but not some .$ext, if
> the one we didn't write is among the necessary one, we are in
> trouble?  OK.

Yes, exactly.

> > +			else if (unlink(fname) < 0 && errno != ENOENT)
> > +				die_errno(_("could not unlink: %s"), fname);
>
> And if we wrote .pack and .idx but not .bitmap, the old .bitmpa that
> has the same pack hash may be stale and we discard it for safety?
> That sounds "prudent" but it is not immdiately clear from what
> danger we are protecting ourselves.
>
> In any case, much of what I speculated while reading the proposed
> log message turned out to be false, which may be a sign that the log
> message did not explain the approach clearly enough.  I thought that
> a newly created file that happened to be identical to existing ones
> would be discarded without getting renamed to their final location,
> but the code does not do such special casing.  I thought the
> 'metadata' it talks about were to compensate for side effects of
> reusing the old files, but that was not what the 'metadata' was even
> about.

It's more about: we trust what pack-objects wrote to be the current
state of things. So, if a bitmap already exists, but the caller did a
back-to-back repack and this time generates their bitmap with the
hash cache extension, we prefer the newer bitmap to the one that already
existed.

That goes both ways: if pack-objects _didn't_ write a file that already
exists, we want to drop the existing file. Peff seems to talk more about
this in his email, which I'll respond to now...

> Other than that, the change in [2/3] and [3/3] look quite sensible
> (I am not saying [1/3] is bad---I haven't looked at it yet).
>
> Thanks.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] builtin/repack.c: don't move existing packs out of the way
  2020-11-17  0:02     ` Jeff King
@ 2020-11-17  0:26       ` Taylor Blau
  0 siblings, 0 replies; 11+ messages in thread
From: Taylor Blau @ 2020-11-17  0:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Taylor Blau, git

On Mon, Nov 16, 2020 at 07:02:52PM -0500, Jeff King wrote:
> I think you could make an argument either way:
>
>   - we have an existing bitmap for free, and bitmaps make things faster,
>     so why not keep it?
>
>   - the user did not ask for bitmaps, so we should make the outcome
>     consistent whether a pack of the exact name existed before or not
>
> The second one seems less surprising to me. But I think if we did the
> first, the code would be shorter (it would not need any of this "keep
> track of what pack-objects generated" stuff at all, but would just copy
> whatever files we see into place).

I think that the second one is _far_ less surprising to me, so I'd
prefer that we did that instead of keeping the .bitmap from the last run
regardless of whether or not the user asked for it.

> > >  				if (rename(fname_old, fname))
> > >  					die_errno(_("renaming '%s' failed"), fname_old);
> >
> > OK, so this is where the previous step matters.  We do the same as
> > before (i.e. stat+chmod and rename) only for paths we have created.
> >
> > We don't reuse the old one because we have already written a new
> > file so we won't save anything by doing so, and checking if the
> > target of rename exists already before deciding not to rename cannot
> > be made atomic, so just relying on rename() to do the right thing is
> > a good idea anyway.
>
> Even though the pack is byte-for-byte identical, the new .idx, etc,
> might not be. And those could be affected by options. E.g.:
>
>   git -c pack.writeBitmapHashCache=false repack -adb
>   git -c pack.writeBitmapHashCache=true  repack -adb
>
> should probably end up with a bitmap file that contains a hash cache.

Agreed completely.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] builtin/repack.c: don't move existing packs out of the way
  2020-11-17  0:25     ` Taylor Blau
@ 2020-11-17  0:46       ` Junio C Hamano
  2020-11-17 20:15         ` Taylor Blau
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2020-11-17  0:46 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff

Taylor Blau <me@ttaylorr.com> writes:

>> In any case, much of what I speculated while reading the proposed
>> log message turned out to be false, which may be a sign that the log
>> message did not explain the approach clearly enough.  I thought that
>> a newly created file that happened to be identical to existing ones
>> would be discarded without getting renamed to their final location,
>> but the code does not do such special casing.  I thought the
>> 'metadata' it talks about were to compensate for side effects of
>> reusing the old files, but that was not what the 'metadata' was even
>> about.
>
> It's more about: ...

You do not have to explain that to me here.  Instead explain that to
future readers of our history in the commit log message.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] builtin/repack.c: don't move existing packs out of the way
  2020-11-17  0:46       ` Junio C Hamano
@ 2020-11-17 20:15         ` Taylor Blau
  2020-11-17 21:28           ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Taylor Blau @ 2020-11-17 20:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff

On Mon, Nov 16, 2020 at 04:46:06PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> > It's more about: ...
>
> You do not have to explain that to me here.  Instead explain that to
> future readers of our history in the commit log message.
>
> Thanks.

Understood. Here's a replacement for the final patch (the log message is
updated, but the patch contents are not):

--- >8 ---

Subject: [PATCH] builtin/repack.c: don't move existing packs out of the way

When 'git repack' creates a pack with the same name as any existing
pack, it moves the existing one to 'old-pack-xxx.{pack,idx,...}' and
then renames the new one into place.

Eventually, it would be nice to have 'git repack' allow for writing a
multi-pack index at the critical time (after the new packs have been
written / moved into place, but before the old ones have been deleted).
Guessing that this option might be called '--write-midx', this makes the
following situation (where repacks are issued back-to-back without any
new objects) impossible:

    $ git repack -adb
    $ git repack -adb --write-midx

In the second repack, the existing packs are overwritten verbatim with
the same rename-to-old sequence. At that point, the current MIDX is
invalidated, since it refers to now-missing packs. So that code wants to
be run after the MIDX is re-written. But (prior to this patch) the new
MIDX can't be written until the new packs are moved into place. So, we
have a circular dependency.

This is all hypothetical, since no code currently exists to write a MIDX
safely during a 'git repack' (the 'GIT_TEST_MULTI_PACK_INDEX' does so
unsafely). Putting hypothetical aside, though: why do we need to rename
existing packs to be prefixed with 'old-' anyway?

This behavior dates all the way back to 2ad47d6 (git-repack: Be
careful when updating the same pack as an existing one., 2006-06-25).
2ad47d6 is mainly concerned about a case where a newly written pack
would have a different structure than its index. This used to be
possible when the pack name was a hash of the set of objects. Under this
naming scheme, two packs that store the same set of objects could differ
in delta selection, object positioning, or both. If this happened, then
any such packs would be unreadable in the instant between copying the
new pack and new index (i.e., either the index or pack will be stale
depending on the order that they were copied).

But since 1190a1a (pack-objects: name pack files after trailer hash,
2013-12-05), this is no longer possible, since pack files are named not
after their logical contents (i.e., the set of objects), but by the
actual checksum of their contents. So, this old- behavior can safely go,
which allows us to avoid our circular dependency above.

In addition to avoiding the circular dependency, this patch also makes
'git repack' a lot simpler, since we don't have to deal with failures
encountered when renaming existing packs to be prefixed with 'old-'.

This patch is mostly limited to removing code paths that deal with the
'old' prefixing, with the exception of files that include the pack's
name in their own filename, like .idx, .bitmap, and related files. The
exception is that we want to continue to trust what pack-objects wrote.
That is, it is not the case that we pretend as if pack-objects didn't
write files identical to ones that already exist, but rather that we
respect what pack-objects wrote as the source of truth. That cuts two
ways:

  - If pack-objects produced an identical pack to one that already
    exists with a bitmap, but did not produce a bitmap, we remove the
    bitmap that already exists. (This behavior is codified in t7700.14).

  - If pack-objects produced an identical pack to one that already
    exists, we trust the just-written version of the coresponding .idx,
    .promisor, and other files over the ones that already exist. This
    ensures that we use the most up-to-date versions of this files,
    which is safe even in the face of format changes in, say, the .idx
    file (which would not be reflected in the .idx file's name).

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c | 103 +++++++----------------------------------------
 1 file changed, 14 insertions(+), 89 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index bb839180da..279be11a16 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -306,7 +306,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	struct string_list rollback = STRING_LIST_INIT_NODUP;
 	struct string_list existing_packs = STRING_LIST_INIT_DUP;
 	struct strbuf line = STRBUF_INIT;
-	int i, ext, ret, failed;
+	int i, ext, ret;
 	FILE *out;

 	/* variables to be filled by option parsing */
@@ -463,109 +463,34 @@ int cmd_repack(int argc, const char **argv, const char *prefix)

 	/*
 	 * Ok we have prepared all new packfiles.
-	 * First see if there are packs of the same name and if so
-	 * if we can move them out of the way (this can happen if we
-	 * repacked immediately after packing fully.
 	 */
-	failed = 0;
 	for_each_string_list_item(item, &names) {
 		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
 			char *fname, *fname_old;

-			fname = mkpathdup("%s/pack-%s%s", packdir,
-						item->string, exts[ext].name);
-			if (!file_exists(fname)) {
-				free(fname);
-				continue;
-			}
-
-			fname_old = mkpathdup("%s/old-%s%s", packdir,
-						item->string, exts[ext].name);
-			if (file_exists(fname_old))
-				if (unlink(fname_old))
-					failed = 1;
-
-			if (!failed && rename(fname, fname_old)) {
-				free(fname);
-				free(fname_old);
-				failed = 1;
-				break;
-			} else {
-				string_list_append(&rollback, fname);
-				free(fname_old);
-			}
-		}
-		if (failed)
-			break;
-	}
-	if (failed) {
-		struct string_list rollback_failure = STRING_LIST_INIT_DUP;
-		for_each_string_list_item(item, &rollback) {
-			char *fname, *fname_old;
-			fname = mkpathdup("%s/%s", packdir, item->string);
-			fname_old = mkpathdup("%s/old-%s", packdir, item->string);
-			if (rename(fname_old, fname))
-				string_list_append(&rollback_failure, fname);
-			free(fname);
-			free(fname_old);
-		}
-
-		if (rollback_failure.nr) {
-			int i;
-			fprintf(stderr,
-				_("WARNING: Some packs in use have been renamed by\n"
-				  "WARNING: prefixing old- to their name, in order to\n"
-				  "WARNING: replace them with the new version of the\n"
-				  "WARNING: file.  But the operation failed, and the\n"
-				  "WARNING: attempt to rename them back to their\n"
-				  "WARNING: original names also failed.\n"
-				  "WARNING: Please rename them in %s manually:\n"), packdir);
-			for (i = 0; i < rollback_failure.nr; i++)
-				fprintf(stderr, "WARNING:   old-%s -> %s\n",
-					rollback_failure.items[i].string,
-					rollback_failure.items[i].string);
-		}
-		exit(1);
-	}
-
-	/* Now the ones with the same name are out of the way... */
-	for_each_string_list_item(item, &names) {
-		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
-			char *fname, *fname_old;
-			struct stat statbuffer;
-			int exists = 0;
 			fname = mkpathdup("%s/pack-%s%s",
 					packdir, item->string, exts[ext].name);
 			fname_old = mkpathdup("%s-%s%s",
 					packtmp, item->string, exts[ext].name);
-			if (!stat(fname_old, &statbuffer)) {
-				statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
-				chmod(fname_old, statbuffer.st_mode);
-				exists = 1;
-			}
-			if (exists || !exts[ext].optional) {
+
+			if (((uintptr_t)item->util) & (1 << ext)) {
+				struct stat statbuffer;
+				if (!stat(fname_old, &statbuffer)) {
+					statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
+					chmod(fname_old, statbuffer.st_mode);
+				}
+
 				if (rename(fname_old, fname))
 					die_errno(_("renaming '%s' failed"), fname_old);
-			}
+			} else if (!exts[ext].optional)
+				die(_("missing required file: %s"), fname_old);
+			else if (unlink(fname) < 0 && errno != ENOENT)
+				die_errno(_("could not unlink: %s"), fname);
+
 			free(fname);
 			free(fname_old);
 		}
 	}
-
-	/* Remove the "old-" files */
-	for_each_string_list_item(item, &names) {
-		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
-			char *fname;
-			fname = mkpathdup("%s/old-%s%s",
-					  packdir,
-					  item->string,
-					  exts[ext].name);
-			if (remove_path(fname))
-				warning(_("failed to remove '%s'"), fname);
-			free(fname);
-		}
-	}
-
 	/* End of pack replacement. */

 	reprepare_packed_git(the_repository);
--
2.29.2.312.gabc4d358d8


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] builtin/repack.c: don't move existing packs out of the way
  2020-11-17 20:15         ` Taylor Blau
@ 2020-11-17 21:28           ` Junio C Hamano
  0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2020-11-17 21:28 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff

Taylor Blau <me@ttaylorr.com> writes:

> That is, it is not the case that we pretend as if pack-objects didn't
> write files identical to ones that already exist, but rather that we
> respect what pack-objects wrote as the source of truth. That cuts two
> ways:
>
>   - If pack-objects produced an identical pack to one that already
>     exists with a bitmap, but did not produce a bitmap, we remove the
>     bitmap that already exists. (This behavior is codified in t7700.14).
>
>   - If pack-objects produced an identical pack to one that already
>     exists, we trust the just-written version of the coresponding .idx,
>     .promisor, and other files over the ones that already exist. This
>     ensures that we use the most up-to-date versions of this files,
>     which is safe even in the face of format changes in, say, the .idx
>     file (which would not be reflected in the .idx file's name).

Very clearly written.  I see no room for confusion, unlike the
original one.

Thanks, will replace.

>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  builtin/repack.c | 103 +++++++----------------------------------------
>  1 file changed, 14 insertions(+), 89 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index bb839180da..279be11a16 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -306,7 +306,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	struct string_list rollback = STRING_LIST_INIT_NODUP;
>  	struct string_list existing_packs = STRING_LIST_INIT_DUP;
>  	struct strbuf line = STRBUF_INIT;
> -	int i, ext, ret, failed;
> +	int i, ext, ret;
>  	FILE *out;
>
>  	/* variables to be filled by option parsing */
> @@ -463,109 +463,34 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>
>  	/*
>  	 * Ok we have prepared all new packfiles.
> -	 * First see if there are packs of the same name and if so
> -	 * if we can move them out of the way (this can happen if we
> -	 * repacked immediately after packing fully.
>  	 */
> -	failed = 0;
>  	for_each_string_list_item(item, &names) {
>  		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
>  			char *fname, *fname_old;
>
> -			fname = mkpathdup("%s/pack-%s%s", packdir,
> -						item->string, exts[ext].name);
> -			if (!file_exists(fname)) {
> -				free(fname);
> -				continue;
> -			}
> -
> -			fname_old = mkpathdup("%s/old-%s%s", packdir,
> -						item->string, exts[ext].name);
> -			if (file_exists(fname_old))
> -				if (unlink(fname_old))
> -					failed = 1;
> -
> -			if (!failed && rename(fname, fname_old)) {
> -				free(fname);
> -				free(fname_old);
> -				failed = 1;
> -				break;
> -			} else {
> -				string_list_append(&rollback, fname);
> -				free(fname_old);
> -			}
> -		}
> -		if (failed)
> -			break;
> -	}
> -	if (failed) {
> -		struct string_list rollback_failure = STRING_LIST_INIT_DUP;
> -		for_each_string_list_item(item, &rollback) {
> -			char *fname, *fname_old;
> -			fname = mkpathdup("%s/%s", packdir, item->string);
> -			fname_old = mkpathdup("%s/old-%s", packdir, item->string);
> -			if (rename(fname_old, fname))
> -				string_list_append(&rollback_failure, fname);
> -			free(fname);
> -			free(fname_old);
> -		}
> -
> -		if (rollback_failure.nr) {
> -			int i;
> -			fprintf(stderr,
> -				_("WARNING: Some packs in use have been renamed by\n"
> -				  "WARNING: prefixing old- to their name, in order to\n"
> -				  "WARNING: replace them with the new version of the\n"
> -				  "WARNING: file.  But the operation failed, and the\n"
> -				  "WARNING: attempt to rename them back to their\n"
> -				  "WARNING: original names also failed.\n"
> -				  "WARNING: Please rename them in %s manually:\n"), packdir);
> -			for (i = 0; i < rollback_failure.nr; i++)
> -				fprintf(stderr, "WARNING:   old-%s -> %s\n",
> -					rollback_failure.items[i].string,
> -					rollback_failure.items[i].string);
> -		}
> -		exit(1);
> -	}
> -
> -	/* Now the ones with the same name are out of the way... */
> -	for_each_string_list_item(item, &names) {
> -		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
> -			char *fname, *fname_old;
> -			struct stat statbuffer;
> -			int exists = 0;
>  			fname = mkpathdup("%s/pack-%s%s",
>  					packdir, item->string, exts[ext].name);
>  			fname_old = mkpathdup("%s-%s%s",
>  					packtmp, item->string, exts[ext].name);
> -			if (!stat(fname_old, &statbuffer)) {
> -				statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
> -				chmod(fname_old, statbuffer.st_mode);
> -				exists = 1;
> -			}
> -			if (exists || !exts[ext].optional) {
> +
> +			if (((uintptr_t)item->util) & (1 << ext)) {
> +				struct stat statbuffer;
> +				if (!stat(fname_old, &statbuffer)) {
> +					statbuffer.st_mode &= ~(S_IWUSR | S_IWGRP | S_IWOTH);
> +					chmod(fname_old, statbuffer.st_mode);
> +				}
> +
>  				if (rename(fname_old, fname))
>  					die_errno(_("renaming '%s' failed"), fname_old);
> -			}
> +			} else if (!exts[ext].optional)
> +				die(_("missing required file: %s"), fname_old);
> +			else if (unlink(fname) < 0 && errno != ENOENT)
> +				die_errno(_("could not unlink: %s"), fname);
> +
>  			free(fname);
>  			free(fname_old);
>  		}
>  	}
> -
> -	/* Remove the "old-" files */
> -	for_each_string_list_item(item, &names) {
> -		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
> -			char *fname;
> -			fname = mkpathdup("%s/old-%s%s",
> -					  packdir,
> -					  item->string,
> -					  exts[ext].name);
> -			if (remove_path(fname))
> -				warning(_("failed to remove '%s'"), fname);
> -			free(fname);
> -		}
> -	}
> -
>  	/* End of pack replacement. */
>
>  	reprepare_packed_git(the_repository);
> --
> 2.29.2.312.gabc4d358d8

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-11-17 21:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-16 18:41 [PATCH 0/3] repack: don't move existing packs out of the way Taylor Blau
2020-11-16 18:41 ` [PATCH 1/3] repack: make "exts" array available outside cmd_repack() Taylor Blau
2020-11-16 18:41 ` [PATCH 2/3] builtin/repack.c: keep track of what pack-objects wrote Taylor Blau
2020-11-16 18:41 ` [PATCH 3/3] builtin/repack.c: don't move existing packs out of the way Taylor Blau
2020-11-16 23:29   ` Junio C Hamano
2020-11-17  0:02     ` Jeff King
2020-11-17  0:26       ` Taylor Blau
2020-11-17  0:25     ` Taylor Blau
2020-11-17  0:46       ` Junio C Hamano
2020-11-17 20:15         ` Taylor Blau
2020-11-17 21:28           ` Junio C Hamano

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).