git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/2] Avoid rewriting "packed-refs" unnecessarily
@ 2017-10-28  9:16 Michael Haggerty
  2017-10-28  9:16 ` [PATCH v2 1/2] t1409: check that `packed-refs` is not rewritten unnecessarily Michael Haggerty
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Michael Haggerty @ 2017-10-28  9:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Eric Sunshine, git, Michael Haggerty

This reroll make some logically small changes to v1 [1] that are
textually very big:

* Invert the sense of `is_packed_transaction_noop()` and rename it to
  `is_packed_transaction_needed()`. This makes the logic easier to
  follow and document.

* Add a big comment to that function, describing the cases when it
  returns false positives and explaining why that isn't a problem.

* In the commit message for patch 02, gives a lot more information
  about the regression that it is fixing. Thanks to Eric for the
  suggestion.

These patches are also available as branch
`avoid-rewriting-packed-refs` on my GitHub fork [2]. They now use
`mh/packed-ref-transactions` as the base, since that is where Junio
chose to apply v1.

Michael

[1] https://public-inbox.org/git/cover.1508924577.git.mhagger@alum.mit.edu/
[2] https://github.com/mhagger/git

Michael Haggerty (2):
  t1409: check that `packed-refs` is not rewritten unnecessarily
  files-backend: don't rewrite the `packed-refs` file unnecessarily

 refs/files-backend.c          |  18 ++++++-
 refs/packed-backend.c         |  94 +++++++++++++++++++++++++++++++++
 refs/packed-backend.h         |   9 ++++
 t/t1409-avoid-packing-refs.sh | 118 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 238 insertions(+), 1 deletion(-)
 create mode 100755 t/t1409-avoid-packing-refs.sh

-- 
2.14.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/2] t1409: check that `packed-refs` is not rewritten unnecessarily
  2017-10-28  9:16 [PATCH v2 0/2] Avoid rewriting "packed-refs" unnecessarily Michael Haggerty
@ 2017-10-28  9:16 ` Michael Haggerty
  2017-10-28  9:16 ` [PATCH v2 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily Michael Haggerty
  2017-11-01  7:34 ` [PATCH v2 0/2] Avoid rewriting "packed-refs" unnecessarily Jeff King
  2 siblings, 0 replies; 5+ messages in thread
From: Michael Haggerty @ 2017-10-28  9:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Eric Sunshine, git, Michael Haggerty

There is no need to rewrite the `packed-refs` file except for the case
that we are deleting a reference that has a packed version. Verify
that `packed-refs` is not rewritten when it shouldn't be.

In fact, two of these tests fail:

* A new (empty) `packed-refs` file is created when deleting any loose
  reference and no `packed-refs` file previously existed.

* The `packed-refs` file is rewritten unnecessarily when deleting a
  loose reference that has no packed counterpart.

Both problems will be fixed in the next commit.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 t/t1409-avoid-packing-refs.sh | 118 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)
 create mode 100755 t/t1409-avoid-packing-refs.sh

diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
new file mode 100755
index 0000000000..a2397c7b71
--- /dev/null
+++ b/t/t1409-avoid-packing-refs.sh
@@ -0,0 +1,118 @@
+#!/bin/sh
+
+test_description='avoid rewriting packed-refs unnecessarily'
+
+. ./test-lib.sh
+
+# Add an identifying mark to the packed-refs file header line. This
+# shouldn't upset readers, and it should be omitted if the file is
+# ever rewritten.
+mark_packed_refs () {
+	sed -e "s/^\(#.*\)/\1 t1409 /" <.git/packed-refs >.git/packed-refs.new &&
+	mv .git/packed-refs.new .git/packed-refs
+}
+
+# Verify that the packed-refs file is still marked.
+check_packed_refs_marked () {
+	grep -q '^#.* t1409 ' .git/packed-refs
+}
+
+test_expect_success 'setup' '
+	git commit --allow-empty -m "Commit A" &&
+	A=$(git rev-parse HEAD) &&
+	git commit --allow-empty -m "Commit B" &&
+	B=$(git rev-parse HEAD) &&
+	git commit --allow-empty -m "Commit C" &&
+	C=$(git rev-parse HEAD)
+'
+
+test_expect_failure 'do not create packed-refs file gratuitously' '
+	test_must_fail test -f .git/packed-refs &&
+	git update-ref refs/heads/foo $A &&
+	test_must_fail test -f .git/packed-refs &&
+	git update-ref refs/heads/foo $B &&
+	test_must_fail test -f .git/packed-refs &&
+	git update-ref refs/heads/foo $C $B &&
+	test_must_fail test -f .git/packed-refs &&
+	git update-ref -d refs/heads/foo &&
+	test_must_fail test -f .git/packed-refs
+'
+
+test_expect_success 'check that marking the packed-refs file works' '
+	git for-each-ref >expected &&
+	git pack-refs --all &&
+	mark_packed_refs &&
+	check_packed_refs_marked &&
+	git for-each-ref >actual &&
+	test_cmp expected actual &&
+	git pack-refs --all &&
+	test_must_fail check_packed_refs_marked &&
+	git for-each-ref >actual2 &&
+	test_cmp expected actual2
+'
+
+test_expect_success 'leave packed-refs untouched on update of packed' '
+	git update-ref refs/heads/packed-update $A &&
+	git pack-refs --all &&
+	mark_packed_refs &&
+	git update-ref refs/heads/packed-update $B &&
+	check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on checked update of packed' '
+	git update-ref refs/heads/packed-checked-update $A &&
+	git pack-refs --all &&
+	mark_packed_refs &&
+	git update-ref refs/heads/packed-checked-update $B $A &&
+	check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on verify of packed' '
+	git update-ref refs/heads/packed-verify $A &&
+	git pack-refs --all &&
+	mark_packed_refs &&
+	echo "verify refs/heads/packed-verify $A" | git update-ref --stdin &&
+	check_packed_refs_marked
+'
+
+test_expect_success 'touch packed-refs on delete of packed' '
+	git update-ref refs/heads/packed-delete $A &&
+	git pack-refs --all &&
+	mark_packed_refs &&
+	git update-ref -d refs/heads/packed-delete &&
+	test_must_fail check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on update of loose' '
+	git pack-refs --all &&
+	git update-ref refs/heads/loose-update $A &&
+	mark_packed_refs &&
+	git update-ref refs/heads/loose-update $B &&
+	check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on checked update of loose' '
+	git pack-refs --all &&
+	git update-ref refs/heads/loose-checked-update $A &&
+	mark_packed_refs &&
+	git update-ref refs/heads/loose-checked-update $B $A &&
+	check_packed_refs_marked
+'
+
+test_expect_success 'leave packed-refs untouched on verify of loose' '
+	git pack-refs --all &&
+	git update-ref refs/heads/loose-verify $A &&
+	mark_packed_refs &&
+	echo "verify refs/heads/loose-verify $A" | git update-ref --stdin &&
+	check_packed_refs_marked
+'
+
+test_expect_failure 'leave packed-refs untouched on delete of loose' '
+	git pack-refs --all &&
+	git update-ref refs/heads/loose-delete $A &&
+	mark_packed_refs &&
+	git update-ref -d refs/heads/loose-delete &&
+	check_packed_refs_marked
+'
+
+test_done
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily
  2017-10-28  9:16 [PATCH v2 0/2] Avoid rewriting "packed-refs" unnecessarily Michael Haggerty
  2017-10-28  9:16 ` [PATCH v2 1/2] t1409: check that `packed-refs` is not rewritten unnecessarily Michael Haggerty
@ 2017-10-28  9:16 ` Michael Haggerty
  2017-10-30  4:52   ` Junio C Hamano
  2017-11-01  7:34 ` [PATCH v2 0/2] Avoid rewriting "packed-refs" unnecessarily Jeff King
  2 siblings, 1 reply; 5+ messages in thread
From: Michael Haggerty @ 2017-10-28  9:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Eric Sunshine, git, Michael Haggerty

Even when we are deleting references, we needn't overwrite the
`packed-refs` file if the references that we are deleting only exist
as loose references. Implement this optimization as follows:

* Add a function `is_packed_transaction_needed()`, which checks
  whether a given packed-refs transaction actually needs to be carried
  out (i.e., it returns false if the transaction obviously wouldn't
  have any effect). This function must be called while holding the
  `packed-refs` lock to avoid races.

* Change `files_transaction_prepare()` to check whether the
  packed-refs transaction is actually needed. If not, squelch it, but
  continue holding the `packed-refs` lock until the end of the
  transaction to avoid races.

This fixes a mild regression caused by dc39e09942 (files_ref_store:
use a transaction to update packed refs, 2017-09-08). Before that
commit, unnecessary rewrites of `packed-refs` were suppressed by
`repack_without_refs()`. But the transaction-based writing introduced
by that commit didn't perform that optimization.

Note that the pre-dc39e09942 code still had to *read* the whole
`packed-refs` file to determine that the rewrite could be skipped, so
the performance for the cases that the write could be elided was
`O(N)` in the number of packed references both before and after
dc39e09942. But after that commit the constant factor increased.

This commit reimplements the optimization of eliding unnecessary
`packed-refs` rewrites. That, plus the fact that since
cfa2e29c34 (packed_ref_store: get rid of the `ref_cache` entirely,
2017-03-17) we don't necessarily have to read the whole `packed-refs`
file at all, means that deletes of one or a few loose references can
now be done with `O(n lg N)` effort, where `n` is the number of loose
references being deleted and `N` is the total number of packed
references.

This commit fixes two tests in t1409.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c          | 18 ++++++++-
 refs/packed-backend.c         | 94 +++++++++++++++++++++++++++++++++++++++++++
 refs/packed-backend.h         |  9 +++++
 t/t1409-avoid-packing-refs.sh |  4 +-
 4 files changed, 122 insertions(+), 3 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 961424a4ea..da8a986697 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2562,7 +2562,23 @@ static int files_transaction_prepare(struct ref_store *ref_store,
 			goto cleanup;
 		}
 		backend_data->packed_refs_locked = 1;
-		ret = ref_transaction_prepare(packed_transaction, err);
+
+		if (is_packed_transaction_needed(refs->packed_ref_store,
+						 packed_transaction)) {
+			ret = ref_transaction_prepare(packed_transaction, err);
+		} else {
+			/*
+			 * We can skip rewriting the `packed-refs`
+			 * file. But we do need to leave it locked, so
+			 * that somebody else doesn't pack a reference
+			 * that we are trying to delete.
+			 */
+			if (ref_transaction_abort(packed_transaction, err)) {
+				ret = TRANSACTION_GENERIC_ERROR;
+				goto cleanup;
+			}
+			backend_data->packed_transaction = NULL;
+		}
 	}
 
 cleanup:
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 0279aeceea..0b0a17ca8e 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -754,6 +754,100 @@ static int write_with_updates(struct packed_ref_store *refs,
 	return -1;
 }
 
+int is_packed_transaction_needed(struct ref_store *ref_store,
+				 struct ref_transaction *transaction)
+{
+	struct packed_ref_store *refs = packed_downcast(
+			ref_store,
+			REF_STORE_READ,
+			"is_packed_transaction_needed");
+	struct strbuf referent = STRBUF_INIT;
+	size_t i;
+	int ret;
+
+	if (!is_lock_file_locked(&refs->lock))
+		BUG("is_packed_transaction_needed() called while unlocked");
+
+	/*
+	 * We're only going to bother returning false for the common,
+	 * trivial case that references are only being deleted, their
+	 * old values are not being checked, and the old `packed-refs`
+	 * file doesn't contain any of those reference(s). This gives
+	 * false positives for some other cases that could
+	 * theoretically be optimized away:
+	 *
+	 * 1. It could be that the old value is being verified without
+	 *    setting a new value. In this case, we could verify the
+	 *    old value here and skip the update if it agrees. If it
+	 *    disagrees, we could either let the update go through
+	 *    (the actual commit would re-detect and report the
+	 *    problem), or come up with a way of reporting such an
+	 *    error to *our* caller.
+	 *
+	 * 2. It could be that a new value is being set, but that it
+	 *    is identical to the current packed value of the
+	 *    reference.
+	 *
+	 * Neither of these cases will come up in the current code,
+	 * because the only caller of this function passes to it a
+	 * transaction that only includes `delete` updates with no
+	 * `old_id`. Even if that ever changes, false positives only
+	 * cause an optimization to be missed; they do not affect
+	 * correctness.
+	 */
+
+	/*
+	 * Start with the cheap checks that don't require old
+	 * reference values to be read:
+	 */
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *update = transaction->updates[i];
+
+		if (update->flags & REF_HAVE_OLD)
+			/* Have to check the old value -> needed. */
+			return 1;
+
+		if ((update->flags & REF_HAVE_NEW) && !is_null_oid(&update->new_oid))
+			/* Have to set a new value -> needed. */
+			return 1;
+	}
+
+	/*
+	 * The transaction isn't checking any old values nor is it
+	 * setting any nonzero new values, so it still might be able
+	 * to be skipped. Now do the more expensive check: the update
+	 * is needed if any of the updates is a delete, and the old
+	 * `packed-refs` file contains a value for that reference.
+	 */
+	ret = 0;
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *update = transaction->updates[i];
+		unsigned int type;
+		struct object_id oid;
+
+		if (!(update->flags & REF_HAVE_NEW))
+			/*
+			 * This reference isn't being deleted -> not
+			 * needed.
+			 */
+			continue;
+
+		if (!refs_read_raw_ref(ref_store, update->refname,
+				       oid.hash, &referent, &type) ||
+		    errno != ENOENT) {
+			/*
+			 * We have to actually delete that reference
+			 * -> this transaction is needed.
+			 */
+			ret = 1;
+			break;
+		}
+	}
+
+	strbuf_release(&referent);
+	return ret;
+}
+
 struct packed_transaction_backend_data {
 	/* True iff the transaction owns the packed-refs lock. */
 	int own_lock;
diff --git a/refs/packed-backend.h b/refs/packed-backend.h
index 61687e408a..640245d3b9 100644
--- a/refs/packed-backend.h
+++ b/refs/packed-backend.h
@@ -23,4 +23,13 @@ int packed_refs_lock(struct ref_store *ref_store, int flags, struct strbuf *err)
 void packed_refs_unlock(struct ref_store *ref_store);
 int packed_refs_is_locked(struct ref_store *ref_store);
 
+/*
+ * Return true if `transaction` really needs to be carried out against
+ * the specified packed_ref_store, or false if it can be skipped
+ * (i.e., because it is an obvious NOOP). `ref_store` must be locked
+ * before calling this function.
+ */
+int is_packed_transaction_needed(struct ref_store *ref_store,
+				 struct ref_transaction *transaction);
+
 #endif /* REFS_PACKED_BACKEND_H */
diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
index a2397c7b71..e5cb8a252d 100755
--- a/t/t1409-avoid-packing-refs.sh
+++ b/t/t1409-avoid-packing-refs.sh
@@ -26,7 +26,7 @@ test_expect_success 'setup' '
 	C=$(git rev-parse HEAD)
 '
 
-test_expect_failure 'do not create packed-refs file gratuitously' '
+test_expect_success 'do not create packed-refs file gratuitously' '
 	test_must_fail test -f .git/packed-refs &&
 	git update-ref refs/heads/foo $A &&
 	test_must_fail test -f .git/packed-refs &&
@@ -107,7 +107,7 @@ test_expect_success 'leave packed-refs untouched on verify of loose' '
 	check_packed_refs_marked
 '
 
-test_expect_failure 'leave packed-refs untouched on delete of loose' '
+test_expect_success 'leave packed-refs untouched on delete of loose' '
 	git pack-refs --all &&
 	git update-ref refs/heads/loose-delete $A &&
 	mark_packed_refs &&
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily
  2017-10-28  9:16 ` [PATCH v2 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily Michael Haggerty
@ 2017-10-30  4:52   ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2017-10-30  4:52 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jeff King, Eric Sunshine, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> +int is_packed_transaction_needed(struct ref_store *ref_store,
> +				 struct ref_transaction *transaction)
> +{
> +	struct packed_ref_store *refs = packed_downcast(
> +			ref_store,
> +			REF_STORE_READ,
> +			"is_packed_transaction_needed");
> +	struct strbuf referent = STRBUF_INIT;
> +	size_t i;
> +	int ret;
> +
> +	if (!is_lock_file_locked(&refs->lock))
> +		BUG("is_packed_transaction_needed() called while unlocked");
> +
> +	/*
> +	 * We're only going to bother returning false for the common,
> +	 * trivial case that references are only being deleted, their
> +	 * old values are not being checked, and the old `packed-refs`
> +	 * file doesn't contain any of those reference(s). This gives
> +	 * false positives for some other cases that could
> +	 * theoretically be optimized away:

The way I understand "the old file does not contain these
references" part of the condition is "if there were any of these
refs, removing them from the loose ref storage may expose them,
which necessitates us to remove them from the packed-refs (and if
there is no loose ref for them, we do noeed to remove them from the
packed-refs)---so that definitely is not a no-op".

I was confused by the "is_noop?" version, especially about "do we
check the old value?" condition.  The above does not help me all
that much to reach the same level of understanding as I have for the
other condition; sorry.

Is the reason why we know we want to play safe when the caller wants
to check the old value because that could cause the transaction to
abort if it does not match?

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 0/2] Avoid rewriting "packed-refs" unnecessarily
  2017-10-28  9:16 [PATCH v2 0/2] Avoid rewriting "packed-refs" unnecessarily Michael Haggerty
  2017-10-28  9:16 ` [PATCH v2 1/2] t1409: check that `packed-refs` is not rewritten unnecessarily Michael Haggerty
  2017-10-28  9:16 ` [PATCH v2 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily Michael Haggerty
@ 2017-11-01  7:34 ` Jeff King
  2 siblings, 0 replies; 5+ messages in thread
From: Jeff King @ 2017-11-01  7:34 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Eric Sunshine, git

On Sat, Oct 28, 2017 at 11:16:00AM +0200, Michael Haggerty wrote:

> This reroll make some logically small changes to v1 [1] that are
> textually very big:
> 
> * Invert the sense of `is_packed_transaction_noop()` and rename it to
>   `is_packed_transaction_needed()`. This makes the logic easier to
>   follow and document.
> 
> * Add a big comment to that function, describing the cases when it
>   returns false positives and explaining why that isn't a problem.
> 
> * In the commit message for patch 02, gives a lot more information
>   about the regression that it is fixing. Thanks to Eric for the
>   suggestion.
> 
> These patches are also available as branch
> `avoid-rewriting-packed-refs` on my GitHub fork [2]. They now use
> `mh/packed-ref-transactions` as the base, since that is where Junio
> chose to apply v1.

This all makes sense to me. I agree that the "is_needed" logic-flip in
v2 makes it a little easier to think about.

Like Junio, I was thrown off at first by the HAVE_OLD check. Especially
since we would not ever set that flag for the transaction we care about
here.  But I think the crux of it is that the packed_ref store code
could in theory operate independently of the loose ref code, where we
feed it more exotic inputs. And what you've written here is
future-proofing against the more general case, even though it would not
be strictly necessary.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-01  7:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-28  9:16 [PATCH v2 0/2] Avoid rewriting "packed-refs" unnecessarily Michael Haggerty
2017-10-28  9:16 ` [PATCH v2 1/2] t1409: check that `packed-refs` is not rewritten unnecessarily Michael Haggerty
2017-10-28  9:16 ` [PATCH v2 2/2] files-backend: don't rewrite the `packed-refs` file unnecessarily Michael Haggerty
2017-10-30  4:52   ` Junio C Hamano
2017-11-01  7:34 ` [PATCH v2 0/2] Avoid rewriting "packed-refs" unnecessarily Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).