git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v1 0/1] pack-refs: pack expired loose refs to packed_refs
@ 2019-07-21 18:17 16657101987
  2019-07-21 18:17 ` [PATCH v1 1/1] " 16657101987
  0 siblings, 1 reply; 8+ messages in thread
From: 16657101987 @ 2019-07-21 18:17 UTC (permalink / raw)
  Cc: git, gitster, worldhello.net, Sun Chao

From: Sun Chao <sunchao9@huawei.com>

When a packed ref is deleted, the whole packed-refs file is
rewrite and omit the ref that no longer exists. However if
another gc command is running and call `pack-refs --all`
simultaneously, there is a change that a ref just updated
will lost the newly commits.

There are two valid methods to avoid losting commit of ref:
  - force `update-ref -d` to update the snapshot before
    rewrite packed-refs.
  - do not pack a recently updated ref, where *recently*
    could be set by *pack.looserefsexpire* option.

I prefer **do not pack a recently updated ref**, here is the
reasons:

  1. It could avoid losting the newly commit of a ref which I
     described upon.

  2. Sometime, the git server will do `pack-refs --all` and
     `update-ref` the same time, and the two commands have
     chances to trying lock the same ref such as master, if
     this happends one command will fail with such error:

     **cannot lock ref 'refs/heads/master'**

     This could happen if a ref is updated frequently, and
     avoid pack the ref which is update recently could avoid
     this error most of the time.

Sun Chao (1):
  pack-refs: pack expired loose refs to packed_refs

 builtin/pack-refs.c       | 13 ++++++++++++-
 refs.c                    |  4 ++--
 refs.h                    |  2 +-
 refs/files-backend.c      | 18 +++++++++++++++++-
 refs/packed-backend.c     |  2 +-
 refs/refs-internal.h      |  2 +-
 t/helper/test-ref-store.c |  2 +-
 7 files changed, 35 insertions(+), 8 deletions(-)

-- 
2.22.0.214.g8dca754b1e



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v1 1/1] pack-refs: pack expired loose refs to packed_refs
  2019-07-21 18:17 [PATCH v1 0/1] pack-refs: pack expired loose refs to packed_refs 16657101987
@ 2019-07-21 18:17 ` 16657101987
  2019-07-30  6:36   ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: 16657101987 @ 2019-07-21 18:17 UTC (permalink / raw)
  Cc: git, gitster, worldhello.net, Sun Chao

From: Sun Chao <sunchao9@huawei.com>

When a packed ref is deleted, the whole packed-refs file is
rewrite and omit the ref that no longer exists. However if
another gc command is running and call `pack-refs --all`
simultaneously, there is a change that a ref just updated
will lost the newly commits.

Through these steps, losting commits of newly updated refs
could be demonstrated:

  # step 1: compile git without `USE_NSEC` option
  Some kernerl releases does enable it by default while some
  does not. And if we compile git without `USE_NSEC`, it
  will be easier demonstrated by the following steps.

  # step 2: setup a bare repository and clone it as child
  git init --bare parent &&
  (cd parent && git config core.logallrefupdates true) &&
  git clone parent child

  # step 3: in one terminal, repack the bare refs repeatedly
  cd parent &&
  while true; do
    git pack-refs --all
  done

  # step 4: in another terminal, simultaneously update the
  # master with update-ref, and create and delete an
  # unrelated ref also with update-ref
  cd child &&
  while true; do
    git commit --allow-empty -m foo &&
    us=`git rev-parse master` &&
    pushd ../parent &&
      git fetch ../child/.git master &&
      git update-ref refs/heads/newbranch $us &&
      git update-ref refs/heads/master $us &&
      git update-ref -d refs/heads/newbranch &&
      them=`git rev-parse master` &&
      if test "$them" != "$us"; then
        echo >&2 "lost commit: $us"
        exit 1
      fi
    popd
  done

Though we have the packed-refs lock file and loose refs lock
files to avoid updating conflicts, a ref will lost its newly
commits if the situation which is described as racy-git by
`Documentation/technical/racy-git.txt` happens, it comes like
this:

  1. Call `pack-refs --all` to pack all the loose refs to
     packed-refs, and let say the modify time of the
     packed-refs is DATE_M.

  2. Call `update-ref` to update a new commit to master while
     it is already packed.  the old value (let us call it
     OID_A) remains in the packed-refs file and write the new
     value (let us call it OID_B) to $GIT_DIR/refs/heads/master.

  3. Call `update-ref -d` within the same DATE_M from the 1th
     step to delete a different ref newbranch which is packed
     in the packed-refs file. It check newbranch's oid from
     packed-refs file without locking it.

     Meanwhile it keeps a snapshot of the packed-refs file in
     memory and record the file's timestamp with the snapshot.
     The oid of master in the packed-refs's snapshot is OID_A.

  4. Redo the 1th step, after `pack-refs --all` finished, the
     oid of master in packe-refs file is OID_B, and the loose
     refs $GIT_DIR/refs/heads/master is removed. Let's say
     the `pack-refs --all` is very quickly done and the new
     packed-refs file's modify time is stille DATE_M.

  5. 3th step now going on, after checking the newbranch, it
     begin to rewrite the packed-refs file, after get the lock
     file of packed-ref file, it checks the timestamp of it's
     snapshot in memory with the packed-refs file's time,
     they are both the same DATE_M, so the snapshot is not
     refreshed.

     Because the loose ref of master is removed by 4th step,
     `update-ref -d` will updates the new packed-ref to disk
     which contains master with the oid OID-A. So now the
     newly commit OID-B of master is lost.

There are two valid methods to avoid losting commit of ref:
  - force `update-ref -d` to update the snapshot before
    rewrite packed-refs.
  - do not pack a recently updated ref, where *recently*
    could be set by *pack.looserefsexpire* option.

I prefer **do not pack a recently updated ref**, here is the
reasons:

  1. It could avoid losting the newly commit of a ref which I
     described upon.

  2. Sometime, the git server will do `pack-refs --all` and
     `update-ref` the same time, and the two commands have
     chances to trying lock the same ref such as master, if
     this happends one command will fail with such error:

     **cannot lock ref 'refs/heads/master'**

     This could happen if a ref is updated frequently, and
     avoid pack the ref which is update recently could avoid
     this error most of the time.

Signed-off-by: Sun Chao <sunchao9@huawei.com>
---
 builtin/pack-refs.c       | 13 ++++++++++++-
 refs.c                    |  4 ++--
 refs.h                    |  2 +-
 refs/files-backend.c      | 18 +++++++++++++++++-
 refs/packed-backend.c     |  2 +-
 refs/refs-internal.h      |  2 +-
 t/helper/test-ref-store.c |  2 +-
 7 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/builtin/pack-refs.c b/builtin/pack-refs.c
index cfbd5c36c7..7baced5788 100644
--- a/builtin/pack-refs.c
+++ b/builtin/pack-refs.c
@@ -9,16 +9,27 @@ static char const * const pack_refs_usage[] = {
 	NULL
 };
 
+static const char *pack_loose_refs_expire = "now";
+
 int cmd_pack_refs(int argc, const char **argv, const char *prefix)
 {
 	unsigned int flags = PACK_REFS_PRUNE;
 	struct option opts[] = {
 		OPT_BIT(0, "all",   &flags, N_("pack everything"), PACK_REFS_ALL),
 		OPT_BIT(0, "prune", &flags, N_("prune loose refs (default)"), PACK_REFS_PRUNE),
+		{ OPTION_STRING, 0, "expire", &pack_loose_refs_expire, N_("date"),
+			N_("pack unactive loose refs"),
+			PARSE_OPT_OPTARG, NULL, (intptr_t)pack_loose_refs_expire },
 		OPT_END(),
 	};
+	static timestamp_t expire;
+
+	git_config_get_expiry("pack.looserefsexpire", &pack_loose_refs_expire);
 	git_config(git_default_config, NULL);
 	if (parse_options(argc, argv, prefix, opts, pack_refs_usage, 0))
 		usage_with_options(pack_refs_usage, opts);
-	return refs_pack_refs(get_main_ref_store(the_repository), flags);
+
+	if (parse_expiry_date(pack_loose_refs_expire, &expire))
+		die(_("failed to parse '%s' value '%s'"), "pack.looserefsexpire", pack_loose_refs_expire);
+	return refs_pack_refs(get_main_ref_store(the_repository), flags, expire);
 }
diff --git a/refs.c b/refs.c
index cd297ee4bd..e72f4b05dd 100644
--- a/refs.c
+++ b/refs.c
@@ -1947,9 +1947,9 @@ void base_ref_store_init(struct ref_store *refs,
 }
 
 /* backend functions */
-int refs_pack_refs(struct ref_store *refs, unsigned int flags)
+int refs_pack_refs(struct ref_store *refs, unsigned int flags, timestamp_t expire)
 {
-	return refs->be->pack_refs(refs, flags);
+	return refs->be->pack_refs(refs, flags, expire);
 }
 
 int refs_peel_ref(struct ref_store *refs, const char *refname,
diff --git a/refs.h b/refs.h
index 730d05ad91..0270aa9efc 100644
--- a/refs.h
+++ b/refs.h
@@ -378,7 +378,7 @@ void warn_dangling_symrefs(FILE *fp, const char *msg_fmt,
  * Write a packed-refs file for the current repository.
  * flags: Combination of the above PACK_REFS_* flags.
  */
-int refs_pack_refs(struct ref_store *refs, unsigned int flags);
+int refs_pack_refs(struct ref_store *refs, unsigned int flags, timestamp_t expire);
 
 /*
  * Setup reflog before using. Fill in err and return -1 on failure.
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 63e55e6773..7e6676cece 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1144,7 +1144,7 @@ static int should_pack_ref(const char *refname,
 	return 1;
 }
 
-static int files_pack_refs(struct ref_store *ref_store, unsigned int flags)
+static int files_pack_refs(struct ref_store *ref_store, unsigned int flags, timestamp_t expire)
 {
 	struct files_ref_store *refs =
 		files_downcast(ref_store, REF_STORE_WRITE | REF_STORE_ODB,
@@ -1152,13 +1152,19 @@ static int files_pack_refs(struct ref_store *ref_store, unsigned int flags)
 	struct ref_iterator *iter;
 	int ok;
 	struct ref_to_prune *refs_to_prune = NULL;
+	struct stat st;
+	struct strbuf path = STRBUF_INIT;
 	struct strbuf err = STRBUF_INIT;
 	struct ref_transaction *transaction;
+	size_t path_baselen;
 
 	transaction = ref_store_transaction_begin(refs->packed_ref_store, &err);
 	if (!transaction)
 		return -1;
 
+	files_ref_path(refs, &path, "");
+	path_baselen = path.len;
+
 	packed_refs_lock(refs->packed_ref_store, LOCK_DIE_ON_ERROR, &err);
 
 	iter = cache_ref_iterator_begin(get_loose_ref_cache(refs), NULL, 0);
@@ -1172,6 +1178,16 @@ static int files_pack_refs(struct ref_store *ref_store, unsigned int flags)
 				     flags))
 			continue;
 
+		/*
+		 * If the loose reference is active (not expired), do not pack it.
+		 */
+		strbuf_setlen(&path, path_baselen);
+		strbuf_addstr(&path, iter->refname);
+		if (stat(path.buf, &st) == 0) {
+			if (st.st_mtime > expire)
+				continue;
+		}
+
 		/*
 		 * Add a reference creation for this reference to the
 		 * packed-refs transaction:
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index c01c7f5901..2c6e6ee990 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1550,7 +1550,7 @@ static int packed_delete_refs(struct ref_store *ref_store, const char *msg,
 	return ret;
 }
 
-static int packed_pack_refs(struct ref_store *ref_store, unsigned int flags)
+static int packed_pack_refs(struct ref_store *ref_store, unsigned int flags, timestamp_t expire)
 {
 	/*
 	 * Packed refs are already packed. It might be that loose refs
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index f2d8c0123a..81f00e4c04 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -530,7 +530,7 @@ typedef int ref_transaction_commit_fn(struct ref_store *refs,
 				      struct ref_transaction *transaction,
 				      struct strbuf *err);
 
-typedef int pack_refs_fn(struct ref_store *ref_store, unsigned int flags);
+typedef int pack_refs_fn(struct ref_store *ref_store, unsigned int flags, timestamp_t expire);
 typedef int create_symref_fn(struct ref_store *ref_store,
 			     const char *ref_target,
 			     const char *refs_heads_master,
diff --git a/t/helper/test-ref-store.c b/t/helper/test-ref-store.c
index 799fc00aa1..b2be2e311d 100644
--- a/t/helper/test-ref-store.c
+++ b/t/helper/test-ref-store.c
@@ -69,7 +69,7 @@ static int cmd_pack_refs(struct ref_store *refs, const char **argv)
 {
 	unsigned int flags = arg_flags(*argv++, "flags");
 
-	return refs_pack_refs(refs, flags);
+	return refs_pack_refs(refs, flags, TIME_MAX);
 }
 
 static int cmd_peel_ref(struct ref_store *refs, const char **argv)
-- 
2.22.0.214.g8dca754b1e



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v1 1/1] pack-refs: pack expired loose refs to packed_refs
  2019-07-21 18:17 ` [PATCH v1 1/1] " 16657101987
@ 2019-07-30  6:36   ` Jeff King
  2019-07-31 18:35     ` [PATCH v2 0/1] pack-refs: always refreshing after take the lock file 16657101987
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2019-07-30  6:36 UTC (permalink / raw)
  To: 16657101987; +Cc: Michael Haggerty, git, gitster, worldhello.net, Sun Chao

On Mon, Jul 22, 2019 at 02:17:39AM +0800, 16657101987@163.com wrote:

> From: Sun Chao <sunchao9@huawei.com>
> 
> When a packed ref is deleted, the whole packed-refs file is
> rewrite and omit the ref that no longer exists. However if
> another gc command is running and call `pack-refs --all`
> simultaneously, there is a change that a ref just updated
> will lost the newly commits.
> 
> Through these steps, losting commits of newly updated refs
> could be demonstrated:

Thanks for the report and an easy-to-follow recipe. I was able to
reproduce the problem.

>   # step 4: in another terminal, simultaneously update the
>   # master with update-ref, and create and delete an
>   # unrelated ref also with update-ref
>   cd child &&
>   while true; do
>     git commit --allow-empty -m foo &&
>     us=`git rev-parse master` &&
>     pushd ../parent &&
>       git fetch ../child/.git master &&
>       git update-ref refs/heads/newbranch $us &&
>       git update-ref refs/heads/master $us &&
>       git update-ref -d refs/heads/newbranch &&
>       them=`git rev-parse master` &&
>       if test "$them" != "$us"; then
>         echo >&2 "lost commit: $us"
>         exit 1
>       fi
>     popd
>   done

You can do this step without the fetch, which makes it hit the race more
quickly. :) Try this:

  # prime it with a single commit
  git commit --allow-empty -m foo
  while true; do
    us=$(git commit-tree -m foo -p HEAD HEAD^{tree}) &&
    git update-ref refs/heads/newbranch $us &&
    git update-ref refs/heads/master $us &&
    git update-ref -d refs/heads/newbranch &&
    them=$(git rev-parse master) &&
    if test "$them" != "$us"; then
      echo >&2 "lost commit: $us"
      exit 1
    fi
    # eye candy
    printf .
  done

> Though we have the packed-refs lock file and loose refs lock
> files to avoid updating conflicts, a ref will lost its newly
> commits if the situation which is described as racy-git by
> `Documentation/technical/racy-git.txt` happens, it comes like
> this:

I don't think this is quite the same as racy-git. There we are comparing
stat entries for a file X to the timestamp of the index (and we are
concerned they were written in the same second).

But here we have no on-disk stat information to compare to. It's all
happening in-process. But you're right that it's a racy stat-validity
problem.

>   4. Redo the 1th step, after `pack-refs --all` finished, the
>      oid of master in packe-refs file is OID_B, and the loose
>      refs $GIT_DIR/refs/heads/master is removed. Let's say
>      the `pack-refs --all` is very quickly done and the new
>      packed-refs file's modify time is stille DATE_M.
> 
>   5. 3th step now going on, after checking the newbranch, it
>      begin to rewrite the packed-refs file, after get the lock
>      file of packed-ref file, it checks the timestamp of it's
>      snapshot in memory with the packed-refs file's time,
>      they are both the same DATE_M, so the snapshot is not
>      refreshed.

The stat-validity check here is actually more than the timestamp.
Specifically it's checking the inode and size. But because of the
specific set of operations you're performing, this ends up correlating
quite often:

  - because our operations involve updating a single ref or
    adding/deleting another ref, we'll oscillate between two sizes
    (either one ref or two)

  - likewise if nothing else is happening on the filesystem, pack-refs
    may flip back and forth between two inodes (not the same one,
    because our tempfile-and-rename strategy means we're still using the
    old one while we write the new packed-refs file).

So I actually find this to be a fairly unlikely case in the real world,
but as your script demonstrates, it's not that hard to trigger it if
you're trying.

> There are two valid methods to avoid losting commit of ref:
>   - force `update-ref -d` to update the snapshot before
>     rewrite packed-refs.
>   - do not pack a recently updated ref, where *recently*
>     could be set by *pack.looserefsexpire* option.

I'm not sure the second one actually fixes things entirely. What if I
have an older refs/heads/foo, and I do this:

  git pack-refs
  git pack-refs --all --prune

We still might hit the race here. The first pack-refs does not pack foo
(because we didn't say --all), then a simultaneous "update-ref -d" opens
`packed-refs`, then the second pack-refs packs it all in the same
second. Now "update-ref -d" uses the old packed-refs file, and we lose
the ref.

Admittedly this is even more unlikely than your original case, because
it involves quickly running pack-refs in two different modes.

But I think if we want the same solution as racy-git, the timestamp we
want to compare to is not the ref itself, but rather for the
stat-validity code to see if packed-refs has the same timestamp as the
moment when we called stat().

Unfortunately that's hard to do robustly, because the filesystem time
and the OS clock time do not necessarily match up. I don't know of a way
to record the current filesystem time without modifying a file.

I do agree that simply removing the stat-validity check and _always_
re-opening the packed-refs file when we take the lock would work.
Traditionally we avoided that because refreshing it implied parsing the
whole file. But these days we mmap it, so it really is just an extra
open()/mmap() and a quick read of the header. That doesn't seem like an
outrageous cost to pay when we're already taking the lock.

Another option would be to put an increasing counter into the file
header itself. We could then record the old counter instead of the
stat-validity info, and always re-open(), mmap(), and parse the header.
But at that point I don't think it saves anything over just refreshing
the file as above.

So I actually think the best path forward is just always refreshing when
we take the lock, something like:

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index c01c7f5901..0c8fdce7be 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1019,7 +1019,7 @@ int packed_refs_lock(struct ref_store *ref_store, int flags, struct strbuf *err)
 	 * is still valid. We've just locked the file, but it might
 	 * have changed the moment *before* we locked it.
 	 */
-	validate_snapshot(refs);
+	clear_snapshot(refs);
 
 	/*
 	 * Now make sure that the packed-refs file as it exists in the

We could still see an old packed-refs file on reading (somebody packs a
ref and then deletes it, but we still see the old packed-refs), but
that's an inherent race (we can't know that somebody didn't update
packed-refs after we checked its stat, because we're not holding the
lock). It's also just one of many ways that the filesystem ref storage
is not atomic (e.g., renaming X to Y, a reader might see neither ref!).

Ultimately the best solution there is to move to a better format (like
the reftables proposal).

> I prefer **do not pack a recently updated ref**, here is the
> reasons:
> 
>   1. It could avoid losting the newly commit of a ref which I
>      described upon.
> 
>   2. Sometime, the git server will do `pack-refs --all` and
>      `update-ref` the same time, and the two commands have
>      chances to trying lock the same ref such as master, if
>      this happends one command will fail with such error:
> 
>      **cannot lock ref 'refs/heads/master'**
> 
>      This could happen if a ref is updated frequently, and
>      avoid pack the ref which is update recently could avoid
>      this error most of the time.

It can also happen if you simply get unlucky with a ref that isn't
updated frequently. We may pack an older ref, but then collide with
somebody updating the ref when we take the lock to delete the loose
version.

-Peff

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 0/1] pack-refs: always refreshing after take the lock file
  2019-07-30  6:36   ` Jeff King
@ 2019-07-31 18:35     ` 16657101987
  2019-07-31 18:35       ` [PATCH v2 1/1] " 16657101987
  2019-08-16 20:49       ` [PATCH v2 0/1] " Jeff King
  0 siblings, 2 replies; 8+ messages in thread
From: 16657101987 @ 2019-07-31 18:35 UTC (permalink / raw)
  To: peff; +Cc: 16657101987, git, gitster, mhagger, sunchao9, worldhello.net

From: Sun Chao <16657101987@163.com>

On Tue, 30 Jul 2019 02:36:35 -0400, Jeff King wrote:

> You can do this step without the fetch, which makes it hit the race more
> quickly. :) Try this:
> 
>   # prime it with a single commit
>   git commit --allow-empty -m foo
>   while true; do
>     us=$(git commit-tree -m foo -p HEAD HEAD^{tree}) &&
>     git update-ref refs/heads/newbranch $us &&
>     git update-ref refs/heads/master $us &&
>     git update-ref -d refs/heads/newbranch &&
>     them=$(git rev-parse master) &&
>     if test "$them" != "$us"; then
>       echo >&2 "lost commit: $us"
>       exit 1
>     fi
>     # eye candy
>     printf .
>   done

Thanks, this could hit the race more quickly and I update it to the
commit log.

> I don't think this is quite the same as racy-git. There we are comparing
> stat entries for a file X to the timestamp of the index (and we are
> concerned they were written in the same second).
> 
> But here we have no on-disk stat information to compare to. It's all
> happening in-process. But you're right that it's a racy stat-validity
> problem.

Yes, I agree with you.

> The stat-validity check here is actually more than the timestamp.
> Specifically it's checking the inode and size. But because of the
> specific set of operations you're performing, this ends up correlating
> quite often:
> 
>   - because our operations involve updating a single ref or
>     adding/deleting another ref, we'll oscillate between two sizes
>     (either one ref or two)
> 
>   - likewise if nothing else is happening on the filesystem, pack-refs
>     may flip back and forth between two inodes (not the same one,
>     because our tempfile-and-rename strategy means we're still using the
>     old one while we write the new packed-refs file).
> 
> So I actually find this to be a fairly unlikely case in the real world,
> but as your script demonstrates, it's not that hard to trigger it if
> you're trying.
>
> I'm not sure the second one actually fixes things entirely. What if I
> have an older refs/heads/foo, and I do this:
> 
>   git pack-refs
>   git pack-refs --all --prune
> 
> We still might hit the race here. The first pack-refs does not pack foo
> (because we didn't say --all), then a simultaneous "update-ref -d" opens
> `packed-refs`, then the second pack-refs packs it all in the same
> second. Now "update-ref -d" uses the old packed-refs file, and we lose
> the ref.

Yes, I agree with you. And in the real word if the git servers has some
3rd-service which update repositories refs or pack-refs frequently may
have this problem, my company's git servers works like this unfortunately.

> So I actually think the best path forward is just always refreshing when
> we take the lock, something like:
> 
> Ultimately the best solution there is to move to a better format (like
> the reftables proposal).

I do not know if we could get the new reftables in the next few versions,
So I commit the changes as you suggested, which is also the same as
another way I metioned in `PATCH v1`:

**force `update-ref -d` to update the snapshot before rewrite packed-refs.**

But if the reftables is comeing soon, please just ignore my PATCH :)

**And thank a lot for your reply, it's great to me, because it's my first
PATCh to git myself :)**

Sun Chao (1):
  pack-refs: always refreshing after take the lock file

 refs/packed-backend.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

-- 
2.22.0.214.g8dca754b1e



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/1] pack-refs: always refreshing after take the lock file
  2019-07-31 18:35     ` [PATCH v2 0/1] pack-refs: always refreshing after take the lock file 16657101987
@ 2019-07-31 18:35       ` 16657101987
  2019-08-16 20:49       ` [PATCH v2 0/1] " Jeff King
  1 sibling, 0 replies; 8+ messages in thread
From: 16657101987 @ 2019-07-31 18:35 UTC (permalink / raw)
  To: peff; +Cc: 16657101987, git, gitster, mhagger, sunchao9, worldhello.net

From: Sun Chao <16657101987@163.com>

When a packed ref is deleted, the whole packed-refs file is
rewrite and omit the ref that no longer exists. However if
another gc command is running and call `pack-refs --all`
simultaneously, there is a chance that a ref just updated
will lost the newly commits.

Through these steps, losting commits of newly updated refs
could be demonstrated:

  # step 1: compile git without `USE_NSEC` option
  Some kernerl releases does enable it by default while some
  does not. And if we compile git without `USE_NSEC`, it
  will be easier demonstrated by the following steps.

  # step 2: setup a repository and add first commit
  git init repo &&
  (cd repo &&
   git config core.logallrefupdates true &&
   git commit --allow-empty -m foo)

  # step 3: in one terminal, repack the refs repeatedly
  cd repo &&
  while true; do
    git pack-refs --all
  done

  # step 4: in another terminal, simultaneously update the
  # master with update-ref, and create and delete an
  # unrelated ref also with update-ref
  cd repo &&
  while true; do
    us=$(git commit-tree -m foo -p HEAD HEAD^{tree}) &&
    git update-ref refs/heads/newbranch $us &&
    git update-ref refs/heads/master $us &&
    git update-ref -d refs/heads/newbranch &&
    them=$(git rev-parse master) &&
    if test "$them" != "$us"; then
      echo >&2 "lost commit: $us"
      exit 1
    fi
    # eye candy
    printf .
  done

Though we have the packed-refs lock file and loose refs lock
files to avoid updating conflicts, a ref will lost its newly
commits if racy stat-validity of `packed-refs` file happens
(which is quite same as the racy-git described in
`Documentation/technical/racy-git.txt`), the following
specific set of operations demonstrates the problem:

  1. Call `pack-refs --all` to pack all the loose refs to
     packed-refs, and let say the modify time of the
     packed-refs is DATE_M.

  2. Call `update-ref` to update a new commit to master while
     it is already packed.  the old value (let us call it
     OID_A) remains in the packed-refs file and write the new
     value (let us call it OID_B) to $GIT_DIR/refs/heads/master.

  3. Call `update-ref -d` within the same DATE_M from the 1th
     step to delete a different ref newbranch which is packed
     in the packed-refs file. It check newbranch's oid from
     packed-refs file without locking it.

     Meanwhile it keeps a snapshot of the packed-refs file in
     memory and record the file's attributes with the snapshot.
     The oid of master in the packed-refs's snapshot is OID_A.

  4. Call a new `pack-refs --all` to pack the loose refs, the
     oid of master in packe-refs file is OID_B, and the loose
     refs $GIT_DIR/refs/heads/master is removed. Let's say
     the `pack-refs --all` is very quickly done and the new
     packed-refs file's modify time is still DATE_M, and it
     has the same file size, even the same inode.

  5. 3th step now goes on after checking the newbranch, it
     begin to rewrite the packed-refs file. After get the
     lock file of packed-ref file, it checks it's on-disk
     file attributes with the snapshot, suck as the timestamp,
     the file size and the inode value. If they are both the
     same values, and the snapshot is not refreshed.

     Because the loose ref of master is removed by 4th step,
     `update-ref -d` will updates the new packed-ref to disk
     which contains master with the oid OID_A. So now the
     newly commit OID_B of master is lost.

The best path forward is just always refreshing after take
the lock file of `packed-refs` file. Traditionally we avoided
that because refreshing it implied parsing the whole file.
But these days we mmap it, so it really is just an extra
open()/mmap() and a quick read of the header. That doesn't seem
like an outrageous cost to pay when we're already taking the lock.

Signed-off-by: Sun Chao <sunchao9@huawei.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Sun Chao <sunchao9@huawei.com>
---
 refs/packed-backend.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index c01c7f5901..4458a0f69c 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1012,14 +1012,23 @@ int packed_refs_lock(struct ref_store *ref_store, int flags, struct strbuf *err)
 	}
 
 	/*
-	 * Now that we hold the `packed-refs` lock, make sure that our
-	 * snapshot matches the current version of the file. Normally
-	 * `get_snapshot()` does that for us, but that function
-	 * assumes that when the file is locked, any existing snapshot
-	 * is still valid. We've just locked the file, but it might
-	 * have changed the moment *before* we locked it.
+	 * There is a stat-validity problem might cause `update-ref -d`
+	 * lost the newly commit of a ref, because a new `packed-refs`
+	 * file might has the same on-disk file attributes such as
+	 * timestamp, file size and inode value, but has a changed
+	 * ref value.
+	 *
+	 * This could happen with a very small chance when
+	 * `update-ref -d` is called and at the same time another
+	 * `pack-refs --all` process is running.
+	 *
+	 * Now that we hold the `packed-refs` lock, it is important
+	 * to make sure we could read the latest version of
+	 * `packed-refs` file no matter we have just mmap it or not.
+	 * So what need to do is clear the snapshot if we hold it
+	 * already.
 	 */
-	validate_snapshot(refs);
+	clear_snapshot(refs);
 
 	/*
 	 * Now make sure that the packed-refs file as it exists in the
-- 
2.22.0.214.g8dca754b1e



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/1] pack-refs: always refreshing after take the lock file
  2019-07-31 18:35     ` [PATCH v2 0/1] pack-refs: always refreshing after take the lock file 16657101987
  2019-07-31 18:35       ` [PATCH v2 1/1] " 16657101987
@ 2019-08-16 20:49       ` Jeff King
  2019-08-19 17:36         ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff King @ 2019-08-16 20:49 UTC (permalink / raw)
  To: 16657101987; +Cc: git, gitster, mhagger, sunchao9, worldhello.net

On Thu, Aug 01, 2019 at 02:35:43AM +0800, 16657101987@163.com wrote:

> > So I actually think the best path forward is just always refreshing when
> > we take the lock, something like:
> > 
> > Ultimately the best solution there is to move to a better format (like
> > the reftables proposal).
> 
> I do not know if we could get the new reftables in the next few versions,
> So I commit the changes as you suggested, which is also the same as
> another way I metioned in `PATCH v1`:
> 
> **force `update-ref -d` to update the snapshot before rewrite packed-refs.**
> 
> But if the reftables is comeing soon, please just ignore my PATCH :)

I'm undecided on this. I think reftables are still a while off, and even
once they are here, many people will still be using the older format. So
it makes sense to still apply fixes to the old code.

What I wonder, though, is whether always refreshing will cause a
noticeable performance impact (and that's why I was so slow in
responding -- I had hoped to try to come up with some numbers, but I
just hadn't gotten around to it).

My gut says it's _probably_ not an issue, but it would be nice to have
some data to back it up.

> **And thank a lot for your reply, it's great to me, because it's my first
> PATCh to git myself :)**

You're welcome. Thanks for diagnosing a rather tricky case. :)

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/1] pack-refs: always refreshing after take the lock file
  2019-08-16 20:49       ` [PATCH v2 0/1] " Jeff King
@ 2019-08-19 17:36         ` Junio C Hamano
  2019-08-20 15:14           ` 16657101987
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2019-08-19 17:36 UTC (permalink / raw)
  To: Jeff King; +Cc: 16657101987, git, mhagger, sunchao9, worldhello.net

Jeff King <peff@peff.net> writes:

> I'm undecided on this. I think reftables are still a while off, and even
> once they are here, many people will still be using the older format. So
> it makes sense to still apply fixes to the old code.

Yeah.

> What I wonder, though, is whether always refreshing will cause a
> noticeable performance impact (and that's why I was so slow in
> responding -- I had hoped to try to come up with some numbers, but I
> just hadn't gotten around to it).
>
> My gut says it's _probably_ not an issue, but it would be nice to have
> some data to back it up.

I am tempted to let correctness (and ease-of-reasoning about the
code) take precedence over potential and unknown performance issue,
at least for now.  A single liner is rather simple to revert (or in
the worst case we could add "allow pack-refs to efficiently lose a
ref to a race" configuration option) anyway.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Re: [PATCH v2 0/1] pack-refs: always refreshing after take the lock file
  2019-08-19 17:36         ` Junio C Hamano
@ 2019-08-20 15:14           ` 16657101987
  0 siblings, 0 replies; 8+ messages in thread
From: 16657101987 @ 2019-08-20 15:14 UTC (permalink / raw)
  To: gitster; +Cc: 16657101987, git, mhagger, peff, sunchao9, worldhello.net

From: Sun Chao <sunchao9@huawei.com>

---

Jeff King <peff@peff.net> writes:

> I'm undecided on this. I think reftables are still a while off, and even
> once they are here, many people will still be using the older format. So
> it makes sense to still apply fixes to the old code.

Got it, thanks for explainning.

> What I wonder, though, is whether always refreshing will cause a
> noticeable performance impact (and that's why I was so slow in
> responding -- I had hoped to try to come up with some numbers, but I
> just hadn't gotten around to it).
>
> My gut says it's _probably_ not an issue, but it would be nice to have
> some data to back it up.

Sorry for responding after 4 days because I have been away on official
business.

Tody I have tryied some tools like trace logs, time, and strace, tring
to figure out if there are some noticeable numbers. I tried different
repositories with different ref numbers and blob numbers, I also can
not recognize how much the refreshing impact the performance, perhaps
I need to find a better computer for benchmark testing.

---

Junio C Hamano <gitster@pobox.com> writes:

> I am tempted to let correctness (and ease-of-reasoning about the
> code) take precedence over potential and unknown performance issue,
> at least for now.  A single liner is rather simple to revert (or in
> the worst case we could add "allow pack-refs to efficiently lose a
> ref to a race" configuration option) anyway.

Thanks a lot :)

-- 
2.17.2 (Apple Git-113)



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-08-20 15:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-21 18:17 [PATCH v1 0/1] pack-refs: pack expired loose refs to packed_refs 16657101987
2019-07-21 18:17 ` [PATCH v1 1/1] " 16657101987
2019-07-30  6:36   ` Jeff King
2019-07-31 18:35     ` [PATCH v2 0/1] pack-refs: always refreshing after take the lock file 16657101987
2019-07-31 18:35       ` [PATCH v2 1/1] " 16657101987
2019-08-16 20:49       ` [PATCH v2 0/1] " Jeff King
2019-08-19 17:36         ` Junio C Hamano
2019-08-20 15:14           ` 16657101987

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).