git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 1/2] pack-objects: break out of want_object loop early
Date: Wed, 27 Jul 2016 18:04:38 -0400	[thread overview]
Message-ID: <20160727220437.GA2792@sigill.intra.peff.net> (raw)
In-Reply-To: <CAPc5daWjNBazzVrH3jdgur2wC19upjM9A_pWp-u=tPUM_qaK0Q@mail.gmail.com>

On Wed, Jul 27, 2016 at 02:28:32PM -0700, Junio C Hamano wrote:

> On Wed, Jul 27, 2016 at 2:13 PM, Jeff King <peff@peff.net> wrote:
> > ... But two is that I've
> > wondered if we can do even better with a most-recently-used cache
> > instead of the last_pack_found hack. So I'm trying to implement and
> > measure that (both for this loop, and to see if it does better in
> > find_pack_entry).
> 
> It is always delightful to hear a well constructed description of a
> thought process. Thanks.
> 
> One thing that made me wonder was what would happen to the
> last_found that is static to has_sha1_pack_kept_or_nonlocal()
> funciton, when we invalidate the packed_git list, but within the
> context of pack-objects it is not likely?

I wondered that, too, and that's part of what I'm working on. :)

We leave the old packed_git structs in place when we re-read the pack
directory. That works because either we have them already opened (so
even though they're gone, we can still access them, and that's why we
_must_ keep the structs in place), or we will call is_pack_valid() and
notice that it went away (and quietly skip to checking the next pack).

There is one place where we do free the packed_git, but I actually think
we can stop doing so. Here's what I'm planning as part of my series:


-- >8 --
Subject: [PATCH] sha1_file: drop free_pack_by_name

The point of this function is to drop an entry from the
"packed_git" cache that points to a file we might be
overwriting, because our contents may not be the same (and
hence the only caller was pack-objects as it moved a
temporary packfile into place).

In older versions of git, this could happen because the
names of packfiles were derived from the set of objects they
contained, not the actual bits on disk. But since 1190a1a
(pack-objects: name pack files after trailer hash,
2013-12-05), the name reflects the actual bits on disk, and
any two packfiles with the same name can be used
interchangeably.

Dropping this function not only saves a few lines of code,
it makes the lifetime of "struct packed_git" much easier to
reason about: namely, we now do not ever free these structs.

Signed-off-by: Jeff King <peff@peff.net>
---
I won't be surprised if this fixes some obscure bug, because there may
be parts of the code that store a pointer to our packed_git that could
be invalidated (I'm certain the bitmap code does). But perhaps it
doesn't matter because it's rare for this function to trigger at all.

 cache.h      |  1 -
 pack-write.c |  1 -
 sha1_file.c  | 30 ------------------------------
 3 files changed, 32 deletions(-)

diff --git a/cache.h b/cache.h
index 3855ddf..57ef726 100644
--- a/cache.h
+++ b/cache.h
@@ -1416,7 +1416,6 @@ extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t
 extern void close_pack_windows(struct packed_git *);
 extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
-extern void free_pack_by_name(const char *);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, int local);
 
diff --git a/pack-write.c b/pack-write.c
index 33293ce..ea0b788 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -354,7 +354,6 @@ void finish_tmp_packfile(struct strbuf *name_buffer,
 		die_errno("unable to make temporary index file readable");
 
 	strbuf_addf(name_buffer, "%s.pack", sha1_to_hex(sha1));
-	free_pack_by_name(name_buffer->buf);
 
 	if (rename(pack_tmp_name, name_buffer->buf))
 		die_errno("unable to rename temporary pack file");
diff --git a/sha1_file.c b/sha1_file.c
index d5e1121..e045d2f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -891,36 +891,6 @@ void close_pack_index(struct packed_git *p)
 	}
 }
 
-/*
- * This is used by git-repack in case a newly created pack happens to
- * contain the same set of objects as an existing one.  In that case
- * the resulting file might be different even if its name would be the
- * same.  It is best to close any reference to the old pack before it is
- * replaced on disk.  Of course no index pointers or windows for given pack
- * must subsist at this point.  If ever objects from this pack are requested
- * again, the new version of the pack will be reinitialized through
- * reprepare_packed_git().
- */
-void free_pack_by_name(const char *pack_name)
-{
-	struct packed_git *p, **pp = &packed_git;
-
-	while (*pp) {
-		p = *pp;
-		if (strcmp(pack_name, p->pack_name) == 0) {
-			clear_delta_base_cache();
-			close_pack(p);
-			free(p->bad_object_sha1);
-			*pp = p->next;
-			if (last_found_pack == p)
-				last_found_pack = NULL;
-			free(p);
-			return;
-		}
-		pp = &p->next;
-	}
-}
-
 static unsigned int get_max_fd_limit(void)
 {
 #ifdef RLIMIT_NOFILE
-- 
2.9.2.607.g98dce7b


  reply	other threads:[~2016-07-27 22:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-25 18:49 [PATCH 0/2] speed up "Counting objects" when there are many packs Jeff King
2016-07-25 18:50 ` [PATCH 1/2] pack-objects: break out of want_object loop early Jeff King
2016-07-25 19:56   ` Junio C Hamano
2016-07-25 21:41     ` Jeff King
2016-07-25 21:52       ` Junio C Hamano
2016-07-25 22:14         ` Jeff King
2016-07-26 20:38           ` Junio C Hamano
2016-07-26 20:48             ` Jeff King
2016-07-26 21:38               ` Junio C Hamano
2016-07-27 21:13                 ` Jeff King
2016-07-27 21:28                   ` Junio C Hamano
2016-07-27 22:04                     ` Jeff King [this message]
2016-07-25 18:50 ` [PATCH 2/2] pack-objects: compute local/ignore_pack_keep early Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160727220437.GA2792@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).