git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Christian Couder <christian.couder@gmail.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Christian Couder <chriscool@tuxfamily.org>,
	Ramsay Jones <ramsay@ramsayjones.plus.com>
Subject: Re: [RFC PATCH 10/10] pack-objects: improve partial packfile reuse
Date: Fri, 13 Sep 2019 15:29:00 -0700	[thread overview]
Message-ID: <xmqq7e6bde4z.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <20190913130226.7449-11-chriscool@tuxfamily.org> (Christian Couder's message of "Fri, 13 Sep 2019 15:02:26 +0200")

Christian Couder <christian.couder@gmail.com> writes:

> +/*
> + * Record the offsets needed in our reused packfile chunks due to
> + * "gaps" where we omitted some objects.
> + */

The meaning of 'start' and 'offset' is unclear from the first
reading.  Is it "starting offset" and "for how many bytes the region
lasts"?  If so, 'offset', which is usually a location (unless you
always measure from the beginning, in which case you could say it
names the byte-size of a region), may be a misnomer (side note: I'll
pretend that I haven't realized the 'offset' may be the end offset
of a region for now---this is a good illustration why a better
comment should be here anyway ;-).

> +static struct reused_chunk {
> +	off_t start;
> +	off_t offset;
> +} *reused_chunks;
> +static int reused_chunks_nr;
> +static int reused_chunks_alloc;
> +
> +static void record_reused_object(off_t where, off_t offset)

And here, 'start' is called 'where'; either is a good word for a
location; we'd want to pick either one to be consistent, perhaps?

>  {
> -	unsigned char buffer[8192];
> -	off_t to_write, total;
> -	int fd;
> +	if (reused_chunks_nr && reused_chunks[reused_chunks_nr-1].offset == offset)
> +		return;

The reason why the above code works is because this function will
always be called in the ascending order of the 'offset'?

Hmmm, perhaps 'offset' is not a region-size after all.  Is it the
end offset, as opposed to 'start' which is the starting offset, and
the two offsets sandwitch a region?

> -	if (!is_pack_valid(reuse_packfile))
> -		die(_("packfile is invalid: %s"), reuse_packfile->pack_name);
> +	ALLOC_GROW(reused_chunks, reused_chunks_nr + 1,
> +		   reused_chunks_alloc);
> +	reused_chunks[reused_chunks_nr].start = where;
> +	reused_chunks[reused_chunks_nr].offset = offset;
> +	reused_chunks_nr++;
> +}
>  
> -	fd = git_open(reuse_packfile->pack_name);
> -	if (fd < 0)
> -		die_errno(_("unable to open packfile for reuse: %s"),
> -			  reuse_packfile->pack_name);
> +/*
> + * Binary search to find the chunk that "where" is in. Note
> + * that we're not looking for an exact match, just the first
> + * chunk that contains it (which implicitly ends at the start
> + * of the next chunk.
> + */
> +static off_t find_reused_offset(off_t where)
> +{
> +	int lo = 0, hi = reused_chunks_nr;
> +	while (lo < hi) {
> +		int mi = lo + ((hi - lo) / 2);
> +		if (where == reused_chunks[mi].start)
> +			return reused_chunks[mi].offset;
> +		if (where < reused_chunks[mi].start)
> +			hi = mi;
> +		else
> +			lo = mi + 1;
> +	}
>  
> -	if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1)
> -		die_errno(_("unable to seek in reused packfile"));
> +	/*
> +	 * The first chunk starts at zero, so we can't have gone below
> +	 * there.
> +	 */
> +	assert(lo);
> +	return reused_chunks[lo-1].offset;
> +}

This comment has nothing to do with the change, but the way the
patch is presented is quite hard to follow, in that the preimage or
the common context lines do not help understand what the new code is
doing at all ;-)

I'll come back to the remainder of the patch later.  Thanks.

  reply	other threads:[~2019-09-13 22:29 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-13 13:02 [RFC PATCH 00/10] Rewrite packfile reuse code Christian Couder
2019-09-13 13:02 ` [RFC PATCH 01/10] builtin/pack-objects: report reused packfile objects Christian Couder
2019-09-13 13:02 ` [RFC PATCH 02/10] packfile: expose get_delta_base() Christian Couder
2019-09-13 13:02 ` [RFC PATCH 03/10] ewah/bitmap: introduce bitmap_word_alloc() Christian Couder
2019-09-13 13:02 ` [RFC PATCH 04/10] ewah/bitmap: always allocate 2 more words Christian Couder
2019-10-10 23:40   ` Jonathan Tan
2019-10-11  7:49     ` Christian Couder
2019-10-11 18:05       ` Jeff King
2019-09-13 13:02 ` [RFC PATCH 05/10] pack-bitmap: don't rely on bitmap_git->reuse_objects Christian Couder
2019-10-10 23:44   ` Jonathan Tan
2019-10-11  7:50     ` Christian Couder
2019-09-13 13:02 ` [RFC PATCH 06/10] pack-bitmap: introduce bitmap_walk_contains() Christian Couder
2019-09-13 13:02 ` [RFC PATCH 07/10] csum-file: introduce hashfile_total() Christian Couder
2019-09-13 13:02 ` [RFC PATCH 08/10] pack-objects: introduce pack.allowPackReuse Christian Couder
2019-09-13 21:37   ` Junio C Hamano
2019-09-13 13:02 ` [RFC PATCH 09/10] builtin/pack-objects: introduce obj_is_packed() Christian Couder
2019-09-13 13:02 ` [RFC PATCH 10/10] pack-objects: improve partial packfile reuse Christian Couder
2019-09-13 22:29   ` Junio C Hamano [this message]
2019-09-14  2:02     ` Jeff King
2019-09-14  3:06       ` Junio C Hamano
2019-10-02 15:57         ` Jeff King
2019-10-03  2:06           ` Junio C Hamano
2019-10-03  6:55             ` Christian Couder
2019-10-10 23:59   ` Jonathan Tan
2019-10-11  7:39     ` Christian Couder
2019-10-11 18:01     ` Jeff King
2019-10-11 21:04       ` Jonathan Tan
2019-10-12  0:04       ` Junio C Hamano
2019-10-13  7:38         ` Jeff King
2019-10-17  7:03           ` Junio C Hamano
2019-10-17  7:23             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq7e6bde4z.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=chriscool@tuxfamily.org \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).