git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: gitgitgadget@gmail.com
Cc: git@vger.kernel.org, sbeller@google.com, peff@peff.net,
	jrnieder@gmail.com, avarab@gmail.com, gitster@pobox.com,
	dstolee@microsoft.com, Jonathan Tan <jonathantanmy@google.com>
Subject: Re: [PATCH v3 5/9] midx: refactor permutation logic and pack sorting
Date: Wed, 23 Jan 2019 13:00:54 -0800	[thread overview]
Message-ID: <20190123210054.118647-1-jonathantanmy@google.com> (raw)
In-Reply-To: <a0d4cc6cb3fbca0de732dfd34cbec4d765b55228.1547047269.git.gitgitgadget@gmail.com>

Following Stolee's wishes [1], I'll stick to the technical aspects here.
Patches 1-4 look correct technically to me, so let me start here.

[1] https://public-inbox.org/git/3aa0a7ea-6c30-2c61-0815-2b9ab8304564@gmail.com/

> +struct pack_info {
> +	uint32_t orig_pack_int_id;
> +	char *pack_name;
> +	struct packed_git *p;
> +};
> +
> +static int pack_info_compare(const void *_a, const void *_b)
> +{
> +	struct pack_info *a = (struct pack_info *)_a;
> +	struct pack_info *b = (struct pack_info *)_b;
> +	return strcmp(a->pack_name, b->pack_name);
> +}
> +
>  struct pack_list {
> -	struct packed_git **list;
> -	char **names;
> +	struct pack_info *info;
>  	uint32_t nr;
> -	uint32_t alloc_list;
> -	uint32_t alloc_names;
> +	uint32_t alloc;
>  	struct multi_pack_index *m;
>  };

"list" and "names" look like a parallel array, but as far as I can tell,
they are actually not, because "names" is eventually sorted but not
"list". So combining these two arrays is great, and removes a lot of
confusion.

Now the packed_git list will be sorted too when the names are sorted; I
was surprised that code did not need to be changed to take this into
account, but I see now that it's because the struct packed_gits don't
need to be referenced anymore after get_sorted_entries().

> -struct pack_pair {
> -	uint32_t pack_int_id;
> -	char *pack_name;
> -};
> -
> -static int pack_pair_compare(const void *_a, const void *_b)
> -{
> -	struct pack_pair *a = (struct pack_pair *)_a;
> -	struct pack_pair *b = (struct pack_pair *)_b;
> -	return strcmp(a->pack_name, b->pack_name);
> -}
> -
> -static void sort_packs_by_name(char **pack_names, uint32_t nr_packs, uint32_t *perm)
> -{
> -	uint32_t i;
> -	struct pack_pair *pairs;
> -
> -	ALLOC_ARRAY(pairs, nr_packs);
> -
> -	for (i = 0; i < nr_packs; i++) {
> -		pairs[i].pack_int_id = i;
> -		pairs[i].pack_name = pack_names[i];
> -	}
> -
> -	QSORT(pairs, nr_packs, pack_pair_compare);
> -
> -	for (i = 0; i < nr_packs; i++) {
> -		pack_names[i] = pairs[i].pack_name;
> -		perm[pairs[i].pack_int_id] = i;
> -	}
> -
> -	free(pairs);
> -}

It's nice that this gets removed.

>  static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
> -				      uint32_t *pack_perm,
>  				      struct pack_midx_entry *e,
>  				      uint32_t pos)
>  {
> @@ -488,7 +464,7 @@ static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
>  		return 1;
>  
>  	nth_midxed_object_oid(&e->oid, m, pos);
> -	e->pack_int_id = pack_perm[nth_midxed_pack_int_id(m, pos)];
> +	e->pack_int_id = nth_midxed_pack_int_id(m, pos);
>  	e->offset = nth_midxed_offset(m, pos);

nth_midxed_pack_midx_entry() is only called from get_sorted_entries(),
which is now called *before* any sorting of pack_info, so there is no
longer any need for considering pack_perm. As to why
get_sorted_entries() needs to be called before sorting of pack_info and
not after, that question will be answered later.

> @@ -522,8 +498,7 @@ static void fill_pack_entry(uint32_t pack_int_id,
>   * of a packfile containing the object).
>   */
>  static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
> -						  struct packed_git **p,
> -						  uint32_t *perm,
> +						  struct pack_info *info,
>  						  uint32_t nr_packs,
>  						  uint32_t *nr_objects)

[snip rest of get_sorted_entries]

As stated above, get_sorted_entries() is now called before any sorting
of pack_info, so there is no need for perm.

>  static size_t write_midx_object_offsets(struct hashfile *f, int large_offset_needed,
> +					uint32_t *perm,
>  					struct pack_midx_entry *objects, uint32_t nr_objects)
>  {
>  	struct pack_midx_entry *list = objects;
> @@ -699,7 +675,7 @@ static size_t write_midx_object_offsets(struct hashfile *f, int large_offset_nee
>  	for (i = 0; i < nr_objects; i++) {
>  		struct pack_midx_entry *obj = list++;
>  
> -		hashwrite_be32(f, obj->pack_int_id);
> +		hashwrite_be32(f, perm[obj->pack_int_id]);
>  
>  		if (large_offset_needed && obj->offset >> 31)
>  			hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++);

This was unexpected to me - why is perm introduced here when it wasn't
needed before? This is because get_sorted_entries() is now called before
any sorting of pack_info. get_sorted_entries() generates entries using
the old pack_int_id, but the resulting packfile is written using new
pack numbers, so when writing the entries to the file, they must be
written using the new numbers.

The mapping of perm is: perm[old number] = new number.

> @@ -795,10 +768,7 @@ int write_midx_file(const char *object_dir)
>  	if (packs.m && packs.nr == packs.m->num_packs)
>  		goto cleanup;
>  
> -	ALLOC_ARRAY(pack_perm, packs.nr);
> -	sort_packs_by_name(packs.names, packs.nr, pack_perm);
> -
> -	entries = get_sorted_entries(packs.m, packs.list, pack_perm, packs.nr, &nr_entries);
> +	entries = get_sorted_entries(packs.m, packs.info, packs.nr, &nr_entries);
>  
>  	for (i = 0; i < nr_entries; i++) {
>  		if (entries[i].offset > 0x7fffffff)
> @@ -807,8 +777,15 @@ int write_midx_file(const char *object_dir)
>  			large_offsets_needed = 1;
>  	}
>  
> +	QSORT(packs.info, packs.nr, pack_info_compare);
> +
> +	ALLOC_ARRAY(pack_perm, packs.nr);
> +	for (i = 0; i < packs.nr; i++) {
> +		pack_perm[packs.info[i].orig_pack_int_id] = i;
> +	}
> +
>  	for (i = 0; i < packs.nr; i++)
> -		pack_name_concat_len += strlen(packs.names[i]) + 1;
> +		pack_name_concat_len += strlen(packs.info[i].pack_name) + 1;
>  
>  	if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
>  		pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -

Indeed, the sorting of pack_info is moved to after get_sorted_entries().
Also, pack_perm[old number] = new number, as expected.

I think a comment explaining why the perm is needed would be helpful -
something explaining that the entries were generated using the old pack
numbers, so we need this mapping to be able to write them using the new
numbers.

As part of this review, I did attempt to make everything use the sorted
pack_info order, but failed. I got as far as making get_sorted_entries()
always use start_pack=0 and skip the pack_infos that didn't have the p
pointer (to match the existing behavior of get_sorted_entries() that
only operates on new pack_infos being added by add_pack_to_midx), but it
still didn't work, and I didn't investigate further.

  reply	other threads:[~2019-01-23 21:01 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-10 18:06 [PATCH 0/5] Create 'expire' and 'repack' verbs for git-multi-pack-index Derrick Stolee via GitGitGadget
2018-12-10 18:06 ` [PATCH 1/5] multi-pack-index: prepare for 'expire' verb Derrick Stolee via GitGitGadget
2018-12-11  1:35   ` Stefan Beller
2018-12-11  1:59     ` SZEDER Gábor
2018-12-11 12:32       ` Derrick Stolee
2018-12-10 18:06 ` [PATCH 2/5] midx: refactor permutation logic Derrick Stolee via GitGitGadget
2018-12-10 18:06 ` [PATCH 3/5] multi-pack-index: implement 'expire' verb Derrick Stolee via GitGitGadget
2018-12-10 18:06 ` [PATCH 4/5] multi-pack-index: prepare 'repack' verb Derrick Stolee via GitGitGadget
2018-12-11  1:54   ` Stefan Beller
2018-12-11 12:45     ` Derrick Stolee
2018-12-10 18:06 ` [PATCH 5/5] midx: implement midx_repack() Derrick Stolee via GitGitGadget
2018-12-11  2:32   ` Stefan Beller
2018-12-11 13:00     ` Derrick Stolee
2018-12-12  7:40   ` Junio C Hamano
2018-12-13  4:23     ` Junio C Hamano
2018-12-21 16:28 ` [PATCH v2 0/7] Create 'expire' and 'repack' verbs for git-multi-pack-index Derrick Stolee via GitGitGadget
2018-12-21 16:28   ` [PATCH v2 1/7] repack: refactor pack deletion for future use Derrick Stolee via GitGitGadget
2018-12-21 16:28   ` [PATCH v2 2/7] Docs: rearrange subcommands for multi-pack-index Derrick Stolee via GitGitGadget
2018-12-21 16:28   ` [PATCH v2 3/7] multi-pack-index: prepare for 'expire' subcommand Derrick Stolee via GitGitGadget
2018-12-21 16:28   ` [PATCH v2 4/7] midx: refactor permutation logic Derrick Stolee via GitGitGadget
2018-12-21 16:28   ` [PATCH v2 5/7] multi-pack-index: implement 'expire' verb Derrick Stolee via GitGitGadget
2018-12-21 16:28   ` [PATCH v2 7/7] midx: implement midx_repack() Derrick Stolee via GitGitGadget
2018-12-21 16:28   ` [PATCH v2 6/7] multi-pack-index: prepare 'repack' subcommand Derrick Stolee via GitGitGadget
2019-01-09 15:21   ` [PATCH v3 0/9] Create 'expire' and 'repack' verbs for git-multi-pack-index Derrick Stolee via GitGitGadget
2019-01-09 15:21     ` [PATCH v3 1/9] repack: refactor pack deletion for future use Derrick Stolee via GitGitGadget
2019-01-09 15:21     ` [PATCH v3 2/9] Docs: rearrange subcommands for multi-pack-index Derrick Stolee via GitGitGadget
2019-01-09 15:21     ` [PATCH v3 3/9] multi-pack-index: prepare for 'expire' subcommand Derrick Stolee via GitGitGadget
2019-01-09 15:21     ` [PATCH v3 4/9] midx: simplify computation of pack name lengths Derrick Stolee via GitGitGadget
2019-01-09 15:21     ` [PATCH v3 5/9] midx: refactor permutation logic and pack sorting Derrick Stolee via GitGitGadget
2019-01-23 21:00       ` Jonathan Tan [this message]
2019-01-24 17:34         ` Derrick Stolee
2019-01-24 19:17           ` Derrick Stolee
2019-01-09 15:21     ` [PATCH v3 6/9] multi-pack-index: implement 'expire' verb Derrick Stolee via GitGitGadget
2019-01-09 15:54       ` SZEDER Gábor
2019-01-10 18:05         ` Junio C Hamano
2019-01-23 22:13       ` Jonathan Tan
2019-01-24 17:36         ` Derrick Stolee
2019-01-09 15:21     ` [PATCH v3 7/9] multi-pack-index: prepare 'repack' subcommand Derrick Stolee via GitGitGadget
2019-01-09 15:56       ` SZEDER Gábor
2019-01-23 22:38       ` Jonathan Tan
2019-01-24 19:36         ` Derrick Stolee
2019-01-24 21:38           ` Jonathan Tan
2019-01-09 15:21     ` [PATCH v3 8/9] midx: implement midx_repack() Derrick Stolee via GitGitGadget
2019-01-23 22:33       ` Jonathan Tan
2019-01-09 15:21     ` [PATCH v3 9/9] multi-pack-index: test expire while adding packs Derrick Stolee via GitGitGadget
2019-01-17 15:27     ` [PATCH v3 0/9] Create 'expire' and 'repack' verbs for git-multi-pack-index Derrick Stolee
2019-01-23 22:44     ` Jonathan Tan
2019-01-24 21:51     ` [PATCH v4 00/10] " Derrick Stolee via GitGitGadget
2019-01-24 21:51       ` [PATCH v4 01/10] repack: refactor pack deletion for future use Derrick Stolee via GitGitGadget
2019-01-24 21:51       ` [PATCH v4 02/10] Docs: rearrange subcommands for multi-pack-index Derrick Stolee via GitGitGadget
2019-01-24 21:51       ` [PATCH v4 03/10] multi-pack-index: prepare for 'expire' subcommand Derrick Stolee via GitGitGadget
2019-01-24 21:51       ` [PATCH v4 04/10] midx: simplify computation of pack name lengths Derrick Stolee via GitGitGadget
2019-01-24 21:51       ` [PATCH v4 05/10] midx: refactor permutation logic and pack sorting Derrick Stolee via GitGitGadget
2019-01-24 21:51       ` [PATCH v4 06/10] multi-pack-index: implement 'expire' subcommand Derrick Stolee via GitGitGadget
2019-01-24 21:51       ` [PATCH v4 07/10] multi-pack-index: prepare 'repack' subcommand Derrick Stolee via GitGitGadget
2019-01-25 23:24         ` Josh Steadmon
2019-01-24 21:52       ` [PATCH v4 08/10] midx: implement midx_repack() Derrick Stolee via GitGitGadget
2019-01-26 17:10         ` Derrick Stolee
2019-01-27 22:50           ` Junio C Hamano
2019-01-24 21:52       ` [PATCH v4 09/10] multi-pack-index: test expire while adding packs Derrick Stolee via GitGitGadget
2019-01-24 21:52       ` [PATCH v4 10/10] midx: add test that 'expire' respects .keep files Derrick Stolee via GitGitGadget
2019-01-24 22:14       ` [PATCH v4 00/10] Create 'expire' and 'repack' verbs for git-multi-pack-index Jonathan Tan
2019-01-25 23:49       ` Josh Steadmon
2019-04-24 15:14       ` [PATCH v5 00/11] " Derrick Stolee
2019-04-24 15:14         ` [PATCH v5 01/11] repack: refactor pack deletion for future use Derrick Stolee
2019-04-24 15:14         ` [PATCH v5 02/11] Docs: rearrange subcommands for multi-pack-index Derrick Stolee
2019-04-24 15:14         ` [PATCH v5 03/11] multi-pack-index: prepare for 'expire' subcommand Derrick Stolee
2019-04-24 15:14         ` [PATCH v5 04/11] midx: simplify computation of pack name lengths Derrick Stolee
2019-04-24 15:14         ` [PATCH v5 05/11] midx: refactor permutation logic and pack sorting Derrick Stolee
2019-04-24 15:14         ` [PATCH v5 06/11] multi-pack-index: implement 'expire' subcommand Derrick Stolee
2019-04-24 15:14         ` [PATCH v5 07/11] multi-pack-index: prepare 'repack' subcommand Derrick Stolee
2019-04-24 15:14         ` [PATCH v5 08/11] midx: implement midx_repack() Derrick Stolee
2019-04-24 15:14         ` [PATCH v5 09/11] multi-pack-index: test expire while adding packs Derrick Stolee
2019-04-24 15:14         ` [PATCH v5 10/11] midx: add test that 'expire' respects .keep files Derrick Stolee
2019-04-24 15:14         ` [PATCH v5 11/11] t5319-multi-pack-index.sh: test batch size zero Derrick Stolee
2019-04-25  5:38         ` [PATCH v5 00/11] Create 'expire' and 'repack' verbs for git-multi-pack-index Junio C Hamano
2019-04-25 11:06           ` Derrick Stolee
2019-05-14 18:47         ` [PATCH v6 " Derrick Stolee
2019-05-14 18:47           ` [PATCH v6 01/11] repack: refactor pack deletion for future use Derrick Stolee
2019-05-14 18:47           ` [PATCH v6 02/11] Docs: rearrange subcommands for multi-pack-index Derrick Stolee
2019-05-14 18:47           ` [PATCH v6 03/11] multi-pack-index: prepare for 'expire' subcommand Derrick Stolee
2019-05-14 18:47           ` [PATCH v6 04/11] midx: simplify computation of pack name lengths Derrick Stolee
2019-05-14 18:47           ` [PATCH v6 05/11] midx: refactor permutation logic and pack sorting Derrick Stolee
2019-05-14 18:47           ` [PATCH v6 06/11] multi-pack-index: implement 'expire' subcommand Derrick Stolee
2019-05-14 18:47           ` [PATCH v6 07/11] multi-pack-index: prepare 'repack' subcommand Derrick Stolee
2019-05-14 18:47           ` [PATCH v6 08/11] midx: implement midx_repack() Derrick Stolee
2019-05-14 18:47           ` [PATCH v6 09/11] multi-pack-index: test expire while adding packs Derrick Stolee
2019-05-14 18:47           ` [PATCH v6 10/11] midx: add test that 'expire' respects .keep files Derrick Stolee
2019-05-14 18:47           ` [PATCH v6 11/11] t5319-multi-pack-index.sh: test batch size zero Derrick Stolee
2019-06-10 14:15           ` [PATCH v6 00/11] Create 'expire' and 'repack' verbs for git-multi-pack-index Derrick Stolee
2019-06-10 17:31             ` Junio C Hamano
2019-06-10 17:57               ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190123210054.118647-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).