From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Elijah Newren <newren@gmail.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits
Date: Mon, 6 May 2024 13:53:01 +0200 [thread overview]
Message-ID: <ZjjEnVukDDpuP6X6@tanuki> (raw)
In-Reply-To: <86a1e4b8b9be99563836d1539fbf2ed4c4a6920d.1714422410.git.me@ttaylorr.com>
[-- Attachment #1: Type: text/plain, Size: 5482 bytes --]
On Mon, Apr 29, 2024 at 04:43:37PM -0400, Taylor Blau wrote:
[snip]
> +static uint32_t pseudo_merge_group_size(const struct pseudo_merge_group *group,
> + const struct pseudo_merge_matches *matches,
> + uint32_t i)
> +{
> + float C = 0.0f;
> + uint32_t n;
> +
> + /*
> + * The size of pseudo-merge groups decays according to a power series,
> + * which looks like:
> + *
> + * f(n) = C * n^-k
> + *
> + * , where 'n' is the n-th pseudo-merge group, 'f(n)' is its size, 'k'
> + * is the decay rate, and 'C' is a scaling value.
> + *
> + * The value of C depends on the number of groups, decay rate, and total
> + * number of commits. It is computed such that if there are M and N
> + * total groups and commits, respectively, that:
> + *
> + * N = f(0) + f(1) + ... f(M-1)
> + *
> + * Rearranging to isolate C, we get:
> + *
> + * N = \sum_{n=1}^M C / n^k
> + *
> + * N / C = \sum_{n=1}^M n^-k
> + *
> + * C = N / \sum_{n=1}^M n^-k
> + *
> + * For example, if we have a decay rate of 'k' being equal to 1.5, 'N'
> + * total commits equal to 10,000, and 'M' being equal to 6 groups, then
> + * the (rounded) group sizes are:
> + *
> + * { 5469, 1934, 1053, 684, 489, 372 }
> + *
> + * increasing the number of total groups, say to 10, scales the group
> + * sizes appropriately:
> + *
> + * { 5012, 1772, 964, 626, 448, 341, 271, 221, 186, 158 }
> + */
> + for (n = 0; n < group->max_merges; n++)
> + C += 1.0f / gitexp(n + 1, group->decay);
> + C = matches->unstable_nr / C;
> +
> + return (int)((C / gitexp(i + 1, group->decay)) + 0.5);
Why do we cast the return to `int` when the function returns a
`uint32_t`?
> +}
> +
> +static void init_pseudo_merge_group(struct pseudo_merge_group *group)
Nit: Should't the name rather be `pseudo_merge_group_init()`?
[snip]
> + } else if (!strcmp(key, "decay")) {
> + group->decay = git_config_int(var, value, ctx->kvi);
> + if (group->decay < 0) {
> + warning(_("%s must be non-negative, using default"), var);
> + group->decay = DEFAULT_PSEUDO_MERGE_DECAY;
> + }
The decay is a float, and your decay rate examples mention a rate of
1.5f. It's impossible to specify fractional rates though because we use
`git_config_int()`. Should we introduce a new `git_config_float()`
function to implement this properly?
> + } else if (!strcmp(key, "samplerate")) {
> + group->sample_rate = git_config_int(var, value, ctx->kvi);
> + if (!(0 <= group->sample_rate && group->sample_rate <= 100)) {
> + warning(_("%s must be between 0 and 100, using default"), var);
> + group->sample_rate = DEFAULT_PSEUDO_MERGE_SAMPLE_RATE;
> + }
> + } else if (!strcmp(key, "threshold")) {
> + if (git_config_expiry_date(&group->threshold, var, value)) {
> + strbuf_release(&buf);
Instead of having multiple exit paths where we need to release `buf` we
should likely have a comment exit path.
[snip]
> +static struct commit *push_pseudo_merge(struct pseudo_merge_group *group)
> +{
> + struct commit *merge;
> +
> + ALLOC_GROW(group->merges, group->merges_nr + 1, group->merges_alloc);
> +
> + merge = alloc_commit_node(the_repository);
> + merge->object.parsed = 1;
Why can we mark the object as parsed here?
> + merge->object.flags |= BITMAP_PSEUDO_MERGE;
> +
> + group->merges[group->merges_nr++] = merge;
> +
> + return merge;
> +}
> +
> +static struct pseudo_merge_commit_idx *pseudo_merge_idx(kh_oid_map_t *pseudo_merge_commits,
> + const struct object_id *oid)
> +
> +{
> + struct pseudo_merge_commit_idx *pmc;
> + khiter_t hash_pos;
> +
> + hash_pos = kh_get_oid_map(pseudo_merge_commits, *oid);
> + if (hash_pos == kh_end(pseudo_merge_commits)) {
> + int hash_ret;
> + hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid, &hash_ret);
> + CALLOC_ARRAY(pmc, 1);
> +
> + kh_value(pseudo_merge_commits, hash_pos) = pmc;
> + } else {
> + pmc = kh_value(pseudo_merge_commits, hash_pos);
> + }
> +
> + return pmc;
> +}
Can't we simplify this to the following (untested):
static struct pseudo_merge_commit_idx *pseudo_merge_idx(kh_oid_map_t *pseudo_merge_commits,
const struct object_id *oid)
{
struct pseudo_merge_commit_idx *pmc;
khiter_t hash_pos;
int hash_ret;
hash_pos = kh_put_oid_map(pseudo_merge_commits, *oid, &hash_ret);
if (hash_ret) {
CALLOC_ARRAY(pmc, 1);
kh_value(pseudo_merge_commits, hash_pos) = pmc;
} else {
pmc = kh_value(pseudo_merge_commits, hash_pos);
}
return pmc;
}
> +
> +#define MIN_PSEUDO_MERGE_SIZE 8
> +
> +static void select_pseudo_merges_1(struct pseudo_merge_group *group,
> + struct pseudo_merge_matches *matches,
> + kh_oid_map_t *pseudo_merge_commits,
> + uint32_t *pseudo_merges_nr)
> +{
> + uint32_t i, j;
> + uint32_t stable_merges_nr;
> +
> + if (!matches->stable_nr && !matches->unstable_nr)
> + return; /* all tips in this group already have bitmaps */
It's nice that there are some comments, but there are quite a lot of
non-obvious things going on in this function that would warrant an
explanation that expands a bit more into what exactly it is that we are
doing here.
I may only be speaking for myself, but I basically have no clue what we
do here :) Something something pseudo merges, I guess. But there is no
in-code explanation at all what a "stable" or "unstable" commit is, how
exactly we match commits and other higher-level ideas.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-05-06 11:53 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
2024-03-20 22:05 ` [PATCH 01/24] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
2024-03-21 21:24 ` Junio C Hamano
2024-03-21 22:13 ` Taylor Blau
2024-03-21 22:22 ` Junio C Hamano
2024-03-20 22:05 ` [PATCH 02/24] config: repo_config_get_expiry() Taylor Blau
2024-04-10 17:54 ` Jeff King
2024-04-29 19:39 ` Taylor Blau
2024-03-20 22:05 ` [PATCH 03/24] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
2024-04-10 18:05 ` Jeff King
2024-04-29 19:47 ` Taylor Blau
2024-03-20 22:05 ` [PATCH 04/24] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
2024-04-10 18:06 ` Jeff King
2024-03-20 22:05 ` [PATCH 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
2024-04-10 18:10 ` Jeff King
2024-03-20 22:05 ` [PATCH 06/24] pseudo-merge.ch: initial commit Taylor Blau
2024-03-20 22:05 ` [PATCH 07/24] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
2024-03-20 22:05 ` [PATCH 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
2024-03-20 22:05 ` [PATCH 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
2024-03-20 22:05 ` [PATCH 10/24] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
2024-03-20 22:05 ` [PATCH 11/24] pack-bitmap-write.c: select " Taylor Blau
2024-03-20 22:05 ` [PATCH 12/24] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
2024-03-20 22:05 ` [PATCH 13/24] pack-bitmap: extract `read_bitmap()` function Taylor Blau
2024-03-20 22:05 ` [PATCH 14/24] pseudo-merge: scaffolding for reads Taylor Blau
2024-03-20 22:05 ` [PATCH 15/24] pack-bitmap.c: read pseudo-merge extension Taylor Blau
2024-03-20 22:05 ` [PATCH 16/24] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
2024-03-20 22:05 ` [PATCH 17/24] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
2024-03-20 22:05 ` [PATCH 18/24] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
2024-03-20 22:05 ` [PATCH 19/24] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
2024-03-20 22:05 ` [PATCH 20/24] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
2024-03-20 22:06 ` [PATCH 21/24] pack-bitmap: extra trace2 information Taylor Blau
2024-03-20 22:06 ` [PATCH 22/24] ewah: `bitmap_equals_ewah()` Taylor Blau
2024-03-20 22:06 ` [PATCH 23/24] pseudo-merge: implement support for finding existing merges Taylor Blau
2024-03-20 22:06 ` [PATCH 24/24] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
2024-03-21 19:50 ` [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
2024-04-29 20:42 ` [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
2024-05-06 11:52 ` Patrick Steinhardt
2024-05-06 16:37 ` Taylor Blau
2024-05-10 11:46 ` Patrick Steinhardt
2024-05-13 19:47 ` Taylor Blau
2024-05-14 6:33 ` Patrick Steinhardt
2024-04-29 20:43 ` [PATCH v2 02/23] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
2024-04-29 20:43 ` [PATCH v2 03/23] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
2024-04-29 20:43 ` [PATCH v2 04/23] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
2024-05-06 11:52 ` Patrick Steinhardt
2024-05-06 18:24 ` Taylor Blau
2024-04-29 20:43 ` [PATCH v2 05/23] pseudo-merge.ch: initial commit Taylor Blau
2024-04-29 20:43 ` [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
2024-05-06 11:52 ` Patrick Steinhardt
2024-05-06 18:48 ` Taylor Blau
2024-05-10 11:47 ` Patrick Steinhardt
2024-05-13 18:42 ` Jeff King
2024-05-13 20:19 ` Taylor Blau
2024-04-29 20:43 ` [PATCH v2 07/23] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
2024-04-29 20:43 ` [PATCH v2 08/23] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
2024-05-13 18:50 ` Jeff King
2024-05-14 0:54 ` Taylor Blau
2024-04-29 20:43 ` [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
2024-05-06 11:53 ` Patrick Steinhardt [this message]
2024-05-06 19:58 ` Taylor Blau
2024-05-13 19:03 ` Jeff King
2024-05-14 0:58 ` Taylor Blau
2024-05-16 8:07 ` Jeff King
2024-05-16 22:43 ` Junio C Hamano
2024-04-29 20:43 ` [PATCH v2 10/23] pack-bitmap-write.c: select " Taylor Blau
2024-05-06 11:53 ` Patrick Steinhardt
2024-05-06 20:05 ` Taylor Blau
2024-05-10 11:47 ` Patrick Steinhardt
2024-04-29 20:43 ` [PATCH v2 11/23] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
2024-04-29 20:43 ` [PATCH v2 12/23] pack-bitmap: extract `read_bitmap()` function Taylor Blau
2024-04-29 20:43 ` [PATCH v2 13/23] pseudo-merge: scaffolding for reads Taylor Blau
2024-04-29 20:43 ` [PATCH v2 14/23] pack-bitmap.c: read pseudo-merge extension Taylor Blau
2024-04-29 20:44 ` [PATCH v2 15/23] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
2024-04-29 20:44 ` [PATCH v2 16/23] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
2024-04-29 20:44 ` [PATCH v2 17/23] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
2024-04-29 20:44 ` [PATCH v2 18/23] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
2024-04-29 20:44 ` [PATCH v2 19/23] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
2024-04-29 20:44 ` [PATCH v2 20/23] pack-bitmap: extra trace2 information Taylor Blau
2024-04-29 20:44 ` [PATCH v2 21/23] ewah: `bitmap_equals_ewah()` Taylor Blau
2024-04-29 20:44 ` [PATCH v2 22/23] pseudo-merge: implement support for finding existing merges Taylor Blau
2024-04-29 20:44 ` [PATCH v2 23/23] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
2024-04-30 20:03 ` [PATCH v2 00/23] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
2024-05-01 14:40 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZjjEnVukDDpuP6X6@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).