git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: John Cai via GitGitGadget <gitgitgadget@gmail.com>,
	git@vger.kernel.org, John Cai <johncai86@gmail.com>
Subject: Re: [PATCH] promisor-remote.c: use oidset for deduplication
Date: Fri, 14 Jan 2022 11:14:56 -0800	[thread overview]
Message-ID: <xmqqzgnyb03z.fsf@gitster.g> (raw)
In-Reply-To: <220114.86wnj2se41.gmgdl@evledraar.gmail.com> ("Ævar Arnfjörð Bjarmason"'s message of "Fri, 14 Jan 2022 13:11:57 +0100")

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> From what I understand of GGG I think you updated only the summary on
> the PR but not the commit itself, the latter is what would go into
> git.git. Here the commit should be updated so we get this message.

Ah, I have long learned to ignore the blurb after the three-dash
line when a patch comes via GGG (because the PR text is often
useless and most often people seem to use text identical to the
commit log message).  I didn't realize that the proposed commit log
message was useless in this particular patch.  I was wondering why
you were talking about an extra blank line after the title there ;-)

> The part after the "---" is usually just used in this project for ad-hoc
> list-only comment.

True.  "What I'm not sure about is ..." however does belong to the
list-only comment, though.  If it is not 

> The difference between these two APIs is thaht oidset is hash-backed,
> and you'd insert into it and we de-duplicate any duplicate OIDs on-the-fly.
> 
> The oid_array is just an realloc()'d "struct object_id *oid". On
> insertion you can insert duplicates, but it has the ability to track
> "I've sorted these", and "let's iterate over this sorted, and de-dup any
> duplicates".
>
> We have the two APIs for a reason, but I don't know in any of these
> cases whether this change is safe.
>
> Does e.g. index-pack.c always receive de-duplicated OIDs and we were
> wasting CPU cycles using an oidset?
>
> Do some of these like pack-objects.c receive de-duplicated OIDs from
> e.g. "git repack" *now*, but we just lack test coverage to see that
> they're happy to get duplicate OIDs on stdin (e.g. manually from a
> user), and this would introduce a bug?

Also, if oid_array is used to produce a de-duplicated list of object
names in the current code, it is very likely that oid_array is
sorted (perhaps the objects are fed in sorted order), and the
callers depend on the order of the objects they find in the array.
Throwing sorted list of object names at oidset and then iterating
over what is in the oidset would likely to destroy the original
ordering.  I do not offhand know if the callers are broken by such a
change (either correctness-wise or performance-wise).

> But most importantly is it worth it? What's the rationale for the
> change? Less CPU/memory use? Getting e.g. "hyperfine" or "/usr/bin/time
> -v" output for those (if so) would be valuable.

Thanks.

  reply	other threads:[~2022-01-14 19:15 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-13 20:32 [PATCH] promisor-remote.c: use oidset for deduplication John Cai via GitGitGadget
2022-01-13 23:45 ` Junio C Hamano
2022-01-14 12:11 ` Ævar Arnfjörð Bjarmason
2022-01-14 19:14   ` Junio C Hamano [this message]
2022-01-14 23:12     ` Junio C Hamano
2022-01-24 22:55   ` John Cai
2022-01-25 19:17     ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqzgnyb03z.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=johncai86@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).