git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Philip Oakley <philipoakley@iee.email>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Erik Faye-Lund <kusmabite@gmail.com>,
	Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow
Date: Fri, 24 Dec 2021 18:31:50 +0000	[thread overview]
Message-ID: <54e16578-93f3-33e7-bd31-8909992e3407@iee.email> (raw)
In-Reply-To: <211224.86zgoqgd7q.gmgdl@evledraar.gmail.com>

On 24/12/2021 16:46, Ævar Arnfjörð Bjarmason wrote:
> On Fri, Dec 24 2021, Philip Oakley wrote:
>
>> On 21/12/2021 23:36, Ævar Arnfjörð Bjarmason wrote:
>>> On Tue, Dec 21 2021, Philip Oakley wrote:
>>>
>>>> Sorry for the late comment..
>>>>
>>>> On 10/12/2021 14:31, Johannes Schindelin wrote:
>>>>> Hi Ævar,
>>>>>
>>>>> On Thu, 9 Dec 2021, Ævar Arnfjörð Bjarmason wrote:
>>>>>
>>>>>> The difference between "master" and "git-for-windows/main" is large
>>>>>> enough that comparing the two will segfault on my system. This is
>>>>>> because the range-diff code does some expensive calculations and will
>>>>>> overflow the "int" type.
>>>>> You are holding this thing wrong.
>>>>>
>>>>> The `main` branch of Git for Windows uses merging rebases, therefore you
>>>>> need to use a commit range like
>>>>> `git-for-windows/main^{/^Start.the.merging}..git-for-windows/main` and
>>>>> compare it to `git-for-windows/main..master`.
>>>> I'm not sure that a Git repo has an established way of indicating to how
>>>> it's branching/merging/releasing workflow is set up, especially for
>>>> projects with non-normative use cases, such as Git for Windows. We don't
>>>> have a git document for covering  the different workflows in common use
>>>> for easy reference and consistent terminology.
>>>>
>>>> The merging rebase flow, with 'fake' merge does solve a problem that
>>>> git.git doesn't have but could easily be a common process for 'friendly
>>>> forks' that follow an upstream with local patches. The choice of
>>>> '{/^Start.the.merging}' is currently specific to the Git-for-Windows
>>>> case making it harder to discover this useful maintainer method.
>>> Yes, but let's not get lost in the weeds here. As I noted I just picked
>>> GFW as a handy example of a large history & that command as a handy
>>> example of something that segfaults on "master".
>> Had you already experienced the segfault locally, without using the GFW
>> example? How many commits were present in that case?
> Yes, I ran into it "orginally" just range-diffing as part of a local
> build process.
>
> I could dig up what revision range it was exactly, but does it matter?
No the particular range-diff doesn't matter, I was checking that this
wasn't a confusion about the Git for Windows workflow.

>
>> The GFW example seems like it's taken the discussion in the wrong direction.
>>
>> For me:
>> $ git log git/master..origin/main --pretty=oneline | wc -l
>> 62105
>>
>> That's a lot of commits to have in a range diff. It's almost as big as
>> the whole of git/master
>>
>> $ git log git/master --pretty=oneline | wc -l
>> 65400
>>
>> Personally I'd like a way of trimming 'deadheads' that's a bit easier
>> that needing to remember Dscho's magic string [1], but time will tell.
> There are some repos that move forward by 500-1k commits/day, and people
> do cherry-pick patches etc. So wanting to range-diff after a couple of
> months is something you might do...

It feels to me that in such cases that maybe the algorithm may need
tweaking for the needle in a haystack case ;-)
>
>>> So the point really isn't to say that we should fix range-diff becase
>>> it'll allow us to run this practically useful command on a git.git fork.
>>>
>>>> I fully agree that the range-diff should probably have a patch limit at
>>>> some sensible value.
>>> Why would it? If I'm willing to spend the CPU to produce a range-diff of
>>> an absurdly large range and I've got the memory why shouldn't we support
>>> it?
>> There will always be a limit somewhere, and if it's not commit count or
>> other easily explained & checked limit it will be hard to rationalise
>> about why Git suddenly fails with an error (or segfault) in those
>> humungous case.
> I think it's fairly easy to explain the "your system wouldn't let us
> malloc more, we're dying" that we get from xmalloc(), st_*() and the
> like.
That's better than a segfault, but does it give actionable information
to the user as to what (& how much) they should do? Hence the comment
about a commit count measure.

>>> We don't in cases like xdiff where it's not trivial to just raise the
>>> limits, but here it seems relatively easy.
>>>
>>> I think limits to save users from spending CPU time they didn't expect
>>> are reasonable, but then we can handle them like the diff/merge rename
>>> detection limits, i.e. print a warning/advice, and allow the user to
>>> opt-out.
>>>
>>> That also doesn't really apply here since "diff/merge" will/might still
>>> do something useful in those scenarios, whereas range-diff would just
>>> have truncated output.
>>>
>>>> The 'confusion' between the types size_t, long and int, does ripple
>>>> through a lot of portable code, as shown in the series. Not an easy problem.
>>> Yes, although here we're not just casting and overflowing types, but
>>> overflowing on multiplication and addition, whereas usually we'd just
>>> overflow on "nr" being too big for "int" or similar.
>> I've been very slowly looking at the `long` limits on GFW which have
>> very similar arithmetic issues for pointers, often with no clear answers.
> Right, that's to do with the whole "long" or whatever use in the
> object.c and related code, but I don't think that's applicable here, is
> it?

That question was more about the policy aspects of ensuring that any
proposals aren't 'head against a brick wall' when it comes to the
potential intrusiveness.

Thanks for the clarifications.

Philip


      reply	other threads:[~2021-12-24 18:31 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 19:19 [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 01/10] string-list API: change "nr" and "alloc" to "size_t" Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 02/10] range-diff.c: don't use st_mult() for signed "int" Ævar Arnfjörð Bjarmason
2021-12-10  3:39   ` Jeff King
2021-12-10 10:22     ` Ævar Arnfjörð Bjarmason
2021-12-10 11:41       ` Jeff King
2021-12-10 12:31         ` Ævar Arnfjörð Bjarmason
2021-12-10 19:24           ` Phillip Wood
2021-12-14 14:34           ` Jeff King
2021-12-10 14:27         ` Johannes Schindelin
2021-12-10 14:58           ` Ævar Arnfjörð Bjarmason
2021-12-11 14:01             ` Johannes Schindelin
2021-12-12 17:44               ` Ævar Arnfjörð Bjarmason
2021-12-14 14:42           ` Jeff King
2021-12-09 19:19 ` [RFC PATCH 03/10] range-diff.c: use "size_t" to refer to "struct string_list"'s "nr" Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 04/10] range-diff: zero out elements in "cost" first Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 05/10] linear-assignment.c: split up compute_assignment() function Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 06/10] linear-assignment.c: take "size_t", not "int" for *_count Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 07/10] linear-assignment.c: convert a macro to a "static inline" function Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 08/10] linear-assignment.c: detect signed add/mul on GCC and Clang Ævar Arnfjörð Bjarmason
2021-12-10  3:56   ` Jeff King
2021-12-09 19:19 ` [RFC PATCH 09/10] linear-assignment.c: add and use intprops.h from Gnulib Ævar Arnfjörð Bjarmason
2021-12-09 19:19 ` [RFC PATCH 10/10] linear-assignment.c: use "intmax_t" instead of "int" Ævar Arnfjörð Bjarmason
2021-12-10  4:00   ` Jeff King
2021-12-10 12:30 ` [RFC PATCH v2 0/5] range-diff: fix segfault due to integer overflow Ævar Arnfjörð Bjarmason
2021-12-10 12:30   ` [RFC PATCH v2 1/5] range-diff: zero out elements in "cost" first Ævar Arnfjörð Bjarmason
2021-12-14 13:36     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 2/5] linear-assignment.c: split up compute_assignment() function Ævar Arnfjörð Bjarmason
2021-12-14 13:39     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 3/5] linear-assignment.c: take "size_t", not "int" for *_count Ævar Arnfjörð Bjarmason
2021-12-14 13:40     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 4/5] range-diff.c: rename "n" to "column_count" in get_correspondences() Ævar Arnfjörð Bjarmason
2021-12-14 13:42     ` Jeff King
2021-12-10 12:30   ` [RFC PATCH v2 5/5] range-diff: fix integer overflow & segfault on cost[i + n * j] Ævar Arnfjörð Bjarmason
2021-12-14 14:04     ` Jeff King
2021-12-10 14:31 ` [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow Johannes Schindelin
2021-12-10 15:07   ` Ævar Arnfjörð Bjarmason
2021-12-21 23:22   ` Philip Oakley
2021-12-21 23:36     ` Ævar Arnfjörð Bjarmason
2021-12-22 20:50       ` Johannes Schindelin
2021-12-22 21:11         ` Jeff King
2021-12-24 11:15       ` Philip Oakley
2021-12-24 16:46         ` Ævar Arnfjörð Bjarmason
2021-12-24 18:31           ` Philip Oakley [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54e16578-93f3-33e7-bd31-8909992e3407@iee.email \
    --to=philipoakley@iee.email \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=kusmabite@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).