From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS53758 23.128.96.0/24 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by dcvr.yhbt.net (Postfix) with ESMTP id B145E1F8C6 for ; Fri, 3 Sep 2021 11:06:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348628AbhICLHH (ORCPT ); Fri, 3 Sep 2021 07:07:07 -0400 Received: from cloud.peff.net ([104.130.231.41]:38616 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235007AbhICLHG (ORCPT ); Fri, 3 Sep 2021 07:07:06 -0400 Received: (qmail 27065 invoked by uid 109); 3 Sep 2021 11:06:06 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 03 Sep 2021 11:06:06 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 7465 invoked by uid 111); 3 Sep 2021 11:06:05 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 03 Sep 2021 07:06:05 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 3 Sep 2021 07:06:05 -0400 From: Jeff King To: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason Cc: Junio C Hamano , git@vger.kernel.org, Emily Shaffer , Derrick Stolee Subject: Re: [PATCH v3 2/5] strvec: add a strvec_pushvec() Message-ID: References: <87v93i8svd.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87v93i8svd.fsf@evledraar.gmail.com> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Sep 03, 2021 at 01:19:33AM +0200, Ævar Arnfjörð Bjarmason wrote: > > One thing that did surprise me: the use of "int" here for iterating, > > rather than size_t. But it seems that strvec is already storing ints, > > which is an accident! > > Is it really? If you temporarily try to say convert that to "size_t *nr" > to have the compiler catch all the cases where we use "nr", and then > s/size_t/int/g those all, you'll find that e.g. setup_revisions() and > the like expect to take either "int argc" or the strvec equivalent. It most definitely is an accident, in the sense that the commit which changed it did not mean to. :) > We can sensibly convert some of those to size_t, but not all, and the > int v.s. size_t inconsistency as a result feels weird. > > Since the main point of this API is to be a wrapper for what a C main() > would take, shouldn't its prototype mirror its prototype? I.e. we should > stick to "int" here? That isn't the main point of this API. The reason the name changed away from argv_array was so that it would be more obviously a generally useful array of strings. But even if that were the use, the point isn't that we expect to see 2B or even 4B strings in any reasonable use case. The point is that we don't want to accidentally overflow the integer counter, leading to reads and writes of random bits of memory. And all it takes to do that is some code like: while (strbuf_readline(stdin, &buf)) strvec_push(&foo, buf.buf); On a system with 32-bit size_t, you are not likely to actually allocate 2B strings before you'd fail. But on most 64-bit systems, you have plenty of address space (and these days, often RAM) to do so, but you still only need 2B strings, because "int" is 32 bits. So code like setup_revisions() which uses an int is fine. It is reading from a source with that much smaller "int" limit in the first place. Whether strvec can handle bigger arrays or not does not matter either way. And when it later uses ints to operate on such a strvec, it's OK in practice; even if the size_t gets truncated on the fly to an int, we know we did not put more than an int's worth of items into the array in the first place. But code which has potentially larger inputs (either because it reads iteratively into the strvec as above, or because it is itself using a larger data type) is now also OK, because it's using size_t. What's not OK is code which operates on a potentially-unbounded strvec using an int. E.g., following the code I showed above with something like: int nr = foo.nr; foo.v[nr] = xstrdup("replacement string"); which may write to memory outside of foo.v. Obviously that's nonsense nobody would ever write directly. It is possible for some code to do this inadvertently (say, passing a ptr/int pair through a function interface), but I think it's fairly unlikely in practice. So while I do think more code ought to be using size_t in general, I don't think it's necessary to audit this carefully. The important thing to me is protecting strvec itself. -Peff