git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] fail compilation with strcpy
@ 2018-07-19 20:33 Jeff King
  2018-07-19 20:39 ` [PATCH 1/2] introduce "banned function" list Jeff King
                   ` (3 more replies)
  0 siblings, 4 replies; 42+ messages in thread
From: Jeff King @ 2018-07-19 20:33 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

This is a patch series to address the discussion in the thread at:

  https://public-inbox.org/git/20180713204350.GA16999@sigill.intra.peff.net/

Basically, the question was: can we declare strcpy banned and have a
linter save us the trouble of finding it in review. The answer is yes,
the compiler is good at that. ;)

There are probably as many lists of banned functions as there are coding
style documents. I don't agree with every entry in the ones I've seen.
And in many cases coccinelle is a better choice, because the problem is
not "this function is so bad your patch should not even make it to the
list with it", but "don't do it like A; we prefer to do it like B
instead". And coccinelle does the latter more flexibly and
automatically.

So I tried to pick some obvious and uncontroversial candidates here.
gets() could be another one, but it's mostly banned already (it's out of
the standard, and most libcs mark it with a deprecated attribute).

Note that this needs to be applied on top of 022d2ac1f3 (blame: prefer
xsnprintf to strcpy for colors, 2018-07-13) or it will complain loudly. :)

  [1/2]: introduce "banned function" list
  [2/2]: banned.h: mark strncpy as banned

 Documentation/CodingGuidelines |  3 +++
 banned.h                       | 20 ++++++++++++++++++++
 git-compat-util.h              |  2 ++
 3 files changed, 25 insertions(+)
 create mode 100644 banned.h

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/2] introduce "banned function" list
  2018-07-19 20:33 [PATCH 0/2] fail compilation with strcpy Jeff King
@ 2018-07-19 20:39 ` Jeff King
  2018-07-19 21:11   ` Eric Sunshine
                     ` (4 more replies)
  2018-07-19 20:39 ` [PATCH 2/2] banned.h: mark strncpy as banned Jeff King
                   ` (2 subsequent siblings)
  3 siblings, 5 replies; 42+ messages in thread
From: Jeff King @ 2018-07-19 20:39 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

There are a few standard C functions (like strcpy) which are
easy to misuse. We generally discourage these in reviews,
but we haven't put advice in CodingGuidelines, nor provided
any automated enforcement. The latter is especially
important because it's more consistent, and it can often
save a round-trip of review.

Let's start by banning strcpy() and sprintf(). It's not
impossible to use these correctly, but it's easy to do so
incorrectly, and there's always a better option.

For enforcement, we can rely on the compiler and just
introduce code which breaks compilation when it sees these
functions. This has a few advantages:

  1. We know it's run as part of a build cycle, so it's
     hard to ignore. Whereas an external linter is an extra
     step the developer needs to remember to do.

  2. Likewise, it's basically free since the compiler is
     parsing the code anyway.

  3. We know it's robust against false positives (unlike a
     grep-based linter).

The one big disadvantage is that it will only check code
that is actually compiled, so it may miss code that isn't
triggered on your particular system. But since presumably
people don't add new code without compiling it (and if they
do, the banned function list is the least of their worries),
we really only care about failing to clean up old code when
adding new functions to the list. And that's easy enough to
address with a manual audit when adding a new function
(which is what I did for the functions here).

That leaves only the question of how to trigger the
compilation error. The goals are:

 - it should be portable; ideally this would trigger
   everywhere, and does not need to be part of a DEVELOPER=1
   setup (because unlike warnings which may depend on the
   compiler or system, this is a clear indicator of
   something wrong in the code).

 - it should generate a readable error that gives the
   developer a clue what happened

 - it should avoid generating too much other cruft that
   makes it hard to see the actual error

 - it should mention the original callsite in the error

The technique used here is to convert the banned function
into a call to a descriptive but non-existent function. The
output from gcc 7 looks like this (on a checkout without
022d2ac1f3, which removes the final strcpy from blame.c):

      CC builtin/blame.o
  In file included from ./git-compat-util.h:1242:0,
                   from ./cache.h:4,
                   from builtin/blame.c:8:
  builtin/blame.c: In function ‘cmd_blame’:
  ./banned.h:11:22: error: implicit declaration of function ‘sorry_strcpy_is_a_banned_function’ [-Werror=implicit-function-declaration]
   #define BANNED(func) sorry_##func##_is_a_banned_function()
                        ^
  ./banned.h:13:21: note: in expansion of macro ‘BANNED’
   #define strcpy(x,y) BANNED(strcpy)
                       ^~~~~~
  builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
      strcpy(repeated_meta_color, GIT_COLOR_CYAN);
      ^~~~~~

This pretty clearly shows the original callsite along with
the descriptive function name, and it mentions banned.h,
which contains more information.

What's not perfectly ideal is:

  1. We'll only trigger with -Wimplicit-function-declaration
     (and only stop compilation with -Werror). These are
     generally enabled by DEVELOPER=1. If you _don't_ have
     that set, we'll still catch the problem, but only at
     link-time, with a slightly less useful message:

       /usr/bin/ld: builtin/blame.o: in function `cmd_blame':
       /home/peff/tmp/builtin/blame.c:1074: undefined reference to `sorry_strcpy_is_a_banned_function'
       collect2: error: ld returned 1 exit status

     If instead we convert this to a reference to an
     undefined variable, that always dies immediately. But
     gcc seems to print the set of errors twice, which
     clutters things up.

  2. The expansion through BANNED() adds an extra layer of
     error. Curiously, though, removing it (and just
     expanding strcpy directly to the bogus function name)
     causes gcc _not_ to report the original line of code.

So experimentally, this seems to behave pretty well on gcc.
Clang looks OK, too, though:

 - it actually produces a better message for the undeclared
   identifier than for the implicit function (it doesn't
   double the errors, and for the implicit function it
   actually produces an extra complaint about
   strict-prototypes).

 - the BANNED indirection has no effect (it shows the
   original line of code either way)

So if we want to optimize for clang's output, we could
switch both of those decisions.

Signed-off-by: Jeff King <peff@peff.net>
---
So you might guess that I experimented mostly with gcc
(which is what I normally use), and then just double-checked
clang at the end. After seeing those results, I actually
think clang does a better job of producing good error
messages overall, and perhaps we should not cater to gcc. I
dunno. It probably doesn't matter that much either way.

I'd also be happy if somebody showed an alternative that
is even cleaner. Sadly I don't think you can trigger an
#error from the _expansion_ of a macro.

 Documentation/CodingGuidelines |  3 +++
 banned.h                       | 19 +++++++++++++++++++
 git-compat-util.h              |  2 ++
 3 files changed, 24 insertions(+)
 create mode 100644 banned.h

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 48aa4edfbd..232f17becd 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -391,6 +391,9 @@ For C programs:
    must be declared with "extern" in header files. However, function
    declarations should not use "extern", as that is already the default.
 
+ - A few easy-to-misuse functions like `strcpy` are banned; see
+   `banned.h` for the complete list.
+
 For Perl programs:
 
  - Most of the C guidelines above apply.
diff --git a/banned.h b/banned.h
new file mode 100644
index 0000000000..fe81020e0f
--- /dev/null
+++ b/banned.h
@@ -0,0 +1,19 @@
+#ifndef BANNED_H
+#define BANNED_H
+
+/*
+ * This header lists functions that have been banned from our code base,
+ * because they're too easy to misuse (and even if used correctly,
+ * complicate audits). Including this header turns them into compile-time
+ * errors.
+ */
+
+#define BANNED(func) sorry_##func##_is_a_banned_function()
+
+#define strcpy(x,y) BANNED(strcpy)
+
+#ifdef HAVE_VARIADIC_MACROS
+#define sprintf(...) BANNED(sprintf)
+#endif
+
+#endif /* BANNED_H */
diff --git a/git-compat-util.h b/git-compat-util.h
index 9a64998b24..51bece30ac 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1239,4 +1239,6 @@ extern void unleak_memory(const void *ptr, size_t len);
 #define UNLEAK(var) do {} while (0)
 #endif
 
+#include "banned.h"
+
 #endif
-- 
2.18.0.540.g6c38643a7b


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/2] banned.h: mark strncpy as banned
  2018-07-19 20:33 [PATCH 0/2] fail compilation with strcpy Jeff King
  2018-07-19 20:39 ` [PATCH 1/2] introduce "banned function" list Jeff King
@ 2018-07-19 20:39 ` Jeff King
  2018-07-19 21:12   ` Ævar Arnfjörð Bjarmason
  2018-07-20 18:58 ` [PATCH 0/2] fail compilation with strcpy Junio C Hamano
  2018-07-24  9:23 ` [PATCH v2 0/4] " Jeff King
  3 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2018-07-19 20:39 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

The strncpy() function is less horrible than strcpy(). But
it's still pretty easy to misuse because of its funny
termination semantics. And we already have a ready-made
alternative in strlcpy. So let's ban it, to make sure uses
don't creep in.

Note that there is one instance of strncpy in
compat/regex/regcomp.c. But this doesn't trigger the
ban-list even when compiling with NO_REGEX=1, because we
don't use git-compat-util.h when compiling it (instead we
rely on the system includes from the upstream library).

Since this use of strncpy was verified by manual inspection
and since it doesn't trigger the automated ban-list, we're
better off leaving it to keep our divergence from glibc
minimal.

Signed-off-by: Jeff King <peff@peff.net>
---
 banned.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/banned.h b/banned.h
index fe81020e0f..ae6aaaa4a9 100644
--- a/banned.h
+++ b/banned.h
@@ -11,6 +11,7 @@
 #define BANNED(func) sorry_##func##_is_a_banned_function()
 
 #define strcpy(x,y) BANNED(strcpy)
+#define strncpy(x,y,n) BANNED(strncpy)
 
 #ifdef HAVE_VARIADIC_MACROS
 #define sprintf(...) BANNED(sprintf)
-- 
2.18.0.540.g6c38643a7b

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-19 20:39 ` [PATCH 1/2] introduce "banned function" list Jeff King
@ 2018-07-19 21:11   ` Eric Sunshine
  2018-07-19 21:27     ` Jeff King
  2018-07-19 21:15   ` Stefan Beller
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: Eric Sunshine @ 2018-07-19 21:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Stefan Beller

On Thu, Jul 19, 2018 at 4:39 PM Jeff King <peff@peff.net> wrote:
> [...]
> Let's start by banning strcpy() and sprintf(). It's not
> impossible to use these correctly, but it's easy to do so
> incorrectly, and there's always a better option.
> [...]
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/banned.h b/banned.h
> @@ -0,0 +1,19 @@
> +/*
> + * This header lists functions that have been banned from our code base,
> + * because they're too easy to misuse (and even if used correctly,
> + * complicate audits). Including this header turns them into compile-time
> + * errors.
> + */

When the above talks about "including this header", the implication is
that it must be included _after_ the system header(s) which declare
the banned functions. I wonder if that requirement should be stated
here explicitly.

(Probably not worth a re-roll.)

> +#define BANNED(func) sorry_##func##_is_a_banned_function()
> +
> +#define strcpy(x,y) BANNED(strcpy)
> diff --git a/git-compat-util.h b/git-compat-util.h
> @@ -1239,4 +1239,6 @@ extern void unleak_memory(const void *ptr, size_t len);
>  #define UNLEAK(var) do {} while (0)
>  #endif
>
> +#include "banned.h"

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] banned.h: mark strncpy as banned
  2018-07-19 20:39 ` [PATCH 2/2] banned.h: mark strncpy as banned Jeff King
@ 2018-07-19 21:12   ` Ævar Arnfjörð Bjarmason
  2018-07-19 21:33     ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-07-19 21:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Stefan Beller


On Thu, Jul 19 2018, Jeff King wrote:

> Since this use of strncpy was verified by manual inspection
> and since it doesn't trigger the automated ban-list, we're
> better off leaving it to keep our divergence from glibc
> minimal.

FWIW it's s/glibc/gawk/. It's originally from glibc, but gawk
perma-forked it long ago, and an ancient copy of thath is the one we
use.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-19 20:39 ` [PATCH 1/2] introduce "banned function" list Jeff King
  2018-07-19 21:11   ` Eric Sunshine
@ 2018-07-19 21:15   ` Stefan Beller
  2018-07-19 21:32     ` Jeff King
  2018-07-19 22:46   ` Junio C Hamano
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 42+ messages in thread
From: Stefan Beller @ 2018-07-19 21:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Thu, Jul 19, 2018 at 1:39 PM Jeff King <peff@peff.net> wrote:
>
> There are a few standard C functions (like strcpy) which are
> easy to misuse. We generally discourage these in reviews,
> but we haven't put advice in CodingGuidelines, nor provided
> any automated enforcement. The latter is especially
> important because it's more consistent, and it can often
> save a round-trip of review.

Thanks for writing these patches. I would think this is a
good idea to have as I certainly would use banned functions
if not told by the compiler not to.

>  Documentation/CodingGuidelines |  3 +++

I'd prefer to not add more text to our documentation
(It is already long and in case you run into this problem
the error message is clear enough IMHO)


> @@ -0,0 +1,19 @@
> +#ifndef BANNED_H
> +#define BANNED_H
> +
> +/*
> + * This header lists functions that have been banned from our code base,
> + * because they're too easy to misuse (and even if used correctly,
> + * complicate audits). Including this header turns them into compile-time
> + * errors.
> + */
> +
> +#define BANNED(func) sorry_##func##_is_a_banned_function()
> +
> +#define strcpy(x,y) BANNED(strcpy)
> +
> +#ifdef HAVE_VARIADIC_MACROS

In a split second I thought you forgot sprintf that was
mentioned in the commit message, but then I kept on reading
just to find it here. I wonder if we want put this #ifdef at the
beginning of the file (after the guard) as then we can have
a uncluttered list of banned functions here. The downside is that
the use of strcpy would not be banned in case you have no
variadic macros, but we'd still catch it quickly as most people
have them. Undecided.

Regarding the introduction of the functions to this list,
I would imagine people would find the commit that introduced
a function to the ban list to look for a reason why.
Can we include a link[1] to explain why we discourage
strcpy and sprintf in this commit?


[1] https://www.thegeekstuff.com/2013/06/buffer-overflow/?utm_source=feedly
  This is the best I found. So not sure if it worth it.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-19 21:11   ` Eric Sunshine
@ 2018-07-19 21:27     ` Jeff King
  2018-07-19 21:59       ` Eric Sunshine
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2018-07-19 21:27 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Stefan Beller

On Thu, Jul 19, 2018 at 05:11:15PM -0400, Eric Sunshine wrote:

> On Thu, Jul 19, 2018 at 4:39 PM Jeff King <peff@peff.net> wrote:
> > [...]
> > Let's start by banning strcpy() and sprintf(). It's not
> > impossible to use these correctly, but it's easy to do so
> > incorrectly, and there's always a better option.
> > [...]
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > diff --git a/banned.h b/banned.h
> > @@ -0,0 +1,19 @@
> > +/*
> > + * This header lists functions that have been banned from our code base,
> > + * because they're too easy to misuse (and even if used correctly,
> > + * complicate audits). Including this header turns them into compile-time
> > + * errors.
> > + */
> 
> When the above talks about "including this header", the implication is
> that it must be included _after_ the system header(s) which declare
> the banned functions. I wonder if that requirement should be stated
> here explicitly.

Hmm, does it need to be? I had originally intended it to be included
before, actually, though in the end I put it later.

I guess it would yield declarations like strcpy_is_banned(), which would
cause _different_ errors (probably link-time ones).

> (Probably not worth a re-roll.)

Yeah, I doubt it matters much either way, since the inclusion is done
automatically in git-compat-util.h.

I had also originally imagined this to be triggered via DEVELOPER=1,
with something like "-include banned.h" in CFLAGS. But I think it
probably is appropriate for everybody to run it, since it shouldn't
cause any false positives or other compilation issues.

The one I brainstormed (but forgot to mention) is that it might be
possible for a platform to have strcpy as a macro already? In which case
we'd need to #undef it or risk a compilation error (even if the macro
isn't actually used).

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-19 21:15   ` Stefan Beller
@ 2018-07-19 21:32     ` Jeff King
  2018-07-19 21:47       ` Stefan Beller
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2018-07-19 21:32 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On Thu, Jul 19, 2018 at 02:15:54PM -0700, Stefan Beller wrote:

> >  Documentation/CodingGuidelines |  3 +++
> 
> I'd prefer to not add more text to our documentation
> (It is already long and in case you run into this problem
> the error message is clear enough IMHO)

I'm fine with that too. I just wondered if somebody would complain in
the opposite way: your patch enforces this, but we never made it an
"official" guideline.

But that may be overly paranoid.  Once upon a time there was some rules
lawyering around CodingGuidelines, but I think that was successfully
shut down and hasn't reared its head for several years.

> > +#define strcpy(x,y) BANNED(strcpy)
> > +
> > +#ifdef HAVE_VARIADIC_MACROS
> 
> In a split second I thought you forgot sprintf that was
> mentioned in the commit message, but then I kept on reading
> just to find it here. I wonder if we want put this #ifdef at the
> beginning of the file (after the guard) as then we can have
> a uncluttered list of banned functions here. The downside is that
> the use of strcpy would not be banned in case you have no
> variadic macros, but we'd still catch it quickly as most people
> have them. Undecided.

Right, exactly. We should catch what we can on lesser platforms, and
everything on modern ones. My hope is that people would not generally
have to touch this file. I don't think we'll be adding banned functions
often.

> Regarding the introduction of the functions to this list,
> I would imagine people would find the commit that introduced
> a function to the ban list to look for a reason why.
> Can we include a link[1] to explain why we discourage
> strcpy and sprintf in this commit?

I hoped that it was mostly common knowledge, but that's probably not a
good assumption. I agree if there's a good reference, we should link to
it.

> [1] https://www.thegeekstuff.com/2013/06/buffer-overflow/?utm_source=feedly
>   This is the best I found. So not sure if it worth it.

Yeah, this one is so-so, because it goes into a lot more detail. I think
we can assume that people know that overflowing a buffer is bad. Maybe
just a short paragraph like:

  We'll include strcpy() and sprintf() in the initial list of banned
  functions. While these _can_ be used carefully by surrounding them
  with extra code, there's no inherent check that the size of the
  destination buffer is big enough, it's very easy to overflow the
  buffer.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/2] banned.h: mark strncpy as banned
  2018-07-19 21:12   ` Ævar Arnfjörð Bjarmason
@ 2018-07-19 21:33     ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2018-07-19 21:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Stefan Beller

On Thu, Jul 19, 2018 at 11:12:49PM +0200, Ævar Arnfjörð Bjarmason wrote:

> 
> On Thu, Jul 19 2018, Jeff King wrote:
> 
> > Since this use of strncpy was verified by manual inspection
> > and since it doesn't trigger the automated ban-list, we're
> > better off leaving it to keep our divergence from glibc
> > minimal.
> 
> FWIW it's s/glibc/gawk/. It's originally from glibc, but gawk
> perma-forked it long ago, and an ancient copy of thath is the one we
> use.

Thanks, I didn't know that. Not materially different, but it's worth
correcting the commit message.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-19 21:32     ` Jeff King
@ 2018-07-19 21:47       ` Stefan Beller
  2018-07-20  0:54         ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Stefan Beller @ 2018-07-19 21:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Thu, Jul 19, 2018 at 2:32 PM Jeff King <peff@peff.net> wrote:
>
> On Thu, Jul 19, 2018 at 02:15:54PM -0700, Stefan Beller wrote:
>
> > >  Documentation/CodingGuidelines |  3 +++
> >
> > I'd prefer to not add more text to our documentation
> > (It is already long and in case you run into this problem
> > the error message is clear enough IMHO)
>
> I'm fine with that too. I just wondered if somebody would complain in
> the opposite way: your patch enforces this, but we never made it an
> "official" guideline.
>
> But that may be overly paranoid.  Once upon a time there was some rules
> lawyering around CodingGuidelines, but I think that was successfully
> shut down and hasn't reared its head for several years.

Heh; My preference would be to keep docs as short and concise as
possible (and lawyering sounds like "verbosity is a feature" :-P) but
still giving advice on controversial (i.e. not enforced by a machine in
a deterministic fashion) things such as having braces around one
statement for example or leaving them out.

> > Regarding the introduction of the functions to this list,
> > I would imagine people would find the commit that introduced
> > a function to the ban list to look for a reason why.
> > Can we include a link[1] to explain why we discourage
> > strcpy and sprintf in this commit?
>
> I hoped that it was mostly common knowledge, but that's probably not a
> good assumption. I agree if there's a good reference, we should link to
> it.

In this case I am just an advocate for a better history. Countless
times I wanted
to know *why* something is the way it is and mailing list and history are not
very good at that, as my specific question was not addressed there.

Either it was obvious to all people involved at the time or not written down
why that solution (which -- in hindsight -- sucks), was superior to the other
solution (which may or may not have been known at the time).

So maybe I would have rather asked, why we start out with these two
functions. And you seem to call them "obviously bad", and you take both
of them because they need to be handled differently due to the variadic macros.
(And those two are "obviously worse" than strcat, as they are used more often.
But strcat, being on MS ban list[1], would do just as fine)

(I agree that) Any other function can be added on top with its own
commit message on *why* that function. Over time I would expect that the
reason is that we got bitten by it, or some other project got famously bitten
by it or that some smart people came up with a list[1]. The exception is
this one, which basically says: "Here is the mechanism how to do it and
$X is the obvious thing to put in first", which I agree with the mechanism
as well as that $X seems bad.

[1] https://msdn.microsoft.com/en-us/library/bb288454.aspx

Now that I am deep down the rathole of discussing a tangent, I just found
[2], when searching for "how much common knowledge is wrong", do you
know if there is such a list for (CS) security related things?

[2] http://www.marcandangel.com/2008/06/12/60-popular-pieces-of-false-knowledge/

>
> > [1] https://www.thegeekstuff.com/2013/06/buffer-overflow/?utm_source=feedly
> >   This is the best I found. So not sure if it worth it.
>
> Yeah, this one is so-so, because it goes into a lot more detail. I think
> we can assume that people know that overflowing a buffer is bad. Maybe
> just a short paragraph like:
>
>   We'll include strcpy() and sprintf() in the initial list of banned
>   functions. While these _can_ be used carefully by surrounding them
>   with extra code, there's no inherent check that the size of the
>   destination buffer is big enough, it's very easy to overflow the
>   buffer.

Sounds good to me, maybe even add "We've had problems with that
in the past, see 'git log -S strcpy'", but that may be too much again.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-19 21:27     ` Jeff King
@ 2018-07-19 21:59       ` Eric Sunshine
  2018-07-20  0:55         ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Eric Sunshine @ 2018-07-19 21:59 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Stefan Beller

On Thu, Jul 19, 2018 at 5:27 PM Jeff King <peff@peff.net> wrote:
> On Thu, Jul 19, 2018 at 05:11:15PM -0400, Eric Sunshine wrote:
> > On Thu, Jul 19, 2018 at 4:39 PM Jeff King <peff@peff.net> wrote:
> > > + * This header lists functions that have been banned from our code base,
> > > + * because they're too easy to misuse (and even if used correctly,
> > > + * complicate audits). Including this header turns them into compile-time
> > > + * errors.
> >
> > When the above talks about "including this header", the implication is
> > that it must be included _after_ the system header(s) which declare
> > the banned functions. I wonder if that requirement should be stated
> > here explicitly.
>
> Hmm, does it need to be? I had originally intended it to be included
> before, actually, though in the end I put it later.
> I guess it would yield declarations like strcpy_is_banned(), which would
> cause _different_ errors (probably link-time ones).

Yes, that's what I meant. You'd only get link-time errors if banned.h
was included before the system headers (assuming I'm thinking about
this correctly).

> > (Probably not worth a re-roll.)
>
> Yeah, I doubt it matters much either way, since the inclusion is done
> automatically in git-compat-util.h.

Exactly.

> I had also originally imagined this to be triggered via DEVELOPER=1,
> with something like "-include banned.h" in CFLAGS. But I think it
> probably is appropriate for everybody to run it, since it shouldn't
> cause any false positives or other compilation issues.

Agreed.

> The one I brainstormed (but forgot to mention) is that it might be
> possible for a platform to have strcpy as a macro already? In which case
> we'd need to #undef it or risk a compilation error (even if the macro
> isn't actually used).

I have some recollection (perhaps long outdated or just wrong) of
Microsoft headers spewing deprecation warnings about "unsafe"
functions. I don't know whether they did that by covering functions
with macros or by decorating the function with a deprecation attribute
or by some other mechanism, but such concern seems well-founded.
#undef'ing them might indeed be a very good preventative tactic.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-19 20:39 ` [PATCH 1/2] introduce "banned function" list Jeff King
  2018-07-19 21:11   ` Eric Sunshine
  2018-07-19 21:15   ` Stefan Beller
@ 2018-07-19 22:46   ` Junio C Hamano
  2018-07-19 23:55     ` Randall S. Becker
  2018-07-20  1:08     ` Jeff King
  2018-07-20 12:42   ` Derrick Stolee
  2018-07-20 14:41   ` Duy Nguyen
  4 siblings, 2 replies; 42+ messages in thread
From: Junio C Hamano @ 2018-07-19 22:46 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Stefan Beller

Jeff King <peff@peff.net> writes:

> For enforcement, we can rely on the compiler and just
> introduce code which breaks compilation when it sees these
> functions. This has a few advantages:
>
>   1. We know it's run as part of a build cycle, so it's
>      hard to ignore. Whereas an external linter is an extra
>      step the developer needs to remember to do.
>
>   2. Likewise, it's basically free since the compiler is
>      parsing the code anyway.
>
>   3. We know it's robust against false positives (unlike a
>      grep-based linter).
>
> The one big disadvantage is that it will only check code
> that is actually compiled, so it may miss code that isn't
> triggered on your particular system. But since presumably
> people don't add new code without compiling it (and if they
> do, the banned function list is the least of their worries),
> we really only care about failing to clean up old code when
> adding new functions to the list. And that's easy enough to
> address with a manual audit when adding a new function
> (which is what I did for the functions here).
>
> That leaves only the question of how to trigger the
> compilation error. The goals are:

I actually have another question, though.

Is it a downside that it is cumbersome to override?  This is not a
rhetorical question.  I am not convinced there will not be a legit
circumstance that calling strcpy (or whatever we are going to ban)
is the best solution and it is safe.  By "best", what I mean is "you
could instead use memcpy/strncpy/whatever" can legitimately be
argued with "but still using memcpy/strncpy/whatever is inferior
than using strcpy in this case for such and such reasons".


^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH 1/2] introduce "banned function" list
  2018-07-19 22:46   ` Junio C Hamano
@ 2018-07-19 23:55     ` Randall S. Becker
  2018-07-20  1:08     ` Jeff King
  1 sibling, 0 replies; 42+ messages in thread
From: Randall S. Becker @ 2018-07-19 23:55 UTC (permalink / raw)
  To: 'Junio C Hamano', 'Jeff King'
  Cc: git, 'Stefan Beller'

On July 19, 2018 6:46 PM, Junio wrote:
> Jeff King <peff@peff.net> writes:
> 
> > For enforcement, we can rely on the compiler and just introduce code
> > which breaks compilation when it sees these functions. This has a few
> > advantages:
> >
> >   1. We know it's run as part of a build cycle, so it's
> >      hard to ignore. Whereas an external linter is an extra
> >      step the developer needs to remember to do.
> >
> >   2. Likewise, it's basically free since the compiler is
> >      parsing the code anyway.
> >
> >   3. We know it's robust against false positives (unlike a
> >      grep-based linter).
> >
> > The one big disadvantage is that it will only check code that is
> > actually compiled, so it may miss code that isn't triggered on your
> > particular system. But since presumably people don't add new code
> > without compiling it (and if they do, the banned function list is the
> > least of their worries), we really only care about failing to clean up
> > old code when adding new functions to the list. And that's easy enough
> > to address with a manual audit when adding a new function (which is
> > what I did for the functions here).
> >
> > That leaves only the question of how to trigger the compilation error.
> > The goals are:
> 
> I actually have another question, though.
> 
> Is it a downside that it is cumbersome to override?  This is not a
rhetorical
> question.  I am not convinced there will not be a legit circumstance that
> calling strcpy (or whatever we are going to ban) is the best solution and
it is
> safe.  By "best", what I mean is "you could instead use
> memcpy/strncpy/whatever" can legitimately be argued with "but still using
> memcpy/strncpy/whatever is inferior than using strcpy in this case for
such
> and such reasons".

Putting on my old-guy compiler hat, this sounds like a more complex activity
that something akin to lint might be useful at handling. Having a
post-processor that searches for offending functions but also supports
annotations explaining exceptions (why you really had to use strncpy because
the NULL was hiding in a bad place and you promise to fix it), might be
useful. Personally, I'd rather know that that code compiles first and then
violates rules that I can fix following basic prototyping than getting
yelled at up front - but that's just me. I can't suggest a good thing to
handle this, short of augmenting lint, and if we were in java, annotations
would be the way to go, but this seems like a problem that other products
have solved.

Cheers,
Randall

-- Brief whoami:
  NonStop developer since approximately NonStop(211288444200000000)
  UNIX developer since approximately 421664400
-- In my real life, I talk too much.




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-19 21:47       ` Stefan Beller
@ 2018-07-20  0:54         ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2018-07-20  0:54 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

On Thu, Jul 19, 2018 at 02:47:35PM -0700, Stefan Beller wrote:

> > But that may be overly paranoid.  Once upon a time there was some rules
> > lawyering around CodingGuidelines, but I think that was successfully
> > shut down and hasn't reared its head for several years.
> 
> Heh; My preference would be to keep docs as short and concise as
> possible (and lawyering sounds like "verbosity is a feature" :-P) but
> still giving advice on controversial (i.e. not enforced by a machine in
> a deterministic fashion) things such as having braces around one
> statement for example or leaving them out.

I think we literally had somebody say "I don't have to abide by this in
a review because it wasn't in CodingGuidelines." But then, that is
perhaps indicative of other problems.

> So maybe I would have rather asked, why we start out with these two
> functions. And you seem to call them "obviously bad", and you take both
> of them because they need to be handled differently due to the variadic macros.
> (And those two are "obviously worse" than strcat, as they are used more often.
> But strcat, being on MS ban list[1], would do just as fine)

Ooh, strcat is another one that should be added.

I actually thought about splitting it into three commits (introduce
mechanism, then one per function), but it felt like stringing it out.
You are probably right, though, that each function deserves its own
explanation. And the commit message is already quite long.

> >   We'll include strcpy() and sprintf() in the initial list of banned
> >   functions. While these _can_ be used carefully by surrounding them
> >   with extra code, there's no inherent check that the size of the
> >   destination buffer is big enough, it's very easy to overflow the
> >   buffer.
> 
> Sounds good to me, maybe even add "We've had problems with that
> in the past, see 'git log -S strcpy'", but that may be too much again.

Actually, that's a good point. We've had actual bugs due strcpy(). I can
similarly point to bad uses of strcat().

I think I have sufficient fodder for a re-roll along these lines
(assuming we like the idea at all; Junio seemed to have some
reservations, but I'll reply there separately).

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-19 21:59       ` Eric Sunshine
@ 2018-07-20  0:55         ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2018-07-20  0:55 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Stefan Beller

On Thu, Jul 19, 2018 at 05:59:47PM -0400, Eric Sunshine wrote:

> > The one I brainstormed (but forgot to mention) is that it might be
> > possible for a platform to have strcpy as a macro already? In which case
> > we'd need to #undef it or risk a compilation error (even if the macro
> > isn't actually used).
> 
> I have some recollection (perhaps long outdated or just wrong) of
> Microsoft headers spewing deprecation warnings about "unsafe"
> functions. I don't know whether they did that by covering functions
> with macros or by decorating the function with a deprecation attribute
> or by some other mechanism, but such concern seems well-founded.
> #undef'ing them might indeed be a very good preventative tactic.

Yeah, these functions are definitely on their "SDL banned list". I don't
know how they implement that. At that level, I'd really expect it to be
done with a deprecated attribute next to the declaration (I also
considered trying to add deprecation attributes, too, but I think it's
hard to do without re-declaring the function, and anyway it's "just" a
warning).

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-19 22:46   ` Junio C Hamano
  2018-07-19 23:55     ` Randall S. Becker
@ 2018-07-20  1:08     ` Jeff King
  2018-07-20  1:12       ` Jeff King
                         ` (2 more replies)
  1 sibling, 3 replies; 42+ messages in thread
From: Jeff King @ 2018-07-20  1:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Stefan Beller

On Thu, Jul 19, 2018 at 03:46:08PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > For enforcement, we can rely on the compiler and just
> > introduce code which breaks compilation when it sees these
> > functions. This has a few advantages:
> >
> >   1. We know it's run as part of a build cycle, so it's
> >      hard to ignore. Whereas an external linter is an extra
> >      step the developer needs to remember to do.
> >
> >   2. Likewise, it's basically free since the compiler is
> >      parsing the code anyway.
> >
> >   3. We know it's robust against false positives (unlike a
> >      grep-based linter).
> >
> > The one big disadvantage is that it will only check code
> > that is actually compiled, so it may miss code that isn't
> > triggered on your particular system. But since presumably
> > people don't add new code without compiling it (and if they
> > do, the banned function list is the least of their worries),
> > we really only care about failing to clean up old code when
> > adding new functions to the list. And that's easy enough to
> > address with a manual audit when adding a new function
> > (which is what I did for the functions here).
> >
> > That leaves only the question of how to trigger the
> > compilation error. The goals are:
> 
> I actually have another question, though.
> 
> Is it a downside that it is cumbersome to override?  This is not a
> rhetorical question.  I am not convinced there will not be a legit
> circumstance that calling strcpy (or whatever we are going to ban)
> is the best solution and it is safe.  By "best", what I mean is "you
> could instead use memcpy/strncpy/whatever" can legitimately be
> argued with "but still using memcpy/strncpy/whatever is inferior
> than using strcpy in this case for such and such reasons".

In my opinion, no, this is not a problem.

My plan is to only add functions which are truly worthless. So with
strcpy(), for example, you _can_ make sure the buffer is big enough
before calling strcpy. But to do so, you by definition must have just
called strlen(), at which point you may as well use memcpy(). It's more
obviously correct, and it's more efficient.

Ditto for sprintf, where you should _always_ be using at least xsnprintf
(or some better tool, depending on the situation).  And for strncpy,
strlcpy (or again, some better tool) is strictly an improvement.  If
these were our functions, I would literally delete them from our
codebase. This is the moral equivalent. ;)

Contrast this with memcpy(). This is on Microsoft's SDL banned list[1],
but I think it's silly for it to be. I would never add it to this list.

Likewise snprintf(). That is a function that _can_ be dangerous due to
unexpected truncation, and I think we should avoid it in general. But I
would not ban it, as it is the correct tool in a few situations (you
have a fixed buffer and returning a truncation error code is fine --
many of our syscall wrappers are in this exact situation). So I would
probably not add it to the ban list (but I feel less strongly than I do
about memcpy).

The nuclear option for overriding is "#undef that-function" in a
particular instance. That covers the rest of the translation unit, but I
think that's probably fine. If we have a function which is mostly
questionable, but we need it sometimes, then we ought to be putting a
sane wrapper around it. And _that_ wrapper can sit in its own file and
#undef as it pleases, keeping the unsafety contained within.

Don't get me wrong, if you have another suggestion that allows easier
one-off overrides, I'm happy to hear it. But I think building this into
the compilation step and having it on-by-default is a huge advantage in
simplicity and having people remember to run it. And I couldn't come up
with a better way to do that.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-20  1:08     ` Jeff King
@ 2018-07-20  1:12       ` Jeff King
  2018-07-20  9:32       ` Junio C Hamano
  2018-07-20 13:22       ` Theodore Y. Ts'o
  2 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2018-07-20  1:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Stefan Beller

On Thu, Jul 19, 2018 at 09:08:08PM -0400, Jeff King wrote:

> Contrast this with memcpy(). This is on Microsoft's SDL banned list[1],
> but I think it's silly for it to be. I would never add it to this list.

I forgot my footnote, which was going to be:

  I'm bringing up that list not because I think it's necessarily a good
  list, but because it's _a_ list. And as I was recently subjected to an
  audit that referenced it, I've been thinking a lot about ban-lists and
  whether they are useful (and specifically for which functions).

  It's at https://msdn.microsoft.com/en-us/library/bb288454.aspx if
  you're curious, but again, that is absolutely not the ban-list I am
  working towards. To what I posted already, I'd probably add strcat()
  and vsprintf() based on discussions here, and then call it done.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-20  1:08     ` Jeff King
  2018-07-20  1:12       ` Jeff King
@ 2018-07-20  9:32       ` Junio C Hamano
  2018-07-20 17:45         ` Jeff King
  2018-07-20 13:22       ` Theodore Y. Ts'o
  2 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2018-07-20  9:32 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Stefan Beller

Jeff King <peff@peff.net> writes:

>> Is it a downside that it is cumbersome to override?  This is not a
>> rhetorical question.  I am not convinced there will not be a legit
>> circumstance that calling strcpy (or whatever we are going to ban)
>> is the best solution and it is safe.  By "best", what I mean is "you
>> could instead use memcpy/strncpy/whatever" can legitimately be
>> argued with "but still using memcpy/strncpy/whatever is inferior
>> than using strcpy in this case for such and such reasons".
>
> In my opinion, no, this is not a problem.
>
> My plan is to only add functions which are truly worthless.

OK.

> Contrast this with memcpy(). This is on Microsoft's SDL banned list[1],
> but I think it's silly for it to be. I would never add it to this list.

A tangent, but is that because they want you to use memmove()
instead so that you do not have to worry about overlapping copies,
perhaps?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-19 20:39 ` [PATCH 1/2] introduce "banned function" list Jeff King
                     ` (2 preceding siblings ...)
  2018-07-19 22:46   ` Junio C Hamano
@ 2018-07-20 12:42   ` Derrick Stolee
  2018-07-20 14:41   ` Duy Nguyen
  4 siblings, 0 replies; 42+ messages in thread
From: Derrick Stolee @ 2018-07-20 12:42 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Stefan Beller

On 7/19/2018 4:39 PM, Jeff King wrote:
> There are a few standard C functions (like strcpy) which are
> easy to misuse. We generally discourage these in reviews,
> but we haven't put advice in CodingGuidelines, nor provided
> any automated enforcement. The latter is especially
> important because it's more consistent, and it can often
> save a round-trip of review.
>
> Let's start by banning strcpy() and sprintf(). It's not
> impossible to use these correctly, but it's easy to do so
> incorrectly, and there's always a better option.
>
> For enforcement, we can rely on the compiler and just
> introduce code which breaks compilation when it sees these
> functions. This has a few advantages:
>
>    1. We know it's run as part of a build cycle, so it's
>       hard to ignore. Whereas an external linter is an extra
>       step the developer needs to remember to do.
>
>    2. Likewise, it's basically free since the compiler is
>       parsing the code anyway.
>
>    3. We know it's robust against false positives (unlike a
>       grep-based linter).

I know I'm late to the party, but I just wanted to chime in that I think 
this is a great idea. The more checks we have as part of the developer 
cycle means we have fewer checks that need to be caught manually in 
review (or can be missed in review).

Thanks,

-Stolee


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-20  1:08     ` Jeff King
  2018-07-20  1:12       ` Jeff King
  2018-07-20  9:32       ` Junio C Hamano
@ 2018-07-20 13:22       ` Theodore Y. Ts'o
  2018-07-20 17:56         ` Jeff King
  2 siblings, 1 reply; 42+ messages in thread
From: Theodore Y. Ts'o @ 2018-07-20 13:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Stefan Beller

On Thu, Jul 19, 2018 at 09:08:08PM -0400, Jeff King wrote:
> Ditto for sprintf, where you should _always_ be using at least xsnprintf
> (or some better tool, depending on the situation).  And for strncpy,
> strlcpy (or again, some better tool) is strictly an improvement.

Nitpick: this may be true for git, but it's not strictly true in all
cases.  I actually have a (non-git) case where strncpy *is* the right
tool.  And this is one where I'm copying a NUL-terminated string into
a fixed-length charater array (in the ext4 superblock) which is *not*
NUL-terminated.  In that case, strncpy works(); but strlcpy() does not
do what I want.

So I used strncpy() advisedly, and I ignore people running Coccinelle
scripts and blindly sending patches to "fix" ext4.

But perhaps that's also a solution for git?  You don't have to
necessarily put them on a banned list; you could instead have some
handy, pre-set scripts that scan the entire code base looking for
"bad" functions with cleanups automatically suggested.  This could be
run at patch review time, and/or periodically (especially before a
release).

						- Ted

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-19 20:39 ` [PATCH 1/2] introduce "banned function" list Jeff King
                     ` (3 preceding siblings ...)
  2018-07-20 12:42   ` Derrick Stolee
@ 2018-07-20 14:41   ` Duy Nguyen
  2018-07-20 17:48     ` Elijah Newren
  2018-07-20 18:00     ` Jeff King
  4 siblings, 2 replies; 42+ messages in thread
From: Duy Nguyen @ 2018-07-20 14:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, Stefan Beller

On Thu, Jul 19, 2018 at 10:40 PM Jeff King <peff@peff.net> wrote:
>
> There are a few standard C functions (like strcpy) which are
> easy to misuse. We generally discourage these in reviews,
> but we haven't put advice in CodingGuidelines, nor provided
> any automated enforcement. The latter is especially
> important because it's more consistent, and it can often
> save a round-trip of review.
>
> Let's start by banning strcpy() and sprintf(). It's not
> impossible to use these correctly, but it's easy to do so
> incorrectly, and there's always a better option.

Is it possible to extend this to ban variables as well? I'm still
going to delete the_index from library code. Once that work is done I
will ban it from entering again (it's only allowed in builtin/ for
example). The next target after that would be the_repository.

Right now I will do this by not declaring the variable [2], which ends
up with a much less friendly compile warning. I did something similar
[1] in an earlier iteration but did not do extensive research on this
topic like you did.

[1] https://public-inbox.org/git/20180606073933.14755-1-pclouds@gmail.com/T/
[2] https://public-inbox.org/git/20180616054157.32433-16-pclouds@gmail.com/
-- 
Duy

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-20  9:32       ` Junio C Hamano
@ 2018-07-20 17:45         ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2018-07-20 17:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Stefan Beller

On Fri, Jul 20, 2018 at 02:32:29AM -0700, Junio C Hamano wrote:

> > Contrast this with memcpy(). This is on Microsoft's SDL banned list[1],
> > but I think it's silly for it to be. I would never add it to this list.
> 
> A tangent, but is that because they want you to use memmove()
> instead so that you do not have to worry about overlapping copies,
> perhaps?

That was my first thought, too, but nope. They recommend memcpy_s()
instead. Which in my opinion adds very little value, while missing the
most common misuse of memcpy I've seen in practice (the overlapping
thing).

Helpers like our COPY_ARRAY() are much more useful for preventing sizing
mistakes, IMHO. But again, I'd never ban memcpy. The right tool for
encouraging COPY_ARRAY() is coccinelle (because the matching is
complicated, but also because we can mechanically turn it into the right
thing, whereas a strcpy is going to require some manual reworking).

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-20 14:41   ` Duy Nguyen
@ 2018-07-20 17:48     ` Elijah Newren
  2018-07-20 18:04       ` Jeff King
  2018-07-20 18:00     ` Jeff King
  1 sibling, 1 reply; 42+ messages in thread
From: Elijah Newren @ 2018-07-20 17:48 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Jeff King, Git Mailing List, Stefan Beller

On Fri, Jul 20, 2018 at 7:41 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Thu, Jul 19, 2018 at 10:40 PM Jeff King <peff@peff.net> wrote:
>>
>> There are a few standard C functions (like strcpy) which are
>> easy to misuse. We generally discourage these in reviews,
>> but we haven't put advice in CodingGuidelines, nor provided
>> any automated enforcement. The latter is especially
>> important because it's more consistent, and it can often
>> save a round-trip of review.
>>
>> Let's start by banning strcpy() and sprintf(). It's not
>> impossible to use these correctly, but it's easy to do so
>> incorrectly, and there's always a better option.
>
> Is it possible to extend this to ban variables as well? I'm still
> going to delete the_index from library code. Once that work is done I

Or perhaps constants, such as PATH_MAX to avoid problems like this one
from 2.18.0 timeframe:
https://public-inbox.org/git/7d1237c7-5a83-d766-7d93-5f0d59166067@web.de/

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-20 13:22       ` Theodore Y. Ts'o
@ 2018-07-20 17:56         ` Jeff King
  2018-07-20 19:03           ` Junio C Hamano
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2018-07-20 17:56 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: Junio C Hamano, git, Stefan Beller

On Fri, Jul 20, 2018 at 09:22:41AM -0400, Theodore Y. Ts'o wrote:

> On Thu, Jul 19, 2018 at 09:08:08PM -0400, Jeff King wrote:
> > Ditto for sprintf, where you should _always_ be using at least xsnprintf
> > (or some better tool, depending on the situation).  And for strncpy,
> > strlcpy (or again, some better tool) is strictly an improvement.
> 
> Nitpick: this may be true for git, but it's not strictly true in all
> cases.  I actually have a (non-git) case where strncpy *is* the right
> tool.  And this is one where I'm copying a NUL-terminated string into
> a fixed-length charater array (in the ext4 superblock) which is *not*
> NUL-terminated.  In that case, strncpy works(); but strlcpy() does not
> do what I want.

Thanks for an interesting (and exotic) counter-point. For strcpy(), you
always have to run strlen() anyway to know you're not going to overflow,
so you might as well memcpy() at that point. But if you really are OK
with truncation, then using strncpy() lets you copy while walking over
the string only once, which is more efficient.  On the other hand,
strncpy() fills out NULs to the end of the destination array, so you are
paying an extra cost for small strings.

> So I used strncpy() advisedly, and I ignore people running Coccinelle
> scripts and blindly sending patches to "fix" ext4.

Even after hearing your counter-point, I think I'm still OK with the
ban. Because as you've noticed, even if the call is fine, it carries an
ongoing maintenance cost. Every time you (or somebody else) audits, you
have to skip over that known-good call. And you have to deal with
well-meaning patches. So to me there's value in eliminating it even if
it's an acceptable tool.

We don't have any remaining calls to strncpy() in Git, so this lets us
punt on deciding if the ban is worth changing what's there. We can wait
for somebody to decide they _really_ need strncpy, and we can discuss it
then with a concrete case.

> But perhaps that's also a solution for git?  You don't have to
> necessarily put them on a banned list; you could instead have some
> handy, pre-set scripts that scan the entire code base looking for
> "bad" functions with cleanups automatically suggested.  This could be
> run at patch review time, and/or periodically (especially before a
> release).

We do this already with coccinelle for some transformations. But I think
there's real value in linting that's always run, and is cheap. Catching
these things early means we don't have to spend time on them in review,
or dealing with follow-up patches.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-20 14:41   ` Duy Nguyen
  2018-07-20 17:48     ` Elijah Newren
@ 2018-07-20 18:00     ` Jeff King
  1 sibling, 0 replies; 42+ messages in thread
From: Jeff King @ 2018-07-20 18:00 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Stefan Beller

On Fri, Jul 20, 2018 at 04:41:34PM +0200, Duy Nguyen wrote:

> > Let's start by banning strcpy() and sprintf(). It's not
> > impossible to use these correctly, but it's easy to do so
> > incorrectly, and there's always a better option.
> 
> Is it possible to extend this to ban variables as well? I'm still
> going to delete the_index from library code. Once that work is done I
> will ban it from entering again (it's only allowed in builtin/ for
> example). The next target after that would be the_repository.
> 
> Right now I will do this by not declaring the variable [2], which ends
> up with a much less friendly compile warning. I did something similar
> [1] in an earlier iteration but did not do extensive research on this
> topic like you did.

IMHO just not declaring the variable is the right move (and is what I
would do with these functions if libc wasn't already declaring them).

The compile error may be less friendly, but these things are temporary
as topics-in-flight catch up. Whereas strcpy() will be with us forever.

I also think that removing variables is a special case of a larger
technique: when we change function semantics, we try to change the
signatures, too. So there you'll get a type error, or not enough
arguments, or whatever. And the next step is usually "git log" or "git
blame" to see what happened, and what the conversion should look like.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-20 17:48     ` Elijah Newren
@ 2018-07-20 18:04       ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2018-07-20 18:04 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Duy Nguyen, Git Mailing List, Stefan Beller

On Fri, Jul 20, 2018 at 10:48:37AM -0700, Elijah Newren wrote:

> > Is it possible to extend this to ban variables as well? I'm still
> > going to delete the_index from library code. Once that work is done I
> 
> Or perhaps constants, such as PATH_MAX to avoid problems like this one
> from 2.18.0 timeframe:
> https://public-inbox.org/git/7d1237c7-5a83-d766-7d93-5f0d59166067@web.de/

I've been slowly trying to eradicate PATH_MAX from our code base. And I
would be happy with an eventual automated ban there. Unlike the_index,
it comes from the system, so it's in the same boat as strcpy() etc.

That said, I think it's less urgent. The urgent problem fixed by the
patch you linked was the use of strcpy() to overflow the buffer. Without
that, it just becomes a normal bug where we do not handle long paths
well on some operating systems.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/2] fail compilation with strcpy
  2018-07-19 20:33 [PATCH 0/2] fail compilation with strcpy Jeff King
  2018-07-19 20:39 ` [PATCH 1/2] introduce "banned function" list Jeff King
  2018-07-19 20:39 ` [PATCH 2/2] banned.h: mark strncpy as banned Jeff King
@ 2018-07-20 18:58 ` Junio C Hamano
  2018-07-20 19:18   ` Jeff King
  2018-07-24  9:23 ` [PATCH v2 0/4] " Jeff King
  3 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2018-07-20 18:58 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Stefan Beller

Jeff King <peff@peff.net> writes:

> This is a patch series to address the discussion in the thread at:
>
>   https://public-inbox.org/git/20180713204350.GA16999@sigill.intra.peff.net/
>
> Basically, the question was: can we declare strcpy banned and have a
> linter save us the trouble of finding it in review. The answer is yes,
> the compiler is good at that. ;)
>
> There are probably as many lists of banned functions as there are coding
> style documents. I don't agree with every entry in the ones I've seen.
> And in many cases coccinelle is a better choice, because the problem is
> not "this function is so bad your patch should not even make it to the
> list with it", but "don't do it like A; we prefer to do it like B
> instead". And coccinelle does the latter more flexibly and
> automatically.
>
> So I tried to pick some obvious and uncontroversial candidates here.
> gets() could be another one, but it's mostly banned already (it's out of
> the standard, and most libcs mark it with a deprecated attribute).
>
> Note that this needs to be applied on top of 022d2ac1f3 (blame: prefer
> xsnprintf to strcpy for colors, 2018-07-13) or it will complain loudly. :)
>
>   [1/2]: introduce "banned function" list
>   [2/2]: banned.h: mark strncpy as banned

Hmph, there is no use of any banned function in hex.c, but when
this topic is merged to 'pu', I seem to get this:

$ make DEVELOPER=1 hex.o
GIT_VERSION = 2.18.0.758.g18f90b35b8
    CC hex.o
In file included from git-compat-util.h:1250:0,
                 from cache.h:4,
                 from hex.c:1:
banned.h:14:0: error: "strncpy" redefined [-Werror]
 #define strncpy(x,y,n) BANNED(strncpy)
 
In file included from /usr/include/string.h:630:0,
                 from git-compat-util.h:165,
                 from cache.h:4,
                 from hex.c:1:
/usr/include/x86_64-linux-gnu/bits/string2.h:84:0: note: this is the location of the previous definition
 # define strncpy(dest, src, n) __builtin_strncpy (dest, src, n)
 
cc1: all warnings being treated as errors
Makefile:2279: recipe for target 'hex.o' failed
make: *** [hex.o] Error 1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/2] introduce "banned function" list
  2018-07-20 17:56         ` Jeff King
@ 2018-07-20 19:03           ` Junio C Hamano
  0 siblings, 0 replies; 42+ messages in thread
From: Junio C Hamano @ 2018-07-20 19:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Theodore Y. Ts'o, git, Stefan Beller

Jeff King <peff@peff.net> writes:

> Thanks for an interesting (and exotic) counter-point. For strcpy(), you
> always have to run strlen() anyway to know you're not going to overflow,
> so you might as well memcpy() at that point. But if you really are OK
> with truncation, then using strncpy() lets you copy while walking over
> the string only once, which is more efficient.  On the other hand,
> strncpy() fills out NULs to the end of the destination array, so you are
> paying an extra cost for small strings.
>
>> So I used strncpy() advisedly, and I ignore people running Coccinelle
>> scripts and blindly sending patches to "fix" ext4.

Given that strncpy() was invented in V7 days for specifically the
purpose of filling the filename field, it is not surprising at all
that it is the perfect tool for the same purpose in ext4 ;-)

> We don't have any remaining calls to strncpy() in Git, so this lets us
> punt on deciding if the ban is worth changing what's there. We can wait
> for somebody to decide they _really_ need strncpy, and we can discuss it
> then with a concrete case.

Yes, and the plan (or at least your plan) is to limit the banned
list to things that really has no reason to be used, the above
sounds like a good approach.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/2] fail compilation with strcpy
  2018-07-20 18:58 ` [PATCH 0/2] fail compilation with strcpy Junio C Hamano
@ 2018-07-20 19:18   ` Jeff King
  2018-07-20 21:50     ` Junio C Hamano
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2018-07-20 19:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Stefan Beller

On Fri, Jul 20, 2018 at 11:58:10AM -0700, Junio C Hamano wrote:

> > Note that this needs to be applied on top of 022d2ac1f3 (blame: prefer
> > xsnprintf to strcpy for colors, 2018-07-13) or it will complain loudly. :)
> >
> >   [1/2]: introduce "banned function" list
> >   [2/2]: banned.h: mark strncpy as banned
> 
> Hmph, there is no use of any banned function in hex.c, but when
> this topic is merged to 'pu', I seem to get this:

Interesting. Builds fine for me even merged to the latest push-out of
pu. But...

> $ make DEVELOPER=1 hex.o
> GIT_VERSION = 2.18.0.758.g18f90b35b8
>     CC hex.o
> In file included from git-compat-util.h:1250:0,
>                  from cache.h:4,
>                  from hex.c:1:
> banned.h:14:0: error: "strncpy" redefined [-Werror]
>  #define strncpy(x,y,n) BANNED(strncpy)
>  
> In file included from /usr/include/string.h:630:0,
>                  from git-compat-util.h:165,
>                  from cache.h:4,
>                  from hex.c:1:
> /usr/include/x86_64-linux-gnu/bits/string2.h:84:0: note: this is the location of the previous definition
>  # define strncpy(dest, src, n) __builtin_strncpy (dest, src, n)

I suspect it has more to do with system/libc differences between our
machines, anyway. There was discussion elsewhere in the thread about the
need to #undef before redefining. I guess this answers that question.

I'll include that in the re-roll, and you can just ignore the v1 patches
I sent for now.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/2] fail compilation with strcpy
  2018-07-20 19:18   ` Jeff King
@ 2018-07-20 21:50     ` Junio C Hamano
  0 siblings, 0 replies; 42+ messages in thread
From: Junio C Hamano @ 2018-07-20 21:50 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Stefan Beller

Jeff King <peff@peff.net> writes:

> I suspect it has more to do with system/libc differences between our
> machines, anyway. There was discussion elsewhere in the thread about the
> need to #undef before redefining. I guess this answers that question.

Yes, that is it.  For now I can squash this in before pushing it out
on 'pu'.

 banned.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/banned.h b/banned.h
index ae6aaaa4a9..d138f3ecf2 100644
--- a/banned.h
+++ b/banned.h
@@ -10,11 +10,17 @@
 
 #define BANNED(func) sorry_##func##_is_a_banned_function()
 
+#undef strcpy
 #define strcpy(x,y) BANNED(strcpy)
+
+#undef strncpy
 #define strncpy(x,y,n) BANNED(strncpy)
 
 #ifdef HAVE_VARIADIC_MACROS
+
+#undef sprintf
 #define sprintf(...) BANNED(sprintf)
+
 #endif
 
 #endif /* BANNED_H */

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 0/4] fail compilation with strcpy
  2018-07-19 20:33 [PATCH 0/2] fail compilation with strcpy Jeff King
                   ` (2 preceding siblings ...)
  2018-07-20 18:58 ` [PATCH 0/2] fail compilation with strcpy Junio C Hamano
@ 2018-07-24  9:23 ` Jeff King
  2018-07-24  9:26   ` [PATCH v2 1/4] automatically ban strcpy() Jeff King
                     ` (3 more replies)
  3 siblings, 4 replies; 42+ messages in thread
From: Jeff King @ 2018-07-24  9:23 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller, Eric Sunshine, Junio C Hamano

On Thu, Jul 19, 2018 at 04:32:59PM -0400, Jeff King wrote:

> This is a patch series to address the discussion in the thread at:
> 
>   https://public-inbox.org/git/20180713204350.GA16999@sigill.intra.peff.net/
> 
> Basically, the question was: can we declare strcpy banned and have a
> linter save us the trouble of finding it in review. The answer is yes,
> the compiler is good at that. ;)

Here's a v2 that I think addresses the comments on the earlier series.
Thanks everybody for your review.

Changes here include:

 - drop the mention in CodingGuidelines. That list is already long, and
   we don't need to to waste mental effort on things that will be
   enforced automatically

 - we now #undef all macros before defining them to avoid redefinition
   warnings

 - the first patch now covers _just_ strcpy(), and each function gets
   its own patch with an explanation (and suggested alternatives). My
   thought is that these should be easy to dig up via blame, "log -S",
   or "log --grep". Though another option would be comments in banned.h.

 - added strcat and vsprintf to the banned list

 - tweaked the first patch's commit message for clarity and to address
   points raised in discussion

As before, this needs to go on top of 022d2ac1f3 (which I hope should
hit master soon anyway).

  [1/4]: automatically ban strcpy()
  [2/4]: banned.h: mark strcat() as banned
  [3/4]: banned.h: mark sprintf() as banned
  [4/4]: banned.h: mark strncpy() as banned

 banned.h          | 30 ++++++++++++++++++++++++++++++
 git-compat-util.h |  6 ++++++
 2 files changed, 36 insertions(+)
 create mode 100644 banned.h

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/4] automatically ban strcpy()
  2018-07-24  9:23 ` [PATCH v2 0/4] " Jeff King
@ 2018-07-24  9:26   ` Jeff King
  2018-07-24 17:20     ` Eric Sunshine
  2018-07-24  9:26   ` [PATCH v2 2/4] banned.h: mark strcat() as banned Jeff King
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2018-07-24  9:26 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller, Eric Sunshine, Junio C Hamano

There are a few standard C functions (like strcpy) which are
easy to misuse. E.g.:

  char path[PATH_MAX];
  strcpy(path, arg);

may overflow the "path" buffer. Sometimes there's an earlier
constraint on the size of "arg", but even in such a case
it's hard to verify that the code is correct. If the size
really is unbounded, you're better off using a dynamic
helper like strbuf:

  struct strbuf path = STRBUF_INIT;
  strbuf_addstr(path, arg);

or if it really is bounded, then use xsnprintf to show your
expectation (and get a run-time assertion):

  char path[PATH_MAX];
  xsnprintf(path, sizeof(path), "%s", arg);

which makes further auditing easier.

We'd usually catch undesirable code like this in a review,
but there's no automated enforcement. Adding that
enforcement can help us be more consistent and save effort
(and a round-trip) during review.

This patch teaches the compiler to report an error when it
sees strcpy (and will become a model for banning a few other
functions). This has a few advantages over a separate
linting tool:

  1. We know it's run as part of a build cycle, so it's
     hard to ignore. Whereas an external linter is an extra
     step the developer needs to remember to do.

  2. Likewise, it's basically free since the compiler is
     parsing the code anyway.

  3. We know it's robust against false positives (unlike a
     grep-based linter).

The two big disadvantages are:

  1. We'll only check code that is actually compiled, so it
     may miss code that isn't triggered on your particular
     system. But since presumably people don't add new code
     without compiling it (and if they do, the banned
     function list is the least of their worries), we really
     only care about failing to clean up old code when
     adding new functions to the list. And that's easy
     enough to address with a manual audit when adding a new
     function.

  2. If this ends up generating false positives, it's going
     to be harder to disable (as opposed to a separate
     linter, which may have mechanisms for overriding a
     particular case).

     But the intent is to only ban functions which are
     obviously bad, and for which we accept using an
     alternative even when this particular use isn't buggy
     (e.g., the xsnprintf alternative above).

The implementation here is simple: we'll define a macro for
the banned function which replaces it with a descriptively
named but undefined function.  Replacing it with any invalid
code would work (since we just want to break compilation).
But ideally we'd meet these goals:

 - it should be portable; ideally this would trigger
   everywhere, and does not need to be part of a DEVELOPER=1
   setup (because unlike warnings which may depend on the
   compiler or system, this is a clear indicator of
   something wrong in the code).

 - it should generate a readable error that gives the
   developer a clue what happened

 - it should avoid generating too much other cruft that
   makes it hard to see the actual error

 - it should mention the original callsite in the error

The output with this patch looks like this (using gcc 7, on
a checkout without 022d2ac1f3, which removes the final
strcpy from blame.c):

      CC builtin/blame.o
  In file included from ./git-compat-util.h:1242:0,
                   from ./cache.h:4,
                   from builtin/blame.c:8:
  builtin/blame.c: In function ‘cmd_blame’:
  ./banned.h:11:22: error: implicit declaration of function ‘sorry_strcpy_is_a_banned_function’ [-Werror=implicit-function-declaration]
   #define BANNED(func) sorry_##func##_is_a_banned_function()
                        ^
  ./banned.h:13:21: note: in expansion of macro ‘BANNED’
   #define strcpy(x,y) BANNED(strcpy)
                       ^~~~~~
  builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
      strcpy(repeated_meta_color, GIT_COLOR_CYAN);
      ^~~~~~

This pretty clearly shows the original callsite along with
the descriptive function name, and it mentions banned.h,
which contains more information.

What's not perfectly ideal is:

  1. We'll only trigger with -Wimplicit-function-declaration
     (and only stop compilation with -Werror). These are
     generally enabled by DEVELOPER=1. If you _don't_ have
     that set, we'll still catch the problem, but only at
     link-time, with a slightly less useful message:

       /usr/bin/ld: builtin/blame.o: in function `cmd_blame':
       /home/peff/tmp/builtin/blame.c:1074: undefined reference to `sorry_strcpy_is_a_banned_function'
       collect2: error: ld returned 1 exit status

     If instead we convert this to a reference to an
     undefined variable, that always dies immediately. But
     gcc seems to print the set of errors twice, which
     clutters things up.

  2. The expansion through BANNED() adds an extra layer of
     error. Curiously, though, removing it (and just
     expanding strcpy directly to the bogus function name)
     causes gcc _not_ to report the original line of code.

So experimentally, this seems to behave pretty well on gcc.
Clang looks OK, too, though:

 - it actually produces a better message for the undeclared
   identifier than for the implicit function (it doesn't
   double the errors, and for the implicit function it
   actually produces an extra complaint about
   strict-prototypes).

 - the BANNED indirection has no effect (it shows the
   original line of code either way)

So if we want to optimize for clang's output, we could
switch both of those decisions.

Note that the linux kernel has a similar mechanism which
goes by BUILD_BUG_ON_MSG(). It works by declaring a one-off
function with gcc's error attribute. That doesn't work for
us here, because we'd like to work even on non-gcc
compilers (and in my opinion the compiler output is actually
_less_ readable).

Signed-off-by: Jeff King <peff@peff.net>
---
 banned.h          | 16 ++++++++++++++++
 git-compat-util.h |  6 ++++++
 2 files changed, 22 insertions(+)
 create mode 100644 banned.h

diff --git a/banned.h b/banned.h
new file mode 100644
index 0000000000..c50091ad7d
--- /dev/null
+++ b/banned.h
@@ -0,0 +1,16 @@
+#ifndef BANNED_H
+#define BANNED_H
+
+/*
+ * This header lists functions that have been banned from our code base,
+ * because they're too easy to misuse (and even if used correctly,
+ * complicate audits). Including this header turns them into compile-time
+ * errors.
+ */
+
+#define BANNED(func) sorry_##func##_is_a_banned_function()
+
+#undef strcpy
+#define strcpy(x,y) BANNED(strcpy)
+
+#endif /* BANNED_H */
diff --git a/git-compat-util.h b/git-compat-util.h
index 9a64998b24..89d37095c7 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1239,4 +1239,10 @@ extern void unleak_memory(const void *ptr, size_t len);
 #define UNLEAK(var) do {} while (0)
 #endif
 
+/*
+ * This include must come after system headers, since it introduces macros that
+ * replace system names.
+ */
+#include "banned.h"
+
 #endif
-- 
2.18.0.542.g2bf2fc4f7e


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 2/4] banned.h: mark strcat() as banned
  2018-07-24  9:23 ` [PATCH v2 0/4] " Jeff King
  2018-07-24  9:26   ` [PATCH v2 1/4] automatically ban strcpy() Jeff King
@ 2018-07-24  9:26   ` Jeff King
  2018-07-24  9:27   ` [PATCH v2 3/4] banned.h: mark sprintf() " Jeff King
  2018-07-24  9:28   ` [PATCH v2 4/4] banned.h: mark strncpy() " Jeff King
  3 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2018-07-24  9:26 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller, Eric Sunshine, Junio C Hamano

The strcat() function has all of the same overflow problems
as strcpy(). And as a bonus, it's easy to end up
accidentally quadratic, as each subsequent call has to walk
through the existing string.

The last strcat() call went away in f063d38b80 (daemon: use
cld->env_array when re-spawning, 2015-09-24). In general,
strcat() can be replaced either with a dynamic string
(strbuf or xstrfmt), or with xsnprintf if you know the
length is bounded.

Signed-off-by: Jeff King <peff@peff.net>
---
 banned.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/banned.h b/banned.h
index c50091ad7d..32e0bae01d 100644
--- a/banned.h
+++ b/banned.h
@@ -12,5 +12,7 @@
 
 #undef strcpy
 #define strcpy(x,y) BANNED(strcpy)
+#undef strcat
+#define strcat(x,y) BANNED(strcat)
 
 #endif /* BANNED_H */
-- 
2.18.0.542.g2bf2fc4f7e


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 3/4] banned.h: mark sprintf() as banned
  2018-07-24  9:23 ` [PATCH v2 0/4] " Jeff King
  2018-07-24  9:26   ` [PATCH v2 1/4] automatically ban strcpy() Jeff King
  2018-07-24  9:26   ` [PATCH v2 2/4] banned.h: mark strcat() as banned Jeff King
@ 2018-07-24  9:27   ` Jeff King
  2018-07-24  9:28   ` [PATCH v2 4/4] banned.h: mark strncpy() " Jeff King
  3 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2018-07-24  9:27 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller, Eric Sunshine, Junio C Hamano

The sprintf() function (and its variadic form vsprintf) make
it easy to accidentally introduce a buffer overflow. If
you're thinking of using them, you're better off either
using a dynamic string (strbuf or xstrfmt), or xsnprintf if
you really know that you won't overflow. The last sprintf()
call went away quite a while ago in f0766bf94e (fsck: use
for_each_loose_file_in_objdir, 2015-09-24).

Note that we respect HAVE_VARIADIC_MACROS here, which some
ancient platforms lack. As a fallback, we can just "guess"
that the caller will provide 3 arguments. If they do, then
the macro will work as usual. If not, then they'll get a
slightly less useful error, like:

  git.c:718:24: error: macro "sprintf" passed 3 arguments, but takes just 2

That's not ideal, but it at least alerts them to the problem
area. And anyway, we're primarily targeting people adding
new code. Most developers should be on modern enough
platforms to see the normal "good" error message.

Signed-off-by: Jeff King <peff@peff.net>
---
 banned.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/banned.h b/banned.h
index 32e0bae01d..c83d6cb9df 100644
--- a/banned.h
+++ b/banned.h
@@ -15,4 +15,14 @@
 #undef strcat
 #define strcat(x,y) BANNED(strcat)
 
+#undef sprintf
+#undef vsprintf
+#ifdef HAVE_VARIADIC_MACROS
+#define sprintf(...) BANNED(sprintf)
+#define vsprintf(...) BANNED(vsprintf)
+#else
+#define sprintf(buf,fmt,arg) BANNED(sprintf)
+#define vsprintf(buf,fmt,arg) BANNED(sprintf)
+#endif
+
 #endif /* BANNED_H */
-- 
2.18.0.542.g2bf2fc4f7e


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 4/4] banned.h: mark strncpy() as banned
  2018-07-24  9:23 ` [PATCH v2 0/4] " Jeff King
                     ` (2 preceding siblings ...)
  2018-07-24  9:27   ` [PATCH v2 3/4] banned.h: mark sprintf() " Jeff King
@ 2018-07-24  9:28   ` Jeff King
  3 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2018-07-24  9:28 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller, Eric Sunshine, Junio C Hamano

The strncpy() function is less horrible than strcpy(), but
is still pretty easy to misuse because of its funny
termination semantics. Namely, that if it truncates it omits
the NUL terminator, and you must remember to add it
yourself. Even if you use it correctly, it's sometimes hard
for a reader to verify this without hunting through the
code. If you're thinking about using it, consider instead:

  - strlcpy() if you really just need a truncated but
    NUL-terminated string (we provide a compat version, so
    it's always available)

  - xsnprintf() if you're sure that what you're copying
    should fit

  - strbuf or xstrfmt() if you need to handle
    arbitrary-length heap-allocated strings

Note that there is one instance of strncpy in
compat/regex/regcomp.c, which is fine (it allocates a
sufficiently large string before copying). But this doesn't
trigger the ban-list even when compiling with NO_REGEX=1,
because:

  1. we don't use git-compat-util.h when compiling it
     (instead we rely on the system includes from the
     upstream library); and

  2. It's in an "#ifdef DEBUG" block

Since it's doesn't trigger the banned.h code, we're better
off leaving it to keep our divergence from upstream minimal.

Signed-off-by: Jeff King <peff@peff.net>
---
 banned.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/banned.h b/banned.h
index c83d6cb9df..ab96954583 100644
--- a/banned.h
+++ b/banned.h
@@ -14,6 +14,8 @@
 #define strcpy(x,y) BANNED(strcpy)
 #undef strcat
 #define strcat(x,y) BANNED(strcat)
+#undef strncpy
+#define strncpy(x,y,n) BANNED(strncpy)
 
 #undef sprintf
 #undef vsprintf
-- 
2.18.0.542.g2bf2fc4f7e

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/4] automatically ban strcpy()
  2018-07-24  9:26   ` [PATCH v2 1/4] automatically ban strcpy() Jeff King
@ 2018-07-24 17:20     ` Eric Sunshine
  2018-07-26  6:58       ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Eric Sunshine @ 2018-07-24 17:20 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, Stefan Beller, Junio C Hamano

On Tue, Jul 24, 2018 at 5:26 AM Jeff King <peff@peff.net> wrote:
>   1. We'll only trigger with -Wimplicit-function-declaration
>      (and only stop compilation with -Werror). These are
>      generally enabled by DEVELOPER=1. If you _don't_ have
>      that set, we'll still catch the problem, but only at
>      link-time, with a slightly less useful message:
>
>      If instead we convert this to a reference to an
>      undefined variable, that always dies immediately. But
>      gcc seems to print the set of errors twice, which
>      clutters things up.

The above does a pretty good job of convincing me that this ought to
be implemented via an undefined variable rather than undefined
function, exactly because it is the newcomer or casual contributor who
is most likely to trip over a banned function, and almost certainly
won't have DEVELOPER=1 set. The gcc clutter seems a minor point
against the benefit this provides to that audience.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 1/4] automatically ban strcpy()
  2018-07-24 17:20     ` Eric Sunshine
@ 2018-07-26  6:58       ` Jeff King
  2018-07-26  7:21         ` [PATCH v3 " Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2018-07-26  6:58 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Stefan Beller, Junio C Hamano

On Tue, Jul 24, 2018 at 01:20:58PM -0400, Eric Sunshine wrote:

> On Tue, Jul 24, 2018 at 5:26 AM Jeff King <peff@peff.net> wrote:
> >   1. We'll only trigger with -Wimplicit-function-declaration
> >      (and only stop compilation with -Werror). These are
> >      generally enabled by DEVELOPER=1. If you _don't_ have
> >      that set, we'll still catch the problem, but only at
> >      link-time, with a slightly less useful message:
> >
> >      If instead we convert this to a reference to an
> >      undefined variable, that always dies immediately. But
> >      gcc seems to print the set of errors twice, which
> >      clutters things up.
> 
> The above does a pretty good job of convincing me that this ought to
> be implemented via an undefined variable rather than undefined
> function, exactly because it is the newcomer or casual contributor who
> is most likely to trip over a banned function, and almost certainly
> won't have DEVELOPER=1 set. The gcc clutter seems a minor point
> against the benefit this provides to that audience.

OK. I was on the fence, but it should be pretty trivial to switch. Let
me see if I can just make a replacement for patch 1, or if the whole
thing needs to be rebased on top.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v3 1/4] automatically ban strcpy()
  2018-07-26  6:58       ` Jeff King
@ 2018-07-26  7:21         ` Jeff King
  2018-07-26 17:33           ` Junio C Hamano
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2018-07-26  7:21 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Stefan Beller, Junio C Hamano

On Thu, Jul 26, 2018 at 02:58:40AM -0400, Jeff King wrote:

> On Tue, Jul 24, 2018 at 01:20:58PM -0400, Eric Sunshine wrote:
> 
> > On Tue, Jul 24, 2018 at 5:26 AM Jeff King <peff@peff.net> wrote:
> > >   1. We'll only trigger with -Wimplicit-function-declaration
> > >      (and only stop compilation with -Werror). These are
> > >      generally enabled by DEVELOPER=1. If you _don't_ have
> > >      that set, we'll still catch the problem, but only at
> > >      link-time, with a slightly less useful message:
> > >
> > >      If instead we convert this to a reference to an
> > >      undefined variable, that always dies immediately. But
> > >      gcc seems to print the set of errors twice, which
> > >      clutters things up.
> > 
> > The above does a pretty good job of convincing me that this ought to
> > be implemented via an undefined variable rather than undefined
> > function, exactly because it is the newcomer or casual contributor who
> > is most likely to trip over a banned function, and almost certainly
> > won't have DEVELOPER=1 set. The gcc clutter seems a minor point
> > against the benefit this provides to that audience.
> 
> OK. I was on the fence, but it should be pretty trivial to switch. Let
> me see if I can just make a replacement for patch 1, or if the whole
> thing needs to be rebased on top.

The rest apply cleanly (because they're far enough away textually from
the definition of BANNED).

So here's a replacement for just patch 1 (I'm assuming this creates less
work than re-posting them all, but it may not be if Junio prefers
dealing with a whole new mbox rather than a "rebase -i", "reset --hard
HEAD^", "git am" -- let me know if you'd prefer it the other way).

-- >8 --
Subject: [PATCH] automatically ban strcpy()

There are a few standard C functions (like strcpy) which are
easy to misuse. E.g.:

  char path[PATH_MAX];
  strcpy(path, arg);

may overflow the "path" buffer. Sometimes there's an earlier
constraint on the size of "arg", but even in such a case
it's hard to verify that the code is correct. If the size
really is unbounded, you're better off using a dynamic
helper like strbuf:

  struct strbuf path = STRBUF_INIT;
  strbuf_addstr(path, arg);

or if it really is bounded, then use xsnprintf to show your
expectation (and get a run-time assertion):

  char path[PATH_MAX];
  xsnprintf(path, sizeof(path), "%s", arg);

which makes further auditing easier.

We'd usually catch undesirable code like this in a review,
but there's no automated enforcement. Adding that
enforcement can help us be more consistent and save effort
(and a round-trip) during review.

This patch teaches the compiler to report an error when it
sees strcpy (and will become a model for banning a few other
functions). This has a few advantages over a separate
linting tool:

  1. We know it's run as part of a build cycle, so it's
     hard to ignore. Whereas an external linter is an extra
     step the developer needs to remember to do.

  2. Likewise, it's basically free since the compiler is
     parsing the code anyway.

  3. We know it's robust against false positives (unlike a
     grep-based linter).

The two big disadvantages are:

  1. We'll only check code that is actually compiled, so it
     may miss code that isn't triggered on your particular
     system. But since presumably people don't add new code
     without compiling it (and if they do, the banned
     function list is the least of their worries), we really
     only care about failing to clean up old code when
     adding new functions to the list. And that's easy
     enough to address with a manual audit when adding a new
     function (which is what I did for the functions here).

  2. If this ends up generating false positives, it's going
     to be harder to disable (as opposed to a separate
     linter, which may have mechanisms for overriding a
     particular case).

     But the intent is to only ban functions which are
     obviously bad, and for which we accept using an
     alternative even when this particular use isn't buggy
     (e.g., the xsnprintf alternative above).

The implementation here is simple: we'll define a macro for
the banned function which replaces it with a reference to a
descriptively named but undeclared identifier.  Replacing it
with any invalid code would work (since we just want to
break compilation).  But ideally we'd meet these goals:

 - it should be portable; ideally this would trigger
   everywhere, and does not need to be part of a DEVELOPER=1
   setup (because unlike warnings which may depend on the
   compiler or system, this is a clear indicator of
   something wrong in the code).

 - it should generate a readable error that gives the
   developer a clue what happened

 - it should avoid generating too much other cruft that
   makes it hard to see the actual error

 - it should mention the original callsite in the error

The output with this patch looks like this (using gcc 7, on
a checkout with 022d2ac1f3 reverted, which removed the final
strcpy from blame.c):

      CC builtin/blame.o
  In file included from ./git-compat-util.h:1246,
                   from ./cache.h:4,
                   from builtin/blame.c:8:
  builtin/blame.c: In function ‘cmd_blame’:
  ./banned.h:11:22: error: ‘sorry_strcpy_is_a_banned_function’ undeclared (first use in this function)
   #define BANNED(func) sorry_##func##_is_a_banned_function
                        ^~~~~~
  ./banned.h:14:21: note: in expansion of macro ‘BANNED’
   #define strcpy(x,y) BANNED(strcpy)
                       ^~~~~~
  builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
      strcpy(repeated_meta_color, GIT_COLOR_CYAN);
      ^~~~~~
  ./banned.h:11:22: note: each undeclared identifier is reported only once for each function it appears in
   #define BANNED(func) sorry_##func##_is_a_banned_function
                        ^~~~~~
  ./banned.h:14:21: note: in expansion of macro ‘BANNED’
   #define strcpy(x,y) BANNED(strcpy)
                       ^~~~~~
  builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’
      strcpy(repeated_meta_color, GIT_COLOR_CYAN);
      ^~~~~~

This prominently shows the phrase "strcpy is a banned
function", along with the original callsite in blame.c and
the location of the ban code in banned.h. Which should be
enough to get even a developer seeing this for the first
time pointed in the right direction.

This doesn't match our ideals perfectly, but it's a pretty
good balance. A few alternatives I tried:

  1. Instead of using an undeclared variable, using an
     undeclared function. This shortens the message, because
     the "each undeclared identifier" message is not needed
     (and as you can see above, it triggers a separate
     mention of each of the expansion points).

     But it doesn't actually stop compilation unless you use
     -Werror=implicit-function-declaration in your CFLAGS.
     This is the case for DEVELOPER=1, but not for a default
     build (on the other hand, we'd eventually produce a
     link error pointing to the correct source line with the
     descriptive name).

  2. The linux kernel uses a similar mechanism in its
     BUILD_BUG_ON_MSG(), where they actually declare the
     function but do so with gcc's error attribute. But
     that's not portable to other compilers (and it also
     runs afoul of our error() macro).

     We could make a gcc-specific technique and fallback on
     other compilers, but it's probably not worth the
     complexity. It also isn't significantly shorter than
     the error message shown above.

  3. We could drop the BANNED() macro, which would shorten
     the number of lines in the error. But curiously,
     removing it (and just expanding strcpy directly to the
     bogus identifier) causes gcc _not_ to report the
     original line of code.

So this strategy seems to be an acceptable mix of
information, portability, simplicity, and robustness,
without _too_ much extra clutter. I also tested it with
clang, and it looks as good (actually, slightly less
cluttered than with gcc).

Signed-off-by: Jeff King <peff@peff.net>
---
 banned.h          | 16 ++++++++++++++++
 git-compat-util.h |  6 ++++++
 2 files changed, 22 insertions(+)
 create mode 100644 banned.h

diff --git a/banned.h b/banned.h
new file mode 100644
index 0000000000..1a3e526570
--- /dev/null
+++ b/banned.h
@@ -0,0 +1,16 @@
+#ifndef BANNED_H
+#define BANNED_H
+
+/*
+ * This header lists functions that have been banned from our code base,
+ * because they're too easy to misuse (and even if used correctly,
+ * complicate audits). Including this header turns them into compile-time
+ * errors.
+ */
+
+#define BANNED(func) sorry_##func##_is_a_banned_function
+
+#undef strcpy
+#define strcpy(x,y) BANNED(strcpy)
+
+#endif /* BANNED_H */
diff --git a/git-compat-util.h b/git-compat-util.h
index 9a64998b24..89d37095c7 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1239,4 +1239,10 @@ extern void unleak_memory(const void *ptr, size_t len);
 #define UNLEAK(var) do {} while (0)
 #endif
 
+/*
+ * This include must come after system headers, since it introduces macros that
+ * replace system names.
+ */
+#include "banned.h"
+
 #endif
-- 
2.18.0.795.g7103b4bd43


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/4] automatically ban strcpy()
  2018-07-26  7:21         ` [PATCH v3 " Jeff King
@ 2018-07-26 17:33           ` Junio C Hamano
  2018-07-27  8:08             ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2018-07-26 17:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Sunshine, Git List, Stefan Beller

Jeff King <peff@peff.net> writes:

> So here's a replacement for just patch 1 (I'm assuming this creates less
> work than re-posting them all, but it may not be if Junio prefers
> dealing with a whole new mbox rather than a "rebase -i", "reset --hard
> HEAD^", "git am" -- let me know if you'd prefer it the other way).

A single patch replacement that is clearly marked which one to
replace and which other ones to keep, like you did here, is fine.
The amount of work is about the same either way.

0) I would first do these to make sure that I can replace:

 $ git checkout jk/banned-functions
 $ git log --first-parent --oneline master..
 $ git log --first-parent --oneline next..

If 'next' has some patches that are not in 'master' from the topic,
I must refrain from replacing them (in this case, there is no such
issue).

1-a) With a wholesale replacement,

 $ git checkout master...	;# try to keep the same base
 $ git am -s mbox		;# on detached HEAD
 $ git tbdiff ..@{-1} @{-1}..	;# or range-diff

If the range-/tbdiff tells me that earlier N patches are the same,
the above is followed by something like (pardon off-by-one)

 $ git rebase --onto @{-1}~N HEAD~N

to preserve as many original commits as possible.

1-b) With a single patch replacement, it is quite different.

 $ git checkout HEAD~4	        ;# we are replacing 1/4 of the original
 $ git am -s mbox               ;# that single patch
 $ git show-branch HEAD @{-1}

That would give me something like this

    * [HEAD] automatically ban strcpy()
     ! [@{-1}] banned.h: mark strncpy() as banned
    --
    *  [HEAD] automatically ban strcpy()
     + [@{-1}] banned.h: mark strncpy() as banned
     + [@{-1}^] banned.h: mark sprintf() as banned
     + [@{-1}~2] banned.h: mark strcat() as banned
     + [@{-1}~3] automatically ban strcpy()
    -- [HEAD^] Merge branch 'sb/blame-color' into jk/banned-function

The most natural thing to do at this point is

 $ git cherry-pick -3 @{-1}

But we know range-pick is buggy and loses core.rewriteref, so
instead I did this, which I know carries the notes forward:

 $ git rebase --onto HEAD @{-1}~3 @{-1}^0

    Side note: The point of last "^0" is that I do not want to lose
    the topic branch jk/banned-functions not just yet.

If I need to re-apply separate pieces of the original history, it
becomes very painful to emulate these multiple cherry-picks with
multiple "rebase --onto", though.  That is where "the amount of work
is about the same" comes from.  If cherry-pick worked correctly,
selective replacement should be less work for me.

Anyway, we already preserved as many original commits, but unlike
1-a, did so manually when we decided to replace selective patches.
So there is no further rebasing with this approach.

2) I now have two diverged histories in HEAD and @{-1} that I can
compare with range-/tbdiff in either case: 

 $ git tbdiff ..@{-1} @{-1}..

After the usual inspection and testing, replacement is concluded by

 $ git checkout -B @{-1}

which takes me back to (an updated) jk/banned-functions topic.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/4] automatically ban strcpy()
  2018-07-26 17:33           ` Junio C Hamano
@ 2018-07-27  8:08             ` Jeff King
  2018-07-27 17:34               ` Junio C Hamano
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2018-07-27  8:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Sunshine, Git List, Stefan Beller

On Thu, Jul 26, 2018 at 10:33:27AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > So here's a replacement for just patch 1 (I'm assuming this creates less
> > work than re-posting them all, but it may not be if Junio prefers
> > dealing with a whole new mbox rather than a "rebase -i", "reset --hard
> > HEAD^", "git am" -- let me know if you'd prefer it the other way).
> 
> A single patch replacement that is clearly marked which one to
> replace and which other ones to keep, like you did here, is fine.
> The amount of work is about the same either way.
> 
> 0) I would first do these to make sure that I can replace:
> [..]

Thanks. As always, I find it interesting to see your workflows.

> 1-b) With a single patch replacement, it is quite different.
> 
>  $ git checkout HEAD~4	        ;# we are replacing 1/4 of the original
>  $ git am -s mbox               ;# that single patch
>  $ git show-branch HEAD @{-1}
> [...]
> The most natural thing to do at this point is
> 
>  $ git cherry-pick -3 @{-1}
> 
> But we know range-pick is buggy and loses core.rewriteref, so
> instead I did this, which I know carries the notes forward:
> 
>  $ git rebase --onto HEAD @{-1}~3 @{-1}^0

Interesting. I'd have probably done it with an interactive rebase:

  $ git rebase -i HEAD~4
  [change first "pick" to "edit"; after stopping...]
  $ git reset --hard HEAD^ ;# throw away patch 1
  $ git am -s mbox         ;# apply single patch
  $ git rebase --continue

Which is really the same thing, but "cheats" around the cherry-pick
problem by using rebase (which I think handles the rewriteref stuff
correctly even in interactive mode).

I guess if we wanted to be really fancy, just replacing the first "pick"
with "x git am -s mbox" would automate it. That might be handy for the
multi-patch case.

Anyway, thanks for handling it. :)

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/4] automatically ban strcpy()
  2018-07-27  8:08             ` Jeff King
@ 2018-07-27 17:34               ` Junio C Hamano
  2018-07-28  9:24                 ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2018-07-27 17:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Eric Sunshine, Git List, Stefan Beller

Jeff King <peff@peff.net> writes:

>>  $ git rebase --onto HEAD @{-1}~3 @{-1}^0
>
> Interesting. I'd have probably done it with an interactive rebase:
>
>   $ git rebase -i HEAD~4
>   [change first "pick" to "edit"; after stopping...]
>   $ git reset --hard HEAD^ ;# throw away patch 1
>   $ git am -s mbox         ;# apply single patch
>   $ git rebase --continue
>
> Which is really the same thing,...

I have unfounded fear for doing anything other than "edit", "commit
--amend", "rebase --continue", or "rebase --abort" during a "rebase
-i" session.  

Especiallly "reset --hard" with anything but HEAD.  I guess that is
because I do not fully trust/understand how the sequencer machinery
replays the remainder of todo tasks on top of the HEAD that is
different from what it left me to futz with when it relinquished the
control.

Also "am" during "rebase -i" is scary to me, as "am" also tries to
keep its own sequencing state.  Do you know if "rebase --continue"
would work correctly in the above sequence if "am" conflicted, I
gave up, and said "am --abort", for example?  I don't offhand know.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v3 1/4] automatically ban strcpy()
  2018-07-27 17:34               ` Junio C Hamano
@ 2018-07-28  9:24                 ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2018-07-28  9:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Sunshine, Git List, Stefan Beller

On Fri, Jul 27, 2018 at 10:34:20AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >>  $ git rebase --onto HEAD @{-1}~3 @{-1}^0
> >
> > Interesting. I'd have probably done it with an interactive rebase:
> >
> >   $ git rebase -i HEAD~4
> >   [change first "pick" to "edit"; after stopping...]
> >   $ git reset --hard HEAD^ ;# throw away patch 1
> >   $ git am -s mbox         ;# apply single patch
> >   $ git rebase --continue
> >
> > Which is really the same thing,...
> 
> I have unfounded fear for doing anything other than "edit", "commit
> --amend", "rebase --continue", or "rebase --abort" during a "rebase
> -i" session.  
> 
> Especiallly "reset --hard" with anything but HEAD.  I guess that is
> because I do not fully trust/understand how the sequencer machinery
> replays the remainder of todo tasks on top of the HEAD that is
> different from what it left me to futz with when it relinquished the
> control.

I jump around via "reset --hard" all the time during interactive
rebases. I don't recall having any issues, though that does not mean
there isn't a corner case lurking. :)

> Also "am" during "rebase -i" is scary to me, as "am" also tries to
> keep its own sequencing state.  Do you know if "rebase --continue"
> would work correctly in the above sequence if "am" conflicted, I
> gave up, and said "am --abort", for example?  I don't offhand know.

This one is trickier. I _assumed_ it would be fine to "am" during a
rebase, but I didn't actually try it (in fact, I rarely do anything
exotic with "am", since my main use is just test-applying patches on a
detached head).

It _seems_ to work with this example:

-- >8 --
git init repo
cd repo

for i in 1 2 3 4; do
  echo $i >file
  git add file
  git commit -m $i
done
git format-patch --stdout -1 >patch

git reset --hard HEAD^
GIT_EDITOR='perl -i -pe "/2$/ and s/pick/edit/"' git rebase -i HEAD~2
git am patch
git am --abort
-- 8< --

I think because "am" lives in $GIT_DIR/rebase-apply, and "rebase -i"
lives in ".git/rebase-merge". Of course "rebase" can use the
rebase-apply directory, but I think interactive-rebase never will.

So it works, but mostly by luck. :)

In my ideal world, operations like this that can be continued would be
stored in a stack, and each stack element would know its operation type.
So you could do:

  # push a rebase onto the stack
  git rebase foo

  # while stopped, you might do another operation which pushes onto the
  # stack
  git am ~/patch

  # aborting an operation (or finishing it naturally) pops it off the
  # stack; now we just have the rebase on the stack
  git am --abort

  # aborting an operation that's not at the top of the stack would
  # either be an error, or could auto-abort everybody on top
  git am ~/patch
  git rebase --abort ;# aborts the am, too!

  # you could even nest similar operations; by default we find the
  # top-most one in the stack, but you could refer to them by stack
  # position.
  #
  # put us in a rebase that stops at a conflict
  git rebase foo
  # oops, rewrite the last few commits as part of fixing the conflict
  git rebase -i HEAD~3
  # nope, let's abort the whole thing (stack level 0)
  git rebase --abort=0

  # it would also be nice to have generic commands to manipulate the
  # stack
  git op list ;# show the stack
  git op abort ;# abort the top operation, whatever it is
  git op continue ;# continue the top operation, whatever it is

I've hacked something similar to "git op continue" myself and find it
very useful, but:

  - it's intimately aware of all the possible operations, including some
    custom ones that I have. I wouldn't need to if each operation
    touched a well-known directory to push itself on the stack, and
    provided a few commands in the stack directory for things like "run
    this to abort me".

  - it sometimes behaves weirdly, because I "canceled" an operation with
    "reset" or "checkout", and later I expect to continue operation X,
    but find that some other operation Y was waiting. Having a stack
    means it's much easier to see which operations are still hanging
    around (we have this already with the prompt logic that shows things
    like rebases, but again, it has to be intimate with which operations
    are storing data in $GIT_DIR)

Anyway. That's something I've dreamed about for a while, but the thought
of retro-fitting the existing multi-command operations turned me off.
The current systems _usually_ works.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2018-07-28  9:24 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-19 20:33 [PATCH 0/2] fail compilation with strcpy Jeff King
2018-07-19 20:39 ` [PATCH 1/2] introduce "banned function" list Jeff King
2018-07-19 21:11   ` Eric Sunshine
2018-07-19 21:27     ` Jeff King
2018-07-19 21:59       ` Eric Sunshine
2018-07-20  0:55         ` Jeff King
2018-07-19 21:15   ` Stefan Beller
2018-07-19 21:32     ` Jeff King
2018-07-19 21:47       ` Stefan Beller
2018-07-20  0:54         ` Jeff King
2018-07-19 22:46   ` Junio C Hamano
2018-07-19 23:55     ` Randall S. Becker
2018-07-20  1:08     ` Jeff King
2018-07-20  1:12       ` Jeff King
2018-07-20  9:32       ` Junio C Hamano
2018-07-20 17:45         ` Jeff King
2018-07-20 13:22       ` Theodore Y. Ts'o
2018-07-20 17:56         ` Jeff King
2018-07-20 19:03           ` Junio C Hamano
2018-07-20 12:42   ` Derrick Stolee
2018-07-20 14:41   ` Duy Nguyen
2018-07-20 17:48     ` Elijah Newren
2018-07-20 18:04       ` Jeff King
2018-07-20 18:00     ` Jeff King
2018-07-19 20:39 ` [PATCH 2/2] banned.h: mark strncpy as banned Jeff King
2018-07-19 21:12   ` Ævar Arnfjörð Bjarmason
2018-07-19 21:33     ` Jeff King
2018-07-20 18:58 ` [PATCH 0/2] fail compilation with strcpy Junio C Hamano
2018-07-20 19:18   ` Jeff King
2018-07-20 21:50     ` Junio C Hamano
2018-07-24  9:23 ` [PATCH v2 0/4] " Jeff King
2018-07-24  9:26   ` [PATCH v2 1/4] automatically ban strcpy() Jeff King
2018-07-24 17:20     ` Eric Sunshine
2018-07-26  6:58       ` Jeff King
2018-07-26  7:21         ` [PATCH v3 " Jeff King
2018-07-26 17:33           ` Junio C Hamano
2018-07-27  8:08             ` Jeff King
2018-07-27 17:34               ` Junio C Hamano
2018-07-28  9:24                 ` Jeff King
2018-07-24  9:26   ` [PATCH v2 2/4] banned.h: mark strcat() as banned Jeff King
2018-07-24  9:27   ` [PATCH v2 3/4] banned.h: mark sprintf() " Jeff King
2018-07-24  9:28   ` [PATCH v2 4/4] banned.h: mark strncpy() " Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).