git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject
@ 2024-01-08 17:34 mohitmarathe
  2024-01-09  9:56 ` Christian Couder
  0 siblings, 1 reply; 8+ messages in thread
From: mohitmarathe @ 2024-01-08 17:34 UTC (permalink / raw
  To: git@vger.kernel.org
  Cc: gitster@pobox.com, britton.kerin@gmail.com, peff@peff.net

Hello,

I'm Mohit, an undergrad from India, and I want to start contributing to the Git project. I have already built Git from source and finished `git psuh` tutorial. And I must say the "Hacking Git" documentation is great (very detailed and maybe exhaustive) and easy to follow. So I also read the topic on "microprojects", and while searching for one, I came across this message: https://public-inbox.org/git/xmqqjzpjsbjl.fsf@gitster.g/.
I want to work on this task (if it is not taken up already) as a microproject for GSoC.

Approach:
From what I understood, the idea is to *not* allow non-integer characters in the arguments by printing an error message. So we have to replace `atoi` with `strtol_i`, like it is done in this patch: https://public-inbox.org/git/xmqq5y181fx0.fsf_-_@gitster.g/.
There are some places where we want to continue to parse after the end of the integer (where `strspn` is used as mentioned by Junio), and based on Junio's suggestion, a new helper can be created like this:

> static inline int strtol_i2(char const *s, int base, int *result, char **endp)
> {
> 	long ul;
>         char *dummy = NULL;
> 
>         if (!endp)
> 		endp = &dummy;
> 	errno = 0;
> 	ul = strtol(s, &endp, base);
> 	if (errno ||
> 	    /*
> 	     * if we are told to parse to the end of the string by
> 	     * passing NULL to endp, it is an error to have any
> 	     * remaining character after the digits.
> 	     */
>             (dummy && *dummy) ||
> 	    endp == s || (int) ul != ul)
> 		return -1;
> 	*result = ul;
> 	return 0;
> }


So I searched for all the occurrences of `atoi(` (as suggested by Junio), and I found only two instances (both in `builtin/patch-id.c`) where it is followed by `strspn`. Is it safe to assume that this is the only place where we cannot directly replace `atoi` with `strtol_i`, or should I keep digging?

Also, this seems like a large change due to the number of files involved, while the documentation about the microproject emphasizes keeping the change small. Does it mean a small change per commit or a small change per Pull Request?

Thanks!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject
  2024-01-08 17:34 [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject mohitmarathe
@ 2024-01-09  9:56 ` Christian Couder
  2024-01-10 17:38   ` Mohit Marathe
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Couder @ 2024-01-09  9:56 UTC (permalink / raw
  To: mohitmarathe
  Cc: git@vger.kernel.org, gitster@pobox.com, britton.kerin@gmail.com,
	peff@peff.net

Hi,

On Mon, Jan 8, 2024 at 6:38 PM <mohitmarathe@proton.me> wrote:
>
> Hello,
>
> I'm Mohit, an undergrad from India, and I want to start contributing to the Git project. I have already built Git from source and finished `git psuh` tutorial. And I must say the "Hacking Git" documentation is great (very detailed and maybe exhaustive) and easy to follow. So I also read the topic on "microprojects", and while searching for one, I came across this message: https://public-inbox.org/git/xmqqjzpjsbjl.fsf@gitster.g/.
> I want to work on this task (if it is not taken up already) as a microproject for GSoC.

Thanks for your interest in Git and the GSoC!

> Approach:
> From what I understood, the idea is to *not* allow non-integer characters in the arguments by printing an error message. So we have to replace `atoi` with `strtol_i`, like it is done in this patch: https://public-inbox.org/git/xmqq5y181fx0.fsf_-_@gitster.g/.
> There are some places where we want to continue to parse after the end of the integer (where `strspn` is used as mentioned by Junio), and based on Junio's suggestion, a new helper can be created like this:
>
> > static inline int strtol_i2(char const *s, int base, int *result, char **endp)
> > {
> >       long ul;
> >         char *dummy = NULL;
> >
> >         if (!endp)
> >               endp = &dummy;
> >       errno = 0;
> >       ul = strtol(s, &endp, base);
> >       if (errno ||
> >           /*
> >            * if we are told to parse to the end of the string by
> >            * passing NULL to endp, it is an error to have any
> >            * remaining character after the digits.
> >            */
> >             (dummy && *dummy) ||
> >           endp == s || (int) ul != ul)
> >               return -1;
> >       *result = ul;
> >       return 0;
> > }
>
>
> So I searched for all the occurrences of `atoi(` (as suggested by Junio), and I found only two instances (both in `builtin/patch-id.c`) where it is followed by `strspn`. Is it safe to assume that this is the only place where we cannot directly replace `atoi` with `strtol_i`, or should I keep digging?

In https://public-inbox.org/git/xmqqjzpjsbjl.fsf@gitster.g/ Junio says:

"Some places use atoi() immediately followed by strspn() to skip over
digits, which means they are parsing an integer and want to continue
reading after the integer, which is incompatible with what
strtol_i() wants to do.  They need either a separate helper or an
updated strtol_i() that optionally allows you to parse the prefix
and report where the integer ended, e.g., something like:"

and then he suggests the above helper.

So it seems that the two instances you found look like good places
where Junio says the new helper could be useful.

Now if you want to continue further on this, I think you would need to
take a closer look at those two instances to see if replacing atoi()
there with the new helper would improve something there or not. If you
find it would improve something, be sure to explain what would be
improved in the commit message.

> Also, this seems like a large change due to the number of files involved, while the documentation about the microproject emphasizes keeping the change small. Does it mean a small change per commit or a small change per Pull Request?

I think that adding a new helper in one .c file and its declaration in
the corresponding .h file, and then using it in another .c file would
change around 3 files and would be Ok size wise for a microproject.

Thanks.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject
  2024-01-09  9:56 ` Christian Couder
@ 2024-01-10 17:38   ` Mohit Marathe
  2024-01-10 17:55     ` rsbecker
  2024-01-10 18:44     ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Mohit Marathe @ 2024-01-10 17:38 UTC (permalink / raw
  To: Christian Couder
  Cc: git@vger.kernel.org, gitster@pobox.com, britton.kerin@gmail.com,
	peff@peff.net

>In https://public-inbox.org/git/xmqqjzpjsbjl.fsf@gitster.g/ Junio says:
>
>"Some places use atoi() immediately followed by strspn() to skip over
>digits, which means they are parsing an integer and want to continue
>reading after the integer, which is incompatible with what
>strtol_i() wants to do.  They need either a separate helper or an
>updated strtol_i() that optionally allows you to parse the prefix
>and report where the integer ended, e.g., something like:"
>
>and then he suggests the above helper.
>
>So it seems that the two instances you found look like good places
>where Junio says the new helper could be useful.
>
>Now if you want to continue further on this, I think you would need to
>take a closer look at those two instances to see if replacing atoi()
>there with the new helper would improve something there or not. If you
>find it would improve something, be sure to explain what would be
>improved in the commit message.

I took a closer look at `builtin/patch-id.c` and it seems replacing 
`atoi()` (which is used to parse numbers in the hunk header) wouldn't
improve anything, unless I'm missing something.

So then I tried finding other places where `atoi()` can be replaced
but I find it difficult to find any reason that would justify the change.
So far I've only looked at few of the MANY occurrences of `atoi()`.
As far as I understand, the only advantage of `strtol_i()`
over `atoi()` is better error handling. And most of places I've seen either
already takes care of that or does not need that at all.

Thanks!


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject
  2024-01-10 17:38   ` Mohit Marathe
@ 2024-01-10 17:55     ` rsbecker
  2024-01-10 18:46       ` Junio C Hamano
  2024-01-10 18:44     ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: rsbecker @ 2024-01-10 17:55 UTC (permalink / raw
  To: 'Mohit Marathe', 'Christian Couder'
  Cc: git, gitster, britton.kerin, peff

On Wednesday, January 10, 2024 12:38 PM, Mohit Marathe wrote:
>>In https://public-inbox.org/git/xmqqjzpjsbjl.fsf@gitster.g/ Junio says:
>>
>>"Some places use atoi() immediately followed by strspn() to skip over
>>digits, which means they are parsing an integer and want to continue
>>reading after the integer, which is incompatible with what
>>strtol_i() wants to do.  They need either a separate helper or an
>>updated strtol_i() that optionally allows you to parse the prefix and
>>report where the integer ended, e.g., something like:"
>>
>>and then he suggests the above helper.
>>
>>So it seems that the two instances you found look like good places
>>where Junio says the new helper could be useful.
>>
>>Now if you want to continue further on this, I think you would need to
>>take a closer look at those two instances to see if replacing atoi()
>>there with the new helper would improve something there or not. If you
>>find it would improve something, be sure to explain what would be
>>improved in the commit message.
>
>I took a closer look at `builtin/patch-id.c` and it seems replacing `atoi()` (which is
>used to parse numbers in the hunk header) wouldn't improve anything, unless I'm
>missing something.
>
>So then I tried finding other places where `atoi()` can be replaced but I find it
>difficult to find any reason that would justify the change.
>So far I've only looked at few of the MANY occurrences of `atoi()`.
>As far as I understand, the only advantage of `strtol_i()` over `atoi()` is better error
>handling. And most of places I've seen either already takes care of that or does not
>need that at all.

I am not sure this is a good idea. The error detection inside strtol_i() reports a -1 if the supplied text value is invalid. This does not differentiate between an invalid value and a valid "-1" supplied. Replacing all instances of atoi() with strtol_i() will likely cause breakages as the error semantics are different between the two.
--Randall



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject
  2024-01-10 17:38   ` Mohit Marathe
  2024-01-10 17:55     ` rsbecker
@ 2024-01-10 18:44     ` Junio C Hamano
  2024-01-12 13:04       ` Mohit Marathe
  2024-01-17  5:28       ` Mohit Marathe
  1 sibling, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2024-01-10 18:44 UTC (permalink / raw
  To: Mohit Marathe
  Cc: Christian Couder, git@vger.kernel.org, britton.kerin@gmail.com,
	peff@peff.net

Mohit Marathe <mohitmarathe@proton.me> writes:

> I took a closer look at `builtin/patch-id.c` and it seems replacing 
> `atoi()` (which is used to parse numbers in the hunk header) wouldn't
> improve anything, unless I'm missing something.

You can of course notice an invalid patch that places non-digit to
the hunk header and error out with such a change.  If we are reading
output from Git that we are invoking, hopefully we will not see such
an invalid patch, but the command is designed to read arbitrary
input like a patch e-mail you received over the network, so I do not
think it is fair to say there is no merit to such a change, even
though I agree that it may not matter all that much.

A corrupt patch may be getting a nonsense patch-ID with the current
code and hopefully is not matching other patches that are not
corrupt, but with such a change, a corrupt patch may not be getting
any patch-ID and a loop that computes patch-ID for many files and
try to match them up might need to be rewritten to take the new
failure case into account.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject
  2024-01-10 17:55     ` rsbecker
@ 2024-01-10 18:46       ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2024-01-10 18:46 UTC (permalink / raw
  To: rsbecker
  Cc: 'Mohit Marathe', 'Christian Couder', git,
	britton.kerin, peff

<rsbecker@nexbridge.com> writes:

> I am not sure this is a good idea. The error detection inside strtol_i() reports a -1 if the supplied text value is invalid. This does not differentiate between an invalid value and a valid "-1" supplied. Replacing all instances of atoi() with strtol_i() will likely cause breakages as the error semantics are different between the two.

atoi() and strtol_i() have totally different function signatures, so
it won't be a straight replacement.  The report of "-1" you mention
is only about strtol_i() reporting "have we successfully read out a
number [yes/no]?".  The value we parsed goes to a separate location,
so there is no risk of confusion as long as the caller knows what it
is doing.

Thanks.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject
  2024-01-10 18:44     ` Junio C Hamano
@ 2024-01-12 13:04       ` Mohit Marathe
  2024-01-17  5:28       ` Mohit Marathe
  1 sibling, 0 replies; 8+ messages in thread
From: Mohit Marathe @ 2024-01-12 13:04 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Christian Couder, git@vger.kernel.org, britton.kerin@gmail.com,
	peff@peff.net

 > > I took a closer look at `builtin/patch-id.c` and it seems replacing
> > `atoi()` (which is used to parse numbers in the hunk header) wouldn't
> > improve anything, unless I'm missing something.
> 
> 
> You can of course notice an invalid patch that places non-digit to
> the hunk header and error out with such a change. If we are reading
> output from Git that we are invoking, hopefully we will not see such
> an invalid patch, but the command is designed to read arbitrary
> input like a patch e-mail you received over the network, so I do not
> think it is fair to say there is no merit to such a change, even
> though I agree that it may not matter all that much.
> 
> A corrupt patch may be getting a nonsense patch-ID with the current
> code and hopefully is not matching other patches that are not
> corrupt, but with such a change, a corrupt patch may not be getting
> any patch-ID and a loop that computes patch-ID for many files and
> try to match them up might need to be rewritten to take the new
> failure case into account.

Oh, I didn't know corrupt patches were a thing. I will start working 
on the patch for this.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject
  2024-01-10 18:44     ` Junio C Hamano
  2024-01-12 13:04       ` Mohit Marathe
@ 2024-01-17  5:28       ` Mohit Marathe
  1 sibling, 0 replies; 8+ messages in thread
From: Mohit Marathe @ 2024-01-17  5:28 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Christian Couder, git@vger.kernel.org, britton.kerin@gmail.com,
	peff@peff.net

> A corrupt patch may be getting a nonsense patch-ID with the current
> code and hopefully is not matching other patches that are not
> corrupt, but with such a change, a corrupt patch may not be getting
> any patch-ID and a loop that computes patch-ID for many files and
> try to match them up might need to be rewritten to take the new
> failure case into account.

Are you talking about the loop in `get_one_patchid()`? And how should
the failure case be handled? Should it give a warning or an error?


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-01-17  5:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-08 17:34 [GSoC][RFC] Replace use of atoi() with strtol_i(), as a microproject mohitmarathe
2024-01-09  9:56 ` Christian Couder
2024-01-10 17:38   ` Mohit Marathe
2024-01-10 17:55     ` rsbecker
2024-01-10 18:46       ` Junio C Hamano
2024-01-10 18:44     ` Junio C Hamano
2024-01-12 13:04       ` Mohit Marathe
2024-01-17  5:28       ` Mohit Marathe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).