unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [discuss] iconv: what's the purpose of the mtrace in the tst-iconv2.c
@ 2019-12-14  8:44 liqingqing
  2020-01-02 10:54 ` question about regex liqingqing
  0 siblings, 1 reply; 9+ messages in thread
From: liqingqing @ 2019-12-14  8:44 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer, Carlos O'Donell; +Cc: Hushiyuan, Liusirui

I am working on the test of iconv feel confused  with this test case tst-iconv2.c,
because this function use mtrace, but i don't know what is the exactly purpose.
what i want to do is removing the mtrace function but i'm not sure.

static int
do_test (void)
{
  char buf[3];
  const wchar_t wc[1] = L"a";
  iconv_t cd;
  char *inptr;
  size_t inlen;
  char *outptr;
  size_t outlen;
  size_t n;
  int e;
  int result = 0;

  mtrace ();//does test case want test memory leak?

  cd = iconv_open ("UCS4", "WCHAR_T");
  if (cd == (iconv_t) -1)
    {
      printf ("cannot convert from wchar_t to UCS4: %m\n");
      exit (1);
    }

  inptr = (char *) wc;
  inlen = sizeof (wchar_t);
  outptr = buf;
  outlen = 3;

  n = iconv (cd, &inptr, &inlen, &outptr, &outlen);
  e = errno;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* question about regex
  2019-12-14  8:44 [discuss] iconv: what's the purpose of the mtrace in the tst-iconv2.c liqingqing
@ 2020-01-02 10:54 ` liqingqing
  2020-01-02 16:16   ` Tim Rühsen
  0 siblings, 1 reply; 9+ messages in thread
From: liqingqing @ 2020-01-02 10:54 UTC (permalink / raw)
  To: libc-alpha, Florian Weimer, Carlos O'Donell; +Cc: Hushiyuan, Liusirui

Hello Florian and all glibc developers.

I wonder ask you that do we have any plan or good ways to fix up the bug as below:


https://sourceware.org/bugzilla/show_bug.cgi?id=24269

Dhiraj 2019-02-26 06:24:20 UTC
While fuzzing the regex module via hongfuzz

$ echo D | grep -E "$(printf '(\0|)(\\1\\1)*')"
  bash: warning: command substitution: ignored null byte in input
  Segmentation fault (core dumped)

==6453== Process terminating with default action of signal 13 (SIGPIPE)
==6453==    at 0x4F4C154: write (write.c:27)
==6453==    by 0x4EC71BC: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1203)
==6453==    by 0x4EC8F50: new_do_write (fileops.c:457)
==6453==    by 0x4EC8F50: _IO_do_write@@GLIBC_2.2.5 (fileops.c:433)
==6453==    by 0x4EC6787: _IO_file_sync@@GLIBC_2.2.5 (fileops.c:813)
==6453==    by 0x4EBA87C: fflush (iofflush.c:40)
==6453==    by 0x10CE73: ??? (in /bin/echo)
==6453==    by 0x10C939: ??? (in /bin/echo)
==6453==    by 0x10A221: ??? (in /bin/echo)
==6453==    by 0x4E7F040: __run_exit_handlers (exit.c:108)
==6453==    by 0x4E7F139: exit (exit.c:139)
==6453==    by 0x4E5DB9D: (below main) (libc-start.c:344)
==6453==

OS: Linux ubuntu 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question about regex
  2020-01-02 10:54 ` question about regex liqingqing
@ 2020-01-02 16:16   ` Tim Rühsen
  2020-01-02 22:14     ` Paul Eggert
  0 siblings, 1 reply; 9+ messages in thread
From: Tim Rühsen @ 2020-01-02 16:16 UTC (permalink / raw)
  To: liqingqing, libc-alpha, Florian Weimer, Carlos O'Donell
  Cc: Hushiyuan, Liusirui


[-- Attachment #1.1: Type: text/plain, Size: 1634 bytes --]

On 1/2/20 11:54 AM, liqingqing wrote:
> Hello Florian and all glibc developers.
> 
> I wonder ask you that do we have any plan or good ways to fix up the bug as below:
> 
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=24269
> 
> Dhiraj 2019-02-26 06:24:20 UTC
> While fuzzing the regex module via hongfuzz
> 
> $ echo D | grep -E "$(printf '(\0|)(\\1\\1)*')"
>   bash: warning: command substitution: ignored null byte in input
>   Segmentation fault (core dumped)
> 
> ==6453== Process terminating with default action of signal 13 (SIGPIPE)
> ==6453==    at 0x4F4C154: write (write.c:27)
> ==6453==    by 0x4EC71BC: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1203)
> ==6453==    by 0x4EC8F50: new_do_write (fileops.c:457)
> ==6453==    by 0x4EC8F50: _IO_do_write@@GLIBC_2.2.5 (fileops.c:433)
> ==6453==    by 0x4EC6787: _IO_file_sync@@GLIBC_2.2.5 (fileops.c:813)
> ==6453==    by 0x4EBA87C: fflush (iofflush.c:40)
> ==6453==    by 0x10CE73: ??? (in /bin/echo)
> ==6453==    by 0x10C939: ??? (in /bin/echo)
> ==6453==    by 0x10A221: ??? (in /bin/echo)
> ==6453==    by 0x4E7F040: __run_exit_handlers (exit.c:108)
> ==6453==    by 0x4E7F139: exit (exit.c:139)
> ==6453==    by 0x4E5DB9D: (below main) (libc-start.c:344)
> ==6453==
> 
> OS: Linux ubuntu 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Meanwhile grep (or libc) seems to exit gracefully:

$ echo D | grep -E "$(printf '(\0|)(\\1\\1)*')"
bash: warning: command substitution: ignored null byte in input
grep: stack overflow

Here: Debian unstable, grep (GNU grep) 3.3

Regards, Tim


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question about regex
  2020-01-02 16:16   ` Tim Rühsen
@ 2020-01-02 22:14     ` Paul Eggert
  2020-01-03  8:09       ` liqingqing
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Eggert @ 2020-01-02 22:14 UTC (permalink / raw)
  To: Tim Rühsen, liqingqing
  Cc: libc-alpha, Florian Weimer, Carlos O'Donell, Hushiyuan,
	Liusirui

On 1/2/20 8:16 AM, Tim Rühsen wrote:
> Meanwhile grep (or libc) seems to exit gracefully:

Yes, there's no core dump if the operating system supports stack 
overflow detection that grep can use. The problem occurs only on OSes 
that don't do that, or on apps that don't try to detect stack overflow 
and simply dump core (or worse).

On 1/2/20 2:54 AM, liqingqing wrote:

> do we have any plan or good ways to fix up the bug as below

The best way would be to fix bug#24269, i.e., fix the glibc regex code 
so that it doesn't blow the stack. If you could write a patch for this 
bug (something that doesn't hurt performance for ordinary regexps), that 
would be welcome.

For that particular test case, you can use an OS that does proper stack 
overflow checking that grep can use.

PS. The next version of the grep manual is planned to nearly wash its 
hands of the matter. Here's the current draft:

----

Back-references can greatly slow down matching, as they can generate
exponentially many matching possibilities that can consume both time
and memory to explore.  Also, the POSIX specification for
back-references is at times unclear.  Furthermore, many regular
expression implementations have back-reference bugs that can cause
programs to return incorrect answers or even crash, and fixing these
bugs has often been low-priority: for example, as of 2020 the
@url{https://sourceware.org/bugzilla/,GNU C library bug database}
contained back-reference bugs
@url{https://sourceware.org/bugzilla/show_bug.cgi?id=52,,52},
@url{https://sourceware.org/bugzilla/show_bug.cgi?id=10844,,10844},
@url{https://sourceware.org/bugzilla/show_bug.cgi?id=11053,,11053},
@url{https://sourceware.org/bugzilla/show_bug.cgi?id=24269,,24269}
and @url{https://sourceware.org/bugzilla/show_bug.cgi?id=25322,,25322},
with little sign of forthcoming fixes.  Luckily,
back-references are rarely useful and it should be little trouble to
avoid them in practical applications.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question about regex
  2020-01-02 22:14     ` Paul Eggert
@ 2020-01-03  8:09       ` liqingqing
  2020-01-20 10:41         ` liqingqing
  0 siblings, 1 reply; 9+ messages in thread
From: liqingqing @ 2020-01-03  8:09 UTC (permalink / raw)
  To: Paul Eggert, Tim Rühsen
  Cc: libc-alpha, Florian Weimer, Carlos O'Donell, Hushiyuan,
	Liusirui



On 2020/1/3 6:14, Paul Eggert wrote:
> On 1/2/20 8:16 AM, Tim Rühsen wrote:
>> Meanwhile grep (or libc) seems to exit gracefully:
> 
> Yes, there's no core dump if the operating system supports stack overflow detection that grep can use. The problem occurs only on OSes that don't do that, or on apps that don't try to detect stack overflow and simply dump core (or worse).
> 
> On 1/2/20 2:54 AM, liqingqing wrote:
> 
>> do we have any plan or good ways to fix up the bug as below
> 
> The best way would be to fix bug#24269, i.e., fix the glibc regex code so that it doesn't blow the stack. If you could write a patch for this bug (something that doesn't hurt performance for ordinary regexps), that would be welcome.
> 
> For that particular test case, you can use an OS that does proper stack overflow checking that grep can use.

thank you Tim and Paul for your reply.  I used the "sed" command with some similar regulator expression and the  stack overflow can also be repeated.
[root@localhost liqingqing]# sed --version
sed (GNU sed) 4.5
echo A  | sed '/\(\)\(\1\1\)*/p'
Segmentation fault (core dumped)


and i use python to test, seems like the result is ok。i really want to fix this bug, but now i'm not knowing too match about this module.
so, expected for some guy's solution, thanks.

the python test:
[root@localhost liqingqing]# python
Python 2.7.15 (default, Jul 22 2019, 00:00:00)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>  m=re.match("()(\\1\\1)*", "A", 0)
  File "<stdin>", line 1
    m=re.match("()(\\1\\1)*", "A", 0)
    ^
IndentationError: unexpected indent
>>> import re
>>>  m=re.match("()(\\1\\1)*", "A", 0)
  File "<stdin>", line 1
    m=re.match("()(\\1\\1)*", "A", 0)
    ^
IndentationError: unexpected indent
>>> m=re.match("()(\\1\\1)*", "A", 0)
>>> m.group(0)
''
>>> m.group(1)
''
>>> m.group(2)
''
>>> m.group(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: no such group
>>> m=re.match("A", "A", 0)
>>> m.group(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: no such group
>>> m.group(0)
'A'


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question about regex
  2020-01-03  8:09       ` liqingqing
@ 2020-01-20 10:41         ` liqingqing
  2020-01-20 19:25           ` Paul Eggert
  0 siblings, 1 reply; 9+ messages in thread
From: liqingqing @ 2020-01-20 10:41 UTC (permalink / raw)
  To: Paul Eggert, Tim Rühsen
  Cc: libc-alpha, Florian Weimer, Carlos O'Donell, Hushiyuan,
	Liusirui



On 2020/1/3 16:09, liqingqing wrote:
> 
> 
> On 2020/1/3 6:14, Paul Eggert wrote:
>> On 1/2/20 8:16 AM, Tim Rühsen wrote:
>>> Meanwhile grep (or libc) seems to exit gracefully:
>>
>> Yes, there's no core dump if the operating system supports stack overflow detection that grep can use. The problem occurs only on OSes that don't do that, or on apps that don't try to detect stack overflow and simply dump core (or worse).
>>
>> On 1/2/20 2:54 AM, liqingqing wrote:
>>
>>> do we have any plan or good ways to fix up the bug as below
>>
>> The best way would be to fix bug#24269, i.e., fix the glibc regex code so that it doesn't blow the stack. If you could write a patch for this bug (something that doesn't hurt performance for ordinary regexps), that would be welcome.
>>

hello everyone, I have read the regex source code and found that it's not a easy work to fix bug#24269 by modify the regex algorithm.
a possible way is that add a backtrack limit to avoid infinite loop. I have tested this way and it works ok.
so what do you think about the bug#24269?  optimize the algorithm or add a backtrack-limit? or not fix.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question about regex
  2020-01-20 10:41         ` liqingqing
@ 2020-01-20 19:25           ` Paul Eggert
  2020-01-21  1:15             ` liqingqing
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Eggert @ 2020-01-20 19:25 UTC (permalink / raw)
  To: liqingqing, Tim Rühsen
  Cc: libc-alpha, Florian Weimer, Carlos O'Donell, Hushiyuan,
	Liusirui

On 1/20/20 2:41 AM, liqingqing wrote:
> it's not a easy work to fix bug#24269 by modify the regex algorithm.
> a possible way is that add a backtrack limit to avoid infinite loop.

If there's an arbitrary backtrack limit, won't that cause the code to mishandle 
some regular expressions? That is, wouldn't an arbitrary limit cause the code to 
incorrectly fail to match in some cases? In some sense that would be worse than 
looping.

Perhaps I am misunderstanding what you're proposing. If so, could you give more 
details about the proposal?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question about regex
  2020-01-20 19:25           ` Paul Eggert
@ 2020-01-21  1:15             ` liqingqing
  2020-01-21  8:57               ` Paul Eggert
  0 siblings, 1 reply; 9+ messages in thread
From: liqingqing @ 2020-01-21  1:15 UTC (permalink / raw)
  To: Paul Eggert, Tim Rühsen
  Cc: libc-alpha, Florian Weimer, Carlos O'Donell, Hushiyuan,
	Liusirui



On 2020/1/21 3:25, Paul Eggert wrote:
> On 1/20/20 2:41 AM, liqingqing wrote:
>> it's not a easy work to fix bug#24269 by modify the regex algorithm.
>> a possible way is that add a backtrack limit to avoid infinite loop.
> 
> If there's an arbitrary backtrack limit, won't that cause the code to mishandle some regular expressions? That is, wouldn't an arbitrary limit cause the code to incorrectly fail to match in some cases? In some sense that would be worse than looping.
> 
> Perhaps I am misunderstanding what you're proposing. If so, could you give more details about the proposal?
> 
> .

Hello Paul, thanks for your explanation. yes you are right and I totally agree with your opinion.
The proposal is that I want to find a way to fix the bug#24269(infinite loop bugs).
and I think it's very hard for me to totally fix it. So, I send this email and ask for all your suggestions.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: question about regex
  2020-01-21  1:15             ` liqingqing
@ 2020-01-21  8:57               ` Paul Eggert
  0 siblings, 0 replies; 9+ messages in thread
From: Paul Eggert @ 2020-01-21  8:57 UTC (permalink / raw)
  To: liqingqing, Tim Rühsen
  Cc: libc-alpha, Florian Weimer, Carlos O'Donell, Hushiyuan,
	Liusirui

On 1/20/20 5:15 PM, liqingqing wrote:
> The proposal is that I want to find a way to fix the bug#24269(infinite loop bugs).

Why is this urgent? Even if you fix this particular bug correctly (instead of 
merely putting in an arbitrary backtrack limit), there are lots of ways to make 
the regex code explode exponentially and I don't see the practical difference 
between doing that and looping forever.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-01-21  8:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-14  8:44 [discuss] iconv: what's the purpose of the mtrace in the tst-iconv2.c liqingqing
2020-01-02 10:54 ` question about regex liqingqing
2020-01-02 16:16   ` Tim Rühsen
2020-01-02 22:14     ` Paul Eggert
2020-01-03  8:09       ` liqingqing
2020-01-20 10:41         ` liqingqing
2020-01-20 19:25           ` Paul Eggert
2020-01-21  1:15             ` liqingqing
2020-01-21  8:57               ` Paul Eggert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).