sox-devel@lists.sourceforge.net unofficial mirror
 help / color / mirror / code / Atom feed
* [PATCH] formats: disallow seeking in dynamic memory buffers
@ 2021-10-06 18:44 Sven Neumann via SoX-devel
  2021-10-07  2:24 ` Sun Zhenliang
  0 siblings, 1 reply; 5+ messages in thread
From: Sven Neumann via SoX-devel @ 2021-10-06 18:44 UTC (permalink / raw)
  To: sox-devel@lists.sourceforge.net; +Cc: Sven Neumann

[-- Attachment #1: Type: text/plain, Size: 2852 bytes --]

Hi,

in one of our internal applications we are using SoX (the 14.4.2+git20190427 version from Ubuntu) to convert from a variety of audio formats to the WAV file format. We observed that the tests for the conversion occasionally failed and over the last days I found time to dig deeper into this.

We are using sox_open_memstream_write() to write to a dynamically allocated in-memory stream. In our tests sometimes the size of the resulting WAV buffer would have the expected size, sometimes it would be 44 bytes, the size of the WAV header. Valgrind told me that the behavior of is_seekable() in formats.c depends on uninitialized memory. In your git repository I found a fix for this:

commit bb38934e11035c8fab141f70dabda3afdd17da36
Author: Mans Rullgard <mans@mansr.com>
Date:   Tue Aug 4 17:19:49 2020 +0100

    format: improve is_seekable() test
    
    Streams opened with fmemopen() do not have an underlying file descriptor,
    so the fstat() will fail, and a random result is returned.
    
    A simpler method that works regardless of file type is to call fseek()
    and check if it reports success.
    
    Suggested by Stefan Sauer <ensonic@google.com>.


Now with this fix applied valgrind was happy, however now our conversion from MP3 to WAV would always result in only 44 bytes, as read from the buffer_size_ptr location passed to sox_open_memstream_write(). It turns out that with above change the undefined behavior is fixed for streams created with open_memstream() and is_seekable() will now reliably returns sox_true for such streams. This allows the WAV writer code to do an fseek() to the start of the stream followed by a write of the WAV header with correct length information. However such a seek followed by a write causes the dynamically allocated memory stream to be truncated. Thus after calling sox_close() the size reported for the stream will be 44 bytes, that's not what we want. Unfortunately we can not simply fix this by reporting the full buffer size as the buffer will actually have been truncated, and a trailing null byte is appended after the WAV header. It looks like we can indeed not seek and fix data in a dynamically allocated stream. Thus I am attaching a patch that changes the code in formats.c to set ft->seekable to false for streams opened with open_memstream(). With this change applied on top of the improvement for the is_seekable() test, our tests pass reliably and valgrind seems happy as well.

I am attaching the patch here, please consider it for inclusion. I am also attaching a simple test application that writes to a stream, seeks to the front and performs another write. The output of this program illustrates that the buffer is truncated:

  buf = `hello', size = 5
  buf = `hello, world', size = 12
  buf = `heyho', size = 5


Regards,
Sven



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-formats-disallow-seeking-in-dynamic-memory-buffers.patch --]
[-- Type: text/x-patch; name="0001-formats-disallow-seeking-in-dynamic-memory-buffers.patch", Size: 1008 bytes --]

From 9a90484d6c7e23ce709e5e34eec2aec62b6d4cbc Mon Sep 17 00:00:00 2001
From: Sven Neumann <sven.neumann@logmein.com>
Date: Wed, 6 Oct 2021 17:36:26 +0200
Subject: [PATCH] formats: disallow seeking in dynamic memory buffers

Seeking in a dynamic memory buffer stream as provided by
open_memstream() truncates the memory buffer. Seeking back to
the start of the file to write a header will leave the user
with just the header then.
---
 src/formats.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/formats.c b/src/formats.c
index 3fcf4382..45ca79ca 100644
--- a/src/formats.c
+++ b/src/formats.c
@@ -932,7 +932,8 @@ static sox_format_t * open_write(
       lsx_fail("Can't set write buffer");
       goto error;
     }
-    ft->seekable = is_seekable(ft);
+    /* Do not allow seeking in dynamic memory buffers as that would truncate the buffer. */
+    ft->seekable = (buffer_ptr && !buffer) ? sox_false : is_seekable(ft);
   }
 
   ft->filetype = lsx_strdup(filetype);
-- 
2.25.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: test.c --]
[-- Type: text/x-csrc; name="test.c", Size: 464 bytes --]

#include <stdio.h>

int
main (void)
{
  char *bp;
  size_t size;
  FILE *stream;

  stream = open_memstream (&bp, &size);
  fprintf (stream, "hello");
  fflush (stream);
  printf ("buf = `%s', size = %ld\n", bp, size);
  fprintf (stream, ", world");
  fflush (stream);
  printf ("buf = `%s', size = %ld\n", bp, size);
  fseek (stream, 0, SEEK_SET);
  fprintf (stream, "heyho");
  fclose (stream);
  printf ("buf = `%s', size = %ld\n", bp, size);
  
  return 0;
}


[-- Attachment #4: Type: text/plain, Size: 0 bytes --]



[-- Attachment #5: Type: text/plain, Size: 158 bytes --]

_______________________________________________
SoX-devel mailing list
SoX-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sox-devel

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] formats: disallow seeking in dynamic memory buffers
  2021-10-06 18:44 [PATCH] formats: disallow seeking in dynamic memory buffers Sven Neumann via SoX-devel
@ 2021-10-07  2:24 ` Sun Zhenliang
  2021-10-07  6:17   ` .Re: " Sven Neumann via SoX-devel
  0 siblings, 1 reply; 5+ messages in thread
From: Sun Zhenliang @ 2021-10-07  2:24 UTC (permalink / raw)
  To: sox-devel


[-- Attachment #1.1: Type: text/plain, Size: 3912 bytes --]

在 2021年10月7日 +0800 05:35,Sven Neumann via SoX-devel <sox-devel@lists.sourceforge.net>,写道:
> Hi,
>
> in one of our internal applications we are using SoX (the 14.4.2+git20190427 version from Ubuntu) to convert from a variety of audio formats to the WAV file format. We observed that the tests for the conversion occasionally failed and over the last days I found time to dig deeper into this.
>
> We are using sox_open_memstream_write() to write to a dynamically allocated in-memory stream. In our tests sometimes the size of the resulting WAV buffer would have the expected size, sometimes it would be 44 bytes, the size of the WAV header. Valgrind told me that the behavior of is_seekable() in formats.c depends on uninitialized memory. In your git repository I found a fix for this:
>
> commit bb38934e11035c8fab141f70dabda3afdd17da36
> Author: Mans Rullgard <mans@mansr.com>
> Date: Tue Aug 4 17:19:49 2020 +0100
>
> format: improve is_seekable() test
>
> Streams opened with fmemopen() do not have an underlying file descriptor,
> so the fstat() will fail, and a random result is returned.
>
> A simpler method that works regardless of file type is to call fseek()
> and check if it reports success.
>
> Suggested by Stefan Sauer <ensonic@google.com>.
>
>
> Now with this fix applied valgrind was happy, however now our conversion from MP3 to WAV would always result in only 44 bytes, as read from the buffer_size_ptr location passed to sox_open_memstream_write(). It turns out that with above change the undefined behavior is fixed for streams created with open_memstream() and is_seekable() will now reliably returns sox_true for such streams. This allows the WAV writer code to do an fseek() to the start of the stream followed by a write of the WAV header with correct length information. However such a seek followed by a write causes the dynamically allocated memory stream to be truncated. Thus after calling sox_close() the size reported for the stream will be 44 bytes, that's not what we want. Unfortunately we can not simply fix this by reporting the full buffer size as the buffer will actually have been truncated, and a trailing null byte is appended after the WAV header. It looks like we can indeed not seek and fix data in a dynamically allocated stream. Thus I am attaching a patch that changes the code in formats.c to set ft->seekable to false for streams opened with open_memstream(). With this change applied on top of the improvement for the is_seekable() test, our tests pass reliably and valgrind seems happy as well.
>
> I am attaching the patch here, please consider it for inclusion. I am also attaching a simple test application that writes to a stream, seeks to the front and performs another write. The output of this program illustrates that the buffer is truncated:
>
> buf = `hello', size = 5
> buf = `hello, world', size = 12
> buf = `heyho', size = 5
Bug mentioned https://sourceforge.net/p/sox/mailman/message/37325794/.

It is possible to fix the whole data. From my test to open_memstream(), the
dynamically allocated memory would not be free before the ft->fp was closed.
The data written is still there if it’s not covered by other write opt.

In this WAV issue, the seek opt is just in sox_close() to rewrite header, which
is the end of transcoding and will not cover the data area. We can just ftell()
to get the end of file before the fseek and seek back to the end after rewriting.

Output with ftell opt:
buf = `hello', size = 5
buf = `hello, world', size = 12
buf = `heyho, world', size = 12

As for other formats, the situation may be different from WAV and needs specific
analysis.

>
>
> Regards,
> Sven
>
>
>
>
> _______________________________________________
> SoX-devel mailing list
> SoX-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/sox-devel

[-- Attachment #1.2: Type: text/html, Size: 4789 bytes --]

[-- Attachment #2: test.c --]
[-- Type: application/octet-stream, Size: 533 bytes --]

#include <stdio.h>

int
main (void)
{
  char *bp;
  size_t size;
  FILE *stream;

  int end;

  stream = open_memstream (&bp, &size);
  fprintf (stream, "hello");
  fflush (stream);
  printf ("buf = `%s', size = %ld\n", bp, size);
  fprintf (stream, ", world");
  fflush (stream);
  printf ("buf = `%s', size = %ld\n", bp, size);

  end = ftell(stream);

  fseek (stream, 0, SEEK_SET);
  fprintf (stream, "heyho");

  fseek(stream, end, SEEK_SET);

  fclose (stream);
  printf ("buf = `%s', size = %ld\n", bp, size);

  return 0;
}


[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



[-- Attachment #4: Type: text/plain, Size: 158 bytes --]

_______________________________________________
SoX-devel mailing list
SoX-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sox-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* .Re: [PATCH] formats: disallow seeking in dynamic memory buffers
  2021-10-07  2:24 ` Sun Zhenliang
@ 2021-10-07  6:17   ` Sven Neumann via SoX-devel
  2021-10-07 12:22     ` Sun Zhenliang
  0 siblings, 1 reply; 5+ messages in thread
From: Sven Neumann via SoX-devel @ 2021-10-07  6:17 UTC (permalink / raw)
  To: sox-devel@lists.sourceforge.net; +Cc: Sven Neumann

[-- Attachment #1: Type: text/plain, Size: 5070 bytes --]

Hi,

thanks for pointing me to https://sourceforge.net/p/sox/mailman/message/37325794/. This is exactly the problem I am trying to solve.

I am somewhat surprised that we can actually recover the whole buffer after a seek and write because the memory stream maintains a null byte after the data. I was assuming that this means that we can not simply recover by seeking back to the end of the buffer because one byte would have been overwritten by that terminator byte. I tried what you suggested (code attached), and it seems that the terminating null byte is not moved on a seek operation, but only when the file descriptor is closed.

Of course it would be desirable to have a proper WAV header even for streams opened with sox_open_memstream_write(). I can prepare a fix that introduces a seek back, but I guess I would have to do something similar for all formats that seek in the output buffer. Before I start to work on that I would like to get approval from the maintainer that this is the right approach.


Regards,
Sven


From: Sun Zhenliang <hisunzhenliang@outlook.com>
Sent: 07 October 2021 04:24
To: sox-devel@lists.sourceforge.net <sox-devel@lists.sourceforge.net>
Subject: Re: [SoX-devel] [PATCH] formats: disallow seeking in dynamic memory buffers 
 
在 2021年10月7日 +0800 05:35,Sven Neumann via SoX-devel <sox-devel@lists.sourceforge.net>,写道:
Hi,

in one of our internal applications we are using SoX (the 14.4.2+git20190427 version from Ubuntu) to convert from a variety of audio formats to the WAV file format. We observed that the tests for the conversion occasionally failed and over the last days I found time to dig deeper into this.

We are using sox_open_memstream_write() to write to a dynamically allocated in-memory stream. In our tests sometimes the size of the resulting WAV buffer would have the expected size, sometimes it would be 44 bytes, the size of the WAV header. Valgrind told me that the behavior of is_seekable() in formats.c depends on uninitialized memory. In your git repository I found a fix for this:

commit bb38934e11035c8fab141f70dabda3afdd17da36
Author: Mans Rullgard <mans@mansr.com>
Date: Tue Aug 4 17:19:49 2020 +0100

format: improve is_seekable() test

Streams opened with fmemopen() do not have an underlying file descriptor,
so the fstat() will fail, and a random result is returned.

A simpler method that works regardless of file type is to call fseek()
and check if it reports success.

Suggested by Stefan Sauer <ensonic@google.com>.


Now with this fix applied valgrind was happy, however now our conversion from MP3 to WAV would always result in only 44 bytes, as read from the buffer_size_ptr location passed to sox_open_memstream_write(). It turns out that with above change the undefined behavior is fixed for streams created with open_memstream() and is_seekable() will now reliably returns sox_true for such streams. This allows the WAV writer code to do an fseek() to the start of the stream followed by a write of the WAV header with correct length information. However such a seek followed by a write causes the dynamically allocated memory stream to be truncated. Thus after calling sox_close() the size reported for the stream will be 44 bytes, that's not what we want. Unfortunately we can not simply fix this by reporting the full buffer size as the buffer will actually have been truncated, and a trailing null byte is appended after the WAV header. It looks like we can indeed not seek and fix data in a dynamically allocated stream. Thus I am attaching a patch that changes the code in formats.c to set ft->seekable to false for streams opened with open_memstream(). With this change applied on top of the improvement for the is_seekable() test, our tests pass reliably and valgrind seems happy as well.

I am attaching the patch here, please consider it for inclusion. I am also attaching a simple test application that writes to a stream, seeks to the front and performs another write. The output of this program illustrates that the buffer is truncated:

buf = `hello', size = 5
buf = `hello, world', size = 12
buf = `heyho', size = 5
Bug mentioned https://sourceforge.net/p/sox/mailman/message/37325794/.

It is possible to fix the whole data. From my test to open_memstream(), the 
dynamically allocated memory would not be free before the ft->fp was closed. 
The data written is still there if it’s not covered by other write opt. 

In this WAV issue, the seek opt is just in sox_close() to rewrite header, which 
is the end of transcoding and will not cover the data area. We can just ftell() 
to get the end of file before the fseek and seek back to the end after rewriting.

Output with ftell opt:
buf = `hello', size = 5
buf = `hello, world', size = 12
buf = `heyho, world', size = 12

As for other formats, the situation may be different from WAV and needs specific 
analysis.



Regards,
Sven




_______________________________________________
SoX-devel mailing list
SoX-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sox-devel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: test.c --]
[-- Type: text/x-csrc; name="test.c", Size: 611 bytes --]

#include <stdio.h>

int
main (void)
{
  char *bp;
  size_t size;
  off_t end;
  FILE *stream;

  stream = open_memstream (&bp, &size);
  fprintf (stream, "hello");
  fflush (stream);
  printf ("buf = `%s', size = %ld\n", bp, size);
  
  fprintf (stream, ", world");
  fflush (stream);
  printf ("buf = `%s', size = %ld\n", bp, size);
  
  end = ftell (stream);
  fseek (stream, 0, SEEK_SET);
  fprintf (stream, "heyho");
  fflush (stream);
  printf ("buf = `%s', size = %ld\n", bp, size);
  
  fseek (stream, end, SEEK_SET);
  fclose (stream);
  printf ("buf = `%s', size = %ld\n", bp, size);
  
  return 0;
}


[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



[-- Attachment #4: Type: text/plain, Size: 158 bytes --]

_______________________________________________
SoX-devel mailing list
SoX-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sox-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: .Re: [PATCH] formats: disallow seeking in dynamic memory buffers
  2021-10-07  6:17   ` .Re: " Sven Neumann via SoX-devel
@ 2021-10-07 12:22     ` Sun Zhenliang
  2021-10-07 13:20       ` Sven Neumann via SoX-devel
  0 siblings, 1 reply; 5+ messages in thread
From: Sun Zhenliang @ 2021-10-07 12:22 UTC (permalink / raw)
  To: sox-devel


[-- Attachment #1.1: Type: text/plain, Size: 5672 bytes --]

在 2021年10月7日 +0800 14:18,Sven Neumann via SoX-devel <sox-devel@lists.sourceforge.net>,写道:
> Hi,
>
> thanks for pointing me to https://sourceforge.net/p/sox/mailman/message/37325794/. This is exactly the problem I am trying to solve.
>
> I am somewhat surprised that we can actually recover the whole buffer after a seek and write because the memory stream maintains a null byte after the data. I was assuming that this means that we can not simply recover by seeking back to the end of the buffer because one byte would have been overwritten by that terminator byte. I tried what you suggested (code attached), and it seems that the terminating null byte is not moved on a seek operation, but only when the file descriptor is closed.
>
> Of course it would be desirable to have a proper WAV header even for streams opened with sox_open_memstream_write(). I can prepare a fix that introduces a seek back, but I guess I would have to do something similar for all formats that seek in the output buffer. Before I start to work on that I would like to get approval from the maintainer that this is the right approach.
Yes, I guess there are lots of works to do related to this seek opt.  Let me know if you need any help.

Regards,
SunZhenliang
>
>
> Regards,
> Sven
>
>
> From: Sun Zhenliang <hisunzhenliang@outlook.com>
> Sent: 07 October 2021 04:24
> To: sox-devel@lists.sourceforge.net <sox-devel@lists.sourceforge.net>
> Subject: Re: [SoX-devel] [PATCH] formats: disallow seeking in dynamic memory buffers
>
> 在 2021年10月7日 +0800 05:35,Sven Neumann via SoX-devel <sox-devel@lists.sourceforge.net>,写道:
> Hi,
>
> in one of our internal applications we are using SoX (the 14.4.2+git20190427 version from Ubuntu) to convert from a variety of audio formats to the WAV file format. We observed that the tests for the conversion occasionally failed and over the last days I found time to dig deeper into this.
>
> We are using sox_open_memstream_write() to write to a dynamically allocated in-memory stream. In our tests sometimes the size of the resulting WAV buffer would have the expected size, sometimes it would be 44 bytes, the size of the WAV header. Valgrind told me that the behavior of is_seekable() in formats.c depends on uninitialized memory. In your git repository I found a fix for this:
>
> commit bb38934e11035c8fab141f70dabda3afdd17da36
> Author: Mans Rullgard <mans@mansr.com>
> Date: Tue Aug 4 17:19:49 2020 +0100
>
> format: improve is_seekable() test
>
> Streams opened with fmemopen() do not have an underlying file descriptor,
> so the fstat() will fail, and a random result is returned.
>
> A simpler method that works regardless of file type is to call fseek()
> and check if it reports success.
>
> Suggested by Stefan Sauer <ensonic@google.com>.
>
>
> Now with this fix applied valgrind was happy, however now our conversion from MP3 to WAV would always result in only 44 bytes, as read from the buffer_size_ptr location passed to sox_open_memstream_write(). It turns out that with above change the undefined behavior is fixed for streams created with open_memstream() and is_seekable() will now reliably returns sox_true for such streams. This allows the WAV writer code to do an fseek() to the start of the stream followed by a write of the WAV header with correct length information. However such a seek followed by a write causes the dynamically allocated memory stream to be truncated. Thus after calling sox_close() the size reported for the stream will be 44 bytes, that's not what we want. Unfortunately we can not simply fix this by reporting the full buffer size as the buffer will actually have been truncated, and a trailing null byte is appended after the WAV header. It looks like we can indeed not seek and fix data in a dynamically allocated stream. Thus I am attaching a patch that changes the code in formats.c to set ft->seekable to false for streams opened with open_memstream(). With this change applied on top of the improvement for the is_seekable() test, our tests pass reliably and valgrind seems happy as well.
>
> I am attaching the patch here, please consider it for inclusion. I am also attaching a simple test application that writes to a stream, seeks to the front and performs another write. The output of this program illustrates that the buffer is truncated:
>
> buf = `hello', size = 5
> buf = `hello, world', size = 12
> buf = `heyho', size = 5
> Bug mentioned https://sourceforge.net/p/sox/mailman/message/37325794/.
>
> It is possible to fix the whole data. From my test to open_memstream(), the
> dynamically allocated memory would not be free before the ft->fp was closed.
> The data written is still there if it’s not covered by other write opt.
>
> In this WAV issue, the seek opt is just in sox_close() to rewrite header, which
> is the end of transcoding and will not cover the data area. We can just ftell()
> to get the end of file before the fseek and seek back to the end after rewriting.
>
> Output with ftell opt:
> buf = `hello', size = 5
> buf = `hello, world', size = 12
> buf = `heyho, world', size = 12
>
> As for other formats, the situation may be different from WAV and needs specific
> analysis.
>
>
>
> Regards,
> Sven
>
>
>
>
> _______________________________________________
> SoX-devel mailing list
> SoX-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/sox-devel
>
> _______________________________________________
> SoX-devel mailing list
> SoX-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/sox-devel

[-- Attachment #1.2: Type: text/html, Size: 6547 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: Type: text/plain, Size: 158 bytes --]

_______________________________________________
SoX-devel mailing list
SoX-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sox-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: .Re: [PATCH] formats: disallow seeking in dynamic memory buffers
  2021-10-07 12:22     ` Sun Zhenliang
@ 2021-10-07 13:20       ` Sven Neumann via SoX-devel
  0 siblings, 0 replies; 5+ messages in thread
From: Sven Neumann via SoX-devel @ 2021-10-07 13:20 UTC (permalink / raw)
  To: sox-devel@lists.sourceforge.net; +Cc: Sven Neumann


[-- Attachment #1.1: Type: text/plain, Size: 6877 bytes --]

Hi,

thanks for the feedback. Attached you find a patch for the seeking that occurs when the WAV header is rewritten. This fixes the reported problem for me and it has seen some testing, mostly with dynamic memory streams though.

I should be able to find time to look at other places that might need a similar fix.


Regards,
Sven


________________________________
From: Sun Zhenliang <hisunzhenliang@outlook.com>
Sent: 07 October 2021 14:22
To: sox-devel@lists.sourceforge.net <sox-devel@lists.sourceforge.net>
Subject: Re: [SoX-devel] .Re: [PATCH] formats: disallow seeking in dynamic memory buffers

在 2021年10月7日 +0800 14:18,Sven Neumann via SoX-devel <sox-devel@lists.sourceforge.net>,写道:
Hi,

thanks for pointing me to https://sourceforge.net/p/sox/mailman/message/37325794/<https://urldefense.com/v3/__https://sourceforge.net/p/sox/mailman/message/37325794/__;!!OA8L0MA-!shZZ0FsVq0F9l-rCU2Rs5tb3a6ZMpc9T5GrKQyIPeMAmADELDfUqDjZGdc51LFq0oA$>. This is exactly the problem I am trying to solve.

I am somewhat surprised that we can actually recover the whole buffer after a seek and write because the memory stream maintains a null byte after the data. I was assuming that this means that we can not simply recover by seeking back to the end of the buffer because one byte would have been overwritten by that terminator byte. I tried what you suggested (code attached), and it seems that the terminating null byte is not moved on a seek operation, but only when the file descriptor is closed.

Of course it would be desirable to have a proper WAV header even for streams opened with sox_open_memstream_write(). I can prepare a fix that introduces a seek back, but I guess I would have to do something similar for all formats that seek in the output buffer. Before I start to work on that I would like to get approval from the maintainer that this is the right approach.
Yes, I guess there are lots of works to do related to this seek opt.  Let me know if you need any help.

Regards,
SunZhenliang


Regards,
Sven


From: Sun Zhenliang <hisunzhenliang@outlook.com>
Sent: 07 October 2021 04:24
To: sox-devel@lists.sourceforge.net <sox-devel@lists.sourceforge.net>
Subject: Re: [SoX-devel] [PATCH] formats: disallow seeking in dynamic memory buffers

在 2021年10月7日 +0800 05:35,Sven Neumann via SoX-devel <sox-devel@lists.sourceforge.net>,写道:
Hi,

in one of our internal applications we are using SoX (the 14.4.2+git20190427 version from Ubuntu) to convert from a variety of audio formats to the WAV file format. We observed that the tests for the conversion occasionally failed and over the last days I found time to dig deeper into this.

We are using sox_open_memstream_write() to write to a dynamically allocated in-memory stream. In our tests sometimes the size of the resulting WAV buffer would have the expected size, sometimes it would be 44 bytes, the size of the WAV header. Valgrind told me that the behavior of is_seekable() in formats.c depends on uninitialized memory. In your git repository I found a fix for this:

commit bb38934e11035c8fab141f70dabda3afdd17da36
Author: Mans Rullgard <mans@mansr.com>
Date: Tue Aug 4 17:19:49 2020 +0100

format: improve is_seekable() test

Streams opened with fmemopen() do not have an underlying file descriptor,
so the fstat() will fail, and a random result is returned.

A simpler method that works regardless of file type is to call fseek()
and check if it reports success.

Suggested by Stefan Sauer <ensonic@google.com>.


Now with this fix applied valgrind was happy, however now our conversion from MP3 to WAV would always result in only 44 bytes, as read from the buffer_size_ptr location passed to sox_open_memstream_write(). It turns out that with above change the undefined behavior is fixed for streams created with open_memstream() and is_seekable() will now reliably returns sox_true for such streams. This allows the WAV writer code to do an fseek() to the start of the stream followed by a write of the WAV header with correct length information. However such a seek followed by a write causes the dynamically allocated memory stream to be truncated. Thus after calling sox_close() the size reported for the stream will be 44 bytes, that's not what we want. Unfortunately we can not simply fix this by reporting the full buffer size as the buffer will actually have been truncated, and a trailing null byte is appended after the WAV header. It looks like we can indeed not seek and fix data in a dynamically allocated stream. Thus I am attaching a patch that changes the code in formats.c to set ft->seekable to false for streams opened with open_memstream(). With this change applied on top of the improvement for the is_seekable() test, our tests pass reliably and valgrind seems happy as well.

I am attaching the patch here, please consider it for inclusion. I am also attaching a simple test application that writes to a stream, seeks to the front and performs another write. The output of this program illustrates that the buffer is truncated:

buf = `hello', size = 5
buf = `hello, world', size = 12
buf = `heyho', size = 5
Bug mentioned https://sourceforge.net/p/sox/mailman/message/37325794/<https://urldefense.com/v3/__https://sourceforge.net/p/sox/mailman/message/37325794/__;!!OA8L0MA-!shZZ0FsVq0F9l-rCU2Rs5tb3a6ZMpc9T5GrKQyIPeMAmADELDfUqDjZGdc51LFq0oA$>.

It is possible to fix the whole data. From my test to open_memstream(), the
dynamically allocated memory would not be free before the ft->fp was closed.
The data written is still there if it’s not covered by other write opt.

In this WAV issue, the seek opt is just in sox_close() to rewrite header, which
is the end of transcoding and will not cover the data area. We can just ftell()
to get the end of file before the fseek and seek back to the end after rewriting.

Output with ftell opt:
buf = `hello', size = 5
buf = `hello, world', size = 12
buf = `heyho, world', size = 12

As for other formats, the situation may be different from WAV and needs specific
analysis.



Regards,
Sven




_______________________________________________
SoX-devel mailing list
SoX-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sox-devel<https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/sox-devel__;!!OA8L0MA-!shZZ0FsVq0F9l-rCU2Rs5tb3a6ZMpc9T5GrKQyIPeMAmADELDfUqDjZGdc4ivaLrLA$>

_______________________________________________
SoX-devel mailing list
SoX-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sox-devel<https://urldefense.com/v3/__https://lists.sourceforge.net/lists/listinfo/sox-devel__;!!OA8L0MA-!shZZ0FsVq0F9l-rCU2Rs5tb3a6ZMpc9T5GrKQyIPeMAmADELDfUqDjZGdc4ivaLrLA$>

[-- Attachment #1.2: Type: text/html, Size: 9524 bytes --]

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-wav-seek-back-to-end-of-file-after-rewriting-the-hea.patch --]
[-- Type: text/x-patch; name="0001-wav-seek-back-to-end-of-file-after-rewriting-the-hea.patch", Size: 1554 bytes --]

From f8fcfe9c9a48a277407e534a3b5d3cf152ac1b9e Mon Sep 17 00:00:00 2001
From: Sven Neumann <sven.neumann@logmein.com>
Date: Thu, 7 Oct 2021 08:18:00 +0200
Subject: [PATCH] wav: seek back to end of file after rewriting the header

This fixes behavior for output streams created with
sox_open_memstream_write(). Without seeking back the buffer would
be truncated to just the header.
---
 src/wav.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/src/wav.c b/src/wav.c
index 3f6beb45..c31ab9f8 100644
--- a/src/wav.c
+++ b/src/wav.c
@@ -1570,13 +1570,25 @@ static int stopwrite(sox_format_t * ft)
         if (!ft->seekable)
           return SOX_EOF;
 
+        off_t end = lsx_tell(ft);
         if (lsx_seeki(ft, (off_t)0, SEEK_SET) != 0)
         {
-                lsx_fail_errno(ft,SOX_EOF,"Can't rewind output file to rewrite .wav header.");
-                return SOX_EOF;
+            lsx_fail_errno(ft, SOX_EOF, "Can't rewind output file to rewrite .wav header.");
+            return SOX_EOF;
+        }
+
+        int result = wavwritehdr(ft, 1);
+        if (result != SOX_SUCCESS)
+            return result;
+
+        /* Seek back to end of stream as a dynamic memory streams would truncate otherwise. */
+        if (lsx_seeki(ft, end, SEEK_SET) != 0)
+        {
+            lsx_fail_errno(ft, SOX_EOF, "Can't seek to end of output file after rewriting .wav header.");
+            return SOX_EOF;
         }
 
-        return (wavwritehdr(ft, 1));
+        return SOX_SUCCESS;
 }
 
 /*
-- 
2.25.1


[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



[-- Attachment #4: Type: text/plain, Size: 158 bytes --]

_______________________________________________
SoX-devel mailing list
SoX-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sox-devel

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-07 14:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-06 18:44 [PATCH] formats: disallow seeking in dynamic memory buffers Sven Neumann via SoX-devel
2021-10-07  2:24 ` Sun Zhenliang
2021-10-07  6:17   ` .Re: " Sven Neumann via SoX-devel
2021-10-07 12:22     ` Sun Zhenliang
2021-10-07 13:20       ` Sven Neumann via SoX-devel

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/sox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).