git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X
@ 2013-08-17 12:40 Steffen Prohaska
  2013-08-17 15:27 ` John Keeping
                   ` (5 more replies)
  0 siblings, 6 replies; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-17 12:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Steffen Prohaska

Previously, filtering more than 2GB through an external filter (see
test) failed on Mac OS X 10.8.4 (12E55) with:

    error: read from external filter cat failed
    error: cannot feed the input to external filter cat
    error: cat died of signal 13
    error: external filter cat failed 141
    error: external filter cat failed

The reason is that read() immediately returns with EINVAL if len >= 2GB.
I haven't found any information under which specific conditions this
occurs.  My suspicion is that it happens when reading from a pipe, while
reading from a standard file should always be fine.  I haven't tested
any other version of Mac OS X, though I'd expect that other versions are
affected as well.

The problem is fixed by always reading less than 2GB in xread().
xread() doesn't guarantee to read all the requested data at once, and
callers are expected to gracefully handle partial reads.  Slicing large
reads into 2GB pieces should not hurt practical performance.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 t/t0021-conversion.sh | 9 +++++++++
 wrapper.c             | 8 ++++++++
 2 files changed, 17 insertions(+)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..aec1253 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,13 @@ test_expect_success 'required filter clean failure' '
 	test_must_fail git add test.fc
 '
 
+test_expect_success 'filter large file' '
+	git config filter.largefile.smudge cat &&
+	git config filter.largefile.clean cat &&
+	dd if=/dev/zero of=2GB count=2097152 bs=1024 &&
+	echo "/2GB filter=largefile" >.gitattributes &&
+	git add 2GB 2>err &&
+	! grep -q "error" err
+'
+
 test_done
diff --git a/wrapper.c b/wrapper.c
index 6a015de..2a2f496 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len)
 {
 	ssize_t nr;
 	while (1) {
+#ifdef __APPLE__
+		const size_t twoGB = (1l << 31);
+		/* len >= 2GB immediately fails on Mac OS X with EINVAL when
+		 * reading from pipe. */
+		if (len >= twoGB) {
+			len = twoGB - 1;
+		}
+#endif
 		nr = read(fd, buf, len);
 		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
 			continue;
-- 
1.8.4.rc3.5.gfcb973a

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X
  2013-08-17 12:40 [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X Steffen Prohaska
@ 2013-08-17 15:27 ` John Keeping
  2013-08-17 15:56 ` Torsten Bögershausen
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 37+ messages in thread
From: John Keeping @ 2013-08-17 15:27 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: Junio C Hamano, git

On Sat, Aug 17, 2013 at 02:40:05PM +0200, Steffen Prohaska wrote:
> Previously, filtering more than 2GB through an external filter (see
> test) failed on Mac OS X 10.8.4 (12E55) with:
> 
>     error: read from external filter cat failed
>     error: cannot feed the input to external filter cat
>     error: cat died of signal 13
>     error: external filter cat failed 141
>     error: external filter cat failed
> 
> The reason is that read() immediately returns with EINVAL if len >= 2GB.
> I haven't found any information under which specific conditions this
> occurs.  My suspicion is that it happens when reading from a pipe, while
> reading from a standard file should always be fine.  I haven't tested
> any other version of Mac OS X, though I'd expect that other versions are
> affected as well.
> 
> The problem is fixed by always reading less than 2GB in xread().
> xread() doesn't guarantee to read all the requested data at once, and
> callers are expected to gracefully handle partial reads.  Slicing large
> reads into 2GB pieces should not hurt practical performance.
> 
> Signed-off-by: Steffen Prohaska <prohaska@zib.de>
> ---
>  t/t0021-conversion.sh | 9 +++++++++
>  wrapper.c             | 8 ++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> index e50f0f7..aec1253 100755
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -190,4 +190,13 @@ test_expect_success 'required filter clean failure' '
>  	test_must_fail git add test.fc
>  '
>  
> +test_expect_success 'filter large file' '
> +	git config filter.largefile.smudge cat &&
> +	git config filter.largefile.clean cat &&
> +	dd if=/dev/zero of=2GB count=2097152 bs=1024 &&
> +	echo "/2GB filter=largefile" >.gitattributes &&
> +	git add 2GB 2>err &&
> +	! grep -q "error" err
> +'
> +
>  test_done
> diff --git a/wrapper.c b/wrapper.c
> index 6a015de..2a2f496 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len)
>  {
>  	ssize_t nr;
>  	while (1) {
> +#ifdef __APPLE__
> +		const size_t twoGB = (1l << 31);
> +		/* len >= 2GB immediately fails on Mac OS X with EINVAL when
> +		 * reading from pipe. */
> +		if (len >= twoGB) {
> +			len = twoGB - 1;
> +		}

Please don't use unnecessary curly braces here (see
Documentation/CodingGuidelines).

> +#endif
>  		nr = read(fd, buf, len);
>  		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
>  			continue;
> -- 
> 1.8.4.rc3.5.gfcb973a

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X
  2013-08-17 12:40 [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X Steffen Prohaska
  2013-08-17 15:27 ` John Keeping
@ 2013-08-17 15:56 ` Torsten Bögershausen
  2013-08-17 17:16 ` Johannes Sixt
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 37+ messages in thread
From: Torsten Bögershausen @ 2013-08-17 15:56 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: Junio C Hamano, git

On 2013-08-17 14.40, Steffen Prohaska wrote:
> Previously, filtering more than 2GB through an external filter (see
> test) failed on Mac OS X 10.8.4 (12E55) with:
> 
>     error: read from external filter cat failed
>     error: cannot feed the input to external filter cat
>     error: cat died of signal 13
>     error: external filter cat failed 141
>     error: external filter cat failed
> 
> The reason is that read() immediately returns with EINVAL if len >= 2GB.
> I haven't found any information under which specific conditions this
> occurs.  My suspicion is that it happens when reading from a pipe, while
> reading from a standard file should always be fine.  I haven't tested
> any other version of Mac OS X, though I'd expect that other versions are
> affected as well.
> 
> The problem is fixed by always reading less than 2GB in xread().
> xread() doesn't guarantee to read all the requested data at once, and
> callers are expected to gracefully handle partial reads.  Slicing large
> reads into 2GB pieces should not hurt practical performance.
> 
> Signed-off-by: Steffen Prohaska <prohaska@zib.de>
> ---
>  t/t0021-conversion.sh | 9 +++++++++
>  wrapper.c             | 8 ++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> index e50f0f7..aec1253 100755
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -190,4 +190,13 @@ test_expect_success 'required filter clean failure' '
>  	test_must_fail git add test.fc
>  '
>  
> +test_expect_success 'filter large file' '
> +	git config filter.largefile.smudge cat &&
> +	git config filter.largefile.clean cat &&
> +	dd if=/dev/zero of=2GB count=2097152 bs=1024 &&
> +	echo "/2GB filter=largefile" >.gitattributes &&
> +	git add 2GB 2>err &&
> +	! grep -q "error" err
> +'
> +
>  test_done
> diff --git a/wrapper.c b/wrapper.c
> index 6a015de..2a2f496 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len)
>  {
>  	ssize_t nr;
>  	while (1) {
> +#ifdef __APPLE__
> +		const size_t twoGB = (1l << 31);
> +		/* len >= 2GB immediately fails on Mac OS X with EINVAL when
> +		 * reading from pipe. */
> +		if (len >= twoGB) {
> +			len = twoGB - 1;
> +		}
> +#endif
>  		nr = read(fd, buf, len);
>  		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
>  			continue;
> 
Thanks for the patch.
I think we can we can replace  __APPLE__ define with a more generic one.
We had a similar patch for write() some time ago:

config.mak.uname
	NEEDS_CLIPPED_WRITE = YesPlease


Makefile
ifdef NEEDS_CLIPPED_WRITE
	BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
	COMPAT_OBJS += compat/clipped-write.o
endif

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X
  2013-08-17 12:40 [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X Steffen Prohaska
  2013-08-17 15:27 ` John Keeping
  2013-08-17 15:56 ` Torsten Bögershausen
@ 2013-08-17 17:16 ` Johannes Sixt
  2013-08-17 18:57 ` Jonathan Nieder
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 37+ messages in thread
From: Johannes Sixt @ 2013-08-17 17:16 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: Junio C Hamano, git

Am 17.08.2013 14:40, schrieb Steffen Prohaska:
> Previously, filtering more than 2GB through an external filter (see
> test) failed on Mac OS X 10.8.4 (12E55) with:
>
>      error: read from external filter cat failed
>      error: cannot feed the input to external filter cat
>      error: cat died of signal 13
>      error: external filter cat failed 141
>      error: external filter cat failed
>
> The reason is that read() immediately returns with EINVAL if len >= 2GB.
> I haven't found any information under which specific conditions this
> occurs.  My suspicion is that it happens when reading from a pipe, while
> reading from a standard file should always be fine.  I haven't tested
> any other version of Mac OS X, though I'd expect that other versions are
> affected as well.
>
> The problem is fixed by always reading less than 2GB in xread().
> xread() doesn't guarantee to read all the requested data at once, and
> callers are expected to gracefully handle partial reads.  Slicing large
> reads into 2GB pieces should not hurt practical performance.
>
> Signed-off-by: Steffen Prohaska <prohaska@zib.de>
> ---
>   t/t0021-conversion.sh | 9 +++++++++
>   wrapper.c             | 8 ++++++++
>   2 files changed, 17 insertions(+)
>
> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> index e50f0f7..aec1253 100755
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -190,4 +190,13 @@ test_expect_success 'required filter clean failure' '
>   	test_must_fail git add test.fc
>   '
>
> +test_expect_success 'filter large file' '
> +	git config filter.largefile.smudge cat &&
> +	git config filter.largefile.clean cat &&
> +	dd if=/dev/zero of=2GB count=2097152 bs=1024 &&

We don't have /dev/zero on Windows. Even if we get a file slightly over 
2GB, we can't handle it on Windows, and other 32bit architectures will 
very likely also be handicapped.

Finally, this test (if it remains in some form) should probably be 
protected by EXPENSIVE.

> +	echo "/2GB filter=largefile" >.gitattributes &&

Drop the slash, please; it may confuse our bash on Windows (it doesn't 
currently because echo is a builtin, but better safe than sorry).

> +	git add 2GB 2>err &&
> +	! grep -q "error" err

Executive summary: drop everything starting at "2>err".

Long story: Can it happen that (1) git add succeeds, but still produces 
something on stderr, and (2) we do not care what this something is as long 
as it does not contain "error"? I don't think this combination of 
conditions makes sense; it's sufficient to check that git add does not fail.

BTW, if you add

	... &&
	rm -f 2GB &&
	git checkout -- 2GB

you would also test the smudge filter code path with a huge file, no?

BTW2, to create a file with slightly over 2GB, you can use

	for i in $(test_seq 0 128); do printf "%16777216d" 1; done >2GB

> +'
> +
>   test_done
> diff --git a/wrapper.c b/wrapper.c
> index 6a015de..2a2f496 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len)
>   {
>   	ssize_t nr;
>   	while (1) {
> +#ifdef __APPLE__
> +		const size_t twoGB = (1l << 31);
> +		/* len >= 2GB immediately fails on Mac OS X with EINVAL when
> +		 * reading from pipe. */
> +		if (len >= twoGB) {
> +			len = twoGB - 1;
> +		}
> +#endif
>   		nr = read(fd, buf, len);
>   		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
>   			continue;
>

-- Hannes

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X
  2013-08-17 12:40 [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X Steffen Prohaska
                   ` (2 preceding siblings ...)
  2013-08-17 17:16 ` Johannes Sixt
@ 2013-08-17 18:57 ` Jonathan Nieder
  2013-08-17 20:25 ` Kyle J. McKay
  2013-08-19  6:38 ` [PATCH v2] compat: Fix read() of 2GB and more " Steffen Prohaska
  5 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2013-08-17 18:57 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: Junio C Hamano, git

Hi,

Steffen Prohaska wrote:

> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -139,6 +139,14 @@ ssize_t xread(int fd, void *buf, size_t len)
>  {
>  	ssize_t nr;
>  	while (1) {
> +#ifdef __APPLE__
> +		const size_t twoGB = (1l << 31);
> +		/* len >= 2GB immediately fails on Mac OS X with EINVAL when
> +		 * reading from pipe. */
> +		if (len >= twoGB) {
> +			len = twoGB - 1;
> +		}
> +#endif
>  		nr = read(fd, buf, len);

See 6c642a87 (compat: large write(2) fails on Mac OS X/XNU,
2013-05-10) for a cleaner way to do this.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X
  2013-08-17 12:40 [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X Steffen Prohaska
                   ` (3 preceding siblings ...)
  2013-08-17 18:57 ` Jonathan Nieder
@ 2013-08-17 20:25 ` Kyle J. McKay
  2013-08-17 21:23   ` Jonathan Nieder
  2013-08-19  6:38 ` [PATCH v2] compat: Fix read() of 2GB and more " Steffen Prohaska
  5 siblings, 1 reply; 37+ messages in thread
From: Kyle J. McKay @ 2013-08-17 20:25 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: Junio C Hamano, git

On Aug 17, 2013, at 05:40, Steffen Prohaska wrote:

> Previously, filtering more than 2GB through an external filter (see
> test) failed on Mac OS X 10.8.4 (12E55) with:
>
>    error: read from external filter cat failed
>    error: cannot feed the input to external filter cat
>    error: cat died of signal 13
>    error: external filter cat failed 141
>    error: external filter cat failed
>
> The reason is that read() immediately returns with EINVAL if len >=  
> 2GB.
> I haven't found any information under which specific conditions this
> occurs.

According to POSIX [1] for read:

If the value of nbyte is greater than {SSIZE_MAX}, the result is  
implementation-defined.


The write function also has the same restriction [2].

Since OS X still supports running 32-bit executables, and SSIZE_MAX is  
2GB - 1 when running 32-bit it would seem the same limit has been  
imposed on 64-bit executables.  In any case, we should avoid  
"implementation-defined" behavior for portability unless we know the  
OS we were compiled on has acceptable "implementation-defined"  
behavior and otherwise never attempt to read or write more than  
SSIZE_MAX bytes.

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X
  2013-08-17 20:25 ` Kyle J. McKay
@ 2013-08-17 21:23   ` Jonathan Nieder
  0 siblings, 0 replies; 37+ messages in thread
From: Jonathan Nieder @ 2013-08-17 21:23 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: Steffen Prohaska, Junio C Hamano, git

Kyle J. McKay wrote:

> According to POSIX [1] for read:
>
> If the value of nbyte is greater than {SSIZE_MAX}, the result is
> implementation-defined.

Sure.

[...]
> Since OS X still supports running 32-bit executables, and SSIZE_MAX is 2GB -
> 1 when running 32-bit it would seem the same limit has been imposed on
> 64-bit executables.  In any case, we should avoid "implementation-defined"
> behavior

Wait --- that's a big leap.

In a 64-bit executable, SSIZE_MAX is 2^63 - 1, so the behavior is not
implementation-defined.  I'm not sure if Steffen's copy of git is
32-bit or 64-bit --- my guess would be 64-bit.  So at first glance
this does look like an XNU-specific bug, not a standards thing.

What about the related case where someone does try to "git add"
a file with a clean filter producing more than SSIZE_MAX and less
than SIZE_MAX bytes?

strbuf_grow() does not directly protect against a strbuf growing to >
SSIZE_MAX bytes, but in practice on most machines realloc() does.  So
in practice we could never read more than SSIZE_MAX characters in the
strbuf_read() codepath, but it might be worth a check for paranoia
anyway.

While we're here, it's easy to wonder: why is git reading into such a
large buffer anyway?  Normally git uses the streaming codepath for
files larger than big_file_threshold (typically 512 MiB).
Unfortunately there are cases where it doesn't.  For example:

  - convert_to_git() has not been taught to stream, so paths
    with a clean filter or requiring crlf conversion are read or
    mapped into memory.

  - deflate_to_pack() relies on seeking backward to retry when
    a pack would grow too large, so "git hash-object --stdin"
    cannot use that codepath.

  - a "clean" filter can make a file bigger.

  Perhaps git needs to learn to write to a temporary file
  when asked to keep track of a blob that is larger than fits
  reasonably in memory.  Or maybe not.

So there is room for related work but the codepaths that read()
indefinitely large files do seem to be needed, at least in the short
term.  Working around this Mac OS X-specific limitation at the read()
level like you've done still sounds like the right thing to do.

Thanks, both, for your work tracking this down.  Hopefully the next
version of the patch will be in good shape and then it can be applied
quickly.

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-17 12:40 [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X Steffen Prohaska
                   ` (4 preceding siblings ...)
  2013-08-17 20:25 ` Kyle J. McKay
@ 2013-08-19  6:38 ` Steffen Prohaska
  2013-08-19  7:54   ` John Keeping
                     ` (4 more replies)
  5 siblings, 5 replies; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-19  6:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Sixt, John Keeping, Jonathan Nieder, Kyle J. McKay,
	Torsten Bögershausen, Steffen Prohaska

Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

    error: read from external filter cat failed
    error: cannot feed the input to external filter cat
    error: cat died of signal 13
    error: external filter cat failed 141
    error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte >=
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed in a earlier
commit [6c642a].

The problem for read() is addressed in a similar way by introducing
a wrapper function in compat that always reads less than 2GB.
Unfortunately, '#undef read' is needed at a few places to avoid
expanding the compat macro in constructs like 'vtbl->read(...)'.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for their suggestions:

    Johannes Sixt <j6t@kdbg.org>
    John Keeping <john@keeping.me.uk>
    Jonathan Nieder <jrnieder@gmail.com>
    Kyle J. McKay <mackyle@gmail.com>
    Torsten Bögershausen <tboegi@web.de>

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] 6c642a878688adf46b226903858b53e2d31ac5c3
    compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 Makefile              |  8 ++++++++
 builtin/var.c         |  1 +
 config.mak.uname      |  1 +
 git-compat-util.h     |  5 +++++
 streaming.c           |  1 +
 t/t0021-conversion.sh | 14 ++++++++++++++
 6 files changed, 30 insertions(+)

diff --git a/Makefile b/Makefile
index 3588ca1..0f69e24 100644
--- a/Makefile
+++ b/Makefile
@@ -69,6 +69,9 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
+# Define NEEDS_CLIPPED_READ if your read(2) cannot read more than
+# INT_MAX bytes at once (e.g. MacOS X).
+#
 # Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
 # INT_MAX bytes at once (e.g. MacOS X).
 #
@@ -1493,6 +1496,11 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
 	MSGFMT += --check --statistics
 endif
 
+ifdef NEEDS_CLIPPED_READ
+	BASIC_CFLAGS += -DNEEDS_CLIPPED_READ
+	COMPAT_OBJS += compat/clipped-read.o
+endif
+
 ifdef NEEDS_CLIPPED_WRITE
 	BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
 	COMPAT_OBJS += compat/clipped-write.o
diff --git a/builtin/var.c b/builtin/var.c
index aedbb53..e59f5ba 100644
--- a/builtin/var.c
+++ b/builtin/var.c
@@ -38,6 +38,7 @@ static struct git_var git_vars[] = {
 	{ "", NULL },
 };
 
+#undef read
 static void list_vars(void)
 {
 	struct git_var *ptr;
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..5c10726 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,6 +95,7 @@ ifeq ($(uname_S),Darwin)
 	NO_MEMMEM = YesPlease
 	USE_ST_TIMESPEC = YesPlease
 	HAVE_DEV_TTY = YesPlease
+	NEEDS_CLIPPED_READ = YesPlease
 	NEEDS_CLIPPED_WRITE = YesPlease
 	COMPAT_OBJS += compat/precompose_utf8.o
 	BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..a227127 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,6 +185,11 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
+#ifdef NEEDS_CLIPPED_READ
+ssize_t clipped_read(int fd, void *buf, size_t nbyte);
+#define read(x,y,z) clipped_read((x),(y),(z))
+#endif
+
 #ifdef NEEDS_CLIPPED_WRITE
 ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
 #define write(x,y,z) clipped_write((x),(y),(z))
diff --git a/streaming.c b/streaming.c
index debe904..c1fe34a 100644
--- a/streaming.c
+++ b/streaming.c
@@ -99,6 +99,7 @@ int close_istream(struct git_istream *st)
 	return r;
 }
 
+#undef read
 ssize_t read_istream(struct git_istream *st, void *buf, size_t sz)
 {
 	return st->vtbl->read(st, buf, sz);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..b92e6cb 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,18 @@ test_expect_success 'required filter clean failure' '
 	test_must_fail git add test.fc
 '
 
+test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE
+
+test_expect_success EXPENSIVE 'filter large file' '
+	git config filter.largefile.smudge cat &&
+	git config filter.largefile.clean cat &&
+	for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
+	echo "2GB filter=largefile" >.gitattributes &&
+	git add 2GB 2>err &&
+	! test -s err &&
+	rm -f 2GB &&
+	git checkout -- 2GB 2>err &&
+	! test -s err
+'
+
 test_done
-- 
1.8.4.rc3.5.ga3343dd

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19  6:38 ` [PATCH v2] compat: Fix read() of 2GB and more " Steffen Prohaska
@ 2013-08-19  7:54   ` John Keeping
  2013-08-19  8:20     ` Steffen Prohaska
  2013-08-19  8:20   ` Johannes Sixt
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 37+ messages in thread
From: John Keeping @ 2013-08-19  7:54 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Junio C Hamano, git, Johannes Sixt, Jonathan Nieder,
	Kyle J. McKay, Torsten Bögershausen

On Mon, Aug 19, 2013 at 08:38:20AM +0200, Steffen Prohaska wrote:
> diff --git a/Makefile b/Makefile
> index 3588ca1..0f69e24 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -69,6 +69,9 @@ all::
>  # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
>  # doesn't support GNU extensions like --check and --statistics
>  #
> +# Define NEEDS_CLIPPED_READ if your read(2) cannot read more than
> +# INT_MAX bytes at once (e.g. MacOS X).
> +#
>  # Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
>  # INT_MAX bytes at once (e.g. MacOS X).
>  #
> @@ -1493,6 +1496,11 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
>  	MSGFMT += --check --statistics
>  endif
>  
> +ifdef NEEDS_CLIPPED_READ
> +	BASIC_CFLAGS += -DNEEDS_CLIPPED_READ
> +	COMPAT_OBJS += compat/clipped-read.o

You've created compat/clipped-write.c, but...

>  Makefile              |  8 ++++++++
>  builtin/var.c         |  1 +
>  config.mak.uname      |  1 +
>  git-compat-util.h     |  5 +++++
>  streaming.c           |  1 +
>  t/t0021-conversion.sh | 14 ++++++++++++++
>  6 files changed, 30 insertions(+)

... it's not included here.  Did you forget to "git add"?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19  6:38 ` [PATCH v2] compat: Fix read() of 2GB and more " Steffen Prohaska
  2013-08-19  7:54   ` John Keeping
@ 2013-08-19  8:20   ` Johannes Sixt
  2013-08-19  8:25     ` Stefan Beller
  2013-08-19  8:28     ` Steffen Prohaska
  2013-08-19  8:21   ` [PATCH v3] " Steffen Prohaska
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 37+ messages in thread
From: Johannes Sixt @ 2013-08-19  8:20 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Junio C Hamano, git, John Keeping, Jonathan Nieder, Kyle J. McKay,
	Torsten Bögershausen

Am 19.08.2013 08:38, schrieb Steffen Prohaska:
> +test_expect_success EXPENSIVE 'filter large file' '
> +	git config filter.largefile.smudge cat &&
> +	git config filter.largefile.clean cat &&
> +	for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&

Shouldn't you count to 2049 to get a file that is over 2GB?

> +	echo "2GB filter=largefile" >.gitattributes &&
> +	git add 2GB 2>err &&
> +	! test -s err &&
> +	rm -f 2GB &&
> +	git checkout -- 2GB 2>err &&
> +	! test -s err
> +'

-- Hannes

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19  7:54   ` John Keeping
@ 2013-08-19  8:20     ` Steffen Prohaska
  0 siblings, 0 replies; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-19  8:20 UTC (permalink / raw)
  To: John Keeping
  Cc: Junio C Hamano, git, Johannes Sixt, Jonathan Nieder,
	Kyle J. McKay, Torsten Bögershausen


On Aug 19, 2013, at 9:54 AM, John Keeping <john@keeping.me.uk> wrote:

> You've created compat/clipped-read.c, but...
> 
>> Makefile              |  8 ++++++++
>> builtin/var.c         |  1 +
>> config.mak.uname      |  1 +
>> git-compat-util.h     |  5 +++++
>> streaming.c           |  1 +
>> t/t0021-conversion.sh | 14 ++++++++++++++
>> 6 files changed, 30 insertions(+)
> 
> ... it's not included here.  Did you forget to "git add"?

Indeed.  How embarrassing.  Thanks for spotting this.  I'll send v3 in a minute.

	Stefffen

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19  6:38 ` [PATCH v2] compat: Fix read() of 2GB and more " Steffen Prohaska
  2013-08-19  7:54   ` John Keeping
  2013-08-19  8:20   ` Johannes Sixt
@ 2013-08-19  8:21   ` Steffen Prohaska
  2013-08-19 13:59     ` Eric Sunshine
  2013-08-19 15:41     ` [PATCH v4] " Steffen Prohaska
  2013-08-19  8:27   ` [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X Johannes Sixt
  2013-08-19 14:41   ` Torsten Bögershausen
  4 siblings, 2 replies; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-19  8:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Sixt, John Keeping, Jonathan Nieder, Kyle J. McKay,
	Torsten Bögershausen, Steffen Prohaska

Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

    error: read from external filter cat failed
    error: cannot feed the input to external filter cat
    error: cat died of signal 13
    error: external filter cat failed 141
    error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte >=
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed in a earlier
commit [6c642a].

The problem for read() is addressed in a similar way by introducing
a wrapper function in compat that always reads less than 2GB.
Unfortunately, '#undef read' is needed at a few places to avoid
expanding the compat macro in constructs like 'vtbl->read(...)'.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for their suggestions:

    Johannes Sixt <j6t@kdbg.org>
    John Keeping <john@keeping.me.uk>
    Jonathan Nieder <jrnieder@gmail.com>
    Kyle J. McKay <mackyle@gmail.com>
    Torsten Bögershausen <tboegi@web.de>

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] 6c642a878688adf46b226903858b53e2d31ac5c3
    compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 Makefile              |  8 ++++++++
 builtin/var.c         |  1 +
 compat/clipped-read.c | 13 +++++++++++++
 config.mak.uname      |  1 +
 git-compat-util.h     |  5 +++++
 streaming.c           |  1 +
 t/t0021-conversion.sh | 14 ++++++++++++++
 7 files changed, 43 insertions(+)
 create mode 100644 compat/clipped-read.c

diff --git a/Makefile b/Makefile
index 3588ca1..0f69e24 100644
--- a/Makefile
+++ b/Makefile
@@ -69,6 +69,9 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
+# Define NEEDS_CLIPPED_READ if your read(2) cannot read more than
+# INT_MAX bytes at once (e.g. MacOS X).
+#
 # Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
 # INT_MAX bytes at once (e.g. MacOS X).
 #
@@ -1493,6 +1496,11 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
 	MSGFMT += --check --statistics
 endif
 
+ifdef NEEDS_CLIPPED_READ
+	BASIC_CFLAGS += -DNEEDS_CLIPPED_READ
+	COMPAT_OBJS += compat/clipped-read.o
+endif
+
 ifdef NEEDS_CLIPPED_WRITE
 	BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
 	COMPAT_OBJS += compat/clipped-write.o
diff --git a/builtin/var.c b/builtin/var.c
index aedbb53..e59f5ba 100644
--- a/builtin/var.c
+++ b/builtin/var.c
@@ -38,6 +38,7 @@ static struct git_var git_vars[] = {
 	{ "", NULL },
 };
 
+#undef read
 static void list_vars(void)
 {
 	struct git_var *ptr;
diff --git a/compat/clipped-read.c b/compat/clipped-read.c
new file mode 100644
index 0000000..6962f67
--- /dev/null
+++ b/compat/clipped-read.c
@@ -0,0 +1,13 @@
+#include "../git-compat-util.h"
+#undef read
+
+/*
+ * Version of read that will write at most INT_MAX bytes.
+ * Workaround a xnu bug on Mac OS X
+ */
+ssize_t clipped_read(int fd, void *buf, size_t nbyte)
+{
+	if (nbyte > INT_MAX)
+		nbyte = INT_MAX;
+	return read(fd, buf, nbyte);
+}
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..5c10726 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,6 +95,7 @@ ifeq ($(uname_S),Darwin)
 	NO_MEMMEM = YesPlease
 	USE_ST_TIMESPEC = YesPlease
 	HAVE_DEV_TTY = YesPlease
+	NEEDS_CLIPPED_READ = YesPlease
 	NEEDS_CLIPPED_WRITE = YesPlease
 	COMPAT_OBJS += compat/precompose_utf8.o
 	BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..a227127 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,6 +185,11 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
+#ifdef NEEDS_CLIPPED_READ
+ssize_t clipped_read(int fd, void *buf, size_t nbyte);
+#define read(x,y,z) clipped_read((x),(y),(z))
+#endif
+
 #ifdef NEEDS_CLIPPED_WRITE
 ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
 #define write(x,y,z) clipped_write((x),(y),(z))
diff --git a/streaming.c b/streaming.c
index debe904..c1fe34a 100644
--- a/streaming.c
+++ b/streaming.c
@@ -99,6 +99,7 @@ int close_istream(struct git_istream *st)
 	return r;
 }
 
+#undef read
 ssize_t read_istream(struct git_istream *st, void *buf, size_t sz)
 {
 	return st->vtbl->read(st, buf, sz);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..b92e6cb 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,18 @@ test_expect_success 'required filter clean failure' '
 	test_must_fail git add test.fc
 '
 
+test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE
+
+test_expect_success EXPENSIVE 'filter large file' '
+	git config filter.largefile.smudge cat &&
+	git config filter.largefile.clean cat &&
+	for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
+	echo "2GB filter=largefile" >.gitattributes &&
+	git add 2GB 2>err &&
+	! test -s err &&
+	rm -f 2GB &&
+	git checkout -- 2GB 2>err &&
+	! test -s err
+'
+
 test_done
-- 
1.8.4.rc3.5.g4f480ff

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19  8:20   ` Johannes Sixt
@ 2013-08-19  8:25     ` Stefan Beller
  2013-08-19  8:40       ` Johannes Sixt
  2013-08-19  8:28     ` Steffen Prohaska
  1 sibling, 1 reply; 37+ messages in thread
From: Stefan Beller @ 2013-08-19  8:25 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Steffen Prohaska, Junio C Hamano, git, John Keeping,
	Jonathan Nieder, Kyle J. McKay, Torsten Bögershausen

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

On 08/19/2013 10:20 AM, Johannes Sixt wrote:
> Am 19.08.2013 08:38, schrieb Steffen Prohaska:
>> +test_expect_success EXPENSIVE 'filter large file' '
>> +    git config filter.largefile.smudge cat &&
>> +    git config filter.largefile.clean cat &&
>> +    for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
> 
> Shouldn't you count to 2049 to get a file that is over 2GB?

Would it be possible to offload the looping from shell to a real
program? So for example
	truncate -s 2049M <filename>
should do the job. That would create a file reading all bytes as zeros	
being larger as 2G. If truncate is not available, what about dd?

Stefan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19  6:38 ` [PATCH v2] compat: Fix read() of 2GB and more " Steffen Prohaska
                     ` (2 preceding siblings ...)
  2013-08-19  8:21   ` [PATCH v3] " Steffen Prohaska
@ 2013-08-19  8:27   ` Johannes Sixt
  2013-08-19 14:41   ` Torsten Bögershausen
  4 siblings, 0 replies; 37+ messages in thread
From: Johannes Sixt @ 2013-08-19  8:27 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Junio C Hamano, git, John Keeping, Jonathan Nieder, Kyle J. McKay,
	Torsten Bögershausen

Am 19.08.2013 08:38, schrieb Steffen Prohaska:
> Note that 'git add' exits with 0 even if it prints filtering errors to
> stderr.  The test, therefore, checks stderr.  'git add' should probably
> be changed (sometime in another commit) to exit with nonzero if
> filtering fails.  The test could then be changed to use test_must_fail.

Thanks for this hint. I was not aware of this behavior.

Of course, we do *not* want to use test_must_fail because git add 
generally must not fail for files with more than 2GB. (Architectures with 
a 32bit size_t are a different matter, of course.)

-- Hannes

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19  8:20   ` Johannes Sixt
  2013-08-19  8:25     ` Stefan Beller
@ 2013-08-19  8:28     ` Steffen Prohaska
  1 sibling, 0 replies; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-19  8:28 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Junio C Hamano, git, John Keeping, Jonathan Nieder, Kyle J. McKay,
	Torsten Bögershausen


On Aug 19, 2013, at 10:20 AM, Johannes Sixt <j6t@kdbg.org> wrote:

> Am 19.08.2013 08:38, schrieb Steffen Prohaska:
>> +test_expect_success EXPENSIVE 'filter large file' '
>> +	git config filter.largefile.smudge cat &&
>> +	git config filter.largefile.clean cat &&
>> +	for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
> 
> Shouldn't you count to 2049 to get a file that is over 2GB?

No.  INT_MAX = 2GB - 1 works.  INT_MAX + 1 = 2GB fails.  It tests exactly at the boundary.

	Steffen

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19  8:25     ` Stefan Beller
@ 2013-08-19  8:40       ` Johannes Sixt
  0 siblings, 0 replies; 37+ messages in thread
From: Johannes Sixt @ 2013-08-19  8:40 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Steffen Prohaska, Junio C Hamano, git, John Keeping,
	Jonathan Nieder, Kyle J. McKay, Torsten Bögershausen

Am 19.08.2013 10:25, schrieb Stefan Beller:
> On 08/19/2013 10:20 AM, Johannes Sixt wrote:
>> Am 19.08.2013 08:38, schrieb Steffen Prohaska:
>>> +test_expect_success EXPENSIVE 'filter large file' '
>>> +    git config filter.largefile.smudge cat &&
>>> +    git config filter.largefile.clean cat &&
>>> +    for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
>>
>> Shouldn't you count to 2049 to get a file that is over 2GB?
>
> Would it be possible to offload the looping from shell to a real
> program? So for example
> 	truncate -s 2049M <filename>
> should do the job. That would create a file reading all bytes as zeros	
> being larger as 2G. If truncate is not available, what about dd?

The point is exactly to avoid external dependencies. Our dd on Windows 
doesn't do the right thing with seek=2GB (it makes the file twice as large 
as expected).

-- Hannes

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19  8:21   ` [PATCH v3] " Steffen Prohaska
@ 2013-08-19 13:59     ` Eric Sunshine
  2013-08-19 16:33       ` Junio C Hamano
  2013-08-19 15:41     ` [PATCH v4] " Steffen Prohaska
  1 sibling, 1 reply; 37+ messages in thread
From: Eric Sunshine @ 2013-08-19 13:59 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Junio C Hamano, Git List, Johannes Sixt, John Keeping,
	Jonathan Nieder, Kyle J. McKay, Torsten Bögershausen

On Mon, Aug 19, 2013 at 4:21 AM, Steffen Prohaska <prohaska@zib.de> wrote:
> Previously, filtering 2GB or more through an external filter (see test)
> failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:
>
>     error: read from external filter cat failed
>     error: cannot feed the input to external filter cat
>     error: cat died of signal 13
>     error: external filter cat failed 141
>     error: external filter cat failed
>
>
> Signed-off-by: Steffen Prohaska <prohaska@zib.de>
> ---
>  Makefile              |  8 ++++++++
>  builtin/var.c         |  1 +
>  compat/clipped-read.c | 13 +++++++++++++
>  config.mak.uname      |  1 +
>  git-compat-util.h     |  5 +++++
>  streaming.c           |  1 +
>  t/t0021-conversion.sh | 14 ++++++++++++++
>  7 files changed, 43 insertions(+)
>  create mode 100644 compat/clipped-read.c
>
> diff --git a/Makefile b/Makefile
> index 3588ca1..0f69e24 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -69,6 +69,9 @@ all::
>  # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
>  # doesn't support GNU extensions like --check and --statistics
>  #
> +# Define NEEDS_CLIPPED_READ if your read(2) cannot read more than
> +# INT_MAX bytes at once (e.g. MacOS X).
> +#
>  # Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
>  # INT_MAX bytes at once (e.g. MacOS X).

Is it likely that we would see a platform requiring only one or the
other CLIPPED? Would it make sense to combine these into a single
NEEDS_CLIPPED_IO?

>  #
> @@ -1493,6 +1496,11 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
>         MSGFMT += --check --statistics
>  endif
>
> +ifdef NEEDS_CLIPPED_READ
> +       BASIC_CFLAGS += -DNEEDS_CLIPPED_READ
> +       COMPAT_OBJS += compat/clipped-read.o
> +endif
> +
>  ifdef NEEDS_CLIPPED_WRITE
>         BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
>         COMPAT_OBJS += compat/clipped-write.o
> diff --git a/builtin/var.c b/builtin/var.c
> index aedbb53..e59f5ba 100644
> --- a/builtin/var.c
> +++ b/builtin/var.c
> @@ -38,6 +38,7 @@ static struct git_var git_vars[] = {
>         { "", NULL },
>  };

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19  6:38 ` [PATCH v2] compat: Fix read() of 2GB and more " Steffen Prohaska
                     ` (3 preceding siblings ...)
  2013-08-19  8:27   ` [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X Johannes Sixt
@ 2013-08-19 14:41   ` Torsten Bögershausen
  4 siblings, 0 replies; 37+ messages in thread
From: Torsten Bögershausen @ 2013-08-19 14:41 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Junio C Hamano, git, Johannes Sixt, John Keeping, Jonathan Nieder,
	Kyle J. McKay, Torsten Bögershausen

On 2013-08-19 08.38, Steffen Prohaska wrote:
[snip]

> diff --git a/builtin/var.c b/builtin/var.c
> index aedbb53..e59f5ba 100644
> --- a/builtin/var.c
> +++ b/builtin/var.c
> @@ -38,6 +38,7 @@ static struct git_var git_vars[] = {
>  	{ "", NULL },
>  };
>  
> +#undef read
This is techically right for this very version of the  code,
but not really future proof, if someone uses read() further down in the code
(in a later version)

I think the problem comes from further up:
------------------
struct git_var {
	const char *name;
	const char *(*read)(int);
};
-----------------
could the read be replaced by readfn ?

===================
> diff --git a/streaming.c b/streaming.c
> index debe904..c1fe34a 100644
> --- a/streaming.c
> +++ b/streaming.c
> @@ -99,6 +99,7 @@ int close_istream(struct git_istream *st)
>  	return r;
>  }
>  
> +#undef read
Same possible future problem as above.
When later someone uses read, the original (buggy) read() will be
used, and not the re-defined clipped_read() from git-compat-util.h

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19  8:21   ` [PATCH v3] " Steffen Prohaska
  2013-08-19 13:59     ` Eric Sunshine
@ 2013-08-19 15:41     ` Steffen Prohaska
  2013-08-19 16:04       ` Linus Torvalds
  2013-08-20  6:43       ` [PATCH v5 0/2] Fix IO of >=2GB on Mac OS X by limiting IO chunks Steffen Prohaska
  1 sibling, 2 replies; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-19 15:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Johannes Sixt, John Keeping, Jonathan Nieder, Kyle J. McKay,
	Torsten Bögershausen, Eric Sunshine, Steffen Prohaska

Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

    error: read from external filter cat failed
    error: cannot feed the input to external filter cat
    error: cat died of signal 13
    error: external filter cat failed 141
    error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte >=
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed in a earlier
commit [6c642a].

The problem for read() is addressed in a similar way by introducing
a wrapper function in compat that always reads less than 2GB.  It is
very likely that the read() and write() wrappers are always used
together.  To avoid introducing another option, NEEDS_CLIPPED_WRITE is
changed to NEEDS_CLIPPED_IO and used to activate both wrappers.

To avoid expanding the read compat macro in constructs like
'vtbl->read(...)', 'read' is renamed to 'readfn' in two cases.  The
solution seems more robust than using '#undef read'.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for their suggestions:

    Johannes Sixt <j6t@kdbg.org>
    John Keeping <john@keeping.me.uk>
    Jonathan Nieder <jrnieder@gmail.com>
    Kyle J. McKay <mackyle@gmail.com>
    Torsten Bögershausen <tboegi@web.de>
    Eric Sunshine <sunshine@sunshineco.com>

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] 6c642a878688adf46b226903858b53e2d31ac5c3
    compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 Makefile                                 | 10 +++++-----
 builtin/var.c                            | 10 +++++-----
 compat/{clipped-write.c => clipped-io.c} | 11 ++++++++++-
 config.mak.uname                         |  2 +-
 git-compat-util.h                        |  5 ++++-
 streaming.c                              |  4 ++--
 t/t0021-conversion.sh                    | 14 ++++++++++++++
 7 files changed, 41 insertions(+), 15 deletions(-)
 rename compat/{clipped-write.c => clipped-io.c} (53%)

diff --git a/Makefile b/Makefile
index 3588ca1..f54134d 100644
--- a/Makefile
+++ b/Makefile
@@ -69,8 +69,8 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
-# Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
-# INT_MAX bytes at once (e.g. MacOS X).
+# Define NEEDS_CLIPPED_IO if your read(2) and/or write(2) cannot handle more
+# than INT_MAX bytes at once (e.g. Mac OS X).
 #
 # Define HAVE_PATHS_H if you have paths.h and want to use the default PATH
 # it specifies.
@@ -1493,9 +1493,9 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
 	MSGFMT += --check --statistics
 endif
 
-ifdef NEEDS_CLIPPED_WRITE
-	BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
-	COMPAT_OBJS += compat/clipped-write.o
+ifdef NEEDS_CLIPPED_IO
+	BASIC_CFLAGS += -DNEEDS_CLIPPED_IO
+	COMPAT_OBJS += compat/clipped-io.o
 endif
 
 ifneq (,$(XDL_FAST_HASH))
diff --git a/builtin/var.c b/builtin/var.c
index aedbb53..06f8459 100644
--- a/builtin/var.c
+++ b/builtin/var.c
@@ -28,7 +28,7 @@ static const char *pager(int flag)
 
 struct git_var {
 	const char *name;
-	const char *(*read)(int);
+	const char *(*readfn)(int);
 };
 static struct git_var git_vars[] = {
 	{ "GIT_COMMITTER_IDENT", git_committer_info },
@@ -43,8 +43,8 @@ static void list_vars(void)
 	struct git_var *ptr;
 	const char *val;
 
-	for (ptr = git_vars; ptr->read; ptr++)
-		if ((val = ptr->read(0)))
+	for (ptr = git_vars; ptr->readfn; ptr++)
+		if ((val = ptr->readfn(0)))
 			printf("%s=%s\n", ptr->name, val);
 }
 
@@ -53,9 +53,9 @@ static const char *read_var(const char *var)
 	struct git_var *ptr;
 	const char *val;
 	val = NULL;
-	for (ptr = git_vars; ptr->read; ptr++) {
+	for (ptr = git_vars; ptr->readfn; ptr++) {
 		if (strcmp(var, ptr->name) == 0) {
-			val = ptr->read(IDENT_STRICT);
+			val = ptr->readfn(IDENT_STRICT);
 			break;
 		}
 	}
diff --git a/compat/clipped-write.c b/compat/clipped-io.c
similarity index 53%
rename from compat/clipped-write.c
rename to compat/clipped-io.c
index b8f98ff..ec3232a 100644
--- a/compat/clipped-write.c
+++ b/compat/clipped-io.c
@@ -1,10 +1,19 @@
 #include "../git-compat-util.h"
+#undef read
 #undef write
 
 /*
- * Version of write that will write at most INT_MAX bytes.
+ * Versions of read() and write() that limit nbyte to INT_MAX.
  * Workaround a xnu bug on Mac OS X
  */
+
+ssize_t clipped_read(int fd, void *buf, size_t nbyte)
+{
+	if (nbyte > INT_MAX)
+		nbyte = INT_MAX;
+	return read(fd, buf, nbyte);
+}
+
 ssize_t clipped_write(int fildes, const void *buf, size_t nbyte)
 {
 	if (nbyte > INT_MAX)
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..fb39726 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,7 +95,7 @@ ifeq ($(uname_S),Darwin)
 	NO_MEMMEM = YesPlease
 	USE_ST_TIMESPEC = YesPlease
 	HAVE_DEV_TTY = YesPlease
-	NEEDS_CLIPPED_WRITE = YesPlease
+	NEEDS_CLIPPED_IO = YesPlease
 	COMPAT_OBJS += compat/precompose_utf8.o
 	BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
 endif
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..4a875cb 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,7 +185,10 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
-#ifdef NEEDS_CLIPPED_WRITE
+#ifdef NEEDS_CLIPPED_IO
+ssize_t clipped_read(int fd, void *buf, size_t nbyte);
+#define read(x,y,z) clipped_read((x),(y),(z))
+
 ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
 #define write(x,y,z) clipped_write((x),(y),(z))
 #endif
diff --git a/streaming.c b/streaming.c
index debe904..2ac3047 100644
--- a/streaming.c
+++ b/streaming.c
@@ -20,7 +20,7 @@ typedef ssize_t (*read_istream_fn)(struct git_istream *, char *, size_t);
 
 struct stream_vtbl {
 	close_istream_fn close;
-	read_istream_fn read;
+	read_istream_fn readfn;
 };
 
 #define open_method_decl(name) \
@@ -101,7 +101,7 @@ int close_istream(struct git_istream *st)
 
 ssize_t read_istream(struct git_istream *st, void *buf, size_t sz)
 {
-	return st->vtbl->read(st, buf, sz);
+	return st->vtbl->readfn(st, buf, sz);
 }
 
 static enum input_source istream_source(const unsigned char *sha1,
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..b92e6cb 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,18 @@ test_expect_success 'required filter clean failure' '
 	test_must_fail git add test.fc
 '
 
+test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE
+
+test_expect_success EXPENSIVE 'filter large file' '
+	git config filter.largefile.smudge cat &&
+	git config filter.largefile.clean cat &&
+	for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
+	echo "2GB filter=largefile" >.gitattributes &&
+	git add 2GB 2>err &&
+	! test -s err &&
+	rm -f 2GB &&
+	git checkout -- 2GB 2>err &&
+	! test -s err
+'
+
 test_done
-- 
1.8.4.rc3.5.g4f480ff

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19 15:41     ` [PATCH v4] " Steffen Prohaska
@ 2013-08-19 16:04       ` Linus Torvalds
  2013-08-19 16:37         ` Steffen Prohaska
  2013-08-19 17:16         ` Junio C Hamano
  2013-08-20  6:43       ` [PATCH v5 0/2] Fix IO of >=2GB on Mac OS X by limiting IO chunks Steffen Prohaska
  1 sibling, 2 replies; 37+ messages in thread
From: Linus Torvalds @ 2013-08-19 16:04 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Junio C Hamano, Git Mailing List, Johannes Sixt, John Keeping,
	Jonathan Nieder, Kyle J. McKay, Torsten Bögershausen,
	Eric Sunshine

[-- Attachment #1: Type: text/plain, Size: 1959 bytes --]

On Mon, Aug 19, 2013 at 8:41 AM, Steffen Prohaska <prohaska@zib.de> wrote:
>
> The reason was that read() immediately returns with EINVAL if nbyte >=
> 2GB.  According to POSIX [1], if the value of nbyte passed to read() is
> greater than SSIZE_MAX, the result is implementation-defined.

Yeah, the OS X filesystem layer is an incredible piece of shit. Not
only doesn't it follow POSIX, it fails *badly*. Because OS X kernel
engineers apparently have the mental capacity of a retarded rodent on
crack.

Linux also refuses to actually read more than a maximum value in one
go (because quite frankly, doing more than 2GB at a time is just not
reasonable, especially in unkillable disk wait), but at least Linux
gives you the partial read, so that the usual "read until you're
happy" works (which you have to do anyway with sockets, pipes, NFS
intr mounts, etc etc). Returning EINVAL is a sign of a diseased mind.

I hate your patch for other reasons, though:

> The problem for read() is addressed in a similar way by introducing
> a wrapper function in compat that always reads less than 2GB.

Why do you do that? We already _have_ wrapper functions for read(),
namely xread().  Exactly because you basically have to, in order to
handle signals on interruptible filesystems (which aren't POSIX
either, but at least sanely so) or from other random sources. And to
handle the "you can't do reads that big" issue.

So why isn't the patch much more straightforward? Like the attached
totally untested one that just limits the read/write size to 8MB
(which is totally arbitrary, but small enough to not have any latency
issues even on slow disks, and big enough that any reasonable IO
subsystem will still get good throughput).

And by "totally untested" I mean that it actually passes the git test
suite, but since I didn't apply your patch nor do I have OS X
anywhere, I can't actually test that it fixes *your* problem. But it
should.


                   Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 1269 bytes --]

 wrapper.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/wrapper.c b/wrapper.c
index 6a015de5f056..e996f3dae467 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -131,6 +131,13 @@ void *xcalloc(size_t nmemb, size_t size)
 }
 
 /*
+ * Doing IO in huge chunks only results in pain. OS X is buggy,
+ * and even in the absense of bugs it can result in bad latencies
+ * when you decide to kill the process.
+ */
+#define MAX_IO_SIZE (8*1024*1024)
+
+/*
  * xread() is the same a read(), but it automatically restarts read()
  * operations with a recoverable error (EAGAIN and EINTR). xread()
  * DOES NOT GUARANTEE that "len" bytes is read even if the data is available.
@@ -139,7 +146,8 @@ ssize_t xread(int fd, void *buf, size_t len)
 {
 	ssize_t nr;
 	while (1) {
-		nr = read(fd, buf, len);
+		nr = len < MAX_IO_SIZE ? len : MAX_IO_SIZE;
+		nr = read(fd, buf, nr);
 		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
 			continue;
 		return nr;
@@ -155,7 +163,8 @@ ssize_t xwrite(int fd, const void *buf, size_t len)
 {
 	ssize_t nr;
 	while (1) {
-		nr = write(fd, buf, len);
+		nr = len < MAX_IO_SIZE ? len : MAX_IO_SIZE;
+		nr = write(fd, buf, nr);
 		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
 			continue;
 		return nr;

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19 13:59     ` Eric Sunshine
@ 2013-08-19 16:33       ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2013-08-19 16:33 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Steffen Prohaska, Git List, Johannes Sixt, John Keeping,
	Jonathan Nieder, Kyle J. McKay, Torsten Bögershausen

Eric Sunshine <sunshine@sunshineco.com> writes:

>> +# Define NEEDS_CLIPPED_READ if your read(2) cannot read more than
>> +# INT_MAX bytes at once (e.g. MacOS X).
>> +#
>>  # Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
>>  # INT_MAX bytes at once (e.g. MacOS X).
>
> Is it likely that we would see a platform requiring only one or the
> other CLIPPED? Would it make sense to combine these into a single
> NEEDS_CLIPPED_IO?

I am slightly negative to that suggestion for two reasons.

 - Does MacOS X clip other IO operations?  Do we need to invent yet
   another NEEDS_CLIPPED, e.g. NEEDS_CLIPPED_LSEEK?

   A single NEEDS_CLIPPED_IO may sound attractive for its simplicity
   (e.g. on a system that only needs NEEDS_CLIPPED_WRITE, we will
   unnecessarily chop a big read into multiple reads, but that does
   not affect the correctness of the operation, only performance but
   the actual IO cost will dominate it anyway).  If we know there
   are 47 different IO operations that might need clipping, that
   simplicity is certainly a good thing to have.  I somehow do not
   think the set of operations will grow that large, though.

 - NEEDS_CLIPPED_IO essentially says "only those who clip their
   writes would clip their reads (and vice versa)", which is not all
   that different from saying "only Apple would clip their IO",
   which in turn defeats the notion of "let's use a generic
   NEEDS_CLIPPED without limiting the workaround to specific
   platforms" somewhat.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19 16:04       ` Linus Torvalds
@ 2013-08-19 16:37         ` Steffen Prohaska
  2013-08-19 17:24           ` Junio C Hamano
  2013-08-19 17:16         ` Junio C Hamano
  1 sibling, 1 reply; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-19 16:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Git Mailing List, Johannes Sixt, John Keeping,
	Jonathan Nieder, Kyle J. McKay, Torsten Bögershausen,
	Eric Sunshine


On Aug 19, 2013, at 6:04 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:

> I hate your patch for other reasons, though:
> 
>> The problem for read() is addressed in a similar way by introducing
>> a wrapper function in compat that always reads less than 2GB.
> 
> Why do you do that? We already _have_ wrapper functions for read(),
> namely xread().  Exactly because you basically have to, in order to
> handle signals on interruptible filesystems (which aren't POSIX
> either, but at least sanely so) or from other random sources. And to
> handle the "you can't do reads that big" issue.
> 
> So why isn't the patch much more straightforward? 

The first version was more straightforward [1].  But reviewers suggested
that the compat wrappers would be the right way to do it and showed me
that it has been done like this before [2].

I haven't submitted anything in a while, so I tried to be a kind person
and followed the suggestions.  I started to hate the patch a bit (maybe less
than you), but I wasn't brave enough to reject the suggestions.  This is
why the patch became what it is.

I'm happy to rework it again towards your suggestion.  I would also remove
the compat wrapper for write().  But I got a bit tired.  I'd appreciate if
I received more indication whether a version without compat wrappers would
be accepted.

	Steffen

[1] http://article.gmane.org/gmane.comp.version-control.git/232455
[2] 6c642a8 compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19 16:04       ` Linus Torvalds
  2013-08-19 16:37         ` Steffen Prohaska
@ 2013-08-19 17:16         ` Junio C Hamano
  2013-08-19 17:28           ` Linus Torvalds
  2013-08-19 21:56           ` Kyle J. McKay
  1 sibling, 2 replies; 37+ messages in thread
From: Junio C Hamano @ 2013-08-19 17:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steffen Prohaska, Git Mailing List, Johannes Sixt, John Keeping,
	Jonathan Nieder, Kyle J. McKay, Torsten Bögershausen,
	Eric Sunshine

Linus Torvalds <torvalds@linux-foundation.org> writes:

> I hate your patch for other reasons, though:
>
>> The problem for read() is addressed in a similar way by introducing
>> a wrapper function in compat that always reads less than 2GB.
>
> Why do you do that? We already _have_ wrapper functions for read(),
> namely xread().  Exactly because you basically have to, in order to
> handle signals on interruptible filesystems (which aren't POSIX
> either, but at least sanely so) or from other random sources. And to
> handle the "you can't do reads that big" issue.

The same argument applies to xwrite(), but currently we explicitly
catch EINTR and EAGAIN knowing that on sane systems these are the
signs that we got interrupted.

Do we catch EINVAL unconditionally in the same codepath?  Could
EINVAL on saner systems mean completely different thing (like our
caller is passing bogus parameters to underlying read/write, which
is a program bug we would want to catch)?

> So why isn't the patch much more straightforward? Like the attached
> totally untested one that just limits the read/write size to 8MB
> (which is totally arbitrary, but small enough to not have any latency
> issues even on slow disks, and big enough that any reasonable IO
> subsystem will still get good throughput).

Ahh.  OK, not noticing EINVAL unconditionally, but always feed IOs
in chunks that are big enough for sane systems but small enough for
broken ones.

That makes sense.  Could somebody on MacOS X test this?

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19 16:37         ` Steffen Prohaska
@ 2013-08-19 17:24           ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2013-08-19 17:24 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Linus Torvalds, Git Mailing List, Johannes Sixt, John Keeping,
	Jonathan Nieder, Kyle J. McKay, Torsten Bögershausen,
	Eric Sunshine

Steffen Prohaska <prohaska@zib.de> writes:

> I'm happy to rework it again towards your suggestion.  I would also remove
> the compat wrapper for write().  But I got a bit tired.  I'd appreciate if
> I received more indication whether a version without compat wrappers would
> be accepted.

I think it is a reasonable way forward to remove the writer side
wrapper and doing large IO in reasonably big (but small enough not to
trigger MacOS X limitations) chunks in both read/write direction.

Linus, thanks for a dose of sanity.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19 17:16         ` Junio C Hamano
@ 2013-08-19 17:28           ` Linus Torvalds
  2013-08-19 21:56           ` Kyle J. McKay
  1 sibling, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2013-08-19 17:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Steffen Prohaska, Git Mailing List, Johannes Sixt, John Keeping,
	Jonathan Nieder, Kyle J. McKay, Torsten Bögershausen,
	Eric Sunshine

On Mon, Aug 19, 2013 at 10:16 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
> The same argument applies to xwrite(), but currently we explicitly
> catch EINTR and EAGAIN knowing that on sane systems these are the
> signs that we got interrupted.
>
> Do we catch EINVAL unconditionally in the same codepath?

No, and we shouldn't. If EINVAL happens, it will keep happening.

But with the size limiter, it doesn't matter, since we won't hit the
OS X braindamage.

> Could
> EINVAL on saner systems mean completely different thing (like our
> caller is passing bogus parameters to underlying read/write, which
> is a program bug we would want to catch)?

Yes. Even on OS X, it means that - it's just that OS X notion of what
is "bogus" is pure crap. But the thing is, looping on EINVAL would be
wrong even on OS X, since unless you change the size, it will keep
happening forever.

But with the "limit IO to 8MB" (or whatever) patch, the issue is moot.
If you get an EINVAL, it will be due to something else being horribly
horribly wrong.

                 Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19 17:16         ` Junio C Hamano
  2013-08-19 17:28           ` Linus Torvalds
@ 2013-08-19 21:56           ` Kyle J. McKay
  2013-08-19 22:51             ` Linus Torvalds
  1 sibling, 1 reply; 37+ messages in thread
From: Kyle J. McKay @ 2013-08-19 21:56 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Steffen Prohaska, Git Mailing List, Johannes Sixt,
	John Keeping, Jonathan Nieder, Torsten Bögershausen,
	Eric Sunshine

On Aug 19, 2013, at 10:16, Junio C Hamano wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> So why isn't the patch much more straightforward? Like the attached
>> totally untested one that just limits the read/write size to 8MB
>> (which is totally arbitrary, but small enough to not have any latency
>> issues even on slow disks, and big enough that any reasonable IO
>> subsystem will still get good throughput).
>
> Ahh.  OK, not noticing EINVAL unconditionally, but always feed IOs
> in chunks that are big enough for sane systems but small enough for
> broken ones.
>
> That makes sense.  Could somebody on MacOS X test this?

I tested this on both i386 (OS X 32-bit intel) and x86_64 (OS X 64-bit  
intel).

What I tested:

1. I started with branch pu:
    (965adb10 Merge branch 'sg/bash-prompt-lf-in-cwd-test' into pu)

2. I added Steffen's additional test (modified to always run) to t0021:

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..b92e6cb 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,16 @@ test_expect_success 'required filter clean  
failure' '
	test_must_fail git add test.fc
'

+test_expect_success 'filter large file' '
+	git config filter.largefile.smudge cat &&
+	git config filter.largefile.clean cat &&
+	for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
+	echo "2GB filter=largefile" >.gitattributes &&
+	git add 2GB 2>err &&
+	! test -s err &&
+	rm -f 2GB &&
+	git checkout -- 2GB 2>err &&
+	! test -s err
+'
+
test_done

3. I verified that the test fails with an unpatched build on both 32- 
bit and 64-bit.

4. I applied Linus's unmodified patch to wrapper.c.

5. I tested again.  The t0021 test now passes on 64-bit.  It still  
fails on 32-bit for another reason unrelated to Linus's patch.

It fails when attempting the "git add 2GB" line from the new 'filter  
large file' part of the test.  The failure with backtrace:

git(16806,0xa095c720) malloc: *** mmap(size=2147487744) failed (error  
code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

# NOTE: error code 12 is ENOMEM on OS X

Breakpoint 1, 0x97f634b1 in malloc_error_break ()
(gdb) bt
#0  0x97f634b1 in malloc_error_break ()
#1  0x97f5e49f in szone_error ()
#2  0x97e8b876 in allocate_pages ()
#3  0x97e8c062 in large_and_huge_malloc ()
#4  0x97e831c8 in szone_malloc ()
#5  0x97e82fb8 in malloc_zone_malloc ()
#6  0x97e8c7b2 in realloc ()
#7  0x00128abe in xrealloc (ptr=0x0, size=2147483649) at wrapper.c:100
#8  0x00111a8c in strbuf_grow (sb=0xbfffe634, extra=2147483648) at  
strbuf.c:74
#9  0x00112bb9 in strbuf_read (sb=0xbfffe634, fd=6, hint=2548572518)  
at strbuf.c:349
#10 0x0009b899 in apply_filter (path=<value temporarily unavailable,  
due to optimizations>, src=0x1000000 ' ' <repeats 200 times>...,  
len=2147483648, dst=0xbfffe774, cmd=0x402980 "cat") at convert.c:407
#11 0x0009c6f6 in convert_to_git (path=0x4028b4 "2GB", src=0x1000000 '  
' <repeats 200 times>..., len=2147483648, dst=0xbfffe774,  
checksafe=SAFE_CRLF_WARN) at convert.c:764
#12 0x0010bb38 in index_mem (sha1=0x402330 "", buf=0x1000000,  
size=2147483648, type=OBJ_BLOB, path=0x4028b4 "2GB", flags=1) at  
sha1_file.c:3044
#13 0x0010bf57 in index_core [inlined] () at /private/var/tmp/src/git/ 
sha1_file.c:3101
#14 0x0010bf57 in index_fd (sha1=0x402330 "", fd=5, st=0xbfffe900,  
type=OBJ_BLOB, path=0x4028b4 "2GB", flags=1) at sha1_file.c:3139
#15 0x0010c05e in index_path (sha1=0x402330 "", path=0x4028b4 "2GB",  
st=0xbfffe900, flags=1) at sha1_file.c:3157
#16 0x000e82f4 in add_to_index (istate=0x1a8820, path=0x4028b4 "2GB",  
st=0xbfffe900, flags=0) at read-cache.c:665
#17 0x000e87c8 in add_file_to_index (istate=0x1a8820, path=0x4028b4  
"2GB", flags=0) at read-cache.c:694
#18 0x0000440a in cmd_add (argc=<value temporarily unavailable, due to  
optimizations>, argv=0xbffff584, prefix=0x0) at builtin/add.c:299
#19 0x00002e1f in run_builtin [inlined] () at /private/var/tmp/src/git/ 
git.c:303
#20 0x00002e1f in handle_internal_command (argc=2, argv=0xbffff584) at  
git.c:466
#21 0x000032d4 in run_argv [inlined] () at /private/var/tmp/src/git/ 
git.c:512
#22 0x000032d4 in main (argc=2, av=0xbfffe28c) at git.c:595

The size 2147487744 is 2GB + 4096 bytes.  Apparently git does not  
support a filter for a file unless the file can fit entirely into  
git's memory space.  Normally a single 2GB + 4096 byte allocation  
works in an OS X 32-bit process, but something else is apparently  
eating up a large portion of the memory space in this case (perhaps an  
mmap'd copy?).  In any case, if the file being filtered was closer to  
4GB in size it would always fail on 32-bit regardless.

The fact that the entire file is read into memory when applying the  
filter does not seem like a good thing (see #7-#10 above).

--Kyle

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19 21:56           ` Kyle J. McKay
@ 2013-08-19 22:51             ` Linus Torvalds
  2013-08-27  4:59               ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2013-08-19 22:51 UTC (permalink / raw)
  To: Kyle J. McKay
  Cc: Junio C Hamano, Steffen Prohaska, Git Mailing List, Johannes Sixt,
	John Keeping, Jonathan Nieder, Torsten Bögershausen,
	Eric Sunshine

On Mon, Aug 19, 2013 at 2:56 PM, Kyle J. McKay <mackyle@gmail.com> wrote:
>
> The fact that the entire file is read into memory when applying the filter
> does not seem like a good thing (see #7-#10 above).

Yeah, that's horrible. Its likely bad for performance too, because
even if you have enough memory, it blows everything out of the L2/L3
caches, and if you don't have enough memory it obviously causes other
problems.

So it would probably be a great idea to make the filtering code able
to do things in smaller chunks, but I suspect that the patch to chunk
up xread/xwrite is the right thing to do anyway.

              Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v5 0/2] Fix IO of >=2GB on Mac OS X by limiting IO chunks
  2013-08-19 15:41     ` [PATCH v4] " Steffen Prohaska
  2013-08-19 16:04       ` Linus Torvalds
@ 2013-08-20  6:43       ` Steffen Prohaska
  2013-08-20  6:43         ` [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X Steffen Prohaska
                           ` (2 more replies)
  1 sibling, 3 replies; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-20  6:43 UTC (permalink / raw)
  To: Junio C Hamano, Linus Torvalds
  Cc: git, Johannes Sixt, John Keeping, Jonathan Nieder, Kyle J. McKay,
	Torsten Bögershausen, Eric Sunshine, Steffen Prohaska

This is the revised patch taking the considerations about IO chunk size into
account.  The series deletes more than it adds and fixes a bug.  Nice.

Steffen Prohaska (2):
  xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X
  Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU"

 Makefile               |  8 --------
 compat/clipped-write.c | 13 -------------
 config.mak.uname       |  1 -
 git-compat-util.h      |  5 -----
 t/t0021-conversion.sh  | 14 ++++++++++++++
 wrapper.c              | 12 ++++++++++++
 6 files changed, 26 insertions(+), 27 deletions(-)
 delete mode 100644 compat/clipped-write.c

-- 
1.8.4.rc3.5.g4f480ff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X
  2013-08-20  6:43       ` [PATCH v5 0/2] Fix IO of >=2GB on Mac OS X by limiting IO chunks Steffen Prohaska
@ 2013-08-20  6:43         ` Steffen Prohaska
  2013-08-20 19:37           ` Junio C Hamano
  2013-08-21 19:50           ` Torsten Bögershausen
  2013-08-20  6:43         ` [PATCH v5 2/2] Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU" Steffen Prohaska
  2013-08-21 13:46         ` [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo Steffen Prohaska
  2 siblings, 2 replies; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-20  6:43 UTC (permalink / raw)
  To: Junio C Hamano, Linus Torvalds
  Cc: git, Johannes Sixt, John Keeping, Jonathan Nieder, Kyle J. McKay,
	Torsten Bögershausen, Eric Sunshine, Steffen Prohaska

Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

    error: read from external filter cat failed
    error: cannot feed the input to external filter cat
    error: cat died of signal 13
    error: external filter cat failed 141
    error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte >=
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed earlier [6c642a].

This commit addresses the problem for read() and write() by limiting
size of IO chunks unconditionally on all platforms in xread() and
xwrite().  Large chunks only cause problems, like triggering the OS
X bug or causing latencies when killing the process.  Reasonably sized
smaller chunks have no negative impact on performance.

The compat wrapper clipped_write() introduced earlier [6c642a] is not
needed anymore.  It will be reverted in a separate commit.  The new test
catches read and write problems.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for suggestions and testing:

    Johannes Sixt <j6t@kdbg.org>
    John Keeping <john@keeping.me.uk>
    Jonathan Nieder <jrnieder@gmail.com>
    Kyle J. McKay <mackyle@gmail.com>
    Linus Torvalds <torvalds@linux-foundation.org>
    Torsten Bögershausen <tboegi@web.de>

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] commit 6c642a878688adf46b226903858b53e2d31ac5c3
    compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 t/t0021-conversion.sh | 14 ++++++++++++++
 wrapper.c             | 12 ++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..b92e6cb 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,18 @@ test_expect_success 'required filter clean failure' '
 	test_must_fail git add test.fc
 '
 
+test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE
+
+test_expect_success EXPENSIVE 'filter large file' '
+	git config filter.largefile.smudge cat &&
+	git config filter.largefile.clean cat &&
+	for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
+	echo "2GB filter=largefile" >.gitattributes &&
+	git add 2GB 2>err &&
+	! test -s err &&
+	rm -f 2GB &&
+	git checkout -- 2GB 2>err &&
+	! test -s err
+'
+
 test_done
diff --git a/wrapper.c b/wrapper.c
index 6a015de..97e3cf7 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -131,6 +131,14 @@ void *xcalloc(size_t nmemb, size_t size)
 }
 
 /*
+ * Limit size of IO chunks, because huge chunks only cause pain.  OS X 64-bit
+ * buggy, returning EINVAL if len >= INT_MAX; and even in the absense of bugs,
+ * large chunks can result in bad latencies when you decide to kill the
+ * process.
+ */
+#define MAX_IO_SIZE (8*1024*1024)
+
+/*
  * xread() is the same a read(), but it automatically restarts read()
  * operations with a recoverable error (EAGAIN and EINTR). xread()
  * DOES NOT GUARANTEE that "len" bytes is read even if the data is available.
@@ -138,6 +146,8 @@ void *xcalloc(size_t nmemb, size_t size)
 ssize_t xread(int fd, void *buf, size_t len)
 {
 	ssize_t nr;
+	if (len > MAX_IO_SIZE)
+	    len = MAX_IO_SIZE;
 	while (1) {
 		nr = read(fd, buf, len);
 		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
@@ -154,6 +164,8 @@ ssize_t xread(int fd, void *buf, size_t len)
 ssize_t xwrite(int fd, const void *buf, size_t len)
 {
 	ssize_t nr;
+	if (len > MAX_IO_SIZE)
+	    len = MAX_IO_SIZE;
 	while (1) {
 		nr = write(fd, buf, len);
 		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
-- 
1.8.4.rc3.5.g4f480ff

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 2/2] Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU"
  2013-08-20  6:43       ` [PATCH v5 0/2] Fix IO of >=2GB on Mac OS X by limiting IO chunks Steffen Prohaska
  2013-08-20  6:43         ` [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X Steffen Prohaska
@ 2013-08-20  6:43         ` Steffen Prohaska
  2013-08-21 13:46         ` [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo Steffen Prohaska
  2 siblings, 0 replies; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-20  6:43 UTC (permalink / raw)
  To: Junio C Hamano, Linus Torvalds
  Cc: git, Johannes Sixt, John Keeping, Jonathan Nieder, Kyle J. McKay,
	Torsten Bögershausen, Eric Sunshine, Steffen Prohaska

The previous commit introduced a size limit on IO chunks on all
platforms.  The compat clipped_write() is not needed anymore.

This reverts commit 6c642a878688adf46b226903858b53e2d31ac5c3.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 Makefile               |  8 --------
 compat/clipped-write.c | 13 -------------
 config.mak.uname       |  1 -
 git-compat-util.h      |  5 -----
 4 files changed, 27 deletions(-)
 delete mode 100644 compat/clipped-write.c

diff --git a/Makefile b/Makefile
index 3588ca1..4026211 100644
--- a/Makefile
+++ b/Makefile
@@ -69,9 +69,6 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
-# Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
-# INT_MAX bytes at once (e.g. MacOS X).
-#
 # Define HAVE_PATHS_H if you have paths.h and want to use the default PATH
 # it specifies.
 #
@@ -1493,11 +1490,6 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
 	MSGFMT += --check --statistics
 endif
 
-ifdef NEEDS_CLIPPED_WRITE
-	BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
-	COMPAT_OBJS += compat/clipped-write.o
-endif
-
 ifneq (,$(XDL_FAST_HASH))
 	BASIC_CFLAGS += -DXDL_FAST_HASH
 endif
diff --git a/compat/clipped-write.c b/compat/clipped-write.c
deleted file mode 100644
index b8f98ff..0000000
--- a/compat/clipped-write.c
+++ /dev/null
@@ -1,13 +0,0 @@
-#include "../git-compat-util.h"
-#undef write
-
-/*
- * Version of write that will write at most INT_MAX bytes.
- * Workaround a xnu bug on Mac OS X
- */
-ssize_t clipped_write(int fildes, const void *buf, size_t nbyte)
-{
-	if (nbyte > INT_MAX)
-		nbyte = INT_MAX;
-	return write(fildes, buf, nbyte);
-}
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..7d61531 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,7 +95,6 @@ ifeq ($(uname_S),Darwin)
 	NO_MEMMEM = YesPlease
 	USE_ST_TIMESPEC = YesPlease
 	HAVE_DEV_TTY = YesPlease
-	NEEDS_CLIPPED_WRITE = YesPlease
 	COMPAT_OBJS += compat/precompose_utf8.o
 	BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
 endif
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..96d8881 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,11 +185,6 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
-#ifdef NEEDS_CLIPPED_WRITE
-ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
-#define write(x,y,z) clipped_write((x),(y),(z))
-#endif
-
 #ifdef MKDIR_WO_TRAILING_SLASH
 #define mkdir(a,b) compat_mkdir_wo_trailing_slash((a),(b))
 extern int compat_mkdir_wo_trailing_slash(const char*, mode_t);
-- 
1.8.4.rc3.5.g4f480ff

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X
  2013-08-20  6:43         ` [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X Steffen Prohaska
@ 2013-08-20 19:37           ` Junio C Hamano
  2013-08-21 19:50           ` Torsten Bögershausen
  1 sibling, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2013-08-20 19:37 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Linus Torvalds, git, Johannes Sixt, John Keeping, Jonathan Nieder,
	Kyle J. McKay, Torsten Bögershausen, Eric Sunshine

Steffen Prohaska <prohaska@zib.de> writes:

> diff --git a/wrapper.c b/wrapper.c
> index 6a015de..97e3cf7 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -131,6 +131,14 @@ void *xcalloc(size_t nmemb, size_t size)
>  }
>  
>  /*
> + * Limit size of IO chunks, because huge chunks only cause pain.  OS X 64-bit
> + * buggy, returning EINVAL if len >= INT_MAX; and even in the absense of bugs,

s/buggy/is &/ perhaps?

> + * large chunks can result in bad latencies when you decide to kill the
> + * process.
> + */
> +#define MAX_IO_SIZE (8*1024*1024)
> +
> +/*
>   * xread() is the same a read(), but it automatically restarts read()
>   * operations with a recoverable error (EAGAIN and EINTR). xread()
>   * DOES NOT GUARANTEE that "len" bytes is read even if the data is available.
> @@ -138,6 +146,8 @@ void *xcalloc(size_t nmemb, size_t size)
>  ssize_t xread(int fd, void *buf, size_t len)
>  {
>  	ssize_t nr;
> +	if (len > MAX_IO_SIZE)
> +	    len = MAX_IO_SIZE;
>  	while (1) {
>  		nr = read(fd, buf, len);
>  		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
> @@ -154,6 +164,8 @@ ssize_t xread(int fd, void *buf, size_t len)
>  ssize_t xwrite(int fd, const void *buf, size_t len)
>  {
>  	ssize_t nr;
> +	if (len > MAX_IO_SIZE)
> +	    len = MAX_IO_SIZE;
>  	while (1) {
>  		nr = write(fd, buf, len);
>  		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo
  2013-08-20  6:43       ` [PATCH v5 0/2] Fix IO of >=2GB on Mac OS X by limiting IO chunks Steffen Prohaska
  2013-08-20  6:43         ` [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X Steffen Prohaska
  2013-08-20  6:43         ` [PATCH v5 2/2] Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU" Steffen Prohaska
@ 2013-08-21 13:46         ` Steffen Prohaska
  2013-08-21 13:46           ` [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X Steffen Prohaska
                             ` (2 more replies)
  2 siblings, 3 replies; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-21 13:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Steffen Prohaska

Fixed typo in comment.

Steffen Prohaska (2):
  xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X
  Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU"

 Makefile               |  8 --------
 compat/clipped-write.c | 13 -------------
 config.mak.uname       |  1 -
 git-compat-util.h      |  5 -----
 t/t0021-conversion.sh  | 14 ++++++++++++++
 wrapper.c              | 12 ++++++++++++
 6 files changed, 26 insertions(+), 27 deletions(-)
 delete mode 100644 compat/clipped-write.c

-- 
1.8.4.rc3.5.g4f480ff

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X
  2013-08-21 13:46         ` [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo Steffen Prohaska
@ 2013-08-21 13:46           ` Steffen Prohaska
  2013-08-21 13:46           ` [PATCH v5 2/2] Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU" Steffen Prohaska
  2013-08-21 15:58           ` [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo Junio C Hamano
  2 siblings, 0 replies; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-21 13:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Steffen Prohaska

Previously, filtering 2GB or more through an external filter (see test)
failed on Mac OS X 10.8.4 (12E55) for a 64-bit executable with:

    error: read from external filter cat failed
    error: cannot feed the input to external filter cat
    error: cat died of signal 13
    error: external filter cat failed 141
    error: external filter cat failed

The reason was that read() immediately returns with EINVAL if nbyte >=
2GB.  According to POSIX [1], if the value of nbyte passed to read() is
greater than SSIZE_MAX, the result is implementation-defined.  The write
function has the same restriction [2].  Since OS X still supports
running 32-bit executables, the 32-bit limit (SSIZE_MAX = INT_MAX
= 2GB - 1) seems to be also imposed on 64-bit executables under certain
conditions.  For write, the problem has been addressed earlier [6c642a].

This commit addresses the problem for read() and write() by limiting
size of IO chunks unconditionally on all platforms in xread() and
xwrite().  Large chunks only cause problems, like triggering the OS
X bug or causing latencies when killing the process.  Reasonably sized
smaller chunks have no negative impact on performance.

The compat wrapper clipped_write() introduced earlier [6c642a] is not
needed anymore.  It will be reverted in a separate commit.  The new test
catches read and write problems.

Note that 'git add' exits with 0 even if it prints filtering errors to
stderr.  The test, therefore, checks stderr.  'git add' should probably
be changed (sometime in another commit) to exit with nonzero if
filtering fails.  The test could then be changed to use test_must_fail.

Thanks to the following people for suggestions and testing:

    Johannes Sixt <j6t@kdbg.org>
    John Keeping <john@keeping.me.uk>
    Jonathan Nieder <jrnieder@gmail.com>
    Kyle J. McKay <mackyle@gmail.com>
    Linus Torvalds <torvalds@linux-foundation.org>
    Torsten Bögershausen <tboegi@web.de>

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html

[6c642a] commit 6c642a878688adf46b226903858b53e2d31ac5c3
    compate/clipped-write.c: large write(2) fails on Mac OS X/XNU

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 t/t0021-conversion.sh | 14 ++++++++++++++
 wrapper.c             | 12 ++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e50f0f7..b92e6cb 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -190,4 +190,18 @@ test_expect_success 'required filter clean failure' '
 	test_must_fail git add test.fc
 '
 
+test -n "$GIT_TEST_LONG" && test_set_prereq EXPENSIVE
+
+test_expect_success EXPENSIVE 'filter large file' '
+	git config filter.largefile.smudge cat &&
+	git config filter.largefile.clean cat &&
+	for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
+	echo "2GB filter=largefile" >.gitattributes &&
+	git add 2GB 2>err &&
+	! test -s err &&
+	rm -f 2GB &&
+	git checkout -- 2GB 2>err &&
+	! test -s err
+'
+
 test_done
diff --git a/wrapper.c b/wrapper.c
index 6a015de..66cc727 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -131,6 +131,14 @@ void *xcalloc(size_t nmemb, size_t size)
 }
 
 /*
+ * Limit size of IO chunks, because huge chunks only cause pain.  OS X 64-bit
+ * is buggy, returning EINVAL if len >= INT_MAX; and even in the absense of
+ * bugs, large chunks can result in bad latencies when you decide to kill the
+ * process.
+ */
+#define MAX_IO_SIZE (8*1024*1024)
+
+/*
  * xread() is the same a read(), but it automatically restarts read()
  * operations with a recoverable error (EAGAIN and EINTR). xread()
  * DOES NOT GUARANTEE that "len" bytes is read even if the data is available.
@@ -138,6 +146,8 @@ void *xcalloc(size_t nmemb, size_t size)
 ssize_t xread(int fd, void *buf, size_t len)
 {
 	ssize_t nr;
+	if (len > MAX_IO_SIZE)
+	    len = MAX_IO_SIZE;
 	while (1) {
 		nr = read(fd, buf, len);
 		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
@@ -154,6 +164,8 @@ ssize_t xread(int fd, void *buf, size_t len)
 ssize_t xwrite(int fd, const void *buf, size_t len)
 {
 	ssize_t nr;
+	if (len > MAX_IO_SIZE)
+	    len = MAX_IO_SIZE;
 	while (1) {
 		nr = write(fd, buf, len);
 		if ((nr < 0) && (errno == EAGAIN || errno == EINTR))
-- 
1.8.4.rc3.5.g4f480ff

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v5 2/2] Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU"
  2013-08-21 13:46         ` [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo Steffen Prohaska
  2013-08-21 13:46           ` [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X Steffen Prohaska
@ 2013-08-21 13:46           ` Steffen Prohaska
  2013-08-21 15:58           ` [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo Junio C Hamano
  2 siblings, 0 replies; 37+ messages in thread
From: Steffen Prohaska @ 2013-08-21 13:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Steffen Prohaska

The previous commit introduced a size limit on IO chunks on all
platforms.  The compat clipped_write() is not needed anymore.

This reverts commit 6c642a878688adf46b226903858b53e2d31ac5c3.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 Makefile               |  8 --------
 compat/clipped-write.c | 13 -------------
 config.mak.uname       |  1 -
 git-compat-util.h      |  5 -----
 4 files changed, 27 deletions(-)
 delete mode 100644 compat/clipped-write.c

diff --git a/Makefile b/Makefile
index 3588ca1..4026211 100644
--- a/Makefile
+++ b/Makefile
@@ -69,9 +69,6 @@ all::
 # Define NO_MSGFMT_EXTENDED_OPTIONS if your implementation of msgfmt
 # doesn't support GNU extensions like --check and --statistics
 #
-# Define NEEDS_CLIPPED_WRITE if your write(2) cannot write more than
-# INT_MAX bytes at once (e.g. MacOS X).
-#
 # Define HAVE_PATHS_H if you have paths.h and want to use the default PATH
 # it specifies.
 #
@@ -1493,11 +1490,6 @@ ifndef NO_MSGFMT_EXTENDED_OPTIONS
 	MSGFMT += --check --statistics
 endif
 
-ifdef NEEDS_CLIPPED_WRITE
-	BASIC_CFLAGS += -DNEEDS_CLIPPED_WRITE
-	COMPAT_OBJS += compat/clipped-write.o
-endif
-
 ifneq (,$(XDL_FAST_HASH))
 	BASIC_CFLAGS += -DXDL_FAST_HASH
 endif
diff --git a/compat/clipped-write.c b/compat/clipped-write.c
deleted file mode 100644
index b8f98ff..0000000
--- a/compat/clipped-write.c
+++ /dev/null
@@ -1,13 +0,0 @@
-#include "../git-compat-util.h"
-#undef write
-
-/*
- * Version of write that will write at most INT_MAX bytes.
- * Workaround a xnu bug on Mac OS X
- */
-ssize_t clipped_write(int fildes, const void *buf, size_t nbyte)
-{
-	if (nbyte > INT_MAX)
-		nbyte = INT_MAX;
-	return write(fildes, buf, nbyte);
-}
diff --git a/config.mak.uname b/config.mak.uname
index b27f51d..7d61531 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -95,7 +95,6 @@ ifeq ($(uname_S),Darwin)
 	NO_MEMMEM = YesPlease
 	USE_ST_TIMESPEC = YesPlease
 	HAVE_DEV_TTY = YesPlease
-	NEEDS_CLIPPED_WRITE = YesPlease
 	COMPAT_OBJS += compat/precompose_utf8.o
 	BASIC_CFLAGS += -DPRECOMPOSE_UNICODE
 endif
diff --git a/git-compat-util.h b/git-compat-util.h
index 115cb1d..96d8881 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -185,11 +185,6 @@ typedef unsigned long uintptr_t;
 #define probe_utf8_pathname_composition(a,b)
 #endif
 
-#ifdef NEEDS_CLIPPED_WRITE
-ssize_t clipped_write(int fildes, const void *buf, size_t nbyte);
-#define write(x,y,z) clipped_write((x),(y),(z))
-#endif
-
 #ifdef MKDIR_WO_TRAILING_SLASH
 #define mkdir(a,b) compat_mkdir_wo_trailing_slash((a),(b))
 extern int compat_mkdir_wo_trailing_slash(const char*, mode_t);
-- 
1.8.4.rc3.5.g4f480ff

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo
  2013-08-21 13:46         ` [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo Steffen Prohaska
  2013-08-21 13:46           ` [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X Steffen Prohaska
  2013-08-21 13:46           ` [PATCH v5 2/2] Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU" Steffen Prohaska
@ 2013-08-21 15:58           ` Junio C Hamano
  2 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2013-08-21 15:58 UTC (permalink / raw)
  To: Steffen Prohaska; +Cc: git

Steffen Prohaska <prohaska@zib.de> writes:

> Fixed typo in comment.

Thanks, and sorry for not being clear that I'll locally tweak before
queuing when I commented on v5 yesterday.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X
  2013-08-20  6:43         ` [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X Steffen Prohaska
  2013-08-20 19:37           ` Junio C Hamano
@ 2013-08-21 19:50           ` Torsten Bögershausen
  1 sibling, 0 replies; 37+ messages in thread
From: Torsten Bögershausen @ 2013-08-21 19:50 UTC (permalink / raw)
  To: Steffen Prohaska
  Cc: Junio C Hamano, Linus Torvalds, git, Johannes Sixt, John Keeping,
	Jonathan Nieder, Kyle J. McKay, Torsten Bögershausen,
	Eric Sunshine

On 2013-08-20 08.43, Steffen Prohaska wrote:
[]
Thanks for V5. It was tested OK on my system here.
(And apologies for recommending a wrapper on top of a wrapper).

One question is left: 
As xread() is tolerant against EAGAIN and especially EINTR,
could it make sense to replace read() with xread() everywhere?

(The risk for getting EINTR is smaller when we only read a small amount
of data, but it is more on the safe side)

And s/write/xwrite/

/Torsten

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v4] compat: Fix read() of 2GB and more on Mac OS X
  2013-08-19 22:51             ` Linus Torvalds
@ 2013-08-27  4:59               ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2013-08-27  4:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kyle J. McKay, Steffen Prohaska, Git Mailing List, Johannes Sixt,
	John Keeping, Jonathan Nieder, Torsten Bögershausen,
	Eric Sunshine

Linus Torvalds <torvalds@linux-foundation.org> writes:

> So it would probably be a great idea to make the filtering code able
> to do things in smaller chunks, but I suspect that the patch to chunk
> up xread/xwrite is the right thing to do anyway.

Yes and yes, but the first yes is a bit tricky for writing things
out, as the recipient of the filter knows the size of the input but
not of the output, and both loose and packed objects needs to record
the length of the object at the very beginning.

Even though our streaming API allows to write new objects directly
to a packfile, for user-specified filters, CRLF, and ident can make
the size of the output unknown before processing all the data, so
the best we could do for these would be to stream to a temporary
file and then copy it again with the length header (undeltified
packed object deflates only the payload, so this "copy" can
literally be a byte-for-byte copy, after writing the in-pack header
out).

As reading from the object store and writing it out to the
filesystem (i.e. entry.c::write_entry() codepath) does not need to
know the output size, convert.c::get_stream_filter() might want to
be told in which direction a filter is asked for and return a
streaming filter back even when those filters that are problematic
for the opposite, writing-to-object-store direction.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2013-08-27  4:59 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-17 12:40 [PATCH] xread(): Fix read error when filtering >= 2GB on Mac OS X Steffen Prohaska
2013-08-17 15:27 ` John Keeping
2013-08-17 15:56 ` Torsten Bögershausen
2013-08-17 17:16 ` Johannes Sixt
2013-08-17 18:57 ` Jonathan Nieder
2013-08-17 20:25 ` Kyle J. McKay
2013-08-17 21:23   ` Jonathan Nieder
2013-08-19  6:38 ` [PATCH v2] compat: Fix read() of 2GB and more " Steffen Prohaska
2013-08-19  7:54   ` John Keeping
2013-08-19  8:20     ` Steffen Prohaska
2013-08-19  8:20   ` Johannes Sixt
2013-08-19  8:25     ` Stefan Beller
2013-08-19  8:40       ` Johannes Sixt
2013-08-19  8:28     ` Steffen Prohaska
2013-08-19  8:21   ` [PATCH v3] " Steffen Prohaska
2013-08-19 13:59     ` Eric Sunshine
2013-08-19 16:33       ` Junio C Hamano
2013-08-19 15:41     ` [PATCH v4] " Steffen Prohaska
2013-08-19 16:04       ` Linus Torvalds
2013-08-19 16:37         ` Steffen Prohaska
2013-08-19 17:24           ` Junio C Hamano
2013-08-19 17:16         ` Junio C Hamano
2013-08-19 17:28           ` Linus Torvalds
2013-08-19 21:56           ` Kyle J. McKay
2013-08-19 22:51             ` Linus Torvalds
2013-08-27  4:59               ` Junio C Hamano
2013-08-20  6:43       ` [PATCH v5 0/2] Fix IO of >=2GB on Mac OS X by limiting IO chunks Steffen Prohaska
2013-08-20  6:43         ` [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X Steffen Prohaska
2013-08-20 19:37           ` Junio C Hamano
2013-08-21 19:50           ` Torsten Bögershausen
2013-08-20  6:43         ` [PATCH v5 2/2] Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU" Steffen Prohaska
2013-08-21 13:46         ` [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo Steffen Prohaska
2013-08-21 13:46           ` [PATCH v5 1/2] xread, xwrite: Limit size of IO, fixing IO of 2GB and more on Mac OS X Steffen Prohaska
2013-08-21 13:46           ` [PATCH v5 2/2] Revert "compate/clipped-write.c: large write(2) fails on Mac OS X/XNU" Steffen Prohaska
2013-08-21 15:58           ` [PATCH v6 0/2] Fix IO >= 2GB on Mac, fixed typo Junio C Hamano
2013-08-19  8:27   ` [PATCH v2] compat: Fix read() of 2GB and more on Mac OS X Johannes Sixt
2013-08-19 14:41   ` Torsten Bögershausen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).