git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Bug? Error during commit
       [not found] <trinity-cb66d9d6-9035-4c98-948e-6857a7bd4de2-1517838396145@3c-app-gmx-bs16>
@ 2018-02-05 13:48 ` Andreas Kalz
  2018-02-05 13:59   ` Duy Nguyen
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Kalz @ 2018-02-05 13:48 UTC (permalink / raw)
  To: git

Hello,
 
I am using git frequently and usually it is running great.
 
I read to write to this eMail address regarding problems and possible bugs.
I am using git version 2.16.1.windows.2 / 64 Bit and during commit the following error message comes up:
e:\Internet>git commit -m 2018-01-27
fatal: unable to generate diffstat for Thunderbird/andreas-kalz.de/Mail/pop.gmx.net/Inbox
[master f74cf30] 2018-01-27
 
I also tried this before with an older git version with same problem.
 
Can you help me with this problem please? Thanks in advance.
 
All the best,
Andreas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug? Error during commit
  2018-02-05 13:48 ` Bug? Error during commit Andreas Kalz
@ 2018-02-05 13:59   ` Duy Nguyen
  2018-02-07 20:45     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Duy Nguyen @ 2018-02-05 13:59 UTC (permalink / raw)
  To: Andreas Kalz; +Cc: Git Mailing List

On Mon, Feb 5, 2018 at 8:48 PM, Andreas Kalz <andreas-kalz@gmx.de> wrote:
> Hello,
>
> I am using git frequently and usually it is running great.
>
> I read to write to this eMail address regarding problems and possible bugs.
> I am using git version 2.16.1.windows.2 / 64 Bit and during commit the following error message comes up:
> e:\Internet>git commit -m 2018-01-27
> fatal: unable to generate diffstat for Thunderbird/andreas-kalz.de/Mail/pop.gmx.net/Inbox
> [master f74cf30] 2018-01-27
>
> I also tried this before with an older git version with same problem.
>
> Can you help me with this problem please? Thanks in advance.

I think if you add -q to that "git commit" command, diffstat is not
generated and you can get past that. If that particular commit can be
published in public, it'll help us find out why diffstat could not be
generated.
-- 
Duy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug? Error during commit
  2018-02-05 13:59   ` Duy Nguyen
@ 2018-02-07 20:45     ` Jeff King
  2018-02-10 10:21       ` Duy Nguyen
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2018-02-07 20:45 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Andreas Kalz, Git Mailing List

On Mon, Feb 05, 2018 at 08:59:52PM +0700, Duy Nguyen wrote:

> On Mon, Feb 5, 2018 at 8:48 PM, Andreas Kalz <andreas-kalz@gmx.de> wrote:
> > Hello,
> >
> > I am using git frequently and usually it is running great.
> >
> > I read to write to this eMail address regarding problems and possible bugs.
> > I am using git version 2.16.1.windows.2 / 64 Bit and during commit the following error message comes up:
> > e:\Internet>git commit -m 2018-01-27
> > fatal: unable to generate diffstat for Thunderbird/andreas-kalz.de/Mail/pop.gmx.net/Inbox
> > [master f74cf30] 2018-01-27
> >
> > I also tried this before with an older git version with same problem.
> >
> > Can you help me with this problem please? Thanks in advance.
> 
> I think if you add -q to that "git commit" command, diffstat is not
> generated and you can get past that. If that particular commit can be
> published in public, it'll help us find out why diffstat could not be
> generated.

I think that's the first time I've seen that particular error. :)

I think the only reason that xdiff would report failure is if malloc()
failed, or if one of the files exceeds MAX_XDIFF_SIZE, which is ~1GB.
I think we'd usually avoid doing a text diff on anything over
core.bigFileThreshold, though.

But it doesn't seem to work:

  $ yes | head -c $((1024*1024*1024 - 10*1024*1024)) >file
  $ git add file
  $ git commit -m one
  $ yes | head -c $((1024*1024*1024)) >file
  $ git commit -am two
  fatal: unable to generate diffstat for file

What's weird is that if I run "git show --stat" on the same commit, it
works! So there's something about how commit invokes the diff that
doesn't let the big-file check kick in.

It looks like the logic in diff_filespec_is_binary() will only check
big_file_threshold if we haven't already loaded the contents into RAM.
So "commit" does that, but "diff" is more careful about not loading the
file contents.

I think we probably ought to consider anything over big_file_threshold
to be binary, no matter what. Possibly even if the user gave us a
.gitattribute that says "no really, this is text". Because that 1GB
limit is a hard limit that the code can't cope with; our options are
either to generate a binary diff or to die.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug? Error during commit
  2018-02-07 20:45     ` Jeff King
@ 2018-02-10 10:21       ` Duy Nguyen
  2018-02-10 12:16         ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Duy Nguyen @ 2018-02-10 10:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Andreas Kalz, Git Mailing List

On Thu, Feb 8, 2018 at 3:45 AM, Jeff King <peff@peff.net> wrote:
> On Mon, Feb 05, 2018 at 08:59:52PM +0700, Duy Nguyen wrote:
>
>> On Mon, Feb 5, 2018 at 8:48 PM, Andreas Kalz <andreas-kalz@gmx.de> wrote:
>> > Hello,
>> >
>> > I am using git frequently and usually it is running great.
>> >
>> > I read to write to this eMail address regarding problems and possible bugs.
>> > I am using git version 2.16.1.windows.2 / 64 Bit and during commit the following error message comes up:
>> > e:\Internet>git commit -m 2018-01-27
>> > fatal: unable to generate diffstat for Thunderbird/andreas-kalz.de/Mail/pop.gmx.net/Inbox
>> > [master f74cf30] 2018-01-27
>> >
>> > I also tried this before with an older git version with same problem.
>> >
>> > Can you help me with this problem please? Thanks in advance.
>>
>> I think if you add -q to that "git commit" command, diffstat is not
>> generated and you can get past that. If that particular commit can be
>> published in public, it'll help us find out why diffstat could not be
>> generated.
>
> I think that's the first time I've seen that particular error. :)
>
> I think the only reason that xdiff would report failure is if malloc()
> failed, or if one of the files exceeds MAX_XDIFF_SIZE, which is ~1GB.
> I think we'd usually avoid doing a text diff on anything over
> core.bigFileThreshold, though.
>
> But it doesn't seem to work:
>
>   $ yes | head -c $((1024*1024*1024 - 10*1024*1024)) >file
>   $ git add file
>   $ git commit -m one
>   $ yes | head -c $((1024*1024*1024)) >file
>   $ git commit -am two
>   fatal: unable to generate diffstat for file
>
> What's weird is that if I run "git show --stat" on the same commit, it
> works! So there's something about how commit invokes the diff that
> doesn't let the big-file check kick in.

I have a flashback about checking big_file_threshold in this code. I
vaguely remember doing something in this code but not sure if it was
merged.

I finally found 6bf3b81348 (diff --stat: mark any file larger than
core.bigfilethreshold binary - 2014-08-16) so it's merged after all,
but this commit is about --stat apparently ;-)

> It looks like the logic in diff_filespec_is_binary() will only check
> big_file_threshold if we haven't already loaded the contents into RAM.
> So "commit" does that, but "diff" is more careful about not loading the
> file contents.
>
> I think we probably ought to consider anything over big_file_threshold
> to be binary, no matter what. Possibly even if the user gave us a
> .gitattribute that says "no really, this is text". Because that 1GB
> limit is a hard limit that the code can't cope with; our options are
> either to generate a binary diff or to die.

Agreed. While at there perhaps we need to improve this "unable to
generate diffstat" message a bit too.
-- 
Duy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Bug? Error during commit
  2018-02-10 10:21       ` Duy Nguyen
@ 2018-02-10 12:16         ` Jeff King
  2018-02-14 17:31           ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2018-02-10 12:16 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Andreas Kalz, Git Mailing List

On Sat, Feb 10, 2018 at 05:21:05PM +0700, Duy Nguyen wrote:

> > But it doesn't seem to work:
> >
> >   $ yes | head -c $((1024*1024*1024 - 10*1024*1024)) >file
> >   $ git add file
> >   $ git commit -m one
> >   $ yes | head -c $((1024*1024*1024)) >file
> >   $ git commit -am two
> >   fatal: unable to generate diffstat for file
> >
> > What's weird is that if I run "git show --stat" on the same commit, it
> > works! So there's something about how commit invokes the diff that
> > doesn't let the big-file check kick in.
> 
> I have a flashback about checking big_file_threshold in this code. I
> vaguely remember doing something in this code but not sure if it was
> merged.
> 
> I finally found 6bf3b81348 (diff --stat: mark any file larger than
> core.bigfilethreshold binary - 2014-08-16) so it's merged after all,
> but this commit is about --stat apparently ;-)

The confounding factor here is actually the break-detection that
"commit" does. After running the "commit" above (which does succeed
despite the "fatal", since that happens only as part of the diff summary
we print), you can replicate the problem with "git show -B --stat".

The break-detection code unconditionally loads the file data. Then when
the stat code later checks whether it's binary, it just uses the data
as-is. So that leads me to a few questions / lines of thought:

  1. When we're checking binary-ness, we shouldn't assume if the data
     is loaded that the size is sane. We should check it against
     big_file_threshold.

  2. Should break detection (and probably inexact rename detection) skip
     files that are over big_file_threshold?

     I think our code is capable of actually performing these operations
     on large files (at least on systems with 64-bit ulong; I'd be
     willing to bet you get funny results due to integer overflow on
     32-bit systems or on 64-bit Windows). But I'm not sure it's doing
     anybody any good. And it's way faster not to. For example, here's
     "git show" before and after the patch below (running on the repo
     from my example):

          $ time git show --oneline -B --stat | cat
          fatal: unable to generate diffstat for file
          883cbdc two
          
          real	0m10.153s
          user	0m9.929s
          sys	0m0.224s
          
          peff@sigill:~/tmp [master]
          $ time git.compile show --oneline -B --stat | cat
          883cbdc two
           file | Bin 1063256064 -> 1073741824 bytes
           1 file changed, 0 insertions(+), 0 deletions(-)
          
          real	0m0.008s
          user	0m0.002s
          sys	0m0.010s

     We can produce the useful answer in a fraction of the time, since
     we don't even need to load the blob content. The downside is that
     in theory we could break-detect these, and then find renames. So I
     guess it comes down to the philosophy of core.bigfilethreshold: how
     much effort are we willing to put into such files on the off chance
     that they produce a useful output?

     If we go this route, then in theory the fix in (1) becomes
     redundant, though I'd probably do both just as a
     belt-and-suspenders thing.

  3. To what degree should we override the user's config to treat these
     files as binary. If I set core.bigfilethreshold to "10G", or if I
     use gitattributes to mark the file as non-binary, then we're still
     going to feed it to xdiff (which is going to choke and die).

     Should we override these when we approach MAX_DIFF_SIZE, and just
     treat the file as binary? Should we barf and tell the user to fix
     their config?

     I guess I argued for overriding attributes earlier, and that
     probably ought to be independent of core.bigfilethreshold. Perhaps
     we should issue a warning in that case, to let the user know their
     config is nonsense.

> > I think we probably ought to consider anything over big_file_threshold
> > to be binary, no matter what. Possibly even if the user gave us a
> > .gitattribute that says "no really, this is text". Because that 1GB
> > limit is a hard limit that the code can't cope with; our options are
> > either to generate a binary diff or to die.
> 
> Agreed. While at there perhaps we need to improve this "unable to
> generate diffstat" message a bit too.

One reason the message is so vague is that we don't actually have any
kind of error code. Though really the only reason for xdiff to fail is
due to file size. So we could perhaps offer some advice there. But if we
do all the things I suggested above, then we'd automatically handle all
the cases we know about.

so my hope was that we would make it impossible to trigger this message.
In which case it maybe ought to be a BUG(). :)

Here's the patch to make "show -B --stat" work. I'll give some more
thought to whether this is a good idea and prepare a series.

One downside is that in the common case it causes us to look up each
object twice (once to get its size, and then again to load the content).
I wonder if we should have a function for "read this object, unless it's
over N bytes, in which case just tell me the size". That's weirdly
specific, but I think pretty much every user of core.bigfilethreshold
would want it.

---
diff --git a/diffcore-break.c b/diffcore-break.c
index c64359f489..35f5b50bcc 100644
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -61,6 +61,13 @@ static int should_break(struct diff_filespec *src,
 	    !oidcmp(&src->oid, &dst->oid))
 		return 0; /* they are the same */
 
+	if (diff_populate_filespec(src, CHECK_SIZE_ONLY) ||
+	    diff_populate_filespec(dst, CHECK_SIZE_ONLY))
+		return 0; /* error but caught downstream */
+
+	if (src->size > big_file_threshold || dst->size > big_file_threshold)
+		return 0; /* too big to be worth computation */
+
 	if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0))
 		return 0; /* error but caught downstream */
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Bug? Error during commit
  2018-02-10 12:16         ` Jeff King
@ 2018-02-14 17:31           ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2018-02-14 17:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Duy Nguyen, Andreas Kalz, Git Mailing List

Jeff King <peff@peff.net> writes:

> Here's the patch to make "show -B --stat" work. I'll give some more
> thought to whether this is a good idea and prepare a series.
>
> One downside is that in the common case it causes us to look up each
> object twice (once to get its size, and then again to load the content).
> I wonder if we should have a function for "read this object, unless it's
> over N bytes, in which case just tell me the size". That's weirdly
> specific, but I think pretty much every user of core.bigfilethreshold
> would want it.

After reading through "git grep" hits for the global variable, I
think it makes sense to have such a helper with a good name without
configurable N (just use big_file_threshold that is global).  The
user of the interface either punt on size like this caller, or
would switch to the streaming interface.

>
> ---
> diff --git a/diffcore-break.c b/diffcore-break.c
> index c64359f489..35f5b50bcc 100644
> --- a/diffcore-break.c
> +++ b/diffcore-break.c
> @@ -61,6 +61,13 @@ static int should_break(struct diff_filespec *src,
>  	    !oidcmp(&src->oid, &dst->oid))
>  		return 0; /* they are the same */
>  
> +	if (diff_populate_filespec(src, CHECK_SIZE_ONLY) ||
> +	    diff_populate_filespec(dst, CHECK_SIZE_ONLY))
> +		return 0; /* error but caught downstream */
> +
> +	if (src->size > big_file_threshold || dst->size > big_file_threshold)
> +		return 0; /* too big to be worth computation */
> +
>  	if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0))
>  		return 0; /* error but caught downstream */
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-02-14 17:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <trinity-cb66d9d6-9035-4c98-948e-6857a7bd4de2-1517838396145@3c-app-gmx-bs16>
2018-02-05 13:48 ` Bug? Error during commit Andreas Kalz
2018-02-05 13:59   ` Duy Nguyen
2018-02-07 20:45     ` Jeff King
2018-02-10 10:21       ` Duy Nguyen
2018-02-10 12:16         ` Jeff King
2018-02-14 17:31           ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).