git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* data loss when doing ls-remote and piped to command
@ 2021-09-15 12:43 Rolf Eike Beer
  2021-09-15 18:17 ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Rolf Eike Beer @ 2021-09-15 12:43 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2250 bytes --]

The given repository is a clone of the vanilla kernel.

/usr/bin/git --git-dir=/home/ebeer/repos/upstream/linux/.git ls-remote origin 2>&1 | less

And I then see things like this:

6f38b5d6cfd43dde3058a10c68baae9cf17af912        refs/tags/v5.0-rc2
1c7fc5cbc33980acd13ae83d0b416db002fe95601e7f97f64b59514d936     refs/tags/v5.7-rc2^{}
d0709bb6da2ab6d49b11643e98abdf79b1a2817f        refs/tags/v5.7-rc3

This also happens when I cd into the repository and just run "git ls-remote" 
on some of my repositories, but much less often.

The remainder of the overly long line is the correct id for that tag. The 
error does not happen on every run, and on some of my repositories it also 
differs from run to run on which tag that happens. Here it seems that it is 
quite stable to happening on this tag. However, a different user on the same 
machine running the very same command had it happen on v5.7-rc3.

I have the same on my laptop, both run on Opensuse Tumbleweed, updated at the 
same time this morning. This seems to be quite fragile regarding latency or 
such: I can reproduce it with our internal git server, but not with 
kernel.org. This is not bound to less, we originally observed the error on a 
entirely different tool that tried to parse the output of ls-remote.

Given that there are quite a lot of tags missing I suspect it may be that the 
pipe handling is somewhere broken, i.e. too much data is written to a pipe 
that is already full. I have not been able to provoke that using pv by rate 
limiting the output so far.

[System Info]
git Version:
git version 2.33.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 5.14.1-1-default #1 SMP Sat Sep 4 08:22:51 UTC 2021 (67af907) x86_64
Compiler Info: gnuc: 11.2
libc Info: glibc: 2.33
$SHELL (typically, interactive shell): /bin/bash

[no hooks]

-- 
Rolf Eike Beer, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055

emlix - smart embedded open source

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 313 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: data loss when doing ls-remote and piped to command
  2021-09-15 12:43 data loss when doing ls-remote and piped to command Rolf Eike Beer
@ 2021-09-15 18:17 ` Junio C Hamano
  2021-09-16  6:38   ` Rolf Eike Beer
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2021-09-15 18:17 UTC (permalink / raw)
  To: git; +Cc: Rolf Eike Beer

Rolf Eike Beer <eb@emlix.com> writes:

> The given repository is a clone of the vanilla kernel.
>
> /usr/bin/git --git-dir=/home/ebeer/repos/upstream/linux/.git ls-remote origin 2>&1 | less
>
> And I then see things like this:
>
> 6f38b5d6cfd43dde3058a10c68baae9cf17af912        refs/tags/v5.0-rc2
> 1c7fc5cbc33980acd13ae83d0b416db002fe95601e7f97f64b59514d936     refs/tags/v5.7-rc2^{}
> d0709bb6da2ab6d49b11643e98abdf79b1a2817f        refs/tags/v5.7-rc3

Not offering any solution, just an observation of the problem and
annotating the report.

What we see on the second line is the beginning of peeled
v5.0-rc2^{} up to the "acd13" (that is, the first 19 bytes of the
line), followed by the full line for peeled v5.7-rc2^{} (which
begins with "ae83d").  12407 bytes in between are missing, which
is even more puzzling as it is not a nice round number.

I wonder if this is "less" misconfigured and misbehaving.  Did the
user after seeing v5.7-* tags scroll back with 'b' or something?

If the output (including the 2>&1 redirection) is sent to a file and
then "cat <that-file" is invoked, does the same thing happen?  How
about "cat <that-file | less"?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: data loss when doing ls-remote and piped to command
  2021-09-15 18:17 ` Junio C Hamano
@ 2021-09-16  6:38   ` Rolf Eike Beer
  2021-09-16 10:12     ` Tobias Ulmer
  0 siblings, 1 reply; 13+ messages in thread
From: Rolf Eike Beer @ 2021-09-16  6:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2290 bytes --]

Am Mittwoch, 15. September 2021, 20:17:42 CEST schrieb Junio C Hamano:
> Rolf Eike Beer <eb@emlix.com> writes:
> > The given repository is a clone of the vanilla kernel.
> > 
> > /usr/bin/git --git-dir=/home/ebeer/repos/upstream/linux/.git ls-remote
> > origin 2>&1 | less
> > 
> > And I then see things like this:
> > 
> > 6f38b5d6cfd43dde3058a10c68baae9cf17af912        refs/tags/v5.0-rc2
> > 1c7fc5cbc33980acd13ae83d0b416db002fe95601e7f97f64b59514d936    
> > refs/tags/v5.7-rc2^{} d0709bb6da2ab6d49b11643e98abdf79b1a2817f       
> > refs/tags/v5.7-rc3
> Not offering any solution, just an observation of the problem and
> annotating the report.
> 
> What we see on the second line is the beginning of peeled
> v5.0-rc2^{} up to the "acd13" (that is, the first 19 bytes of the
> line), followed by the full line for peeled v5.7-rc2^{} (which
> begins with "ae83d").  12407 bytes in between are missing, which
> is even more puzzling as it is not a nice round number.
> 
> I wonder if this is "less" misconfigured and misbehaving.  Did the
> user after seeing v5.7-* tags scroll back with 'b' or something?

To quote myself:

>> This is not bound to less, we originally observed the error on a 
>> entirely different tool that tried to parse the output of ls-remote.

In fact when less opened I just started to scroll down until I visually 
noticed an error.

> If the output (including the 2>&1 redirection) is sent to a file and
> then "cat <that-file" is invoked, does the same thing happen?  How
> about "cat <that-file | less"?

The redirection seems to be an important part of it. I now did:

git ... 2>&1 | sha256sum

This gives different results basically on every run. I also noticed that 
having more tags makes it easier to reproduce, so a stable kernel in contrast 
to vanilla is a better trigger. Doing that without the stderr redirection gave 
the same result every time I tried.

Regards,

Eike
-- 
Rolf Eike Beer, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055

emlix - smart embedded open source

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 313 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: data loss when doing ls-remote and piped to command
  2021-09-16  6:38   ` Rolf Eike Beer
@ 2021-09-16 10:12     ` Tobias Ulmer
  2021-09-16 12:17       ` Rolf Eike Beer
  0 siblings, 1 reply; 13+ messages in thread
From: Tobias Ulmer @ 2021-09-16 10:12 UTC (permalink / raw)
  To: Rolf Eike Beer, git; +Cc: Junio C Hamano

On 16/09/2021 08:38, Rolf Eike Beer wrote:
...
> The redirection seems to be an important part of it. I now did:
> 
> git ... 2>&1 | sha256sum

I've tried to reproduce this since yesterday, but couldn't until now:

2>&1 made all the difference, took less than a minute.

Different repo, different machine, but also running Tumbleweed 
5.14.1-1-default, git 2.33.0

while [ "`git --git-dir=$PWD/in/linux/.git ls-remote origin 2>&1 | tee 
failed.out | sha1sum`" = "7fa299e589bacdc908395730beff542b0fc684eb  -" 
]; do echo -n .; done
..........

failed.out has multiple lines like this:

--8<--
4e77f7f1261f65cff06918bc5e66d02a418fc842        refs/tags/v3.10.18^{}
f7b8df0cc81cf82a4ac6834225bddbe46a340455a4a5d52f29d08d923ce8d232b0b497da674dd2c 
refs/tags/v3.18
b2776bf7149bddd1f4161f14f79520f17fc1d71d        refs/tags/v3.18^{}
--8<--


Running the same on Archlinux (5.13.13-arch1-1, 2.33.0) doesn't show the 
problem.
This may well turn out not to be git, but a kernel issue.

@Eike: I think at this point you should try to downgrade and see whether 
that makes any difference

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: data loss when doing ls-remote and piped to command
  2021-09-16 10:12     ` Tobias Ulmer
@ 2021-09-16 12:17       ` Rolf Eike Beer
  2021-09-16 15:49         ` Mike Galbraith
  2021-09-16 17:11         ` Linus Torvalds
  0 siblings, 2 replies; 13+ messages in thread
From: Rolf Eike Beer @ 2021-09-16 12:17 UTC (permalink / raw)
  To: git, Linus Torvalds; +Cc: Tobias Ulmer, Junio C Hamano, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1740 bytes --]

Am Donnerstag, 16. September 2021, 12:12:48 CEST schrieb Tobias Ulmer:
> On 16/09/2021 08:38, Rolf Eike Beer wrote:
> ...
> 
> > The redirection seems to be an important part of it. I now did:
> > 
> > git ... 2>&1 | sha256sum
> 
> I've tried to reproduce this since yesterday, but couldn't until now:
> 
> 2>&1 made all the difference, took less than a minute.
> 
> Different repo, different machine, but also running Tumbleweed
> 5.14.1-1-default, git 2.33.0
> 
> while [ "`git --git-dir=$PWD/in/linux/.git ls-remote origin 2>&1 | tee
> failed.out | sha1sum`" = "7fa299e589bacdc908395730beff542b0fc684eb  -"
> ]; do echo -n .; done
> ..........
> 
> failed.out has multiple lines like this:
> 
> --8<--
> 4e77f7f1261f65cff06918bc5e66d02a418fc842        refs/tags/v3.10.18^{}
> f7b8df0cc81cf82a4ac6834225bddbe46a340455a4a5d52f29d08d923ce8d232b0b497da674d
> d2c refs/tags/v3.18
> b2776bf7149bddd1f4161f14f79520f17fc1d71d        refs/tags/v3.18^{}
> --8<--
> 
> 
> Running the same on Archlinux (5.13.13-arch1-1, 2.33.0) doesn't show the
> problem.
> This may well turn out not to be git, but a kernel issue.

Linus,

since you have been hacking around in pipe.c recently, I fear this isn't 
entirely impossible. Have you any idea?

For easier reference, the complete thread is at:

https://public-inbox.org/git/85a103f6-8b3c-2f21-cc0f-04f517c0c9a1@emlix.com/T/

Eike
-- 
Rolf Eike Beer, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055

emlix - smart embedded open source

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 313 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: data loss when doing ls-remote and piped to command
  2021-09-16 12:17       ` Rolf Eike Beer
@ 2021-09-16 15:49         ` Mike Galbraith
  2021-09-17  6:38           ` Mike Galbraith
  2021-09-16 17:11         ` Linus Torvalds
  1 sibling, 1 reply; 13+ messages in thread
From: Mike Galbraith @ 2021-09-16 15:49 UTC (permalink / raw)
  To: Rolf Eike Beer, git, Linus Torvalds
  Cc: Tobias Ulmer, Junio C Hamano, linux-kernel

On Thu, 2021-09-16 at 14:17 +0200, Rolf Eike Beer wrote:
> Am Donnerstag, 16. September 2021, 12:12:48 CEST schrieb Tobias Ulmer:
> > On 16/09/2021 08:38, Rolf Eike Beer wrote:
> > ...
> >
> > > The redirection seems to be an important part of it. I now did:
> > >
> > > git ... 2>&1 | sha256sum
> >
> > I've tried to reproduce this since yesterday, but couldn't until now:
> >
> > 2>&1 made all the difference, took less than a minute.
> >
> > Different repo, different machine, but also running Tumbleweed
> > 5.14.1-1-default, git 2.33.0
> >
> > while [ "`git --git-dir=$PWD/in/linux/.git ls-remote origin 2>&1 | tee
> > failed.out | sha1sum`" = "7fa299e589bacdc908395730beff542b0fc684eb  -"
> > ]; do echo -n .; done
> > ..........
> >
> > failed.out has multiple lines like this:
> >
> > --8<--
> > 4e77f7f1261f65cff06918bc5e66d02a418fc842        refs/tags/v3.10.18^{}
> > f7b8df0cc81cf82a4ac6834225bddbe46a340455a4a5d52f29d08d923ce8d232b0b497da674d
> > d2c refs/tags/v3.18
> > b2776bf7149bddd1f4161f14f79520f17fc1d71d        refs/tags/v3.18^{}
> > --8<--
> >
> >
> > Running the same on Archlinux (5.13.13-arch1-1, 2.33.0) doesn't show the
> > problem.
> > This may well turn out not to be git, but a kernel issue.
>
> Linus,
>
> since you have been hacking around in pipe.c recently, I fear this isn't
> entirely impossible. Have you any idea?
>
> For easier reference, the complete thread is at:
>
> https://public-inbox.org/git/85a103f6-8b3c-2f21-cc0f-04f517c0c9a1@emlix.com/T/
>

I use git-daemon (2.33) and reference clones for my local pile of
kernel trees (74), so out of curiosity, modified the above ls-remote
loop to fit one of them, and tried to reproduce with both master.today
(ff1ffd71) and SUSE's stable branch (where Tumbleweed gets source,
currently at 5.14.4).  Both kernels failed to reproduce given a few
minutes each (zzzz) to do so.  I'm running Leap-15.3 vs Tumbleweed, but
that shouldn't matter.

	-Mike

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: data loss when doing ls-remote and piped to command
  2021-09-16 12:17       ` Rolf Eike Beer
  2021-09-16 15:49         ` Mike Galbraith
@ 2021-09-16 17:11         ` Linus Torvalds
  2021-09-16 20:42           ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2021-09-16 17:11 UTC (permalink / raw)
  To: Rolf Eike Beer
  Cc: Git List Mailing, Tobias Ulmer, Junio C Hamano,
	Linux Kernel Mailing List

On Thu, Sep 16, 2021 at 5:17 AM Rolf Eike Beer <eb@emlix.com> wrote:
>
> Am Donnerstag, 16. September 2021, 12:12:48 CEST schrieb Tobias Ulmer:
> > > The redirection seems to be an important part of it. I now did:
> > >
> > > git ... 2>&1 | sha256sum
> >
> > I've tried to reproduce this since yesterday, but couldn't until now:
> >
> > 2>&1 made all the difference, took less than a minute.

So if that redirection is what matters, and what causes problems, I
can almost guarantee that the reason is very simple:

Your git repository (or more likely your upstream) has some problem,
it's getting reported on stderr, and because you mix stdout and stderr
with that '2>&1', you get randomly mixed output.

Then it depends on timing where the mixing happens.

Or rather, it depends on various different factors, like the buffering
done internally by stdio (where stdout generally will be
block-buffered, while stderr is usually line-buffered, which is why
you get odd mixing of the two).

But timing can be an effect particularly with "git ls-remote" and
friends, because you may get errors from the transport asynchronously.

So the different buffering ends up causing the effect of mixing things
in the middle of lines, while the timing differences due to the
asynchronous nature of the remote access pipeline will likely then
cause that odd mixing to be different.

End result: corrupted lines, and different sha256sum every time.

> > Running the same on Archlinux (5.13.13-arch1-1, 2.33.0) doesn't show the
> > problem.
> > This may well turn out not to be git, but a kernel issue.

Much more likely that the other box just doesn't have the error situation.

> since you have been hacking around in pipe.c recently, I fear this isn't
> entirely impossible. Have you any idea?

Almost certainly not the kernel. Kernel - and other - differences
could affect timing, of course, but the whole "2>&1" really is
fundamentally bogus.

If you don't have any errors, then the "2>&1" doesn't matter.

And if you *do* have errors, then by definition the "2>&1" will mix in
the errors with the output randomly and piping them together is
senseless.

Either way, it's wrong.

So what I'd suggest Tobias should do is

    git ... 2> err | sha256sum

which will send the errors to the "err" file. Take a look at that file
afterwards and see what is in it.

Basically, '2&>1" is almost never the right thing to do, unless you
explicitly don't care about the output and just want to suppress it.

So "2&>1 > /dev/null" is common and natural.

Of course, people also use it when they just want to eyeball the
errors mixed in, so doing that

   ... 2&>1 | less

thing isn't necessarily *wrong*, but it's somewhat dangerous and
confusing. Because when you do it you do need to be very aware of the
fact that the errors and output will be *mixed*. And the mixing will
not necessarily be at all sensible.

Finally: pipes on a low level guarantee certain atomicity constraints,
so if you do low-level "write()" calls of size PIPE_BUF or less, the
contents will not be interleaved randomly.  HOWEVER. That's only true
at that "write()" level. The moment you use <stdio> for your IO, you
have buffering inside of the standard IO libraries, and if your code
isn't explicitly very careful about it, using setbuf() and fflush()
and friends, you'll get that random mixing.

Anyway. That was a long email just to tell people it's almost
certainly user error, not the kernel.

            Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: data loss when doing ls-remote and piped to command
  2021-09-16 17:11         ` Linus Torvalds
@ 2021-09-16 20:42           ` Junio C Hamano
  2021-09-17  6:59             ` Rolf Eike Beer
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2021-09-16 20:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rolf Eike Beer, Git List Mailing, Tobias Ulmer,
	Linux Kernel Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, Sep 16, 2021 at 5:17 AM Rolf Eike Beer <eb@emlix.com> wrote:
>>
>> Am Donnerstag, 16. September 2021, 12:12:48 CEST schrieb Tobias Ulmer:
>> > > The redirection seems to be an important part of it. I now did:
>> > >
>> > > git ... 2>&1 | sha256sum
>> >
>> > I've tried to reproduce this since yesterday, but couldn't until now:
>> >
>> > 2>&1 made all the difference, took less than a minute.
> 
> So if that redirection is what matters, and what causes problems, I
> can almost guarantee that the reason is very simple:
> ...
> Anyway. That was a long email just to tell people it's almost
> certainly user error, not the kernel.

Yes, 2>&1 will mix messages from the standard error stream at random
places in the output, which explains the checksum quite well.

I am not sure if it explains the initial report where

	ls-remote 2>&1 | less

produced

    > 6f38b5d6cfd43dde3058a10c68baae9cf17af912        refs/tags/v5.0-rc2
    > 1c7fc5cbc33980acd13ae83d0b416db002fe95601e7f97f64b59514d936     refs/tags/v5.7-rc2^{}
    > d0709bb6da2ab6d49b11643e98abdf79b1a2817f        refs/tags/v5.7-rc3

    What we see on the second line is the beginning of peeled
    v5.0-rc2^{} up to the "acd13" (that is, the first 19 bytes of the
    line), followed by the full line for peeled v5.7-rc2^{} (which
    begins with "ae83d").  12407 bytes in between are missing, which
    is even more puzzling as it is not a nice round number.

I can sort of guess that the progress display during transfer, which
comes out on the standard error stream and uses terminal control
sequences like "go back to the end of the line without feeding a new
line", "erase to the end of the line", etc., would be contributing,
but because it is piped to "less", which would make it "visible"
(i.e. you do not get the raw escape but see three capital letters
ESC in reverse), it does not quite explain how the display was
broken.

In any case, I do not think the kernel is involved, or more
generally I do not think any "loss of output bytes" is happening
here.  It's just "| less" that failed to show a range about 12k
bytes long is mystery to me ;-).



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: data loss when doing ls-remote and piped to command
  2021-09-16 15:49         ` Mike Galbraith
@ 2021-09-17  6:38           ` Mike Galbraith
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Galbraith @ 2021-09-17  6:38 UTC (permalink / raw)
  To: Rolf Eike Beer, git, Linus Torvalds
  Cc: Tobias Ulmer, Junio C Hamano, linux-kernel

On Thu, 2021-09-16 at 17:49 +0200, Mike Galbraith wrote:
> Both kernels failed to reproduce...

Nor did the TW kernel (now 5.14.2-1-default) reproduce, neither in my
Leap-15.3 box, nor in a TW KVM set up to play server.  'course that
doesn't mean there's no kernel bug lurking, means with certainty only
that if there is one, the posted reproducer ain't all that wonderful.

	-Mike

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: data loss when doing ls-remote and piped to command
  2021-09-16 20:42           ` Junio C Hamano
@ 2021-09-17  6:59             ` Rolf Eike Beer
  2021-09-17 19:13               ` Jeff King
                                 ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Rolf Eike Beer @ 2021-09-17  6:59 UTC (permalink / raw)
  To: Linus Torvalds, Junio C Hamano
  Cc: Git List Mailing, Tobias Ulmer, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2746 bytes --]

Am Donnerstag, 16. September 2021, 22:42:22 CEST schrieb Junio C Hamano:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
> > On Thu, Sep 16, 2021 at 5:17 AM Rolf Eike Beer <eb@emlix.com> wrote:
> >> Am Donnerstag, 16. September 2021, 12:12:48 CEST schrieb Tobias Ulmer:
> >> > > The redirection seems to be an important part of it. I now did:
> >> > > 
> >> > > git ... 2>&1 | sha256sum
> >> > 
> >> > I've tried to reproduce this since yesterday, but couldn't until now:
> >> > 
> >> > 2>&1 made all the difference, took less than a minute.
> > 
> > So if that redirection is what matters, and what causes problems, I
> > can almost guarantee that the reason is very simple:
> > ...
> > Anyway. That was a long email just to tell people it's almost
> > certainly user error, not the kernel.
> 
> Yes, 2>&1 will mix messages from the standard error stream at random
> places in the output, which explains the checksum quite well.

If there would be any errors. The point is: if I run the command with ">/dev/
null" just to the terminals a hundred times there is never any output on 
stderr at all. If I pipe stderr into a file it's empty after all of this (yes, 
I did append, not overwrite).

That the particular construct in this case is sort of nonsense is granted, I 
just hit it because some tool here used some very similar construct and 
suddenly started failing. "less" isn't the original reproducer, it was just 
something I started testing with to be able to easily visually inspect the 
output.

What you need is a _fast_ git server. kernel.org or github.com seem to be too 
slow for this if you don't sit somewhere in their datacenter. Use something in 
your local network, a Xeon E5 with lot's of RAM and connected with 1GBit/s 
Ethernet in my case.

And the reader must be "somewhat" slow. Using sha256sum works reliably for me. 
Using "wc -l" does not, also md5sum and sha1sum are too fast as it seems.

When I run the whole thing with strace I can't see the effect, which isn't 
really surprising. But there is a difference between the cases where I run 
with redirection "2>&1":

ioctl(2, TCGETS, 0x7ffd6f119b10)        = -1 ENOTTY (Inappropriate ioctl for 
device)

and without:

ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0

AFAICT this is the only place where fd 2 is used at all during the whole time.

Regards,

Eike
-- 
Rolf Eike Beer, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055

emlix - smart embedded open source

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 313 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: data loss when doing ls-remote and piped to command
  2021-09-17  6:59             ` Rolf Eike Beer
@ 2021-09-17 19:13               ` Jeff King
  2021-09-17 19:28               ` Linus Torvalds
  2021-09-18  6:33               ` Mike Galbraith
  2 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2021-09-17 19:13 UTC (permalink / raw)
  To: Rolf Eike Beer
  Cc: Linus Torvalds, Junio C Hamano, Git List Mailing, Tobias Ulmer,
	Linux Kernel Mailing List

On Fri, Sep 17, 2021 at 08:59:07AM +0200, Rolf Eike Beer wrote:

> What you need is a _fast_ git server. kernel.org or github.com seem to be too 
> slow for this if you don't sit somewhere in their datacenter. Use something in 
> your local network, a Xeon E5 with lot's of RAM and connected with 1GBit/s 
> Ethernet in my case.

One thing that puzzled me here: is the bad output between the server and
ls-remote, or between ls-remote and its output pipe?

I'd guess it has to be the latter, since otherwise ls-remote itself
would barf with an error message.

In that case, I'd think "git ls-remote ." would give you the fastest
outcome, because it's talking to upload-pack on the local box. But I'm
also confused how the speed could matter, as ls-remote reads the entire
input into an in-memory array, and then formats it.

We do the write using printf(). Is it possible your libc's stdio may
drop bytes when the pipe is full, rather than blocking? In general, I'd
expect write() to block, so libc doesn't have to care at all. But might
there be something in your environment putting the pipe into
non-blocking mode, and we get EAGAIN or something? If so, I'd expect
stdio to return the error.

Maybe patching Git like this would help:

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index f4fd823af8..5936b2b42c 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -146,7 +146,8 @@ int cmd_ls_remote(int argc, const char **argv, const char *prefix)
 		const struct ref_array_item *ref = ref_array.items[i];
 		if (show_symref_target && ref->symref)
 			printf("ref: %s\t%s\n", ref->symref, ref->refname);
-		printf("%s\t%s\n", oid_to_hex(&ref->objectname), ref->refname);
+		if (printf("%s\t%s\n", oid_to_hex(&ref->objectname), ref->refname) < 0)
+			die_errno("printf failed");
 		status = 0; /* we found something */
 	}
 

> And the reader must be "somewhat" slow. Using sha256sum works reliably for me. 
> Using "wc -l" does not, also md5sum and sha1sum are too fast as it seems.

If a slow pipe is involved, maybe:

  git ls-remote . | (sleep 5; cat) | sha256sum

would help reproduce. Assuming ls-remote's output is bigger than your
system pipe buffer (which is another interesting thing to check), then
it should block for 5 seconds on write() midway through the output,
which you can verify with strace.

-Peff

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: data loss when doing ls-remote and piped to command
  2021-09-17  6:59             ` Rolf Eike Beer
  2021-09-17 19:13               ` Jeff King
@ 2021-09-17 19:28               ` Linus Torvalds
  2021-09-18  6:33               ` Mike Galbraith
  2 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2021-09-17 19:28 UTC (permalink / raw)
  To: Rolf Eike Beer
  Cc: Junio C Hamano, Git List Mailing, Tobias Ulmer,
	Linux Kernel Mailing List

On Thu, Sep 16, 2021 at 11:59 PM Rolf Eike Beer <eb@emlix.com> wrote:
>
> When I run the whole thing with strace I can't see the effect, which isn't
> really surprising. But there is a difference between the cases where I run
> with redirection "2>&1":
>
> ioctl(2, TCGETS, 0x7ffd6f119b10)        = -1 ENOTTY (Inappropriate ioctl for device)

Ehh. That format of strace implies that you didn't use "strace -f"
(which would have the PID in it).

Although maybe you edited it out.

I think the error output would come from the other process (ssh, or
whatever process you use to run "git-upload-pack" on the other end).

I still strongly doubt it's about pipes - we've had changes to them,
but if they are broken we'd see a lot more breakage than some very
incidental use by git.

But I can easily see it being timing-dependent. And yes, sadly
'strace' can often end up hiding any timing issues because it
obviously slows down the target quite a bit.

Doing "strace -o tracefile -f" in a loop would be interesting if you
can reproduce it (and then stop when you reproduce it, so that the
final 'tracefile' is the one for the case that reproduced it).

            Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: data loss when doing ls-remote and piped to command
  2021-09-17  6:59             ` Rolf Eike Beer
  2021-09-17 19:13               ` Jeff King
  2021-09-17 19:28               ` Linus Torvalds
@ 2021-09-18  6:33               ` Mike Galbraith
  2 siblings, 0 replies; 13+ messages in thread
From: Mike Galbraith @ 2021-09-18  6:33 UTC (permalink / raw)
  To: Rolf Eike Beer, Linus Torvalds, Junio C Hamano
  Cc: Git List Mailing, Tobias Ulmer, Linux Kernel Mailing List

On Fri, 2021-09-17 at 08:59 +0200, Rolf Eike Beer wrote:
>
> What you need is a _fast_ git server. kernel.org or github.com seem to be too
> slow for this if you don't sit somewhere in their datacenter. Use something in
> your local network, a Xeon E5 with lot's of RAM and connected with 1GBit/s
> Ethernet in my case.

Even faster: what's coming across that wire should be a constant (is?),
variable is only delivery/consumption jitter.  If there's really really
a pipe problem lurking, you should also be able to trigger by saving
the data once, and just catting it, letting interrupts etc provide
jitter.  Which stdout is left of '|' in a script shouldn't matter one
whit to the interpreter/kernel conversation, they're all the same.

That said, if I had a reproducer I was confident pointed to the kernel,
I'd try to bisect.. boring as hell, but highly effective.

	-Mike

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-09-18  6:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-15 12:43 data loss when doing ls-remote and piped to command Rolf Eike Beer
2021-09-15 18:17 ` Junio C Hamano
2021-09-16  6:38   ` Rolf Eike Beer
2021-09-16 10:12     ` Tobias Ulmer
2021-09-16 12:17       ` Rolf Eike Beer
2021-09-16 15:49         ` Mike Galbraith
2021-09-17  6:38           ` Mike Galbraith
2021-09-16 17:11         ` Linus Torvalds
2021-09-16 20:42           ` Junio C Hamano
2021-09-17  6:59             ` Rolf Eike Beer
2021-09-17 19:13               ` Jeff King
2021-09-17 19:28               ` Linus Torvalds
2021-09-18  6:33               ` Mike Galbraith

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).