git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [BUG/FEATURE] Git pushing and fetching many more objects than strictly required
@ 2019-11-08 14:06 Paul van Loon
  2019-11-08 18:47 ` Jonathan Tan
  0 siblings, 1 reply; 5+ messages in thread
From: Paul van Loon @ 2019-11-08 14:06 UTC (permalink / raw)
  To: git

$ git --version
git version 2.21.0

When fetching/pushing to a forked repo on Github, I've noticed several times that much more objects were being fetched or pushed than were strictly necessary.
I'm not sure if it's a bug, or just a opportunity for performance improvement.

I got these traces:

$ git fetch --all
Fetching origin
remote: Enumerating objects: 29507, done.
remote: Counting objects: 100% (29507/29507), done.
remote: Compressing objects: 100% (33/33), done.
remote: Total 53914 (delta 29478), reused 29500 (delta 29471), pack-reused 24407
Receiving objects: 100% (53914/53914), 31.90 MiB | 111.00 KiB/s, done.
Resolving deltas: 100% (42462/42462), completed with 7382 local objects.
--

$ git push -v origin 'refs/replace/*:refs/replace/*'
Pushing to XXXX
Enumerating objects: 2681, done.
Counting objects: 100% (2681/2681), done.
Delta compression using up to 8 threads
Compressing objects: 100% (1965/1965), done.
Writing objects: 100% (2582/2582), 1.96 MiB | 1024 bytes/s, done.
Total 2582 (delta 95), reused 1446 (delta 58)
remote: Resolving deltas: 100% (95/95), completed with 33 local objects.
To XXXX
 * [new branch]            refs/replace/XXXX -> refs/replace/XXXX
--

Especially the pushing of a single replace commit involved 2582 objects to be written. This was after first a fetch was done.

This especially hurts on flaky and slow connections, especially the more objects need to be written/read, the bigger the chance of the connection failing.
In combination with the inability to restart fetches/pushes without down/uploading ALL objects again, this can become quite a frustrating experience.

Any thoughts?

Regards,
Paul



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG/FEATURE] Git pushing and fetching many more objects than strictly required
  2019-11-08 14:06 [BUG/FEATURE] Git pushing and fetching many more objects than strictly required Paul van Loon
@ 2019-11-08 18:47 ` Jonathan Tan
  2019-11-08 20:54   ` Paul van Loon
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Tan @ 2019-11-08 18:47 UTC (permalink / raw)
  To: nospam; +Cc: git, Jonathan Tan

> $ git fetch --all
> Fetching origin
> remote: Enumerating objects: 29507, done.
> remote: Counting objects: 100% (29507/29507), done.
> remote: Compressing objects: 100% (33/33), done.
> remote: Total 53914 (delta 29478), reused 29500 (delta 29471), pack-reused 24407
> Receiving objects: 100% (53914/53914), 31.90 MiB | 111.00 KiB/s, done.
> Resolving deltas: 100% (42462/42462), completed with 7382 local objects.
> --
> 
> $ git push -v origin 'refs/replace/*:refs/replace/*'
> Pushing to XXXX
> Enumerating objects: 2681, done.
> Counting objects: 100% (2681/2681), done.
> Delta compression using up to 8 threads
> Compressing objects: 100% (1965/1965), done.
> Writing objects: 100% (2582/2582), 1.96 MiB | 1024 bytes/s, done.
> Total 2582 (delta 95), reused 1446 (delta 58)
> remote: Resolving deltas: 100% (95/95), completed with 33 local objects.
> To XXXX
>  * [new branch]            refs/replace/XXXX -> refs/replace/XXXX

Could you verify that refs/replace/XXXX (or one of its close ancestors)
was fetched by the "git fetch --all" command? "--all" fetches all
remotes, not all refs.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG/FEATURE] Git pushing and fetching many more objects than strictly required
  2019-11-08 18:47 ` Jonathan Tan
@ 2019-11-08 20:54   ` Paul van Loon
  2019-11-08 21:21     ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Paul van Loon @ 2019-11-08 20:54 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On 2019-11-08 19:47, Jonathan Tan wrote:
>> $ git fetch --all
>> Fetching origin
>> remote: Enumerating objects: 29507, done.
>> remote: Counting objects: 100% (29507/29507), done.
>> remote: Compressing objects: 100% (33/33), done.
>> remote: Total 53914 (delta 29478), reused 29500 (delta 29471), pack-reused 24407
>> Receiving objects: 100% (53914/53914), 31.90 MiB | 111.00 KiB/s, done.
>> Resolving deltas: 100% (42462/42462), completed with 7382 local objects.
>> --
>>
>> $ git push -v origin 'refs/replace/*:refs/replace/*'
>> Pushing to XXXX
>> Enumerating objects: 2681, done.
>> Counting objects: 100% (2681/2681), done.
>> Delta compression using up to 8 threads
>> Compressing objects: 100% (1965/1965), done.
>> Writing objects: 100% (2582/2582), 1.96 MiB | 1024 bytes/s, done.
>> Total 2582 (delta 95), reused 1446 (delta 58)
>> remote: Resolving deltas: 100% (95/95), completed with 33 local objects.
>> To XXXX
>>  * [new branch]            refs/replace/XXXX -> refs/replace/XXXX
>
> Could you verify that refs/replace/XXXX (or one of its close ancestors)
> was fetched by the "git fetch --all" command? "--all" fetches all
> remotes, not all refs.

No, it was not fetched. HOWEVER, the ONLY thing the replace commit (1 single object) does is point to an existing parent object. No other new objects are referenced.
Those 'ancestor' objects were all fetched.

Paul


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG/FEATURE] Git pushing and fetching many more objects than strictly required
  2019-11-08 20:54   ` Paul van Loon
@ 2019-11-08 21:21     ` Jeff King
  2019-11-12 13:39       ` Paul van Loon
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2019-11-08 21:21 UTC (permalink / raw)
  To: Paul van Loon; +Cc: Jonathan Tan, git

On Fri, Nov 08, 2019 at 09:54:02PM +0100, Paul van Loon wrote:

> >> $ git push -v origin 'refs/replace/*:refs/replace/*'
> >> Pushing to XXXX
> >> Enumerating objects: 2681, done.
> >> Counting objects: 100% (2681/2681), done.
> >> Delta compression using up to 8 threads
> >> Compressing objects: 100% (1965/1965), done.
> >> Writing objects: 100% (2582/2582), 1.96 MiB | 1024 bytes/s, done.
> >> Total 2582 (delta 95), reused 1446 (delta 58)
> >> remote: Resolving deltas: 100% (95/95), completed with 33 local objects.
> >> To XXXX
> >>  * [new branch]            refs/replace/XXXX -> refs/replace/XXXX
> >
> > Could you verify that refs/replace/XXXX (or one of its close ancestors)
> > was fetched by the "git fetch --all" command? "--all" fetches all
> > remotes, not all refs.
> 
> No, it was not fetched. HOWEVER, the ONLY thing the replace commit (1 single object) does is point to an existing parent object. No other new objects are referenced.
> Those 'ancestor' objects were all fetched.

Was it a parent object at the tip of a ref?

The push protocol, unlike the fetch protocol, doesn't expend any effort
to negotiate to find a common base. It just feeds the ref tips of the
receiver to pack-objects (which then does traverse down to a merge base,
but it can't always do so if the sender doesn't have all of the
objects).

It's hard to say more without having a reproducible case to look at.

Some possible things to poke at:

  - record the stdin from the local push to the local pack-objects,
    which shows which objects we're planning to send and which we're
    claiming the other side has. That would help determine if the push
    isn't feeding enough information to pack-objects, or if pack-objects
    isn't trying hard enough to find the minimal set of objects

    There's not really an easy way to do this, but something like strace
    might help.

  - try building reachability bitmaps (e.g., "git repack -adb") in the
    local clone. When those are present, pack-objects will compute the
    object set more thoroughly (because it can do so efficiently).

I don't _think_ the fact that it's in refs/replace should matter to push
(in terms of what it feeds to pack-objects). But obviously another thing
to try is whether pushing to or from a different ref has any impact.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG/FEATURE] Git pushing and fetching many more objects than strictly required
  2019-11-08 21:21     ` Jeff King
@ 2019-11-12 13:39       ` Paul van Loon
  0 siblings, 0 replies; 5+ messages in thread
From: Paul van Loon @ 2019-11-12 13:39 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git

On 2019-11-08 22:21, Jeff King wrote:
> On Fri, Nov 08, 2019 at 09:54:02PM +0100, Paul van Loon wrote:
>
>>>> $ git push -v origin 'refs/replace/*:refs/replace/*'
>>>> Pushing to XXXX
>>>> Enumerating objects: 2681, done.
>>>> Counting objects: 100% (2681/2681), done.
>>>> Delta compression using up to 8 threads
>>>> Compressing objects: 100% (1965/1965), done.
>>>> Writing objects: 100% (2582/2582), 1.96 MiB | 1024 bytes/s, done.
>>>> Total 2582 (delta 95), reused 1446 (delta 58)
>>>> remote: Resolving deltas: 100% (95/95), completed with 33 local objects.
>>>> To XXXX
>>>>  * [new branch]            refs/replace/XXXX -> refs/replace/XXXX
>>>
>>> Could you verify that refs/replace/XXXX (or one of its close ancestors)
>>> was fetched by the "git fetch --all" command? "--all" fetches all
>>> remotes, not all refs.
>>
>> No, it was not fetched. HOWEVER, the ONLY thing the replace commit (1 single object) does is point to an existing parent object. No other new objects are referenced.
>> Those 'ancestor' objects were all fetched.
>
> Was it a parent object at the tip of a ref?

No, it was a newly created replace object (created with git replace --edit)

>
> The push protocol, unlike the fetch protocol, doesn't expend any effort
> to negotiate to find a common base. It just feeds the ref tips of the
> receiver to pack-objects (which then does traverse down to a merge base,
> but it can't always do so if the sender doesn't have all of the
> objects).

So this would be the opportunity for performance improvement I guess.

>
> It's hard to say more without having a reproducible case to look at.
>
> Some possible things to poke at:
>
>   - record the stdin from the local push to the local pack-objects,
>     which shows which objects we're planning to send and which we're
>     claiming the other side has. That would help determine if the push
>     isn't feeding enough information to pack-objects, or if pack-objects
>     isn't trying hard enough to find the minimal set of objects
>
>     There's not really an easy way to do this, but something like strace
>     might help.

That's way above my Git expertise.

>   - try building reachability bitmaps (e.g., "git repack -adb") in the
>     local clone. When those are present, pack-objects will compute the
>     object set more thoroughly (because it can do so efficiently).
>
> I don't _think_ the fact that it's in refs/replace should matter to push
> (in terms of what it feeds to pack-objects). But obviously another thing
> to try is whether pushing to or from a different ref has any impact.

I'll do some additional experiments

> -Peff
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-11-12 13:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-08 14:06 [BUG/FEATURE] Git pushing and fetching many more objects than strictly required Paul van Loon
2019-11-08 18:47 ` Jonathan Tan
2019-11-08 20:54   ` Paul van Loon
2019-11-08 21:21     ` Jeff King
2019-11-12 13:39       ` Paul van Loon

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).