git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git fetch --dry-run changes the local repo and transfers data
@ 2022-12-26 17:21 Tim Hockin
  2022-12-27 12:52 ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Hockin @ 2022-12-26 17:21 UTC (permalink / raw)
  To: git

I'm trying to test if a remote repo has a given SHA. `ls-remote` does
not work for this unless it is a tag or a HEAD (which is not
guaranteed for this case).

`git fetch --dry-run $remote $rev` SEEMS to fit the bill, except it
changes local state. For example:

```
$ git cat-file -t f80f1b23b4cab2a295a091c623bb4746d188bd4a
fatal: git cat-file: could not get object info

$ git rev-parse FETCH_HEAD
42d21cf12aab73ac8bc6245cc74ac9850bdf6989

$ git fetch --dry-run file:///tmp/e2e.2527526915/repo
f80f1b23b4cab2a295a091c623bb4746d188bd4a
remote: Enumerating objects: 20, done.
remote: Counting objects: 100% (20/20), done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 18 (delta 6), reused 0 (delta 0), pack-reused 0
Unpacking objects: 100% (18/18), 415.91 KiB | 84.00 KiB/s, done.
From file:///tmp/e2e.2527526915/repo
 * branch            f80f1b23b4cab2a295a091c623bb4746d188bd4a -> FETCH_HEAD

$ git cat-file -t f80f1b23b4cab2a295a091c623bb4746d188bd4a
commit

$ git rev-parse FETCH_HEAD
42d21cf12aab73ac8bc6245cc74ac9850bdf6989
```

FETCH_HEAD was not updated (good) but the rev in question is now
present locally (bad), further tested by making a very large commit
and watching the fetch be very slow.  Also the same on non file://
repo (I tried https)

Am I using --dry-run wrong (or misunderstanding its purpose)?

Tim

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git fetch --dry-run changes the local repo and transfers data
  2022-12-26 17:21 git fetch --dry-run changes the local repo and transfers data Tim Hockin
@ 2022-12-27 12:52 ` Junio C Hamano
  2022-12-27 18:42   ` Tim Hockin
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2022-12-27 12:52 UTC (permalink / raw)
  To: Tim Hockin; +Cc: git

Tim Hockin <thockin@google.com> writes:

> I'm trying to test if a remote repo has a given SHA. `ls-remote` does
> not work for this unless it is a tag or a HEAD (which is not
> guaranteed for this case).
>
> `git fetch --dry-run $remote $rev` SEEMS to fit the bill, except it
> changes local state. For example:

Well, without changing the "local state", you cannot see if that
report repository has or does not have the commit.

> FETCH_HEAD was not updated (good)

None of your refs are updated, either.

There are things that are not reachable from your refs (or other
anchoring points like the index and the reflog).  As far as Git is
concerned, they don't exist [*], and that is why Git will happily
remove them with "git gc", for example.  They are crufts and it is
easier to think of them as not a part of "local data".

So I suspect that ...

> but the rev in question is now
> present locally (bad),

... this complaint is a bit misguided.  After all, the procdure you
gave above is exactly how you would ask the "I'm trying to test if a
remote repo has a given SHA" question about commit f80f1b23b4ca.  If
the operation did not transfer data, you would not be able to get
the answer to that question.

> Am I using --dry-run wrong (or misunderstanding its purpose)?

Maybe (mis)understanding on Git's data model (see above [*]) is the
root cause of this confusion.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git fetch --dry-run changes the local repo and transfers data
  2022-12-27 12:52 ` Junio C Hamano
@ 2022-12-27 18:42   ` Tim Hockin
  2023-01-03 11:07     ` Jeff King
  0 siblings, 1 reply; 4+ messages in thread
From: Tim Hockin @ 2022-12-27 18:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Thanks.  What threw me is that I expected it to be "very fast" and
"very cheap" . If I commit a multi-gig file on the server, I see the
dry-run fetch takes several seconds (clearly indicating some work
proportional to the server repo size).  I don't want to transfer that
file on a dry-run, I hoped the server and client were both
dry-running, andb the server could simply say "here's metadata for
what I _would have_ returned if this was real".  Not possible?

Tim

On Tue, Dec 27, 2022 at 4:52 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Tim Hockin <thockin@google.com> writes:
>
> > I'm trying to test if a remote repo has a given SHA. `ls-remote` does
> > not work for this unless it is a tag or a HEAD (which is not
> > guaranteed for this case).
> >
> > `git fetch --dry-run $remote $rev` SEEMS to fit the bill, except it
> > changes local state. For example:
>
> Well, without changing the "local state", you cannot see if that
> report repository has or does not have the commit.
>
> > FETCH_HEAD was not updated (good)
>
> None of your refs are updated, either.
>
> There are things that are not reachable from your refs (or other
> anchoring points like the index and the reflog).  As far as Git is
> concerned, they don't exist [*], and that is why Git will happily
> remove them with "git gc", for example.  They are crufts and it is
> easier to think of them as not a part of "local data".
>
> So I suspect that ...
>
> > but the rev in question is now
> > present locally (bad),
>
> ... this complaint is a bit misguided.  After all, the procdure you
> gave above is exactly how you would ask the "I'm trying to test if a
> remote repo has a given SHA" question about commit f80f1b23b4ca.  If
> the operation did not transfer data, you would not be able to get
> the answer to that question.
>
> > Am I using --dry-run wrong (or misunderstanding its purpose)?
>
> Maybe (mis)understanding on Git's data model (see above [*]) is the
> root cause of this confusion.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git fetch --dry-run changes the local repo and transfers data
  2022-12-27 18:42   ` Tim Hockin
@ 2023-01-03 11:07     ` Jeff King
  0 siblings, 0 replies; 4+ messages in thread
From: Jeff King @ 2023-01-03 11:07 UTC (permalink / raw)
  To: Tim Hockin; +Cc: Junio C Hamano, git

On Tue, Dec 27, 2022 at 10:42:25AM -0800, Tim Hockin wrote:

> Thanks.  What threw me is that I expected it to be "very fast" and
> "very cheap" . If I commit a multi-gig file on the server, I see the
> dry-run fetch takes several seconds (clearly indicating some work
> proportional to the server repo size).  I don't want to transfer that
> file on a dry-run, I hoped the server and client were both
> dry-running, andb the server could simply say "here's metadata for
> what I _would have_ returned if this was real".  Not possible?

No, the server has no notion of a dry run.

I think the best you could do with fetch is to ask for a smaller set of
objects. For example:

  git fetch --depth=1 --filter=tree:0 \
    https://github.com/git/git \
    2e71cbbddd64695d43383c25c7a054ac4ff86882

will grab a single object. You can even "git show -s 2e71cbbd" on the
result to see it (the "-s" is important to avoid it fetching the trees
to do a diff!). Two things to be aware of:

  - this may have some lingering effects in your repository, as the
    shallow and partial features store some metadata locally to make
    sense of the situation. You're probably best off doing it in a
    temporary repository.

  - not all servers will support --filter; it has to be enabled in the
    config.

There is potentially a more direct option, though. A while back, commit
a2ba162cda (object-info: support for retrieving object info, 2021-04-20)
added an extension that lets you get the size of an object on the
server. Unfortunately I don't think anybody ever wrote client-side
support. So you'd have to rig up something yourself like:

  # write git's packet format: 4-hex length followed by data
  pkt() {
    printf '%04x%s' "$((4+${#1}))" "$1"
  }

  # a sample input; you should be able to query multiple objects if you
  # want by adding more "oid" lines
  {
    pkt "command=object-info"
    printf "0001"
    pkt "size"
    pkt "oid 2e71cbbddd64695d43383c25c7a054ac4ff86882"
    printf "0000"
  } >input

  # this makes a local request; it's important we're in v2 mode, since
  # the extension only applies there. For http, I think you'd want
  # something like:
  #
  #  curl -H 'Git-Protocol: version=2' https://example.com/repo.git/git-upload-pack
  #
  # but I didn't test it.
  GIT_PROTOCOL=version=2 git-upload-pack /path/to/repo.git <input >output

I've left parsing the output as an exercise for the reader. But you
should be able to notice whether the object is present or not based on
the result.

Not all servers may support the extension. For example, I think GitHub's
servers have disabled it.

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-01-03 11:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-26 17:21 git fetch --dry-run changes the local repo and transfers data Tim Hockin
2022-12-27 12:52 ` Junio C Hamano
2022-12-27 18:42   ` Tim Hockin
2023-01-03 11:07     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).