* Subject: [RFC PATCH] upload_pack.c: make deepen-not more tree-ish
@ 2023-02-10 21:31 Andrew Wansink
2023-02-11 22:23 ` Andrew Wansink
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Wansink @ 2023-02-10 21:31 UTC (permalink / raw)
To: git
This unlocks `git clone --shallow-exclude=<commit-sha1>`
git-clone only accepts --shallow-excude arguments where
the argument is a branch or tag because upload_pack only
searches deepen-not arguments for branches and tags.
Make process_deepen_not search for commit objects if no
branch or tag is found then add them to the deepen_not
list.
Signed-off-by: Andrew Wansink <wansink@uber.com>
---
At Uber we have a lot of patches in CI simultaneously,
the CI jobs will frequently clone the monorepo multiple
times for each patch. They do this to calculate diffs
between a patch and its parent commit.
One optimisation in this flow is to clone only to a specific
depth, this may or may not work, depending on how old the
patch is. In this case we have to --unshallow or discard
the shallow clone and fully clone the repo.
This patch would allow us to clone to exactly the depth we
need to find a patch's parent commit.
t/t5500-fetch-pack.sh | 30 ++++++++++++++++++++++++++++++
upload-pack.c | 35 +++++++++++++++++++++++++++++++----
2 files changed, 61 insertions(+), 4 deletions(-)
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index d18f2823d86..8d5045cc1b9 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -899,6 +899,36 @@ test_expect_success 'shallow clone exclude tag two' '
)
'
+test_expect_success 'shallow clone exclude commit' '
+ test_create_repo shallow-exclude-commit &&
+ (
+ cd shallow-exclude-commit &&
+ test_commit one &&
+ test_commit two &&
+ test_commit three &&
+ commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) &&
+ git clone --shallow-exclude=${commit_two_sha1} "file://$(pwd)/."
../shallow3-by-commit &&
+ git -C ../shallow3-by-commit log --pretty=tformat:%s HEAD >actual &&
+ git log -n 1 --pretty=tformat:%s HEAD >expected &&
+ test_cmp expected actual
+ )
+'
+
+test_expect_success 'shallow clone exclude commit^' '
+ test_create_repo shallow-exclude-commit-carat &&
+ (
+ cd shallow-exclude-commit-carat &&
+ test_commit one &&
+ test_commit two &&
+ test_commit three &&
+ commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) &&
+ git clone --shallow-exclude=${commit_two_sha1}^ "file://$(pwd)/."
../shallow23-by-commit &&
+ git -C ../shallow23-by-commit log --pretty=tformat:%s HEAD >actual &&
+ git log -n 2 --pretty=tformat:%s HEAD >expected &&
+ test_cmp expected actual
+ )
+'
+
test_expect_success 'fetch exclude tag one' '
git -C shallow12 fetch --shallow-exclude one origin &&
git -C shallow12 log --pretty=tformat:%s origin/main >actual &&
diff --git a/upload-pack.c b/upload-pack.c
index 551f22ffa5d..0c8594f4744 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -985,10 +985,37 @@ static int process_deepen_not(const char *line,
struct string_list *deepen_not,
if (skip_prefix(line, "deepen-not ", &arg)) {
char *ref = NULL;
struct object_id oid;
- if (expand_ref(the_repository, arg, strlen(arg), &oid, &ref) != 1)
- die("git upload-pack: ambiguous deepen-not: %s", line);
- string_list_append(deepen_not, ref);
- free(ref);
+
+ switch (expand_ref(the_repository, arg, strlen(arg), &oid, &ref)) {
+ case 1:
+ // tag or branch matching arg found
+ string_list_append(deepen_not, ref);
+ free(ref);
+ break;
+ case 0: {
+ // no tags or branches matching arg
+ struct object *obj = NULL;
+ struct commit *commit = NULL;
+
+ if (get_oid(arg, &oid))
+ die("git upload-pack: deepen-not: no ref or object %s", arg);
+
+ obj = parse_object(the_repository, &oid);
+ if (!obj)
+ die("git upload-pack: deepen-not: object could not be parsed: %s", arg);
+
+ commit = (struct commit *)peel_to_type(arg, 0, obj, OBJ_COMMIT);
+ if (!commit)
+ die("git upload-pack: deepen-not: object not a commit: %s", arg);
+
+ string_list_append(deepen_not, oid_to_hex(&commit->object.oid));
+ break;
+ }
+ default:
+ // more than 1 tag or branch matches arg
+ die("git upload-pack: ambiguous deepen-not: %s", arg);
+ }
+
*deepen_rev_list = 1;
return 1;
}
--
2.39.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH] upload_pack.c: make deepen-not more tree-ish
2023-02-10 21:31 Subject: [RFC PATCH] upload_pack.c: make deepen-not more tree-ish Andrew Wansink
@ 2023-02-11 22:23 ` Andrew Wansink
2023-02-11 22:40 ` Andrew Wansink
[not found] ` <CAL3xRKdCkAAR0r3jyKFy+TtUi65LQcHaste=2WCqYHtwi8cUhw@mail.gmail.com>
0 siblings, 2 replies; 4+ messages in thread
From: Andrew Wansink @ 2023-02-11 22:23 UTC (permalink / raw)
To: git; +Cc: Andrew Wansink
This unlocks `git clone --shallow-exclude=<commit-sha1>`
git-clone only accepts --shallow-excude arguments where
the argument is a branch or tag because upload_pack only
searches deepen-not arguments for branches and tags.
Make process_deepen_not search for commit objects if no
branch or tag is found then add them to the deepen_not
list.
Signed-off-by: Andrew Wansink <wansink@uber.com>
---
At Uber we have a lot of patches in CI simultaneously,
the CI jobs will frequently clone the monorepo multiple
times for each patch. They do this to calculate diffs
between a patch and its parent commit.
One optimisation in this flow is to clone only to a specific
depth, this may or may not work, depending on how old the
patch is. In this case we have to --unshallow or discard
the shallow clone and fully clone the repo.
This patch would allow us to clone to exactly the depth we
need to find a patch's parent commit.
t/t5500-fetch-pack.sh | 30 ++++++++++++++++++++++++++++++
upload-pack.c | 35 +++++++++++++++++++++++++++++++----
2 files changed, 61 insertions(+), 4 deletions(-)
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index d18f2823d86..8d5045cc1b9 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -899,6 +899,36 @@ test_expect_success 'shallow clone exclude tag two' '
)
'
+test_expect_success 'shallow clone exclude commit' '
+ test_create_repo shallow-exclude-commit &&
+ (
+ cd shallow-exclude-commit &&
+ test_commit one &&
+ test_commit two &&
+ test_commit three &&
+ commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) &&
+ git clone --shallow-exclude=${commit_two_sha1} "file://$(pwd)/." ../shallow3-by-commit &&
+ git -C ../shallow3-by-commit log --pretty=tformat:%s HEAD >actual &&
+ git log -n 1 --pretty=tformat:%s HEAD >expected &&
+ test_cmp expected actual
+ )
+'
+
+test_expect_success 'shallow clone exclude commit^' '
+ test_create_repo shallow-exclude-commit-carat &&
+ (
+ cd shallow-exclude-commit-carat &&
+ test_commit one &&
+ test_commit two &&
+ test_commit three &&
+ commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) &&
+ git clone --shallow-exclude=${commit_two_sha1}^ "file://$(pwd)/." ../shallow23-by-commit &&
+ git -C ../shallow23-by-commit log --pretty=tformat:%s HEAD >actual &&
+ git log -n 2 --pretty=tformat:%s HEAD >expected &&
+ test_cmp expected actual
+ )
+'
+
test_expect_success 'fetch exclude tag one' '
git -C shallow12 fetch --shallow-exclude one origin &&
git -C shallow12 log --pretty=tformat:%s origin/main >actual &&
diff --git a/upload-pack.c b/upload-pack.c
index 551f22ffa5d..0c8594f4744 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -985,10 +985,37 @@ static int process_deepen_not(const char *line, struct string_list *deepen_not,
if (skip_prefix(line, "deepen-not ", &arg)) {
char *ref = NULL;
struct object_id oid;
- if (expand_ref(the_repository, arg, strlen(arg), &oid, &ref) != 1)
- die("git upload-pack: ambiguous deepen-not: %s", line);
- string_list_append(deepen_not, ref);
- free(ref);
+
+ switch (expand_ref(the_repository, arg, strlen(arg), &oid, &ref)) {
+ case 1:
+ // tag or branch matching arg found
+ string_list_append(deepen_not, ref);
+ free(ref);
+ break;
+ case 0: {
+ // no tags or branches matching arg
+ struct object *obj = NULL;
+ struct commit *commit = NULL;
+
+ if (get_oid(arg, &oid))
+ die("git upload-pack: deepen-not: no ref or object %s", arg);
+
+ obj = parse_object(the_repository, &oid);
+ if (!obj)
+ die("git upload-pack: deepen-not: object could not be parsed: %s", arg);
+
+ commit = (struct commit *)peel_to_type(arg, 0, obj, OBJ_COMMIT);
+ if (!commit)
+ die("git upload-pack: deepen-not: object not a commit: %s", arg);
+
+ string_list_append(deepen_not, oid_to_hex(&commit->object.oid));
+ break;
+ }
+ default:
+ // more than 1 tag or branch matches arg
+ die("git upload-pack: ambiguous deepen-not: %s", arg);
+ }
+
*deepen_rev_list = 1;
return 1;
}
--
2.39.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] upload_pack.c: make deepen-not more tree-ish
2023-02-11 22:23 ` Andrew Wansink
@ 2023-02-11 22:40 ` Andrew Wansink
[not found] ` <CAL3xRKdCkAAR0r3jyKFy+TtUi65LQcHaste=2WCqYHtwi8cUhw@mail.gmail.com>
1 sibling, 0 replies; 4+ messages in thread
From: Andrew Wansink @ 2023-02-11 22:40 UTC (permalink / raw)
To: git
Sorry to spam the list with this patch twice, I failed to follow the
instructions correctly the first time and sent a diff with the
whitespace stipped.
- Andrew
On Sat, Feb 11, 2023 at 2:23 PM Andrew Wansink <andy@halogix.com> wrote:
>
> This unlocks `git clone --shallow-exclude=<commit-sha1>`
>
> git-clone only accepts --shallow-excude arguments where
> the argument is a branch or tag because upload_pack only
> searches deepen-not arguments for branches and tags.
>
> Make process_deepen_not search for commit objects if no
> branch or tag is found then add them to the deepen_not
> list.
>
> Signed-off-by: Andrew Wansink <wansink@uber.com>
> ---
>
> At Uber we have a lot of patches in CI simultaneously,
> the CI jobs will frequently clone the monorepo multiple
> times for each patch. They do this to calculate diffs
> between a patch and its parent commit.
>
> One optimisation in this flow is to clone only to a specific
> depth, this may or may not work, depending on how old the
> patch is. In this case we have to --unshallow or discard
> the shallow clone and fully clone the repo.
>
> This patch would allow us to clone to exactly the depth we
> need to find a patch's parent commit.
>
> t/t5500-fetch-pack.sh | 30 ++++++++++++++++++++++++++++++
> upload-pack.c | 35 +++++++++++++++++++++++++++++++----
> 2 files changed, 61 insertions(+), 4 deletions(-)
>
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index d18f2823d86..8d5045cc1b9 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -899,6 +899,36 @@ test_expect_success 'shallow clone exclude tag two' '
> )
> '
>
> +test_expect_success 'shallow clone exclude commit' '
> + test_create_repo shallow-exclude-commit &&
> + (
> + cd shallow-exclude-commit &&
> + test_commit one &&
> + test_commit two &&
> + test_commit three &&
> + commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) &&
> + git clone --shallow-exclude=${commit_two_sha1} "file://$(pwd)/." ../shallow3-by-commit &&
> + git -C ../shallow3-by-commit log --pretty=tformat:%s HEAD >actual &&
> + git log -n 1 --pretty=tformat:%s HEAD >expected &&
> + test_cmp expected actual
> + )
> +'
> +
> +test_expect_success 'shallow clone exclude commit^' '
> + test_create_repo shallow-exclude-commit-carat &&
> + (
> + cd shallow-exclude-commit-carat &&
> + test_commit one &&
> + test_commit two &&
> + test_commit three &&
> + commit_two_sha1=$(git log -n 1 --pretty=tformat:%h HEAD^) &&
> + git clone --shallow-exclude=${commit_two_sha1}^ "file://$(pwd)/." ../shallow23-by-commit &&
> + git -C ../shallow23-by-commit log --pretty=tformat:%s HEAD >actual &&
> + git log -n 2 --pretty=tformat:%s HEAD >expected &&
> + test_cmp expected actual
> + )
> +'
> +
> test_expect_success 'fetch exclude tag one' '
> git -C shallow12 fetch --shallow-exclude one origin &&
> git -C shallow12 log --pretty=tformat:%s origin/main >actual &&
> diff --git a/upload-pack.c b/upload-pack.c
> index 551f22ffa5d..0c8594f4744 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -985,10 +985,37 @@ static int process_deepen_not(const char *line, struct string_list *deepen_not,
> if (skip_prefix(line, "deepen-not ", &arg)) {
> char *ref = NULL;
> struct object_id oid;
> - if (expand_ref(the_repository, arg, strlen(arg), &oid, &ref) != 1)
> - die("git upload-pack: ambiguous deepen-not: %s", line);
> - string_list_append(deepen_not, ref);
> - free(ref);
> +
> + switch (expand_ref(the_repository, arg, strlen(arg), &oid, &ref)) {
> + case 1:
> + // tag or branch matching arg found
> + string_list_append(deepen_not, ref);
> + free(ref);
> + break;
> + case 0: {
> + // no tags or branches matching arg
> + struct object *obj = NULL;
> + struct commit *commit = NULL;
> +
> + if (get_oid(arg, &oid))
> + die("git upload-pack: deepen-not: no ref or object %s", arg);
> +
> + obj = parse_object(the_repository, &oid);
> + if (!obj)
> + die("git upload-pack: deepen-not: object could not be parsed: %s", arg);
> +
> + commit = (struct commit *)peel_to_type(arg, 0, obj, OBJ_COMMIT);
> + if (!commit)
> + die("git upload-pack: deepen-not: object not a commit: %s", arg);
> +
> + string_list_append(deepen_not, oid_to_hex(&commit->object.oid));
> + break;
> + }
> + default:
> + // more than 1 tag or branch matches arg
> + die("git upload-pack: ambiguous deepen-not: %s", arg);
> + }
> +
> *deepen_rev_list = 1;
> return 1;
> }
> --
> 2.39.1
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] upload_pack.c: make deepen-not more tree-ish
[not found] ` <CAL3xRKdCkAAR0r3jyKFy+TtUi65LQcHaste=2WCqYHtwi8cUhw@mail.gmail.com>
@ 2023-02-12 14:12 ` Son Luong Ngoc
0 siblings, 0 replies; 4+ messages in thread
From: Son Luong Ngoc @ 2023-02-12 14:12 UTC (permalink / raw)
To: wansink; +Cc: git
Re-send to the Git mailing-list as setting a font on gmail switched
plain-text to HTML and thus, got blocked by mailing-list.
On Sun, Feb 12, 2023 at 3:09 PM Son Luong Ngoc <sluongng@gmail.com> wrote:
>
> Hi Andrew,
>
> On Sat, Feb 11, 2023 at 11:49 PM Andrew Wansink <andy@halogix.com> wrote:
> >
> > This unlocks `git clone --shallow-exclude=<commit-sha1>`
> >
> > git-clone only accepts --shallow-excude arguments where
> > the argument is a branch or tag because upload_pack only
> > searches deepen-not arguments for branches and tags.
> >
> > Make process_deepen_not search for commit objects if no
> > branch or tag is found then add them to the deepen_not
> > list.
> >
> > Signed-off-by: Andrew Wansink <wansink@uber.com>
> > ---
> >
> > At Uber we have a lot of patches in CI simultaneously,
> > the CI jobs will frequently clone the monorepo multiple
> > times for each patch. They do this to calculate diffs
> > between a patch and its parent commit.
> >
>
> I used to manage a CI system that support monorepo use cases not so long ago.
> We had several hosts(VM/Baremetal) on which we spin up containers for CI to run.
>
> We maintain a bare copy of the monorepo on the host level (cron job / systemd / DaemonSet) and mount this as read-only into each of the CI containers.
>
> Each of the CI containers would attempt to clone/fetch the monorepo with `--reference-if-able ./path/to/read-only-mount/repo.git` (1)
> So that most of the needed objects are already on disk in the shared bare repo.
>
>
> +-----------+ +-----------+ +-----------+
> | container | | container | | container |
> +-----------+ +-----------+ +-----------+
> \ | /
> (mount) \ | /
> +------------+ +--------+
> | bare-repo | <-------------- | Remote |
> +------------+ (git-fetch) +--------+
> |
> | (maintain)
> |
> +----------+
> | cron-job |
> +----------+
>
> (forgive my horrible drawing)
>
> With this setup, we did not have a need to shallow clone any longer,
> and our git-clone in each container is simply a combination of git-ls-remote and a very light-weighted git-fetch.
> In some cases, such as a job in the later stages of a CI pipeline,
> the host would already download all the needed objects into the bare copy of the repository.
> This lets us skip git-fetch entirely when the CI container executes.
>
> Compared to the shallow clone approach,
> our "local cache" approach sped up the clone speed drastically
> while allowing developers to interact with git history inside tests a lot easier.
>
> > One optimisation in this flow is to clone only to a specific
> > depth, this may or may not work, depending on how old the
> > patch is. In this case we have to --unshallow or discard
> > the shallow clone and fully clone the repo.
> >
> > This patch would allow us to clone to exactly the depth we
> > need to find a patch's parent commit.
>
> Hope it helps,
> Son Luong.
>
> (1): https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---reference-if-ableltrepositorygt
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-02-12 14:12 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-10 21:31 Subject: [RFC PATCH] upload_pack.c: make deepen-not more tree-ish Andrew Wansink
2023-02-11 22:23 ` Andrew Wansink
2023-02-11 22:40 ` Andrew Wansink
[not found] ` <CAL3xRKdCkAAR0r3jyKFy+TtUi65LQcHaste=2WCqYHtwi8cUhw@mail.gmail.com>
2023-02-12 14:12 ` Son Luong Ngoc
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).