git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] revision: add `--ignore-missing-links` user option
@ 2023-09-08 17:42 Karthik Nayak
  2023-09-08 19:19 ` Junio C Hamano
  2023-09-12 15:58 ` [PATCH v2] " Karthik Nayak
  0 siblings, 2 replies; 20+ messages in thread
From: Karthik Nayak @ 2023-09-08 17:42 UTC (permalink / raw
  To: git; +Cc: Karthik Nayak

The revision backend is used by multiple porcelain commands such as
git-rev-list(1) and git-log(1). The backend currently supports ignoring
missing links by setting the `ignore_missing_links` bit. This allows the
revision walk to skip any objects links which are missing.

Currently there is no way to use git-rev-list(1) to traverse the objects
of the main object directory (GIT_OBJECT_DIRECTORY) and print the
boundary objects when moving from the main object directory to the
alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES).

By exposing this new flag `--ignore-missing-links`, users can set the
required env variables (GIT_OBJECT_DIRECTORY and
GIT_ALTERNATE_OBJECT_DIRECTORIES) along with the `--boundary` flag to
find the boundary objects between object directories.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/rev-list-options.txt |  5 ++++
 revision.c                         |  2 ++
 t/t6022-rev-list-alternates.sh     | 43 ++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+)
 create mode 100755 t/t6022-rev-list-alternates.sh

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a4a0cb93b2..a0b48db8a8 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -227,6 +227,11 @@ explicitly.
 	Upon seeing an invalid object name in the input, pretend as if
 	the bad input was not given.
 
+--ignore-missing-links::
+	When an object points to another object that is missing, pretend as if the
+	link did not exist. These missing links are not written to stdout unless
+	the --boundary flag is passed.
+
 ifndef::git-rev-list[]
 --bisect::
 	Pretend as if the bad bisection ref `refs/bisect/bad`
diff --git a/revision.c b/revision.c
index 2f4c53ea20..cbfcbf6e28 100644
--- a/revision.c
+++ b/revision.c
@@ -2595,6 +2595,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->limited = 1;
 	} else if (!strcmp(arg, "--ignore-missing")) {
 		revs->ignore_missing = 1;
+	} else if (!strcmp(arg, "--ignore-missing-links")) {
+		revs->ignore_missing_links = 1;
 	} else if (opt && opt->allow_exclude_promisor_objects &&
 		   !strcmp(arg, "--exclude-promisor-objects")) {
 		if (fetch_if_missing)
diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
new file mode 100755
index 0000000000..626ebb2dce
--- /dev/null
+++ b/t/t6022-rev-list-alternates.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+
+test_description='handling of alternates in rev-list'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+# We create 5 commits and move them to the alt directory and
+# create 5 more commits which will stay in the main odb.
+test_expect_success 'create repository and alternate directory' '
+	git init main &&
+	test_commit_bulk -C main 5 &&
+	mkdir alt &&
+	mv main/.git/objects/* alt &&
+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
+'
+
+# When the alternate odb is provided, all commits are listed.
+test_expect_success 'rev-list passes with alternate object directory' '
+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_stdout_line_count = 10 git -C main rev-list HEAD
+'
+
+# When the alternate odb is not provided, rev-list fails since the 5th commit's
+# parent is not present in the main odb.
+test_expect_success 'rev-list fails without alternate object directory' '
+	test_must_fail git -C main rev-list HEAD
+'
+
+# With `--ignore-missing-links`, we stop the traversal when we encounter a
+# missing link.
+test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
+	test_stdout_line_count = 5 git -C main rev-list --ignore-missing-links HEAD
+'
+
+# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
+# commits.
+test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
+	git -C main rev-list --ignore-missing-links --boundary HEAD >list-output &&
+	test_stdout_line_count = 6 cat list-output &&
+	test_stdout_line_count = 1 cat list-output | grep "^-"
+'
+
+test_done
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] revision: add `--ignore-missing-links` user option
  2023-09-08 17:42 [PATCH] revision: add `--ignore-missing-links` user option Karthik Nayak
@ 2023-09-08 19:19 ` Junio C Hamano
  2023-09-12 14:42   ` Karthik Nayak
  2023-09-12 15:58 ` [PATCH v2] " Karthik Nayak
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2023-09-08 19:19 UTC (permalink / raw
  To: Karthik Nayak; +Cc: git

Karthik Nayak <karthik.188@gmail.com> writes:

> The revision backend is used by multiple porcelain commands such as
> git-rev-list(1) and git-log(1). The backend currently supports ignoring
> missing links by setting the `ignore_missing_links` bit. This allows the
> revision walk to skip any objects links which are missing.

> Currently there is no way to use git-rev-list(1) to traverse the objects
> of the main object directory (GIT_OBJECT_DIRECTORY) and print the
> boundary objects when moving from the main object directory to the
> alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES).

The above description needs tightened up a bit, I think.

What is left unsaid is that you arranged a repository to borrow from
an alternate object directory (or two), and plan to walk objects
with this bit on in the repository, while leaving the alternates
disabled.  Without stating that you plan to disable the alternates
while this mode of operation happens, nothing would happen when the
traversal goes from the main to the alternate because no links are
broken, no?

> By exposing this new flag `--ignore-missing-links`, users can set the
> required env variables (GIT_OBJECT_DIRECTORY and
> GIT_ALTERNATE_OBJECT_DIRECTORIES) along with the `--boundary` flag to
> find the boundary objects between object directories.

This command being a plumbing, there is not much reason to object to
surfacing features that already internally exist to the command line
option.    Having said that, 

 * Suppose your traversal with --ignore-missing-links from the tip
   of a branch reaches a tree object A, and the tree object A has a
   link to a blob B and a blob C.  But B is in a separate object
   store that you usually access via the alternate mechanism.
   Instead of barfing "The repository is corrupt---object A points
   at object B that does not exist", we pretend that A does not have
   the link to B and keep traversing, discovering C and other
   objects.

   That much we can read from the above and also the documentation
   part of the patch.  The interaction with --boundary needs to be
   clarified in this description and the documentation, though.  It
   is unclear if you show 'A' or 'B' in this scenario.

 * Some traversals use the ignore-missing-links bit implicitly and
   currently there is no way to turn it off.  Is it plausible that
   user may want to explicitly toggle it off, with the option
   negated, i.e. --no-ignore-missing-links?  I do not immediately
   see the utility of such an option, but that is only due to my
   lack of imagination.  For now, I think it makes sense not to
   allow negating this option, until somebody comes up with a useful
   use case.

> +--ignore-missing-links::
> +	When an object points to another object that is missing, pretend as if the
> +	link did not exist. These missing links are not written to stdout unless
> +	the --boundary flag is passed.

Does "git rev-list" ever writes "links"?  I thought not.  

"These missing objects are not written" would be more sensible, but
we never write missing objects with or without the option, so it
is not even worth saying.

When "--boundary" is passed, do they appear as if they are
available?  If not, then the above description is very misleading.

    During traversal, if an object that is referenced does not
    exist, pretend as if the reference itself does not exist,
    instead of dying of a repository corruption.  Running the
    command with the "--boundary" option makes these missing
    objects, together with the objects on the edge of revision
    ranges (i.e. true boundary objects), appear on the output,
    prefixed with '-'.

or something like that, perhaps?

> +# With `--ignore-missing-links`, we stop the traversal when we encounter a
> +# missing link.
> +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
> +	test_stdout_line_count = 5 git -C main rev-list --ignore-missing-links HEAD
> +'
> +
> +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
> +# commits.
> +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
> +	git -C main rev-list --ignore-missing-links --boundary HEAD >list-output &&
> +	test_stdout_line_count = 6 cat list-output &&
> +	test_stdout_line_count = 1 cat list-output | grep "^-"
> +'

These tests are way too loose.  Not only you want to see certain
number of boundary objects, you _know_ exactly which object should
be on the boundary, and you should check that instead.  That will
allow you to find a mistake to write commit 'A' that refers to a
missing commit 'B', when they wanted to write the missing comit 'B',
as a boundary object, for example.

Thanks.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] revision: add `--ignore-missing-links` user option
  2023-09-08 19:19 ` Junio C Hamano
@ 2023-09-12 14:42   ` Karthik Nayak
  0 siblings, 0 replies; 20+ messages in thread
From: Karthik Nayak @ 2023-09-12 14:42 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

On Fri, Sep 8, 2023 at 9:19 PM Junio C Hamano <gitster@pobox.com> wrote:
> The above description needs tightened up a bit, I think.
>
> What is left unsaid is that you arranged a repository to borrow from
> an alternate object directory (or two), and plan to walk objects
> with this bit on in the repository, while leaving the alternates
> disabled.  Without stating that you plan to disable the alternates
> while this mode of operation happens, nothing would happen when the
> traversal goes from the main to the alternate because no links are
> broken, no?
>

Fair enough, I agree with your points. I'll amend the message to highlight this
scenario.

> > By exposing this new flag `--ignore-missing-links`, users can set the
> > required env variables (GIT_OBJECT_DIRECTORY and
> > GIT_ALTERNATE_OBJECT_DIRECTORIES) along with the `--boundary` flag to
> > find the boundary objects between object directories.
>
> This command being a plumbing, there is not much reason to object to
> surfacing features that already internally exist to the command line
> option.    Having said that,
>
>  * Suppose your traversal with --ignore-missing-links from the tip
>    of a branch reaches a tree object A, and the tree object A has a
>    link to a blob B and a blob C.  But B is in a separate object
>    store that you usually access via the alternate mechanism.
>    Instead of barfing "The repository is corrupt---object A points
>    at object B that does not exist", we pretend that A does not have
>    the link to B and keep traversing, discovering C and other
>    objects.
>
>    That much we can read from the above and also the documentation
>    part of the patch.  The interaction with --boundary needs to be
>    clarified in this description and the documentation, though.  It
>    is unclear if you show 'A' or 'B' in this scenario.

Do note that the `--boundary` option only works with commits. Keeping this in
mind `--ignore-missing-links` when used with `--boundary` doesn't even traverse
non-commit objects. Which means trees/blobs being corrupted shouldn't matter.

But I did realize that `--ignore-missing-links` as this patch stands
is broken when
used alongside the `--objects` flag (`--boundary` doesn't work with
`--objects` at the
moment, this is something I plan to tackle soon after with a
`--boundary-objects` flag).
The second version of this patch will have a fix to ensure that even
non-commit objects
are ignored during traversal if `--objects` option is used.

>
>  * Some traversals use the ignore-missing-links bit implicitly and
>    currently there is no way to turn it off.  Is it plausible that
>    user may want to explicitly toggle it off, with the option
>    negated, i.e. --no-ignore-missing-links?  I do not immediately
>    see the utility of such an option, but that is only due to my
>    lack of imagination.  For now, I think it makes sense not to
>    allow negating this option, until somebody comes up with a useful
>    use case.
>

Agreed!

> > +--ignore-missing-links::
> > +     When an object points to another object that is missing, pretend as if the
> > +     link did not exist. These missing links are not written to stdout unless
> > +     the --boundary flag is passed.
>
> Does "git rev-list" ever writes "links"?  I thought not.
>
> "These missing objects are not written" would be more sensible, but
> we never write missing objects with or without the option, so it
> is not even worth saying.
>
> When "--boundary" is passed, do they appear as if they are
> available?  If not, then the above description is very misleading.
>
>     During traversal, if an object that is referenced does not
>     exist, pretend as if the reference itself does not exist,
>     instead of dying of a repository corruption.  Running the
>     command with the "--boundary" option makes these missing
>     objects, together with the objects on the edge of revision
>     ranges (i.e. true boundary objects), appear on the output,
>     prefixed with '-'.
>
> or something like that, perhaps?
>

This indeed is better, I've copied and modified it as needed.

> > +# With `--ignore-missing-links`, we stop the traversal when we encounter a
> > +# missing link.
> > +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
> > +     test_stdout_line_count = 5 git -C main rev-list --ignore-missing-links HEAD
> > +'
> > +
> > +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
> > +# commits.
> > +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
> > +     git -C main rev-list --ignore-missing-links --boundary HEAD >list-output &&
> > +     test_stdout_line_count = 6 cat list-output &&
> > +     test_stdout_line_count = 1 cat list-output | grep "^-"
> > +'
>
> These tests are way too loose.  Not only you want to see certain
> number of boundary objects, you _know_ exactly which object should
> be on the boundary, and you should check that instead.  That will
> allow you to find a mistake to write commit 'A' that refers to a
> missing commit 'B', when they wanted to write the missing comit 'B',
> as a boundary object, for example.
>

Fair enough, I will make them more specific and add some tests for
missing trees/blobs.

> Thanks.

Thank you for the review. Will send the next version of the patch soon :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2] revision: add `--ignore-missing-links` user option
  2023-09-08 17:42 [PATCH] revision: add `--ignore-missing-links` user option Karthik Nayak
  2023-09-08 19:19 ` Junio C Hamano
@ 2023-09-12 15:58 ` Karthik Nayak
  2023-09-12 17:07   ` Taylor Blau
  2023-09-15  8:34   ` [PATCH v3] " Karthik Nayak
  1 sibling, 2 replies; 20+ messages in thread
From: Karthik Nayak @ 2023-09-12 15:58 UTC (permalink / raw
  To: git; +Cc: gitster, Karthik Nayak

The revision backend is used by multiple porcelain commands such as
git-rev-list(1) and git-log(1). The backend currently supports ignoring
missing links by setting the `ignore_missing_links` bit. This allows the
revision walk to skip any objects links which are missing. Expose this
bit via an `--ignore-missing-links` user option.

A scenario where this option would be used is to find the boundary
objects between different object directories. Consider a repository with
a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
repository, enabling this option along with the `--boundary` option for
while disabling the alternate object directory allows us to find the
boundary objects between the main and alternate object directory.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---

Changes from v1:
1. Changes in the commit message and option description to be more specific
and list why and what the changes are.
2. Ensure the new option also works with the existing `--objects` options.
3. More specific testing for boundary commit.

Range diff against v1:

1:  c0a4dca9b0 ! 1:  e3f4d85732 revision: add `--ignore-missing-links` user option
    @@ Commit message
         The revision backend is used by multiple porcelain commands such as
         git-rev-list(1) and git-log(1). The backend currently supports ignoring
         missing links by setting the `ignore_missing_links` bit. This allows the
    -    revision walk to skip any objects links which are missing.
    +    revision walk to skip any objects links which are missing. Expose this
    +    bit via an `--ignore-missing-links` user option.
     
    -    Currently there is no way to use git-rev-list(1) to traverse the objects
    -    of the main object directory (GIT_OBJECT_DIRECTORY) and print the
    -    boundary objects when moving from the main object directory to the
    -    alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES).
    -
    -    By exposing this new flag `--ignore-missing-links`, users can set the
    -    required env variables (GIT_OBJECT_DIRECTORY and
    -    GIT_ALTERNATE_OBJECT_DIRECTORIES) along with the `--boundary` flag to
    -    find the boundary objects between object directories.
    +    A scenario where this option would be used is to find the boundary
    +    objects between different object directories. Consider a repository with
    +    a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
    +    object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
    +    repository, enabling this option along with the `--boundary` option for
    +    while disabling the alternate object directory allows us to find the
    +    boundary objects between the main and alternate object directory.
     
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
    @@ Documentation/rev-list-options.txt: explicitly.
      	the bad input was not given.
      
     +--ignore-missing-links::
    -+	When an object points to another object that is missing, pretend as if the
    -+	link did not exist. These missing links are not written to stdout unless
    -+	the --boundary flag is passed.
    ++	During traversal, if an object that is referenced does not
    ++	exist, instead of dying of a repository corruption, pretend as
    ++	if the reference itself does not exist. Running the command
    ++	with the `--boundary` option makes these missing commits,
    ++	together with the commits on the edge of revision ranges
    ++	(i.e. true boundary objects), appear on the output, prefixed
    ++	with '-'.
     +
      ifndef::git-rev-list[]
      --bisect::
      	Pretend as if the bad bisection ref `refs/bisect/bad`
     
    + ## builtin/rev-list.c ##
    +@@ builtin/rev-list.c: static int finish_object(struct object *obj, const char *name UNUSED,
    + {
    + 	struct rev_list_info *info = cb_data;
    + 	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
    +-		finish_object__ma(obj);
    ++		if (!info->revs->ignore_missing_links)
    ++			finish_object__ma(obj);
    + 		return 1;
    + 	}
    + 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
    +
      ## revision.c ##
     @@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
      		revs->limited = 1;
    @@ t/t6022-rev-list-alternates.sh (new)
     +test_expect_success 'create repository and alternate directory' '
     +	git init main &&
     +	test_commit_bulk -C main 5 &&
    ++	BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
     +	mkdir alt &&
     +	mv main/.git/objects/* alt &&
     +	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
     +'
     +
    -+# When the alternate odb is provided, all commits are listed.
    ++# when the alternate odb is provided, all commits are listed along with the boundary
    ++# commit.
     +test_expect_success 'rev-list passes with alternate object directory' '
    -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_stdout_line_count = 10 git -C main rev-list HEAD
    ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
    ++	test_stdout_line_count = 10 cat actual &&
    ++	grep $BOUNDARY_COMMIT actual
     +'
     +
     +# When the alternate odb is not provided, rev-list fails since the 5th commit's
    @@ t/t6022-rev-list-alternates.sh (new)
     +'
     +
     +# With `--ignore-missing-links`, we stop the traversal when we encounter a
    -+# missing link.
    ++# missing link. The boundary commit is not listed as we haven't used the
    ++# `--boundary` options.
     +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
    -+	test_stdout_line_count = 5 git -C main rev-list --ignore-missing-links HEAD
    ++	git -C main rev-list --ignore-missing-links HEAD >actual &&
    ++	test_stdout_line_count = 5 cat actual &&
    ++	! grep -$BOUNDARY_COMMIT actual
     +'
     +
     +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
     +# commits.
     +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
    -+	git -C main rev-list --ignore-missing-links --boundary HEAD >list-output &&
    -+	test_stdout_line_count = 6 cat list-output &&
    -+	test_stdout_line_count = 1 cat list-output | grep "^-"
    ++	git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
    ++	test_stdout_line_count = 6 cat actual &&
    ++	grep -$BOUNDARY_COMMIT actual
    ++'
    ++
    ++# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
    ++# fail when used alongside `--objects` when a tree is missing.
    ++test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
    ++	echo "foo" >main/file &&
    ++	git -C main add file &&
    ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
    ++	TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
    ++	mkdir alt/${TREE_OID:0:2} &&
    ++	mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
    ++	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
    ++	! grep $TREE_OID actual
    ++'
    ++
    ++# Similar to above, it should also work when a blob is missing.
    ++test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
    ++	echo "bar" >main/file &&
    ++	git -C main add file &&
    ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
    ++	BLOB_OID=$(git -C main rev-parse HEAD:file) &&
    ++	mkdir alt/${BLOB_OID:0:2} &&
    ++	mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
    ++	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
    ++	! grep $BLOB_OID actual
     +'
     +
     +test_done


 Documentation/rev-list-options.txt |  9 ++++
 builtin/rev-list.c                 |  3 +-
 revision.c                         |  2 +
 t/t6022-rev-list-alternates.sh     | 75 ++++++++++++++++++++++++++++++
 4 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100755 t/t6022-rev-list-alternates.sh

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a4a0cb93b2..8ee713db3d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -227,6 +227,15 @@ explicitly.
 	Upon seeing an invalid object name in the input, pretend as if
 	the bad input was not given.
 
+--ignore-missing-links::
+	During traversal, if an object that is referenced does not
+	exist, instead of dying of a repository corruption, pretend as
+	if the reference itself does not exist. Running the command
+	with the `--boundary` option makes these missing commits,
+	together with the commits on the edge of revision ranges
+	(i.e. true boundary objects), appear on the output, prefixed
+	with '-'.
+
 ifndef::git-rev-list[]
 --bisect::
 	Pretend as if the bad bisection ref `refs/bisect/bad`
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ff715d6918..5239d83c76 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -266,7 +266,8 @@ static int finish_object(struct object *obj, const char *name UNUSED,
 {
 	struct rev_list_info *info = cb_data;
 	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
-		finish_object__ma(obj);
+		if (!info->revs->ignore_missing_links)
+			finish_object__ma(obj);
 		return 1;
 	}
 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
diff --git a/revision.c b/revision.c
index 2f4c53ea20..cbfcbf6e28 100644
--- a/revision.c
+++ b/revision.c
@@ -2595,6 +2595,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->limited = 1;
 	} else if (!strcmp(arg, "--ignore-missing")) {
 		revs->ignore_missing = 1;
+	} else if (!strcmp(arg, "--ignore-missing-links")) {
+		revs->ignore_missing_links = 1;
 	} else if (opt && opt->allow_exclude_promisor_objects &&
 		   !strcmp(arg, "--exclude-promisor-objects")) {
 		if (fetch_if_missing)
diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
new file mode 100755
index 0000000000..08d9ffde5f
--- /dev/null
+++ b/t/t6022-rev-list-alternates.sh
@@ -0,0 +1,75 @@
+#!/bin/sh
+
+test_description='handling of alternates in rev-list'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+# We create 5 commits and move them to the alt directory and
+# create 5 more commits which will stay in the main odb.
+test_expect_success 'create repository and alternate directory' '
+	git init main &&
+	test_commit_bulk -C main 5 &&
+	BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
+	mkdir alt &&
+	mv main/.git/objects/* alt &&
+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
+'
+
+# when the alternate odb is provided, all commits are listed along with the boundary
+# commit.
+test_expect_success 'rev-list passes with alternate object directory' '
+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
+	test_stdout_line_count = 10 cat actual &&
+	grep $BOUNDARY_COMMIT actual
+'
+
+# When the alternate odb is not provided, rev-list fails since the 5th commit's
+# parent is not present in the main odb.
+test_expect_success 'rev-list fails without alternate object directory' '
+	test_must_fail git -C main rev-list HEAD
+'
+
+# With `--ignore-missing-links`, we stop the traversal when we encounter a
+# missing link. The boundary commit is not listed as we haven't used the
+# `--boundary` options.
+test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
+	git -C main rev-list --ignore-missing-links HEAD >actual &&
+	test_stdout_line_count = 5 cat actual &&
+	! grep -$BOUNDARY_COMMIT actual
+'
+
+# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
+# commits.
+test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
+	git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
+	test_stdout_line_count = 6 cat actual &&
+	grep -$BOUNDARY_COMMIT actual
+'
+
+# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
+# fail when used alongside `--objects` when a tree is missing.
+test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
+	echo "foo" >main/file &&
+	git -C main add file &&
+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
+	TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
+	mkdir alt/${TREE_OID:0:2} &&
+	mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
+	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
+	! grep $TREE_OID actual
+'
+
+# Similar to above, it should also work when a blob is missing.
+test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
+	echo "bar" >main/file &&
+	git -C main add file &&
+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
+	BLOB_OID=$(git -C main rev-parse HEAD:file) &&
+	mkdir alt/${BLOB_OID:0:2} &&
+	mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
+	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
+	! grep $BLOB_OID actual
+'
+
+test_done
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] revision: add `--ignore-missing-links` user option
  2023-09-12 15:58 ` [PATCH v2] " Karthik Nayak
@ 2023-09-12 17:07   ` Taylor Blau
  2023-09-13  9:32     ` Karthik Nayak
  2023-09-15  8:34   ` [PATCH v3] " Karthik Nayak
  1 sibling, 1 reply; 20+ messages in thread
From: Taylor Blau @ 2023-09-12 17:07 UTC (permalink / raw
  To: Karthik Nayak; +Cc: git, gitster

On Tue, Sep 12, 2023 at 05:58:20PM +0200, Karthik Nayak wrote:
> The revision backend is used by multiple porcelain commands such as
> git-rev-list(1) and git-log(1). The backend currently supports ignoring
> missing links by setting the `ignore_missing_links` bit. This allows the
> revision walk to skip any objects links which are missing. Expose this
> bit via an `--ignore-missing-links` user option.
>
> A scenario where this option would be used is to find the boundary
> objects between different object directories. Consider a repository with
> a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
> object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
> repository, enabling this option along with the `--boundary` option for
> while disabling the alternate object directory allows us to find the
> boundary objects between the main and alternate object directory.
>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>
> Changes from v1:
> 1. Changes in the commit message and option description to be more specific
> and list why and what the changes are.
> 2. Ensure the new option also works with the existing `--objects` options.
> 3. More specific testing for boundary commit.
>
> Range diff against v1:
>
> 1:  c0a4dca9b0 ! 1:  e3f4d85732 revision: add `--ignore-missing-links` user option
>     @@ Commit message
>          The revision backend is used by multiple porcelain commands such as
>          git-rev-list(1) and git-log(1). The backend currently supports ignoring
>          missing links by setting the `ignore_missing_links` bit. This allows the
>     -    revision walk to skip any objects links which are missing.
>     +    revision walk to skip any objects links which are missing. Expose this
>     +    bit via an `--ignore-missing-links` user option.
>
>     -    Currently there is no way to use git-rev-list(1) to traverse the objects
>     -    of the main object directory (GIT_OBJECT_DIRECTORY) and print the
>     -    boundary objects when moving from the main object directory to the
>     -    alternate object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES).
>     -
>     -    By exposing this new flag `--ignore-missing-links`, users can set the
>     -    required env variables (GIT_OBJECT_DIRECTORY and
>     -    GIT_ALTERNATE_OBJECT_DIRECTORIES) along with the `--boundary` flag to
>     -    find the boundary objects between object directories.
>     +    A scenario where this option would be used is to find the boundary
>     +    objects between different object directories. Consider a repository with
>     +    a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
>     +    object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
>     +    repository, enabling this option along with the `--boundary` option for
>     +    while disabling the alternate object directory allows us to find the
>     +    boundary objects between the main and alternate object directory.
>
>          Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
>
>     @@ Documentation/rev-list-options.txt: explicitly.
>       	the bad input was not given.
>
>      +--ignore-missing-links::
>     -+	When an object points to another object that is missing, pretend as if the
>     -+	link did not exist. These missing links are not written to stdout unless
>     -+	the --boundary flag is passed.
>     ++	During traversal, if an object that is referenced does not
>     ++	exist, instead of dying of a repository corruption, pretend as
>     ++	if the reference itself does not exist. Running the command
>     ++	with the `--boundary` option makes these missing commits,
>     ++	together with the commits on the edge of revision ranges
>     ++	(i.e. true boundary objects), appear on the output, prefixed
>     ++	with '-'.
>      +
>       ifndef::git-rev-list[]
>       --bisect::
>       	Pretend as if the bad bisection ref `refs/bisect/bad`
>
>     + ## builtin/rev-list.c ##
>     +@@ builtin/rev-list.c: static int finish_object(struct object *obj, const char *name UNUSED,
>     + {
>     + 	struct rev_list_info *info = cb_data;
>     + 	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
>     +-		finish_object__ma(obj);
>     ++		if (!info->revs->ignore_missing_links)
>     ++			finish_object__ma(obj);
>     + 		return 1;
>     + 	}
>     + 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
>     +
>       ## revision.c ##
>      @@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
>       		revs->limited = 1;
>     @@ t/t6022-rev-list-alternates.sh (new)
>      +test_expect_success 'create repository and alternate directory' '
>      +	git init main &&
>      +	test_commit_bulk -C main 5 &&
>     ++	BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
>      +	mkdir alt &&
>      +	mv main/.git/objects/* alt &&
>      +	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
>      +'
>      +
>     -+# When the alternate odb is provided, all commits are listed.
>     ++# when the alternate odb is provided, all commits are listed along with the boundary
>     ++# commit.
>      +test_expect_success 'rev-list passes with alternate object directory' '
>     -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_stdout_line_count = 10 git -C main rev-list HEAD
>     ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
>     ++	test_stdout_line_count = 10 cat actual &&
>     ++	grep $BOUNDARY_COMMIT actual
>      +'
>      +
>      +# When the alternate odb is not provided, rev-list fails since the 5th commit's
>     @@ t/t6022-rev-list-alternates.sh (new)
>      +'
>      +
>      +# With `--ignore-missing-links`, we stop the traversal when we encounter a
>     -+# missing link.
>     ++# missing link. The boundary commit is not listed as we haven't used the
>     ++# `--boundary` options.
>      +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
>     -+	test_stdout_line_count = 5 git -C main rev-list --ignore-missing-links HEAD
>     ++	git -C main rev-list --ignore-missing-links HEAD >actual &&
>     ++	test_stdout_line_count = 5 cat actual &&
>     ++	! grep -$BOUNDARY_COMMIT actual
>      +'
>      +
>      +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
>      +# commits.
>      +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
>     -+	git -C main rev-list --ignore-missing-links --boundary HEAD >list-output &&
>     -+	test_stdout_line_count = 6 cat list-output &&
>     -+	test_stdout_line_count = 1 cat list-output | grep "^-"
>     ++	git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
>     ++	test_stdout_line_count = 6 cat actual &&
>     ++	grep -$BOUNDARY_COMMIT actual
>     ++'
>     ++
>     ++# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
>     ++# fail when used alongside `--objects` when a tree is missing.
>     ++test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
>     ++	echo "foo" >main/file &&
>     ++	git -C main add file &&
>     ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
>     ++	TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
>     ++	mkdir alt/${TREE_OID:0:2} &&
>     ++	mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
>     ++	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
>     ++	! grep $TREE_OID actual
>     ++'
>     ++
>     ++# Similar to above, it should also work when a blob is missing.
>     ++test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
>     ++	echo "bar" >main/file &&
>     ++	git -C main add file &&
>     ++	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
>     ++	BLOB_OID=$(git -C main rev-parse HEAD:file) &&
>     ++	mkdir alt/${BLOB_OID:0:2} &&
>     ++	mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
>     ++	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
>     ++	! grep $BLOB_OID actual
>      +'
>      +
>      +test_done
>
>
>  Documentation/rev-list-options.txt |  9 ++++
>  builtin/rev-list.c                 |  3 +-
>  revision.c                         |  2 +
>  t/t6022-rev-list-alternates.sh     | 75 ++++++++++++++++++++++++++++++
>  4 files changed, 88 insertions(+), 1 deletion(-)
>  create mode 100755 t/t6022-rev-list-alternates.sh
>
> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index a4a0cb93b2..8ee713db3d 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -227,6 +227,15 @@ explicitly.
>  	Upon seeing an invalid object name in the input, pretend as if
>  	the bad input was not given.
>
> +--ignore-missing-links::
> +	During traversal, if an object that is referenced does not
> +	exist, instead of dying of a repository corruption, pretend as
> +	if the reference itself does not exist. Running the command
> +	with the `--boundary` option makes these missing commits,
> +	together with the commits on the edge of revision ranges
> +	(i.e. true boundary objects), appear on the output, prefixed
> +	with '-'.
> +
>  ifndef::git-rev-list[]
>  --bisect::
>  	Pretend as if the bad bisection ref `refs/bisect/bad`
> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index ff715d6918..5239d83c76 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -266,7 +266,8 @@ static int finish_object(struct object *obj, const char *name UNUSED,
>  {
>  	struct rev_list_info *info = cb_data;
>  	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
> -		finish_object__ma(obj);
> +		if (!info->revs->ignore_missing_links)
> +			finish_object__ma(obj);
>  		return 1;
>  	}
>  	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)


> diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
> new file mode 100755
> index 0000000000..08d9ffde5f
> --- /dev/null
> +++ b/t/t6022-rev-list-alternates.sh
> @@ -0,0 +1,75 @@
> +#!/bin/sh
> +
> +test_description='handling of alternates in rev-list'
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> +. ./test-lib.sh
> +
> +# We create 5 commits and move them to the alt directory and
> +# create 5 more commits which will stay in the main odb.
> +test_expect_success 'create repository and alternate directory' '
> +	git init main &&

We don't necessarily have to initialize a repository, as the test suite
already does so for us. So we may want to write this instead as:

    test_commit_bulk 5 &&
    git clone --reference=. --shared . alt &&
    test_commit_bulk -C alt --start=6 5

> +# when the alternate odb is provided, all commits are listed along with the boundary
> +# commit.
> +test_expect_success 'rev-list passes with alternate object directory' '
> +	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
> +	test_stdout_line_count = 10 cat actual &&
> +	grep $BOUNDARY_COMMIT actual
> +'

Here, I think we'd want to make sure that we have not just 10 lines of
output, but that they are the 10 that we expect, like so:

    git -C alt rev-list --all --objects --no-object-names >actual.raw &&
    {
      git rev-list --all --objects --no-object-names &&
      git -C alt rev-list --all --objects --no-object-names --not \
        --alternate-refs
    } >expect.raw &&
    sort actual.raw >actual &&
    sort expect.raw >expect &&
    test_cmp expect actual

When reviewing this and tweaking some of the tests locally, I found it
useful to have some convenience functions like "hide_alternates" and
"show_alternates" to control whether or not "alt" could see its
alternate or not.

From my review locally, the resulting changes (which can be applied
directly on top of your patch here look like):

--- 8< ---
diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
index 08d9ffde5f..ef4231b2de 100755
--- a/t/t6022-rev-list-alternates.sh
+++ b/t/t6022-rev-list-alternates.sh
@@ -8,68 +8,86 @@ TEST_PASSES_SANITIZE_LEAK=true
 # We create 5 commits and move them to the alt directory and
 # create 5 more commits which will stay in the main odb.
 test_expect_success 'create repository and alternate directory' '
-	git init main &&
-	test_commit_bulk -C main 5 &&
-	BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
-	mkdir alt &&
-	mv main/.git/objects/* alt &&
-	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
+	test_commit_bulk 5 &&
+	git clone --reference=. --shared . alt &&
+	test_commit_bulk --start=6 -C alt 5
 '

 # when the alternate odb is provided, all commits are listed along with the boundary
 # commit.
 test_expect_success 'rev-list passes with alternate object directory' '
-	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
-	test_stdout_line_count = 10 cat actual &&
-	grep $BOUNDARY_COMMIT actual
+	git -C alt rev-list --all --objects --no-object-names >actual.raw &&
+	{
+		git rev-list --all --objects --no-object-names &&
+		git -C alt rev-list --all --objects --no-object-names --not \
+			--alternate-refs
+	} >expect.raw &&
+	sort actual.raw >actual &&
+	sort expect.raw >expect &&
+	test_cmp expect actual
 '

+alt=alt/.git/objects/info/alternates
+
+hide_alternates () {
+	test -f "$alt.bak" || mv "$alt" "$alt.bak"
+}
+
+show_alternates () {
+	test -f "$alt" || mv "$alt.bak" "$alt"
+}
+
 # When the alternate odb is not provided, rev-list fails since the 5th commit's
 # parent is not present in the main odb.
 test_expect_success 'rev-list fails without alternate object directory' '
-	test_must_fail git -C main rev-list HEAD
+	hide_alternates &&
+	test_must_fail git -C alt rev-list HEAD
 '

 # With `--ignore-missing-links`, we stop the traversal when we encounter a
 # missing link. The boundary commit is not listed as we haven't used the
 # `--boundary` options.
 test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
-	git -C main rev-list --ignore-missing-links HEAD >actual &&
-	test_stdout_line_count = 5 cat actual &&
-	! grep -$BOUNDARY_COMMIT actual
+	hide_alternates &&
+
+	git -C alt rev-list --objects --no-object-names \
+		--ignore-missing-links HEAD >actual.raw &&
+	git -C alt cat-file --batch-check="%(objectname)" \
+		--batch-all-objects >expect.raw &&
+
+	sort actual.raw >actual &&
+	sort expect.raw >expect &&
+	test_cmp expect actual
 '

 # With `--ignore-missing-links` and `--boundary`, we can even print those boundary
 # commits.
 test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
-	git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
-	test_stdout_line_count = 6 cat actual &&
-	grep -$BOUNDARY_COMMIT actual
+	git -C alt rev-list --ignore-missing-links --boundary HEAD >got &&
+	grep "^-$(git rev-parse HEAD)" got
 '

-# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
-# fail when used alongside `--objects` when a tree is missing.
-test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
-	echo "foo" >main/file &&
-	git -C main add file &&
-	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
-	TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
-	mkdir alt/${TREE_OID:0:2} &&
-	mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
-	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
-	! grep $TREE_OID actual
+test_expect_success "setup for rev-list --ignore-missing-links with missing objects" '
+	show_alternates &&
+	test_commit -C alt 11
 '

-# Similar to above, it should also work when a blob is missing.
-test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
-	echo "bar" >main/file &&
-	git -C main add file &&
-	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
-	BLOB_OID=$(git -C main rev-parse HEAD:file) &&
-	mkdir alt/${BLOB_OID:0:2} &&
-	mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
-	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
-	! grep $BLOB_OID actual
-'
+for obj in "HEAD^{tree}" "HEAD:11.t"
+do
+	# The `--ignore-missing-links` option should ensure that git-rev-list(1)
+	# doesn't fail when used alongside `--objects` when a tree/blob is
+	# missing.
+	test_expect_success "rev-list --ignore-missing-links with missing $type" '
+		oid="$(git -C alt rev-parse $obj)" &&
+		path="alt/.git/objects/$(test_oid_to_path $oid)" &&
+
+		mv "$path" "$path.hidden" &&
+		test_when_finished "mv $path.hidden $path" &&
+
+		git -C alt rev-list --ignore-missing-links --objects HEAD \
+			>actual &&
+		! grep $oid actual
+	'
+done

 test_done
--- >8 ---

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] revision: add `--ignore-missing-links` user option
  2023-09-12 17:07   ` Taylor Blau
@ 2023-09-13  9:32     ` Karthik Nayak
  2023-09-13 17:17       ` Taylor Blau
  0 siblings, 1 reply; 20+ messages in thread
From: Karthik Nayak @ 2023-09-13  9:32 UTC (permalink / raw
  To: Taylor Blau; +Cc: git, gitster

On Tue, Sep 12, 2023 at 7:07 PM Taylor Blau <me@ttaylorr.com> wrote:
> > +# We create 5 commits and move them to the alt directory and
> > +# create 5 more commits which will stay in the main odb.
> > +test_expect_success 'create repository and alternate directory' '
> > +     git init main &&
>
> We don't necessarily have to initialize a repository, as the test suite
> already does so for us. So we may want to write this instead as:
>
>     test_commit_bulk 5 &&
>     git clone --reference=. --shared . alt &&
>     test_commit_bulk -C alt --start=6 5
>

I was trying to use the env variable `GIT_ALTERNATE_OBJECT_DIRECTORIES` and
get hence ended up creating a new repository. But I really like the
convenience functions
that you've suggested below. With that, this seems like the way to go.

> > +# when the alternate odb is provided, all commits are listed along with the boundary
> > +# commit.
> > +test_expect_success 'rev-list passes with alternate object directory' '
> > +     GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
> > +     test_stdout_line_count = 10 cat actual &&
> > +     grep $BOUNDARY_COMMIT actual
> > +'
>
> Here, I think we'd want to make sure that we have not just 10 lines of
> output, but that they are the 10 that we expect, like so:
>
>     git -C alt rev-list --all --objects --no-object-names >actual.raw &&
>     {
>       git rev-list --all --objects --no-object-names &&
>       git -C alt rev-list --all --objects --no-object-names --not \
>         --alternate-refs
>     } >expect.raw &&
>     sort actual.raw >actual &&
>     sort expect.raw >expect &&
>     test_cmp expect actual
>
> When reviewing this and tweaking some of the tests locally, I found it
> useful to have some convenience functions like "hide_alternates" and
> "show_alternates" to control whether or not "alt" could see its
> alternate or not.
>
> From my review locally, the resulting changes (which can be applied
> directly on top of your patch here look like):
>

This is much better. I didn't know about `test_oid_to_path` and
`test_when_finished`, and overall
your patch looks much nicer and is more thorough in the testing. I'll
add it to the next version.
Will wait a day or two for more feedback before I submit v3.

Thanks again for your review and the patch :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2] revision: add `--ignore-missing-links` user option
  2023-09-13  9:32     ` Karthik Nayak
@ 2023-09-13 17:17       ` Taylor Blau
  0 siblings, 0 replies; 20+ messages in thread
From: Taylor Blau @ 2023-09-13 17:17 UTC (permalink / raw
  To: Karthik Nayak; +Cc: git, gitster

On Wed, Sep 13, 2023 at 11:32:13AM +0200, Karthik Nayak wrote:
> > From my review locally, the resulting changes (which can be applied
> > directly on top of your patch here look like):
>
> This is much better. I didn't know about `test_oid_to_path` and
> `test_when_finished`, and overall your patch looks much nicer and is
> more thorough in the testing. I'll add it to the next version.

Thanks for folding it in! I felt bad that I might have stepped on your
toes by saying "here's how I would do it", but by the time I had applied
your patch locally to review it, I had already generated the
aforementioned diff.

> Will wait a day or two for more feedback before I submit v3.

Sounds like a plan :-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3] revision: add `--ignore-missing-links` user option
  2023-09-12 15:58 ` [PATCH v2] " Karthik Nayak
  2023-09-12 17:07   ` Taylor Blau
@ 2023-09-15  8:34   ` Karthik Nayak
  2023-09-15 18:54     ` Junio C Hamano
  2023-09-20 10:45     ` [PATCH v4] " Karthik Nayak
  1 sibling, 2 replies; 20+ messages in thread
From: Karthik Nayak @ 2023-09-15  8:34 UTC (permalink / raw
  To: karthik.188; +Cc: git, gitster, me

From: Karthik Nayak <karthik.188@gmail.com>

The revision backend is used by multiple porcelain commands such as
git-rev-list(1) and git-log(1). The backend currently supports ignoring
missing links by setting the `ignore_missing_links` bit. This allows the
revision walk to skip any objects links which are missing. Expose this
bit via an `--ignore-missing-links` user option.

A scenario where this option would be used is to find the boundary
objects between different object directories. Consider a repository with
a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
repository, enabling this option along with the `--boundary` option
while disabling the alternate object directory allows us to find the
boundary objects between the main and alternate object directory.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---

Changes since v2:
- Refactored the tests thanks to Taylor! 

Range diff against version 2:

 1:  e3f4d85732 ! 1:  a08f3637a0 revision: add `--ignore-missing-links` user option
    @@ Commit message
         while disabling the alternate object directory allows us to find the
         boundary objects between the main and alternate object directory.
     
    +    Helped-by: Taylor Blau <me@ttaylorr.com>
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
      ## Documentation/rev-list-options.txt ##
    @@ t/t6022-rev-list-alternates.sh (new)
     +# We create 5 commits and move them to the alt directory and
     +# create 5 more commits which will stay in the main odb.
     +test_expect_success 'create repository and alternate directory' '
    -+	git init main &&
    -+	test_commit_bulk -C main 5 &&
    -+	BOUNDARY_COMMIT=$(git -C main rev-parse HEAD) &&
    -+	mkdir alt &&
    -+	mv main/.git/objects/* alt &&
    -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt test_commit_bulk --start=6 -C main 5
    ++	test_commit_bulk 5 &&
    ++	git clone --reference=. --shared . alt &&
    ++	test_commit_bulk --start=6 -C alt 5
     +'
     +
     +# when the alternate odb is provided, all commits are listed along with the boundary
     +# commit.
     +test_expect_success 'rev-list passes with alternate object directory' '
    -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main rev-list HEAD >actual &&
    -+	test_stdout_line_count = 10 cat actual &&
    -+	grep $BOUNDARY_COMMIT actual
    ++	git -C alt rev-list --all --objects --no-object-names >actual.raw &&
    ++	{
    ++		git rev-list --all --objects --no-object-names &&
    ++		git -C alt rev-list --all --objects --no-object-names --not \
    ++			--alternate-refs
    ++	} >expect.raw &&
    ++	sort actual.raw >actual &&
    ++	sort expect.raw >expect &&
    ++	test_cmp expect actual
     +'
     +
    ++alt=alt/.git/objects/info/alternates
    ++
    ++hide_alternates () {
    ++	test -f "$alt.bak" || mv "$alt" "$alt.bak"
    ++}
    ++
    ++show_alternates () {
    ++	test -f "$alt" || mv "$alt.bak" "$alt"
    ++}
    ++
     +# When the alternate odb is not provided, rev-list fails since the 5th commit's
     +# parent is not present in the main odb.
     +test_expect_success 'rev-list fails without alternate object directory' '
    -+	test_must_fail git -C main rev-list HEAD
    ++	hide_alternates &&
    ++	test_must_fail git -C alt rev-list HEAD
     +'
     +
     +# With `--ignore-missing-links`, we stop the traversal when we encounter a
     +# missing link. The boundary commit is not listed as we haven't used the
     +# `--boundary` options.
     +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
    -+	git -C main rev-list --ignore-missing-links HEAD >actual &&
    -+	test_stdout_line_count = 5 cat actual &&
    -+	! grep -$BOUNDARY_COMMIT actual
    ++	hide_alternates &&
    ++
    ++	git -C alt rev-list --objects --no-object-names \
    ++		--ignore-missing-links HEAD >actual.raw &&
    ++	git -C alt cat-file  --batch-check="%(objectname)" \
    ++		--batch-all-objects >expect.raw &&
    ++
    ++	sort actual.raw >actual &&
    ++	sort expect.raw >expect &&
    ++	test_must_fail git -C alt rev-list HEAD
     +'
     +
     +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
     +# commits.
     +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
    -+	git -C main rev-list --ignore-missing-links --boundary HEAD >actual &&
    -+	test_stdout_line_count = 6 cat actual &&
    -+	grep -$BOUNDARY_COMMIT actual
    ++	git -C alt rev-list --ignore-missing-links --boundary HEAD >got &&
    ++	grep "^-$(git rev-parse HEAD)" got
     +'
     +
    -+# The `--ignore-missing-links` option should ensure that git-rev-list(1) doesn't
    -+# fail when used alongside `--objects` when a tree is missing.
    -+test_expect_success 'rev-list --ignore-missing-links works with missing tree' '
    -+	echo "foo" >main/file &&
    -+	git -C main add file &&
    -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 11" &&
    -+	TREE_OID=$(git -C main rev-parse HEAD^{tree}) &&
    -+	mkdir alt/${TREE_OID:0:2} &&
    -+	mv main/.git/objects/${TREE_OID:0:2}/${TREE_OID:2} alt/${TREE_OID:0:2}/ &&
    -+	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
    -+	! grep $TREE_OID actual
    ++test_expect_success "setup for rev-list --ignore-missing-links with missing objects" '
    ++	show_alternates &&
    ++	test_commit -C alt 11
     +'
     +
    -+# Similar to above, it should also work when a blob is missing.
    -+test_expect_success 'rev-list --ignore-missing-links works with missing blob' '
    -+	echo "bar" >main/file &&
    -+	git -C main add file &&
    -+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$PWD/alt git -C main commit -m"commit 12" &&
    -+	BLOB_OID=$(git -C main rev-parse HEAD:file) &&
    -+	mkdir alt/${BLOB_OID:0:2} &&
    -+	mv main/.git/objects/${BLOB_OID:0:2}/${BLOB_OID:2} alt/${BLOB_OID:0:2}/ &&
    -+	git -C main rev-list --ignore-missing-links --objects HEAD >actual &&
    -+	! grep $BLOB_OID actual
    -+'
    ++for obj in "HEAD^{tree}" "HEAD:11.t"
    ++do
    ++	# The `--ignore-missing-links` option should ensure that git-rev-list(1)
    ++	# doesn't fail when used alongside `--objects` when a tree/blob is
    ++	# missing.
    ++	test_expect_success "rev-list --ignore-missing-links with missing $type" '
    ++		oid="$(git -C alt rev-parse $obj)" &&
    ++		path="alt/.git/objects/$(test_oid_to_path $oid)" &&
    ++
    ++		mv "$path" "$path.hidden" &&
    ++		test_when_finished "mv $path.hidden $path" &&
    ++
    ++		git -C alt rev-list --ignore-missing-links --objects HEAD \
    ++			>actual &&
    ++		! grep $oid actual
    ++       '
    ++done
     +
     +test_done


 Documentation/rev-list-options.txt |  9 +++
 builtin/rev-list.c                 |  3 +-
 revision.c                         |  2 +
 t/t6022-rev-list-alternates.sh     | 93 ++++++++++++++++++++++++++++++
 4 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100755 t/t6022-rev-list-alternates.sh

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a4a0cb93b2..8ee713db3d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -227,6 +227,15 @@ explicitly.
 	Upon seeing an invalid object name in the input, pretend as if
 	the bad input was not given.
 
+--ignore-missing-links::
+	During traversal, if an object that is referenced does not
+	exist, instead of dying of a repository corruption, pretend as
+	if the reference itself does not exist. Running the command
+	with the `--boundary` option makes these missing commits,
+	together with the commits on the edge of revision ranges
+	(i.e. true boundary objects), appear on the output, prefixed
+	with '-'.
+
 ifndef::git-rev-list[]
 --bisect::
 	Pretend as if the bad bisection ref `refs/bisect/bad`
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index ff715d6918..5239d83c76 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -266,7 +266,8 @@ static int finish_object(struct object *obj, const char *name UNUSED,
 {
 	struct rev_list_info *info = cb_data;
 	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
-		finish_object__ma(obj);
+		if (!info->revs->ignore_missing_links)
+			finish_object__ma(obj);
 		return 1;
 	}
 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
diff --git a/revision.c b/revision.c
index 2f4c53ea20..cbfcbf6e28 100644
--- a/revision.c
+++ b/revision.c
@@ -2595,6 +2595,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->limited = 1;
 	} else if (!strcmp(arg, "--ignore-missing")) {
 		revs->ignore_missing = 1;
+	} else if (!strcmp(arg, "--ignore-missing-links")) {
+		revs->ignore_missing_links = 1;
 	} else if (opt && opt->allow_exclude_promisor_objects &&
 		   !strcmp(arg, "--exclude-promisor-objects")) {
 		if (fetch_if_missing)
diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
new file mode 100755
index 0000000000..567dd21876
--- /dev/null
+++ b/t/t6022-rev-list-alternates.sh
@@ -0,0 +1,93 @@
+#!/bin/sh
+
+test_description='handling of alternates in rev-list'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+# We create 5 commits and move them to the alt directory and
+# create 5 more commits which will stay in the main odb.
+test_expect_success 'create repository and alternate directory' '
+	test_commit_bulk 5 &&
+	git clone --reference=. --shared . alt &&
+	test_commit_bulk --start=6 -C alt 5
+'
+
+# when the alternate odb is provided, all commits are listed along with the boundary
+# commit.
+test_expect_success 'rev-list passes with alternate object directory' '
+	git -C alt rev-list --all --objects --no-object-names >actual.raw &&
+	{
+		git rev-list --all --objects --no-object-names &&
+		git -C alt rev-list --all --objects --no-object-names --not \
+			--alternate-refs
+	} >expect.raw &&
+	sort actual.raw >actual &&
+	sort expect.raw >expect &&
+	test_cmp expect actual
+'
+
+alt=alt/.git/objects/info/alternates
+
+hide_alternates () {
+	test -f "$alt.bak" || mv "$alt" "$alt.bak"
+}
+
+show_alternates () {
+	test -f "$alt" || mv "$alt.bak" "$alt"
+}
+
+# When the alternate odb is not provided, rev-list fails since the 5th commit's
+# parent is not present in the main odb.
+test_expect_success 'rev-list fails without alternate object directory' '
+	hide_alternates &&
+	test_must_fail git -C alt rev-list HEAD
+'
+
+# With `--ignore-missing-links`, we stop the traversal when we encounter a
+# missing link. The boundary commit is not listed as we haven't used the
+# `--boundary` options.
+test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
+	hide_alternates &&
+
+	git -C alt rev-list --objects --no-object-names \
+		--ignore-missing-links HEAD >actual.raw &&
+	git -C alt cat-file  --batch-check="%(objectname)" \
+		--batch-all-objects >expect.raw &&
+
+	sort actual.raw >actual &&
+	sort expect.raw >expect &&
+	test_must_fail git -C alt rev-list HEAD
+'
+
+# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
+# commits.
+test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
+	git -C alt rev-list --ignore-missing-links --boundary HEAD >got &&
+	grep "^-$(git rev-parse HEAD)" got
+'
+
+test_expect_success "setup for rev-list --ignore-missing-links with missing objects" '
+	show_alternates &&
+	test_commit -C alt 11
+'
+
+for obj in "HEAD^{tree}" "HEAD:11.t"
+do
+	# The `--ignore-missing-links` option should ensure that git-rev-list(1)
+	# doesn't fail when used alongside `--objects` when a tree/blob is
+	# missing.
+	test_expect_success "rev-list --ignore-missing-links with missing $type" '
+		oid="$(git -C alt rev-parse $obj)" &&
+		path="alt/.git/objects/$(test_oid_to_path $oid)" &&
+
+		mv "$path" "$path.hidden" &&
+		test_when_finished "mv $path.hidden $path" &&
+
+		git -C alt rev-list --ignore-missing-links --objects HEAD \
+			>actual &&
+		! grep $oid actual
+       '
+done
+
+test_done
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] revision: add `--ignore-missing-links` user option
  2023-09-15  8:34   ` [PATCH v3] " Karthik Nayak
@ 2023-09-15 18:54     ` Junio C Hamano
  2023-09-18 10:12       ` Karthik Nayak
  2023-09-20 10:45     ` [PATCH v4] " Karthik Nayak
  1 sibling, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2023-09-15 18:54 UTC (permalink / raw
  To: Karthik Nayak; +Cc: git, me

Karthik Nayak <karthik.188@gmail.com> writes:

> From: Karthik Nayak <karthik.188@gmail.com>
>
> The revision backend is used by multiple porcelain commands such as
> git-rev-list(1) and git-log(1). The backend currently supports ignoring
> missing links by setting the `ignore_missing_links` bit. This allows the
> revision walk to skip any objects links which are missing. Expose this
> bit via an `--ignore-missing-links` user option.

Given the above "we merely surface a feature that already exists and
supported to be used by the end users from the command line" claim ...

> diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> index ff715d6918..5239d83c76 100644
> --- a/builtin/rev-list.c
> +++ b/builtin/rev-list.c
> @@ -266,7 +266,8 @@ static int finish_object(struct object *obj, const char *name UNUSED,
>  {
>  	struct rev_list_info *info = cb_data;
>  	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
> -		finish_object__ma(obj);
> +		if (!info->revs->ignore_missing_links)
> +			finish_object__ma(obj);
>  		return 1;
>  	}

... this hunk is a bit unexpected.  As a low-level plumbing command,
shouldn't it be left to the user who gives --ignore-missing-links
from their command line to specify how the missing "obj" here should
be dealt with by giving the "--missing=<foo>" option?  While giving
"allow-promisor" may not make much sense, "--missing=allow-any" may
of course make sense (it is the same as hardcoding the decision not
to call finish_object__ma() at all), and so may "--missing=print".

Stepping back a bit, with "--missing=print", is this change still
needed?  The missing objects discovered will be shown at the end,
with the setting, no?

Thanks.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] revision: add `--ignore-missing-links` user option
  2023-09-15 18:54     ` Junio C Hamano
@ 2023-09-18 10:12       ` Karthik Nayak
  2023-09-18 15:56         ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Karthik Nayak @ 2023-09-18 10:12 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, me

On Fri, Sep 15, 2023 at 8:54 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Karthik Nayak <karthik.188@gmail.com> writes:
>
> > From: Karthik Nayak <karthik.188@gmail.com>
> >
> > The revision backend is used by multiple porcelain commands such as
> > git-rev-list(1) and git-log(1). The backend currently supports ignoring
> > missing links by setting the `ignore_missing_links` bit. This allows the
> > revision walk to skip any objects links which are missing. Expose this
> > bit via an `--ignore-missing-links` user option.
>
> Given the above "we merely surface a feature that already exists and
> supported to be used by the end users from the command line" claim ...
>
> > diff --git a/builtin/rev-list.c b/builtin/rev-list.c
> > index ff715d6918..5239d83c76 100644
> > --- a/builtin/rev-list.c
> > +++ b/builtin/rev-list.c
> > @@ -266,7 +266,8 @@ static int finish_object(struct object *obj, const char *name UNUSED,
> >  {
> >       struct rev_list_info *info = cb_data;
> >       if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
> > -             finish_object__ma(obj);
> > +             if (!info->revs->ignore_missing_links)
> > +                     finish_object__ma(obj);
> >               return 1;
> >       }
>
> ... this hunk is a bit unexpected.  As a low-level plumbing command,
> shouldn't it be left to the user who gives --ignore-missing-links
> from their command line to specify how the missing "obj" here should
> be dealt with by giving the "--missing=<foo>" option?  While giving
> "allow-promisor" may not make much sense, "--missing=allow-any" may
> of course make sense (it is the same as hardcoding the decision not
> to call finish_object__ma() at all), and so may "--missing=print".
>

This is to be expected, in my opinion. In terms of revision.c and setting the
`revs->ignore_missing_links` bit, the traversal will go throw all
objects (commits
and otherwise) and call `show_commit` or `show_object` on them.

Here there is a difference for commits and non-commit objects.
1. Commit objects: commits are parsed in revision.c and after that the
`show_commit`
function is called only when the object is available.
2. Non-commit objects: while trees are parsed in revision.c, blobs are
never parsed and
hence, ` show_object` can be called on missing blobs. This is left to
the user to handle. In
our case, we error out in `rev-list.c`, which is not what we want when using the
`--ignore-missing-links` option. Hence, this addition.

There is an argument to be made around compatibility between the
`--missing` option
and `--ignore-missing-links` option, but since the former only works
with non-commit objects
I think the latter should be independent, and also the latter is about
ignoring all missing links.
I also don't think the user should again specify what to do with
missing links by adding
`--missing=allow-any` as `--ignore-missing-links` is a superset of it.

> Stepping back a bit, with "--missing=print", is this change still
> needed?  The missing objects discovered will be shown at the end,
> with the setting, no?
>

The main difference is that the `--missing` options works entirely
with non-commit
objects (I'm assuming this was built with promisor notes in mind). So
if a commit is
missing, git-rev-list(1) will still barf an error, but this error
handling is not in
`builtin/rev-list.c` rather is in a layer above in `revision.c`. So it
wouldn't be trivial for
the `--missing` option to support missing commit links. So that's why we expose
`--ignore-missing-links` which ensures any kind of object (commits
included) if missing,
is ignored.

Thanks for the review!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] revision: add `--ignore-missing-links` user option
  2023-09-18 10:12       ` Karthik Nayak
@ 2023-09-18 15:56         ` Junio C Hamano
  2023-09-19  8:45           ` Karthik Nayak
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2023-09-18 15:56 UTC (permalink / raw
  To: Karthik Nayak; +Cc: git, me

Karthik Nayak <karthik.188@gmail.com> writes:

>> Given the above "we merely surface a feature that already exists and
>> supported to be used by the end users from the command line" claim ...
>>
>> > diff --git a/builtin/rev-list.c b/builtin/rev-list.c
>> > index ff715d6918..5239d83c76 100644
>> > --- a/builtin/rev-list.c
>> > +++ b/builtin/rev-list.c
>> > @@ -266,7 +266,8 @@ static int finish_object(struct object *obj, const char *name UNUSED,
>> >  {
>> >       struct rev_list_info *info = cb_data;
>> >       if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
>> > -             finish_object__ma(obj);
>> > +             if (!info->revs->ignore_missing_links)
>> > +                     finish_object__ma(obj);
>> >               return 1;
>> >       }
>>
>> ... this hunk is a bit unexpected.  As a low-level plumbing command,
>> shouldn't it be left to the user who gives --ignore-missing-links
>> from their command line to specify how the missing "obj" here should
>> be dealt with by giving the "--missing=<foo>" option?  While giving
>> "allow-promisor" may not make much sense, "--missing=allow-any" may
>> of course make sense (it is the same as hardcoding the decision not
>> to call finish_object__ma() at all), and so may "--missing=print".
>>
>
> This is to be expected, in my opinion. In terms of revision.c and
> setting the `revs->ignore_missing_links` bit, the traversal will
> go throw all objects (commits and otherwise) and call
> `show_commit` or `show_object` on them.

Yes.  And the user can choose how to handle such an object here by
telling finish_object__ma() with the --missing=<how> option, so
letting them do so, instead of robbing the choice from them, would
be a more flexible design here, right?

> if a commit is
> missing, git-rev-list(1) will still barf an error, but this error

OK, yeah, I do see the need for setting the ignore-missing-links bit
for what you are doing, and --missing and --ignore-missing-links are
orthogonal options.  Getting rid of the hardcoded skipping of
finish_object__ma() would make sense from this angle, too.

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] revision: add `--ignore-missing-links` user option
  2023-09-18 15:56         ` Junio C Hamano
@ 2023-09-19  8:45           ` Karthik Nayak
  2023-09-19 15:13             ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Karthik Nayak @ 2023-09-19  8:45 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, me

On Mon, Sep 18, 2023 at 5:56 PM Junio C Hamano <gitster@pobox.com> wrote:
> Karthik Nayak <karthik.188@gmail.com> writes:
> > This is to be expected, in my opinion. In terms of revision.c and
> > setting the `revs->ignore_missing_links` bit, the traversal will
> > go throw all objects (commits and otherwise) and call
> > `show_commit` or `show_object` on them.
>
> Yes.  And the user can choose how to handle such an object here by
> telling finish_object__ma() with the --missing=<how> option, so
> letting them do so, instead of robbing the choice from them, would
> be a more flexible design here, right?
>
> > if a commit is
> > missing, git-rev-list(1) will still barf an error, but this error
>
> OK, yeah, I do see the need for setting the ignore-missing-links bit
> for what you are doing, and --missing and --ignore-missing-links are
> orthogonal options.  Getting rid of the hardcoded skipping of
> finish_object__ma() would make sense from this angle, too.

Well. The only problem is that setting `ignore_missing_links` bit never calls
`show_commit` for missing commits (since commits are pre-parsed in revision.c).
So to keep that behavior consistent for non-commit objects, I hardcoded the
skipping of `finish_object__ma()` in `show_object`.

If I remove the hardcoding, it would mean that `--ignore-missing-links` would
skip missing commits but for non-commits objects, the user would have to pass
`--missing=allow-any` else rev-list would still error out with a
missing object error.

Don't you think this would be confusing for the user?
I'm happy to send a revised version removing this hardcoding if you still think
otherwise :)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3] revision: add `--ignore-missing-links` user option
  2023-09-19  8:45           ` Karthik Nayak
@ 2023-09-19 15:13             ` Junio C Hamano
  0 siblings, 0 replies; 20+ messages in thread
From: Junio C Hamano @ 2023-09-19 15:13 UTC (permalink / raw
  To: Karthik Nayak; +Cc: git, me

Karthik Nayak <karthik.188@gmail.com> writes:

> If I remove the hardcoding, it would mean that
> `--ignore-missing-links` would skip missing commits but for
> non-commits objects, the user would have to pass
> `--missing=allow-any` else rev-list would still error out with a
> missing object error.
>
> Don't you think this would be confusing for the user?  I'm happy
> to send a revised version removing this hardcoding if you still
> think otherwise :)

Yes.  This is an example of flexibility and ergonomics at odds, and
for a low-level plumbing like rev-list, I would prefer not to limit
the flexibility unnecessarily.

I do not care about the ability to pass allow-any here.  But when
you traverse a range A..B with the --ignore-missing-links option,
the reporting mechanism based on the --boundary cannot tell which
ones are at the usual "traversal boundaries" and which ones are ones
beyond the broken links, can it?  If you allowed the users to pass
'print', then those reported with '?' prefix would be the missing
ones.  The ones that are reported with '-' prefix may still be
mixture of the two kinds, but you can now subtract one set from the
other set to see which ones are true boundaries and which ones are
missing.  The hardcoded "we do not let __ma() logic to kick in"
makes it impossible, which is what I find disturbing.

Thanks.





^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4] revision: add `--ignore-missing-links` user option
  2023-09-15  8:34   ` [PATCH v3] " Karthik Nayak
  2023-09-15 18:54     ` Junio C Hamano
@ 2023-09-20 10:45     ` Karthik Nayak
  2023-09-20 15:32       ` Junio C Hamano
  1 sibling, 1 reply; 20+ messages in thread
From: Karthik Nayak @ 2023-09-20 10:45 UTC (permalink / raw
  To: karthik.188; +Cc: git, gitster, me

The revision backend is used by multiple porcelain commands such as
git-rev-list(1) and git-log(1). The backend currently supports ignoring
missing links by setting the `ignore_missing_links` bit. This allows the
revision walk to skip any objects links which are missing. Expose this
bit via an `--ignore-missing-links` user option.

A scenario where this option would be used is to find the boundary
objects between different object directories. Consider a repository with
a main object directory (GIT_OBJECT_DIRECTORY) and one or more alternate
object directories (GIT_ALTERNATE_OBJECT_DIRECTORIES). In such a
repository, enabling this option along with the `--boundary` option for
while disabling the alternate object directory allows us to find the
boundary objects between the main and alternate object directory.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---

Changes from v3:
1. Remove hard-coded skipping of finish_object__ma(...). This means that
`--ignore-missing-links` needs to be used with `--missing=...` for missing
non-commit objects, but also now provides the flexibility to the user instead.
Fixes to the tests around this.
2. Fix an incorrect test.
3. Capitalize first character in test's comment.

Range diff from v4

1:  a08f3637a0 ! 1:  639a8cc385 revision: add `--ignore-missing-links` user option
    @@ Documentation/rev-list-options.txt: explicitly.
      --bisect::
      	Pretend as if the bad bisection ref `refs/bisect/bad`
     
    - ## builtin/rev-list.c ##
    -@@ builtin/rev-list.c: static int finish_object(struct object *obj, const char *name UNUSED,
    - {
    - 	struct rev_list_info *info = cb_data;
    - 	if (oid_object_info_extended(the_repository, &obj->oid, NULL, 0) < 0) {
    --		finish_object__ma(obj);
    -+		if (!info->revs->ignore_missing_links)
    -+			finish_object__ma(obj);
    - 		return 1;
    - 	}
    - 	if (info->revs->verify_objects && !obj->parsed && obj->type != OBJ_COMMIT)
    -
      ## revision.c ##
     @@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
      		revs->limited = 1;
    @@ t/t6022-rev-list-alternates.sh (new)
     +	test_commit_bulk --start=6 -C alt 5
     +'
     +
    -+# when the alternate odb is provided, all commits are listed along with the boundary
    ++# When the alternate odb is provided, all commits are listed along with the boundary
     +# commit.
     +test_expect_success 'rev-list passes with alternate object directory' '
     +	git -C alt rev-list --all --objects --no-object-names >actual.raw &&
    @@ t/t6022-rev-list-alternates.sh (new)
     +	hide_alternates &&
     +
     +	git -C alt rev-list --objects --no-object-names \
    -+		--ignore-missing-links HEAD >actual.raw &&
    ++		--ignore-missing-links --missing=allow-any HEAD >actual.raw &&
     +	git -C alt cat-file  --batch-check="%(objectname)" \
     +		--batch-all-objects >expect.raw &&
     +
     +	sort actual.raw >actual &&
     +	sort expect.raw >expect &&
    -+	test_must_fail git -C alt rev-list HEAD
    ++	test_cmp expect actual
     +'
     +
     +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
    @@ t/t6022-rev-list-alternates.sh (new)
     +		mv "$path" "$path.hidden" &&
     +		test_when_finished "mv $path.hidden $path" &&
     +
    -+		git -C alt rev-list --ignore-missing-links --objects HEAD \
    ++		git -C alt rev-list --ignore-missing-links --missing=allow-any --objects HEAD \
     +			>actual &&
     +		! grep $oid actual
     +       '

 Documentation/rev-list-options.txt |  9 +++
 revision.c                         |  2 +
 t/t6022-rev-list-alternates.sh     | 93 ++++++++++++++++++++++++++++++
 3 files changed, 104 insertions(+)
 create mode 100755 t/t6022-rev-list-alternates.sh

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index a4a0cb93b2..8ee713db3d 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -227,6 +227,15 @@ explicitly.
 	Upon seeing an invalid object name in the input, pretend as if
 	the bad input was not given.
 
+--ignore-missing-links::
+	During traversal, if an object that is referenced does not
+	exist, instead of dying of a repository corruption, pretend as
+	if the reference itself does not exist. Running the command
+	with the `--boundary` option makes these missing commits,
+	together with the commits on the edge of revision ranges
+	(i.e. true boundary objects), appear on the output, prefixed
+	with '-'.
+
 ifndef::git-rev-list[]
 --bisect::
 	Pretend as if the bad bisection ref `refs/bisect/bad`
diff --git a/revision.c b/revision.c
index 2f4c53ea20..cbfcbf6e28 100644
--- a/revision.c
+++ b/revision.c
@@ -2595,6 +2595,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->limited = 1;
 	} else if (!strcmp(arg, "--ignore-missing")) {
 		revs->ignore_missing = 1;
+	} else if (!strcmp(arg, "--ignore-missing-links")) {
+		revs->ignore_missing_links = 1;
 	} else if (opt && opt->allow_exclude_promisor_objects &&
 		   !strcmp(arg, "--exclude-promisor-objects")) {
 		if (fetch_if_missing)
diff --git a/t/t6022-rev-list-alternates.sh b/t/t6022-rev-list-alternates.sh
new file mode 100755
index 0000000000..9ba739c830
--- /dev/null
+++ b/t/t6022-rev-list-alternates.sh
@@ -0,0 +1,93 @@
+#!/bin/sh
+
+test_description='handling of alternates in rev-list'
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+# We create 5 commits and move them to the alt directory and
+# create 5 more commits which will stay in the main odb.
+test_expect_success 'create repository and alternate directory' '
+	test_commit_bulk 5 &&
+	git clone --reference=. --shared . alt &&
+	test_commit_bulk --start=6 -C alt 5
+'
+
+# When the alternate odb is provided, all commits are listed along with the boundary
+# commit.
+test_expect_success 'rev-list passes with alternate object directory' '
+	git -C alt rev-list --all --objects --no-object-names >actual.raw &&
+	{
+		git rev-list --all --objects --no-object-names &&
+		git -C alt rev-list --all --objects --no-object-names --not \
+			--alternate-refs
+	} >expect.raw &&
+	sort actual.raw >actual &&
+	sort expect.raw >expect &&
+	test_cmp expect actual
+'
+
+alt=alt/.git/objects/info/alternates
+
+hide_alternates () {
+	test -f "$alt.bak" || mv "$alt" "$alt.bak"
+}
+
+show_alternates () {
+	test -f "$alt" || mv "$alt.bak" "$alt"
+}
+
+# When the alternate odb is not provided, rev-list fails since the 5th commit's
+# parent is not present in the main odb.
+test_expect_success 'rev-list fails without alternate object directory' '
+	hide_alternates &&
+	test_must_fail git -C alt rev-list HEAD
+'
+
+# With `--ignore-missing-links`, we stop the traversal when we encounter a
+# missing link. The boundary commit is not listed as we haven't used the
+# `--boundary` options.
+test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
+	hide_alternates &&
+
+	git -C alt rev-list --objects --no-object-names \
+		--ignore-missing-links --missing=allow-any HEAD >actual.raw &&
+	git -C alt cat-file  --batch-check="%(objectname)" \
+		--batch-all-objects >expect.raw &&
+
+	sort actual.raw >actual &&
+	sort expect.raw >expect &&
+	test_cmp expect actual
+'
+
+# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
+# commits.
+test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
+	git -C alt rev-list --ignore-missing-links --boundary HEAD >got &&
+	grep "^-$(git rev-parse HEAD)" got
+'
+
+test_expect_success "setup for rev-list --ignore-missing-links with missing objects" '
+	show_alternates &&
+	test_commit -C alt 11
+'
+
+for obj in "HEAD^{tree}" "HEAD:11.t"
+do
+	# The `--ignore-missing-links` option should ensure that git-rev-list(1)
+	# doesn't fail when used alongside `--objects` when a tree/blob is
+	# missing.
+	test_expect_success "rev-list --ignore-missing-links with missing $type" '
+		oid="$(git -C alt rev-parse $obj)" &&
+		path="alt/.git/objects/$(test_oid_to_path $oid)" &&
+
+		mv "$path" "$path.hidden" &&
+		test_when_finished "mv $path.hidden $path" &&
+
+		git -C alt rev-list --ignore-missing-links --missing=allow-any --objects HEAD \
+			>actual &&
+		! grep $oid actual
+       '
+done
+
+test_done
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] revision: add `--ignore-missing-links` user option
  2023-09-20 10:45     ` [PATCH v4] " Karthik Nayak
@ 2023-09-20 15:32       ` Junio C Hamano
  2023-09-21 10:53         ` Karthik Nayak
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2023-09-20 15:32 UTC (permalink / raw
  To: Karthik Nayak; +Cc: git, me

> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index a4a0cb93b2..8ee713db3d 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -227,6 +227,15 @@ explicitly.
>  	Upon seeing an invalid object name in the input, pretend as if
>  	the bad input was not given.
>  
> +--ignore-missing-links::
> +	During traversal, if an object that is referenced does not
> +	exist, instead of dying of a repository corruption, pretend as
> +	if the reference itself does not exist. Running the command
> +	with the `--boundary` option makes these missing commits,
> +	together with the commits on the edge of revision ranges
> +	(i.e. true boundary objects), appear on the output, prefixed
> +	with '-'.

There needs an explanation of interaction with --missing=<action>
option here, no?  "--missing=allow-any" and "--missing=print" are
sensible choices, I presume.  The former allows the traversal to
proceed, as you described in one of your responses.  Also with
"--missing=print", the user can more directly find out which are the
missing objects, even without using the "--boundary" that requires
them to sift between missing objects and the objects that are truly
on boundary.

Here is my attempt:

        --ignore-missing-links::
                During traversal, if an object that is referenced does not
                exist, instead of dying of a repository corruption, allow
                `--missing=<missing-action>` to decide what to do.
        +
        `--missing=print` will make the command print a list of missing
        objects, prefixed with a "?" character.
        +
        `--missing=allow-any` will make the command proceed without doing
        anything special.  Used with `--boundary`, output these missing
        objects mixed with the commits on the edge of revision ranges,
        prefixed with a "-" character.

It might make sense to add

        +
        Use of this option with other 'missing-action' may probably not
        give useful behaviour.

at the end, but it may not be useful to the readers to say "we allow
even more extra flexibility but haven't thought through what good
they would do".

> +# With `--ignore-missing-links`, we stop the traversal when we encounter a
> +# missing link. The boundary commit is not listed as we haven't used the
> +# `--boundary` options.
> +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
> +	hide_alternates &&
> +
> +	git -C alt rev-list --objects --no-object-names \
> +		--ignore-missing-links --missing=allow-any HEAD >actual.raw &&
> +	git -C alt cat-file  --batch-check="%(objectname)" \
> +		--batch-all-objects >expect.raw &&
> +
> +	sort actual.raw >actual &&
> +	sort expect.raw >expect &&
> +	test_cmp expect actual
> +'

This gives a good baseline.  "--missing=print" without "--boundary"
may have more obvious use cases, but is there a practical use case
for the output from an invocation with "--missing=allow-any" without
"--boundary"?  Just being curious if I am missing something obvious.

Perhaps add another test that uses "--missing=print" instead, and
check that the "? missing" output matches what we expect to be
missing?  The same comment applies to the other test that uses
"--missing=allow-any" without "--boundary" we see later.

> +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
> +# commits.
> +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
> +	git -C alt rev-list --ignore-missing-links --boundary HEAD >got &&
> +	grep "^-$(git rev-parse HEAD)" got
> +'

This makes sure what we expect to appear in 'got' actually is in
'got', but we should also make sure 'got' does not have anything
unexpected.  

> +test_expect_success "setup for rev-list --ignore-missing-links with missing objects" '
> +	show_alternates &&
> +	test_commit -C alt 11
> +'
> +
> +for obj in "HEAD^{tree}" "HEAD:11.t"
> +do
> +	# The `--ignore-missing-links` option should ensure that git-rev-list(1)
> +	# doesn't fail when used alongside `--objects` when a tree/blob is
> +	# missing.
> +	test_expect_success "rev-list --ignore-missing-links with missing $type" '
> +		oid="$(git -C alt rev-parse $obj)" &&
> +		path="alt/.git/objects/$(test_oid_to_path $oid)" &&
> +
> +		mv "$path" "$path.hidden" &&
> +		test_when_finished "mv $path.hidden $path" &&

In the first iteration, we check without the tree object and we only
ensure that removed tree does not appear in the output---but we know
the blob that is referenced by that removed tree will not appear in
the output, either, don't we?  Don't we want to check that, too?

In the second iteration, we have resurrected the tree but removed
the blob that is referenced by the tree, so we would not see that
blob in the output, which makes sense.

> +		git -C alt rev-list --ignore-missing-links --missing=allow-any --objects HEAD \
> +			>actual &&
> +		! grep $oid actual
> +       '
> +done
> +
> +test_done

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] revision: add `--ignore-missing-links` user option
  2023-09-20 15:32       ` Junio C Hamano
@ 2023-09-21 10:53         ` Karthik Nayak
  2023-09-21 19:16           ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Karthik Nayak @ 2023-09-21 10:53 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, me

On Wed, Sep 20, 2023 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> There needs an explanation of interaction with --missing=<action>
> option here, no?  "--missing=allow-any" and "--missing=print" are
> sensible choices, I presume.  The former allows the traversal to
> proceed, as you described in one of your responses.  Also with
> "--missing=print", the user can more directly find out which are the
> missing objects, even without using the "--boundary" that requires
> them to sift between missing objects and the objects that are truly
> on boundary.
>
> Here is my attempt:
>
>         --ignore-missing-links::
>                 During traversal, if an object that is referenced does not
>                 exist, instead of dying of a repository corruption, allow
>                 `--missing=<missing-action>` to decide what to do.
>         +
>         `--missing=print` will make the command print a list of missing
>         objects, prefixed with a "?" character.
>         +
>         `--missing=allow-any` will make the command proceed without doing
>         anything special.  Used with `--boundary`, output these missing
>         objects mixed with the commits on the edge of revision ranges,
>         prefixed with a "-" character.
>
> It might make sense to add
>
>         +
>         Use of this option with other 'missing-action' may probably not
>         give useful behaviour.
>
> at the end, but it may not be useful to the readers to say "we allow
> even more extra flexibility but haven't thought through what good
> they would do".
>

I was thinking about this, but mostly didn't do this, because the
interaction with `--missing`
is only for non-commit objects. Because for missing commits,
`--ignore-missing-links` skips
the commit and the value of `--missing` doesn't make any difference.

It's only for non-commit objects that `--missing` comes into play. So
perhaps change the current
explanation to:

--ignore-missing-links::
       During traversal, if a commit that is referenced does not
       exist, instead of dying of a repository corruption, pretend as
       if the commit itself does not exist. Running the command
       with the `--boundary` option makes these missing commits,
       together with the commits on the edge of revision ranges
       (i.e. true boundary objects), appear on the output, prefixed
       with '-'.

This way `--ignore-missing-links` is specific to commits, combining
this with `--missing=...` for
non-commit objects is left to the user. What do you think?

> > +# With `--ignore-missing-links`, we stop the traversal when we encounter a
> > +# missing link. The boundary commit is not listed as we haven't used the
> > +# `--boundary` options.
> > +test_expect_success 'rev-list only prints main odb commits with --ignore-missing-links' '
> > +     hide_alternates &&
> > +
> > +     git -C alt rev-list --objects --no-object-names \
> > +             --ignore-missing-links --missing=allow-any HEAD >actual.raw &&
> > +     git -C alt cat-file  --batch-check="%(objectname)" \
> > +             --batch-all-objects >expect.raw &&
> > +
> > +     sort actual.raw >actual &&
> > +     sort expect.raw >expect &&
> > +     test_cmp expect actual
> > +'
>
> This gives a good baseline.  "--missing=print" without "--boundary"
> may have more obvious use cases, but is there a practical use case
> for the output from an invocation with "--missing=allow-any" without
> "--boundary"?  Just being curious if I am missing something obvious.
>

Not really, but it's easier to build up the testing, here without
boundary we can
use cat-file to test all objects (commits and others) that are output
by rev-list.

Then we can build on top of this in the next test, where we can also ensure that
boundary commits are printed. This however is very simplistic, as
you've mentioned.
There could be other objects and we don't really check.

> Perhaps add another test that uses "--missing=print" instead, and
> check that the "? missing" output matches what we expect to be
> missing?  The same comment applies to the other test that uses
> "--missing=allow-any" without "--boundary" we see later.
>

Sure, we can add that too!

> > +# With `--ignore-missing-links` and `--boundary`, we can even print those boundary
> > +# commits.
> > +test_expect_success 'rev-list prints boundary commit with --ignore-missing-links' '
> > +     git -C alt rev-list --ignore-missing-links --boundary HEAD >got &&
> > +     grep "^-$(git rev-parse HEAD)" got
> > +'
>
> This makes sure what we expect to appear in 'got' actually is in
> 'got', but we should also make sure 'got' does not have anything
> unexpected.
>

Yeah, I can add that in too.

> > +test_expect_success "setup for rev-list --ignore-missing-links with missing objects" '
> > +     show_alternates &&
> > +     test_commit -C alt 11
> > +'
> > +
> > +for obj in "HEAD^{tree}" "HEAD:11.t"
> > +do
> > +     # The `--ignore-missing-links` option should ensure that git-rev-list(1)
> > +     # doesn't fail when used alongside `--objects` when a tree/blob is
> > +     # missing.
> > +     test_expect_success "rev-list --ignore-missing-links with missing $type" '
> > +             oid="$(git -C alt rev-parse $obj)" &&
> > +             path="alt/.git/objects/$(test_oid_to_path $oid)" &&
> > +
> > +             mv "$path" "$path.hidden" &&
> > +             test_when_finished "mv $path.hidden $path" &&
>
> In the first iteration, we check without the tree object and we only
> ensure that removed tree does not appear in the output---but we know
> the blob that is referenced by that removed tree will not appear in
> the output, either, don't we?  Don't we want to check that, too?
>
> In the second iteration, we have resurrected the tree but removed
> the blob that is referenced by the tree, so we would not see that
> blob in the output, which makes sense.
>

I was implementing this change and just realized that for missing
trees, show_object() is
never called (that is --missing=print has no effect).

That means we only call show_object() when there is a missing blob.

So this effectively means:
1. missing commits: --ignore-missing-links works, --missing=... has no effect
2. missing trees:      --ignore-missing-links works, --missing=... has no effect
3. missing blobs:      --ignore-missing-links works in conjunction
with --missing=...

I now think it does make even more sense to hardcode the skipping of
`finish_object__ma` this way
we can state that `--ignore-missing-links` and `--missing` are
incompatible, wherein `--ignore-missing-links`
ignores any missing object (irrelevant of type) and `--missing` is
used to specifically handle missing blobs
and provides options.

This is also how currently `--boundary` and `--missing=print` is
specific to commits and blobs respectively.
What do you think?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] revision: add `--ignore-missing-links` user option
  2023-09-21 10:53         ` Karthik Nayak
@ 2023-09-21 19:16           ` Junio C Hamano
  2023-09-24 16:14             ` Karthik Nayak
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2023-09-21 19:16 UTC (permalink / raw
  To: Karthik Nayak; +Cc: git, me

Karthik Nayak <karthik.188@gmail.com> writes:

> I was thinking about this, but mostly didn't do this, because the
> interaction with `--missing` is only for non-commit
> objects. Because for missing commits, `--ignore-missing-links`
> skips the commit and the value of `--missing` doesn't make any
> difference.

Hmph, somehow that smells like an existing bug.  So does the "trees
are not shown by --missing=print, and show_object() is never called
for missing objects unless they are blobs" you mention.  When the
user asks "instead of dying, list them so that I can ask around and
fetch them to repair this repository", shouldn't we do just that?

I wonder if these bugs are something people may be taking advatage
of and cannot be fixed retroactively?  If we can fix these and nobody
complains, that would give us the ideal outcome, I would think.

Thanks.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] revision: add `--ignore-missing-links` user option
  2023-09-21 19:16           ` Junio C Hamano
@ 2023-09-24 16:14             ` Karthik Nayak
  2023-09-25 16:57               ` Junio C Hamano
  0 siblings, 1 reply; 20+ messages in thread
From: Karthik Nayak @ 2023-09-24 16:14 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, me

On Thu, Sep 21, 2023 at 9:16 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Karthik Nayak <karthik.188@gmail.com> writes:
>
> > I was thinking about this, but mostly didn't do this, because the
> > interaction with `--missing` is only for non-commit
> > objects. Because for missing commits, `--ignore-missing-links`
> > skips the commit and the value of `--missing` doesn't make any
> > difference.
>
> Hmph, somehow that smells like an existing bug.  So does the "trees
> are not shown by --missing=print, and show_object() is never called
> for missing objects unless they are blobs" you mention.  When the
> user asks "instead of dying, list them so that I can ask around and
> fetch them to repair this repository", shouldn't we do just that?
>
> I wonder if these bugs are something people may be taking advatage
> of and cannot be fixed retroactively?  If we can fix these and nobody
> complains, that would give us the ideal outcome, I would think.
>

Let me prefix with saying that I was partly wrong. `--missing` does work for
trees, only that it's ineffective when used along with the
`ignore_missing_links` bit.

But for commits, `--missing` was never configured to work with. I did a quick
look at the code, we can do something like this for commits too, i.e.
add support
for the `--missing` option. We'll have to add a new flag (maybe
MISSING) so it can
be set during within `repo_parse_commit_gently` so we can parse this
as a missing
object in rev-list.c and act accordingly.

It would invalidate this patch series in some sense. But I'm okay with
that. Does that
sound good to you?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] revision: add `--ignore-missing-links` user option
  2023-09-24 16:14             ` Karthik Nayak
@ 2023-09-25 16:57               ` Junio C Hamano
  2023-09-27 16:26                 ` Karthik Nayak
  0 siblings, 1 reply; 20+ messages in thread
From: Junio C Hamano @ 2023-09-25 16:57 UTC (permalink / raw
  To: Karthik Nayak; +Cc: git, me

Karthik Nayak <karthik.188@gmail.com> writes:

> Let me prefix with saying that I was partly wrong. `--missing` does work for
> trees, only that it's ineffective when used along with the
> `ignore_missing_links` bit.
>
> But for commits, `--missing` was never configured to work with. I
> did a quick look at the code, we can do something like this for
> commits too, i.e.  add support for the `--missing` option. We'll
> have to add a new flag (maybe MISSING) so it can be set during
> within `repo_parse_commit_gently` so we can parse this as a
> missing object in rev-list.c and act accordingly.

Do you mean that process_parents() would now throw such a commit to
the resulting list successfully instead of omitting when "--missing"
is requested?  That sounds like a right thing to do but at the same
time is a fix with major impact.  I do not offhand know what the
ramifications are, for example, when bitmap traversal is in use (I
assume such a missing commit would not be catalogued in the bitmap?).

Taylor, what do you think?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4] revision: add `--ignore-missing-links` user option
  2023-09-25 16:57               ` Junio C Hamano
@ 2023-09-27 16:26                 ` Karthik Nayak
  0 siblings, 0 replies; 20+ messages in thread
From: Karthik Nayak @ 2023-09-27 16:26 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git, me

On Mon, Sep 25, 2023 at 6:57 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Karthik Nayak <karthik.188@gmail.com> writes:
>
> > Let me prefix with saying that I was partly wrong. `--missing` does work for
> > trees, only that it's ineffective when used along with the
> > `ignore_missing_links` bit.
> >
> > But for commits, `--missing` was never configured to work with. I
> > did a quick look at the code, we can do something like this for
> > commits too, i.e.  add support for the `--missing` option. We'll
> > have to add a new flag (maybe MISSING) so it can be set during
> > within `repo_parse_commit_gently` so we can parse this as a
> > missing object in rev-list.c and act accordingly.
>
> Do you mean that process_parents() would now throw such a commit to
> the resulting list successfully instead of omitting when "--missing"
> is requested?  That sounds like a right thing to do but at the same
> time is a fix with major impact.

Yes, but with appropriate flag added. Which will be a new flag.

>  I do not offhand know what the
> ramifications are, for example, when bitmap traversal is in use (I
> assume such a missing commit would not be catalogued in the bitmap?).
>

If there is a missing commit or object, will there even be a bitmap?
I can think of the two scenarios:
1. Object is missing before bitmap creation: In such a scenario, the bitmap
doesn't get created, since an object is missing. Could be any type of object.
2. Object is missing after bitmap creation: In this case, the bitmap
already exists
and rev-list won't even know that the commit is missing and simply output the
objects as if the objects exist.

Overall, this makes sense, but curious to hear what Taylor has to say.
I also might
post a patch series in this direction to consolidate our thoughts and
get a feedback
from the list.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-09-27 16:27 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-08 17:42 [PATCH] revision: add `--ignore-missing-links` user option Karthik Nayak
2023-09-08 19:19 ` Junio C Hamano
2023-09-12 14:42   ` Karthik Nayak
2023-09-12 15:58 ` [PATCH v2] " Karthik Nayak
2023-09-12 17:07   ` Taylor Blau
2023-09-13  9:32     ` Karthik Nayak
2023-09-13 17:17       ` Taylor Blau
2023-09-15  8:34   ` [PATCH v3] " Karthik Nayak
2023-09-15 18:54     ` Junio C Hamano
2023-09-18 10:12       ` Karthik Nayak
2023-09-18 15:56         ` Junio C Hamano
2023-09-19  8:45           ` Karthik Nayak
2023-09-19 15:13             ` Junio C Hamano
2023-09-20 10:45     ` [PATCH v4] " Karthik Nayak
2023-09-20 15:32       ` Junio C Hamano
2023-09-21 10:53         ` Karthik Nayak
2023-09-21 19:16           ` Junio C Hamano
2023-09-24 16:14             ` Karthik Nayak
2023-09-25 16:57               ` Junio C Hamano
2023-09-27 16:26                 ` Karthik Nayak

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).