git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC 0/1] Tolerate broken headers in `packed-refs` files
@ 2018-03-26 12:42 Michael Haggerty
  2018-03-26 12:42 ` [RFC 1/1] packed-backend: ignore broken headers Michael Haggerty
  2018-03-26 13:08 ` [RFC 0/1] Tolerate broken headers in `packed-refs` files Derrick Stolee
  0 siblings, 2 replies; 5+ messages in thread
From: Michael Haggerty @ 2018-03-26 12:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Michael Haggerty

Prior to

    9308b7f3ca read_packed_refs(): die if `packed-refs` contains bogus data, 2017-07-01

we silently ignored pretty much any bogus data in a `packed-refs`
file. I think that was pretty clearly a bad policy. The above commit
made parsing quite a bit stricter, calling `die()` if it found any
lines that it didn't understand.

But there's one situation that is maybe not quite so clear-cut. The
first line of a `packed-refs` file can be a header that enumerates
some traits of the file containing it; for example,

    # pack-refs with: peeled fully-peeled 

The old code would have tolerated lots of breakage in that line. For
example, any of the following headers would have just been ignored:

    # arbitrary data that looks like a comment
    # pack-refs with peeled fully-peeled          ← note: missing colon
    # pack-refs

Now, any of the above lines are considered errors and cause git to
die.

In my opinion, that is a good thing and we *shouldn't* tolerate broken
header lines; i.e., the status quo is good and we *shouldn't* apply
this patch.

But there might be some tools out in the wild that have been writing
broken headers. In that case, users who upgrade Git might suddenly
find that they can't read repositories that they could read before. In
fact, a tool that we wrote and use internally at GitHub was doing
exactly that, which is how we discovered this "problem".

This patch shows what it would look like to relax the parsing again,
albeit *only* for the first line of the file, and *only* for lines
that start with '#'.

The problem with this patch is that it would make it harder for people
who implement broken tools in the future to discover their mistakes.
The only result of the error would be that it is slower to work with
the `packed-refs` files that they wrote. Such an error could go
undiscovered for a long time.

The tighter check was released quite a while ago, and AFAIK we haven't
had any bug reports from people tripped up by this consistency check.
So I'm inclined to believe that the existing tools are OK and this
patch would be counterproductive. But I wanted to share it with the
list anyway.

Michael

Michael Haggerty (1):
  packed-backend: ignore broken headers

 refs/packed-backend.c | 21 +++++++++------------
 t/t3210-pack-refs.sh  | 33 ++++++++++++++++++++++++++++++++-
 2 files changed, 41 insertions(+), 13 deletions(-)

-- 
2.14.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC 1/1] packed-backend: ignore broken headers
  2018-03-26 12:42 [RFC 0/1] Tolerate broken headers in `packed-refs` files Michael Haggerty
@ 2018-03-26 12:42 ` Michael Haggerty
  2018-03-26 13:08 ` [RFC 0/1] Tolerate broken headers in `packed-refs` files Derrick Stolee
  1 sibling, 0 replies; 5+ messages in thread
From: Michael Haggerty @ 2018-03-26 12:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Michael Haggerty

Prior to

    9308b7f3ca read_packed_refs(): die if `packed-refs` contains bogus data, 2017-07-01

we silently ignored any lines in a `packed-refs` file that we didn't
understand. That policy was clearly wrong.

But at the time, unrecognized header lines were processed by the same
code as reference and peeled lines. This means that they were also
subject to the same liberal treatment. For example, any of the
following "header" lines would have been ignored:

* "# arbitrary data that looks like a comment"
* "# pack-refs with peeled fully-peeled" ← note: missing colon
* "# pack-refs"

Loosen up the parser to ignore any first line that begins with `#` but
doesn't start with the exact character sequence "# pack-refs with:".

(In fact, the old liberal policy meant that "comment" lines would have
been ignored anywhere in the file. But the file format isn't actually
documented to allow comments, and comments make little sense in this
file, so we won't go that far.)

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
I *don't* think we actually want to merge this patch. See the cover
letter for my reasoning.

 refs/packed-backend.c | 21 +++++++++------------
 t/t3210-pack-refs.sh  | 33 ++++++++++++++++++++++++++++++++-
 2 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 65288c6472..f9d71bb60d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -635,21 +635,18 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 
 		tmp = xmemdupz(snapshot->buf, eol - snapshot->buf);
 
-		if (!skip_prefix(tmp, "# pack-refs with:", (const char **)&p))
-			die_invalid_line(refs->path,
-					 snapshot->buf,
-					 snapshot->eof - snapshot->buf);
+		if (skip_prefix(tmp, "# pack-refs with:", (const char **)&p)) {
+			string_list_split_in_place(&traits, p, ' ', -1);
 
-		string_list_split_in_place(&traits, p, ' ', -1);
+			if (unsorted_string_list_has_string(&traits, "fully-peeled"))
+				snapshot->peeled = PEELED_FULLY;
+			else if (unsorted_string_list_has_string(&traits, "peeled"))
+				snapshot->peeled = PEELED_TAGS;
 
-		if (unsorted_string_list_has_string(&traits, "fully-peeled"))
-			snapshot->peeled = PEELED_FULLY;
-		else if (unsorted_string_list_has_string(&traits, "peeled"))
-			snapshot->peeled = PEELED_TAGS;
+			sorted = unsorted_string_list_has_string(&traits, "sorted");
 
-		sorted = unsorted_string_list_has_string(&traits, "sorted");
-
-		/* perhaps other traits later as well */
+			/* perhaps other traits later as well */
+		}
 
 		/* The "+ 1" is for the LF character. */
 		snapshot->start = eol + 1;
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index afa27ffe2d..353ef3e655 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -20,7 +20,8 @@ test_expect_success \
     'echo Hello > A &&
      git update-index --add A &&
      git commit -m "Initial commit." &&
-     HEAD=$(git rev-parse --verify HEAD)'
+     HEAD=$(git rev-parse --verify HEAD) &&
+     git tag -m "A tag" annotated-tag'
 
 SHA1=
 
@@ -221,6 +222,36 @@ test_expect_success 'reject packed-refs with a short SHA-1' '
 	test_cmp expected_err err
 '
 
+test_expect_success 'handle packed-refs with a bogus header' '
+	git show-ref -d >expected &&
+	mv .git/packed-refs .git/packed-refs.bak &&
+	test_when_finished "mv .git/packed-refs.bak .git/packed-refs" &&
+	sed -e "s/^#.*/# whacked-refs/" <.git/packed-refs.bak >.git/packed-refs &&
+	git show-ref -d >actual 2>err &&
+	test_cmp expected actual &&
+	test_must_be_empty err
+'
+
+test_expect_success 'handle packed-refs with a truncated header' '
+	git show-ref -d >expected &&
+	mv .git/packed-refs .git/packed-refs.bak &&
+	test_when_finished "mv .git/packed-refs.bak .git/packed-refs" &&
+	sed -e "s/^#.*/# pack-refs/" <.git/packed-refs.bak >.git/packed-refs &&
+	git show-ref -d >actual 2>err &&
+	test_cmp expected actual &&
+	test_must_be_empty err
+'
+
+test_expect_success 'handle packed-refs with no traits in header' '
+	git show-ref -d >expected &&
+	mv .git/packed-refs .git/packed-refs.bak &&
+	test_when_finished "mv .git/packed-refs.bak .git/packed-refs" &&
+	sed -e "s/^#.*/# pack-refs with:/" <.git/packed-refs.bak >.git/packed-refs &&
+	git show-ref -d >actual 2>err &&
+	test_cmp expected actual &&
+	test_must_be_empty err
+'
+
 test_expect_success 'timeout if packed-refs.lock exists' '
 	LOCK=.git/packed-refs.lock &&
 	>"$LOCK" &&
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC 0/1] Tolerate broken headers in `packed-refs` files
  2018-03-26 12:42 [RFC 0/1] Tolerate broken headers in `packed-refs` files Michael Haggerty
  2018-03-26 12:42 ` [RFC 1/1] packed-backend: ignore broken headers Michael Haggerty
@ 2018-03-26 13:08 ` Derrick Stolee
  2018-03-26 14:33   ` Jeff King
  2018-03-26 14:33   ` Edward Thomson
  1 sibling, 2 replies; 5+ messages in thread
From: Derrick Stolee @ 2018-03-26 13:08 UTC (permalink / raw)
  To: Michael Haggerty, Junio C Hamano; +Cc: Jeff King, git, ethomson

On 3/26/2018 8:42 AM, Michael Haggerty wrote:
> [...]
>
> But there might be some tools out in the wild that have been writing
> broken headers. In that case, users who upgrade Git might suddenly
> find that they can't read repositories that they could read before. In
> fact, a tool that we wrote and use internally at GitHub was doing
> exactly that, which is how we discovered this "problem".
>
> This patch shows what it would look like to relax the parsing again,
> albeit *only* for the first line of the file, and *only* for lines
> that start with '#'.
>
> The problem with this patch is that it would make it harder for people
> who implement broken tools in the future to discover their mistakes.
> The only result of the error would be that it is slower to work with
> the `packed-refs` files that they wrote. Such an error could go
> undiscovered for a long time.

My opinion is that we shouldn't maintain back-compat with formats that 
may have been written by another tool because Git wasn't strict about 
it. As long as Git never wrote files with these formats, then they 
shouldn't be supported.

You are absolutely right that staying strict will help discover the 
tools that are writing an incorrect format.

Since most heavily-used tools that didn't spawn Git processes use 
LibGit2 to interact with Git repos, I added Ed Thomson to CC to see if 
libgit2 could ever write these bad header comments.

Thanks for writing this RFC so we can have the discussion and more 
quickly identify this issue if/when users are broken.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC 0/1] Tolerate broken headers in `packed-refs` files
  2018-03-26 13:08 ` [RFC 0/1] Tolerate broken headers in `packed-refs` files Derrick Stolee
@ 2018-03-26 14:33   ` Jeff King
  2018-03-26 14:33   ` Edward Thomson
  1 sibling, 0 replies; 5+ messages in thread
From: Jeff King @ 2018-03-26 14:33 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Michael Haggerty, Junio C Hamano, git, ethomson

On Mon, Mar 26, 2018 at 09:08:04AM -0400, Derrick Stolee wrote:

> Since most heavily-used tools that didn't spawn Git processes use LibGit2 to
> interact with Git repos, I added Ed Thomson to CC to see if libgit2 could
> ever write these bad header comments.

Ed can probably answer more definitively, but I dug in the libgit2
history a little, and I think it has only ever generated correct
"pack-refs with:" lines.

Ditto for JGit, though there it blames down to 1a6964c82 (Initial JGit
contribution to eclipse.org, 2009-09-29). I didn't dig further into the
historical JGit repository, but I think that's probably far enough to
feel good about it.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC 0/1] Tolerate broken headers in `packed-refs` files
  2018-03-26 13:08 ` [RFC 0/1] Tolerate broken headers in `packed-refs` files Derrick Stolee
  2018-03-26 14:33   ` Jeff King
@ 2018-03-26 14:33   ` Edward Thomson
  1 sibling, 0 replies; 5+ messages in thread
From: Edward Thomson @ 2018-03-26 14:33 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Michael Haggerty, Junio C Hamano, Jeff King, Git Mailing List,
	Carlos Martín Nieto

On Mon, Mar 26, 2018 at 2:08 PM, Derrick Stolee <stolee@gmail.com> wrote:
> Since most heavily-used tools that didn't spawn Git processes use
> LibGit2 to interact with Git repos, I added Ed Thomson to CC to see
> if libgit2 could ever write these bad header comments.

We added the `sorted` capability to our `packed-refs` header relatively
recently (approximately two months ago, v0.27.0 will be the first release
to include it as of today).  So, at the moment, libgit2 writes:

  "# pack-refs with: peeled fully-peeled sorted "

Prior to this change, libgit2's header was stable for the last five years
as:

  "# pack-refs with: peeled fully-peeled "

And prior to that, we advertised only `peeled`:

  "# pack-refs with: peeled "

Thanks for thinking of us!

-ed

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-03-26 14:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-26 12:42 [RFC 0/1] Tolerate broken headers in `packed-refs` files Michael Haggerty
2018-03-26 12:42 ` [RFC 1/1] packed-backend: ignore broken headers Michael Haggerty
2018-03-26 13:08 ` [RFC 0/1] Tolerate broken headers in `packed-refs` files Derrick Stolee
2018-03-26 14:33   ` Jeff King
2018-03-26 14:33   ` Edward Thomson

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).