git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
* RFC on packfile URIs and .gitmodules check
@ 2021-01-15 23:43 Jonathan Tan
  2021-01-16  0:30 ` Junio C Hamano
                   ` (3 more replies)
  0 siblings, 4 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-01-15 23:43 UTC (permalink / raw)
  To: git; +Cc: peff, Jonathan Tan

Someone at $DAYJOB noticed that if a .gitmodules-containing tree and the
.gitmodules blob itself are sent in 2 separate packfiles during a fetch
(which can happen when packfile URIs are used), transfer.fsckobjects
causes the fetch to fail. You can reproduce it as follows (as of the
time of writing):

  $ git -c fetch.uriprotocols=https -c transfer.fsckobjects=true clone https://chromium.googlesource.com/chromiumos/codesearch
  Cloning into 'codesearch'...
  remote: Total 2242 (delta 0), reused 2242 (delta 0)
  Receiving objects: 100% (2242/2242), 1.77 MiB | 4.62 MiB/s, done.
  error: object 1f155c20935ee1154a813a814f03ef2b3976680f: gitmodulesMissing: unable to read .gitmodules blob
  fatal: fsck error in pack objects
  fatal: index-pack failed

This happens because the fsck part is currently being done in
index-pack, which operates on one pack at a time. When index-pack sees
the tree, it runs fsck on it (like any other object), and the fsck
subsystem remembers the .gitmodules target (specifically, in
gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
checks if the target exists, but it doesn't, so it reports the failure.

One option is for fetch to do its own pass of checking all downloaded
objects once all packfiles have been downloaded, but that seems wasteful
as all trees would have to be re-inflated.

Another option is to do it within the connectivity check instead - so,
update rev-list and the object walking mechanism to be able to detect
.gitmodules in trees and fsck the target blob whenever such an entry
occurs. This has the advantage that there is no extra re-inflation,
although it might be strange to have object walking be able to fsck.

The simplest solution would be to just relax this - check the blob if it
exists, but if it doesn't, it's OK. Some things in favor of this
solution:

 - This is something we already do in the partial clone case (although
   it could be argued that in this case, we're already trusting the
   server for far more than .gitmodules, so just because it's OK in the
   partial clone case doesn't mean that it's OK in the regular case).

 - Also, the commit message for this feature (from ed8b10f631 ("fsck: check
   .gitmodules content", 2018-05-21)) gives a rationale of a newer
   server being able to protect older clients.
    - Servers using receive-pack (instead of fetch-pack) to obtain
      objects would still be protected, since receive-pack still only
      accepts one packfile at a time (and there are currently no plans
      to expand this).
    - Also, malicious .gitobjects files could still be crafted that pass
      fsck checking - for example, by containing a URL (of another
      server) that refers to a repo with a .gitobjects that would fail
      fsck.

So I would rather go with just relaxing the check, but if consensus is
that we should still do it, I'll investigate doing it in the
connectivity check.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-15 23:43 RFC on packfile URIs and .gitmodules check Jonathan Tan
@ 2021-01-16  0:30 ` Junio C Hamano
  2021-01-16  3:22   ` Taylor Blau
  2021-01-20  8:07 ` Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 134+ messages in thread
From: Junio C Hamano @ 2021-01-16  0:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff

Jonathan Tan <jonathantanmy@google.com> writes:

> Someone at $DAYJOB noticed that if a .gitmodules-containing tree and the
> .gitmodules blob itself are sent in 2 separate packfiles during a fetch
> (which can happen when packfile URIs are used), transfer.fsckobjects
> causes the fetch to fail. You can reproduce it as follows (as of the
> time of writing):
>
>   $ git -c fetch.uriprotocols=https -c transfer.fsckobjects=true clone https://chromium.googlesource.com/chromiumos/codesearch
>   Cloning into 'codesearch'...
>   remote: Total 2242 (delta 0), reused 2242 (delta 0)
>   Receiving objects: 100% (2242/2242), 1.77 MiB | 4.62 MiB/s, done.
>   error: object 1f155c20935ee1154a813a814f03ef2b3976680f: gitmodulesMissing: unable to read .gitmodules blob
>   fatal: fsck error in pack objects
>   fatal: index-pack failed
>
> This happens because the fsck part is currently being done in
> index-pack, which operates on one pack at a time. When index-pack sees
> the tree, it runs fsck on it (like any other object), and the fsck
> subsystem remembers the .gitmodules target (specifically, in
> gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
> checks if the target exists, but it doesn't, so it reports the failure.

Is this because the gitmodules blob is contained in the base image
served via the pack URI mechansim, and the "dynamic" packfile for
the latest part of the history refers to the gitmodules file that is
unchanged, hence the latter one lacks it?

> Another option is to do it within the connectivity check instead - so,
> update rev-list and the object walking mechanism to be able to detect
> .gitmodules in trees and fsck the target blob whenever such an entry
> occurs. This has the advantage that there is no extra re-inflation,
> although it might be strange to have object walking be able to fsck.
>
> The simplest solution would be to just relax this - check the blob if it
> exists, but if it doesn't, it's OK. Some things in favor of this
> solution:
>
>  - This is something we already do in the partial clone case (although
>    it could be argued that in this case, we're already trusting the
>    server for far more than .gitmodules, so just because it's OK in the
>    partial clone case doesn't mean that it's OK in the regular case).
>
>  - Also, the commit message for this feature (from ed8b10f631 ("fsck: check
>    .gitmodules content", 2018-05-21)) gives a rationale of a newer
>    server being able to protect older clients.
>     - Servers using receive-pack (instead of fetch-pack) to obtain
>       objects would still be protected, since receive-pack still only
>       accepts one packfile at a time (and there are currently no plans
>       to expand this).
>     - Also, malicious .gitobjects files could still be crafted that pass
>       fsck checking - for example, by containing a URL (of another
>       server) that refers to a repo with a .gitobjects that would fail
>       fsck.
>
> So I would rather go with just relaxing the check, but if consensus is
> that we should still do it, I'll investigate doing it in the
> connectivity check.

You've listed two possible solutions, i.e.

 (1) punt and declare that we assume an missing and uncheckable blob
     is OK,

 (2) defer the check after transfer completes.

Between the two, my gut feeling is that the latter is preferrable.
If we assume an missing and uncheckable one is OK, then even if a
blob is available to be checked, there is not much point in
checking, no?

As long as the quarantine of incoming pack works correctly,
streaming the incoming packdata (and packfile downloaded out of line
via a separate mechanism like pack URI) to index-pack that does not
check to complete the transfer, with a separate step to check the
sanity of these packs as a whole, should not harm the repository
even if it is interrupted in the middle, after transfer is done but
before checking says it is OK.

As a potential third option, I wonder if it is easier for everybody
involved (including third-party implementation of their
index-pack/fsck equivalent) if we made it a rule that a pack that
has a tree that refers to .git<something> must include the blob for
it?

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-16  0:30 ` Junio C Hamano
@ 2021-01-16  3:22   ` Taylor Blau
  2021-01-19 12:56     ` Derrick Stolee
  2021-01-19 19:02     ` Jonathan Tan
  0 siblings, 2 replies; 134+ messages in thread
From: Taylor Blau @ 2021-01-16  3:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git, peff

On Fri, Jan 15, 2021 at 04:30:07PM -0800, Junio C Hamano wrote:
> Jonathan Tan <jonathantanmy@google.com> writes:
>
> > Someone at $DAYJOB noticed that if a .gitmodules-containing tree and the
> > .gitmodules blob itself are sent in 2 separate packfiles during a fetch
> > (which can happen when packfile URIs are used), transfer.fsckobjects
> > causes the fetch to fail. You can reproduce it as follows (as of the
> > time of writing):
> >
> >   $ git -c fetch.uriprotocols=https -c transfer.fsckobjects=true clone https://chromium.googlesource.com/chromiumos/codesearch
> >   Cloning into 'codesearch'...
> >   remote: Total 2242 (delta 0), reused 2242 (delta 0)
> >   Receiving objects: 100% (2242/2242), 1.77 MiB | 4.62 MiB/s, done.
> >   error: object 1f155c20935ee1154a813a814f03ef2b3976680f: gitmodulesMissing: unable to read .gitmodules blob
> >   fatal: fsck error in pack objects
> >   fatal: index-pack failed
> >
> > This happens because the fsck part is currently being done in
> > index-pack, which operates on one pack at a time. When index-pack sees
> > the tree, it runs fsck on it (like any other object), and the fsck
> > subsystem remembers the .gitmodules target (specifically, in
> > gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
> > checks if the target exists, but it doesn't, so it reports the failure.
>
> Is this because the gitmodules blob is contained in the base image
> served via the pack URI mechansim, and the "dynamic" packfile for
> the latest part of the history refers to the gitmodules file that is
> unchanged, hence the latter one lacks it?

That seems like a likely explanation, although this seems ultimately up
to what the pack CDN serves.
> You've listed two possible solutions, i.e.
>
>  (1) punt and declare that we assume an missing and uncheckable blob
>      is OK,
>
>  (2) defer the check after transfer completes.
>
> Between the two, my gut feeling is that the latter is preferrable.
> If we assume an missing and uncheckable one is OK, then even if a
> blob is available to be checked, there is not much point in
> checking, no?

I'm going to second this. If this were a more benign check, then I'd
perhaps feel differently, but .gitmodules fsck checks seem to get
hardened fairly often during security releases, and so it seems
important to keep performing them when the user asked for it.

> As long as the quarantine of incoming pack works correctly,
> streaming the incoming packdata (and packfile downloaded out of line
> via a separate mechanism like pack URI) to index-pack that does not
> check to complete the transfer, with a separate step to check the
> sanity of these packs as a whole, should not harm the repository
> even if it is interrupted in the middle, after transfer is done but
> before checking says it is OK.

Agreed. Bear in mind that I am pretty unfamiliar with this code, and so
I'm not sure if it's 'easy' or not to change it in this way. The obvious
downside, which Jonathan notes, is that you almost certainly have to
reinflate all of the trees again.

But, since the user is asking for transfer.fsckObjects explicitly, I
don't think that it's a problem.

> As a potential third option, I wonder if it is easier for everybody
> involved (including third-party implementation of their
> index-pack/fsck equivalent) if we made it a rule that a pack that
> has a tree that refers to .git<something> must include the blob for
> it?

Interesting, but I'm sure CDN administrators would prefer to have as few
restrictions in place as possible.

A potential fourth option that I can think of is that we can try to
eagerly perform the .gitmodules fsck checks as we receive objects, under
the assumption that the .gitmoudles blob and the tree which contains it
appear in the same pack.

If they do, then we ought to be able to check them as we currently do
(and avoid leaving them to the slow post-processing step). Any blobs
that we _can't_ find get placed into an array, and then that array is
iterated over after we have received all packs, including from the CDN.
Any blobs that couldn't be found in the pack transferred from the
remote, the CDN, or the local repository (and isn't explicitly excluded
via an object --filter) is declared missing.

Thoughts?

> Thanks.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-16  3:22   ` Taylor Blau
@ 2021-01-19 12:56     ` Derrick Stolee
  2021-01-19 19:13       ` Jonathan Tan
  2021-01-19 19:02     ` Jonathan Tan
  1 sibling, 1 reply; 134+ messages in thread
From: Derrick Stolee @ 2021-01-19 12:56 UTC (permalink / raw)
  To: Taylor Blau, Junio C Hamano; +Cc: Jonathan Tan, git, peff

On 1/15/2021 10:22 PM, Taylor Blau wrote:
> On Fri, Jan 15, 2021 at 04:30:07PM -0800, Junio C Hamano wrote:
>> Jonathan Tan <jonathantanmy@google.com> writes:
>>
>>> Someone at $DAYJOB noticed that if a .gitmodules-containing tree and the
>>> .gitmodules blob itself are sent in 2 separate packfiles during a fetch
>>> (which can happen when packfile URIs are used), transfer.fsckobjects
>>> causes the fetch to fail. You can reproduce it as follows (as of the
>>> time of writing):
>>>
>>>   $ git -c fetch.uriprotocols=https -c transfer.fsckobjects=true clone https://chromium.googlesource.com/chromiumos/codesearch
>>>   Cloning into 'codesearch'...
>>>   remote: Total 2242 (delta 0), reused 2242 (delta 0)
>>>   Receiving objects: 100% (2242/2242), 1.77 MiB | 4.62 MiB/s, done.
>>>   error: object 1f155c20935ee1154a813a814f03ef2b3976680f: gitmodulesMissing: unable to read .gitmodules blob
>>>   fatal: fsck error in pack objects
>>>   fatal: index-pack failed

I'm contributing a quick suggestion for just this item:

>>> This happens because the fsck part is currently being done in
>>> index-pack, which operates on one pack at a time. When index-pack sees
>>> the tree, it runs fsck on it (like any other object), and the fsck
>>> subsystem remembers the .gitmodules target (specifically, in
>>> gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
>>> checks if the target exists, but it doesn't, so it reports the failure.
>>
>> Is this because the gitmodules blob is contained in the base image
>> served via the pack URI mechansim, and the "dynamic" packfile for
>> the latest part of the history refers to the gitmodules file that is
>> unchanged, hence the latter one lacks it?
> 
> That seems like a likely explanation, although this seems ultimately up
> to what the pack CDN serves.
>> You've listed two possible solutions, i.e.
>>
>>  (1) punt and declare that we assume an missing and uncheckable blob
>>      is OK,
>>
>>  (2) defer the check after transfer completes.
>>
>> Between the two, my gut feeling is that the latter is preferrable.
>> If we assume an missing and uncheckable one is OK, then even if a
>> blob is available to be checked, there is not much point in
>> checking, no?
> 
> I'm going to second this. If this were a more benign check, then I'd
> perhaps feel differently, but .gitmodules fsck checks seem to get
> hardened fairly often during security releases, and so it seems
> important to keep performing them when the user asked for it.

It might be nice to teach 'index-pack' a mode that says certain
errors should be reported as warnings by writing the problematic
OIDs to stdout/stderr. Then, the second check after all packs are
present can focus on those problematic objects instead of
re-scanning everything.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-16  3:22   ` Taylor Blau
  2021-01-19 12:56     ` Derrick Stolee
@ 2021-01-19 19:02     ` Jonathan Tan
  1 sibling, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-01-19 19:02 UTC (permalink / raw)
  To: me; +Cc: gitster, jonathantanmy, git, peff

> > Is this because the gitmodules blob is contained in the base image
> > served via the pack URI mechansim, and the "dynamic" packfile for
> > the latest part of the history refers to the gitmodules file that is
> > unchanged, hence the latter one lacks it?
> 
> That seems like a likely explanation, although this seems ultimately up
> to what the pack CDN serves.

In this case, yes, that is what is happening.

> > You've listed two possible solutions, i.e.
> >
> >  (1) punt and declare that we assume an missing and uncheckable blob
> >      is OK,
> >
> >  (2) defer the check after transfer completes.
> >
> > Between the two, my gut feeling is that the latter is preferrable.
> > If we assume an missing and uncheckable one is OK, then even if a
> > blob is available to be checked, there is not much point in
> > checking, no?
> 
> I'm going to second this. If this were a more benign check, then I'd
> perhaps feel differently, but .gitmodules fsck checks seem to get
> hardened fairly often during security releases, and so it seems
> important to keep performing them when the user asked for it.

That makes sense.

> > As long as the quarantine of incoming pack works correctly,
> > streaming the incoming packdata (and packfile downloaded out of line
> > via a separate mechanism like pack URI) to index-pack that does not
> > check to complete the transfer, with a separate step to check the
> > sanity of these packs as a whole, should not harm the repository
> > even if it is interrupted in the middle, after transfer is done but
> > before checking says it is OK.
> 
> Agreed. Bear in mind that I am pretty unfamiliar with this code, and so
> I'm not sure if it's 'easy' or not to change it in this way. The obvious
> downside, which Jonathan notes, is that you almost certainly have to
> reinflate all of the trees again.
> 
> But, since the user is asking for transfer.fsckObjects explicitly, I
> don't think that it's a problem.

We might be able to avoid the reinflate if we do it as part of the
connectivity check or somehow teach index-pack a way to communicate the
dangling .gitmodules links (as you suggest below).

> > As a potential third option, I wonder if it is easier for everybody
> > involved (including third-party implementation of their
> > index-pack/fsck equivalent) if we made it a rule that a pack that
> > has a tree that refers to .git<something> must include the blob for
> > it?
> 
> Interesting, but I'm sure CDN administrators would prefer to have as few
> restrictions in place as possible.

That rule would help, but it also seems inelegant in that if we put
commits that have the same .gitmodules in 2 or more different packs,
there would be identical objects across those packs (besides the reason
Taylor mentioned).

> A potential fourth option that I can think of is that we can try to
> eagerly perform the .gitmodules fsck checks as we receive objects, under
> the assumption that the .gitmoudles blob and the tree which contains it
> appear in the same pack.
> 
> If they do, then we ought to be able to check them as we currently do
> (and avoid leaving them to the slow post-processing step). Any blobs
> that we _can't_ find get placed into an array, and then that array is
> iterated over after we have received all packs, including from the CDN.
> Any blobs that couldn't be found in the pack transferred from the
> remote, the CDN, or the local repository (and isn't explicitly excluded
> via an object --filter) is declared missing.
> 
> Thoughts?

The hard part is communicating this array to the parent fetch process.
Stolee has a suggestion [1] which I will reply to directly.

[1] https://lore.kernel.org/git/d2ca2fec-a353-787a-15a7-3831a665523e@gmail.com/

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-19 12:56     ` Derrick Stolee
@ 2021-01-19 19:13       ` Jonathan Tan
  2021-01-20  1:04         ` Junio C Hamano
  0 siblings, 1 reply; 134+ messages in thread
From: Jonathan Tan @ 2021-01-19 19:13 UTC (permalink / raw)
  To: stolee; +Cc: me, gitster, jonathantanmy, git, peff

> I'm contributing a quick suggestion for just this item:
> 
> >>> This happens because the fsck part is currently being done in
> >>> index-pack, which operates on one pack at a time. When index-pack sees
> >>> the tree, it runs fsck on it (like any other object), and the fsck
> >>> subsystem remembers the .gitmodules target (specifically, in
> >>> gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
> >>> checks if the target exists, but it doesn't, so it reports the failure.
> >>
> >> Is this because the gitmodules blob is contained in the base image
> >> served via the pack URI mechansim, and the "dynamic" packfile for
> >> the latest part of the history refers to the gitmodules file that is
> >> unchanged, hence the latter one lacks it?
> > 
> > That seems like a likely explanation, although this seems ultimately up
> > to what the pack CDN serves.
> >> You've listed two possible solutions, i.e.
> >>
> >>  (1) punt and declare that we assume an missing and uncheckable blob
> >>      is OK,
> >>
> >>  (2) defer the check after transfer completes.
> >>
> >> Between the two, my gut feeling is that the latter is preferrable.
> >> If we assume an missing and uncheckable one is OK, then even if a
> >> blob is available to be checked, there is not much point in
> >> checking, no?
> > 
> > I'm going to second this. If this were a more benign check, then I'd
> > perhaps feel differently, but .gitmodules fsck checks seem to get
> > hardened fairly often during security releases, and so it seems
> > important to keep performing them when the user asked for it.
> 
> It might be nice to teach 'index-pack' a mode that says certain
> errors should be reported as warnings by writing the problematic
> OIDs to stdout/stderr. Then, the second check after all packs are
> present can focus on those problematic objects instead of
> re-scanning everything.

My initial reaction was that stdout is already used to report the hash
part of the generated name and that stderr is already used for whatever
warnings there are, but looking at the documentation, index-pack
--fsck-objects is "[for] internal use only", so it might be fine to
extend the output format in this case and report the problematic OIDs
after the hash. I'll take a look.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-19 19:13       ` Jonathan Tan
@ 2021-01-20  1:04         ` Junio C Hamano
  0 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-01-20  1:04 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: stolee, me, git, peff

Jonathan Tan <jonathantanmy@google.com> writes:

>> It might be nice to teach 'index-pack' a mode that says certain
>> errors should be reported as warnings by writing the problematic
>> OIDs to stdout/stderr. Then, the second check after all packs are
>> present can focus on those problematic objects instead of
>> re-scanning everything.
>
> My initial reaction was that stdout is already used to report the hash
> part of the generated name and that stderr is already used for whatever
> warnings there are, but looking at the documentation, index-pack
> --fsck-objects is "[for] internal use only", so it might be fine to
> extend the output format in this case and report the problematic OIDs
> after the hash. I'll take a look.

If I am not mistaken, Taylor also mentioned the possibility to give
"these objects need reinspecting" to a later process, and it is an
excellent suggestion.  And I think it is perfectly fine to adjust
the internal format used purely for internal use.

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-15 23:43 RFC on packfile URIs and .gitmodules check Jonathan Tan
  2021-01-16  0:30 ` Junio C Hamano
@ 2021-01-20  8:07 ` Ævar Arnfjörð Bjarmason
  2021-01-20 19:30   ` Jonathan Tan
  2021-01-20 19:36   ` [PATCH] Doc: clarify contents of packfile sent as URI Jonathan Tan
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
  3 siblings, 2 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-01-20  8:07 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, Derrick Stolee, Taylor Blau


On Sat, Jan 16 2021, Jonathan Tan wrote:

> Someone at $DAYJOB noticed that if a .gitmodules-containing tree and the
> .gitmodules blob itself are sent in 2 separate packfiles during a fetch
> (which can happen when packfile URIs are used), transfer.fsckobjects
> causes the fetch to fail. You can reproduce it as follows (as of the
> time of writing):
>
>   $ git -c fetch.uriprotocols=https -c transfer.fsckobjects=true clone https://chromium.googlesource.com/chromiumos/codesearch
>   Cloning into 'codesearch'...
>   remote: Total 2242 (delta 0), reused 2242 (delta 0)
>   Receiving objects: 100% (2242/2242), 1.77 MiB | 4.62 MiB/s, done.
>   error: object 1f155c20935ee1154a813a814f03ef2b3976680f: gitmodulesMissing: unable to read .gitmodules blob
>   fatal: fsck error in pack objects
>   fatal: index-pack failed
>
> This happens because the fsck part is currently being done in
> index-pack, which operates on one pack at a time. When index-pack sees
> the tree, it runs fsck on it (like any other object), and the fsck
> subsystem remembers the .gitmodules target (specifically, in
> gitmodules_found in fsck.c). Later, index-pack runs fsck_finish() which
> checks if the target exists, but it doesn't, so it reports the failure.
>
> One option is for fetch to do its own pass of checking all downloaded
> objects once all packfiles have been downloaded, but that seems wasteful
> as all trees would have to be re-inflated.
>
> Another option is to do it within the connectivity check instead - so,
> update rev-list and the object walking mechanism to be able to detect
> .gitmodules in trees and fsck the target blob whenever such an entry
> occurs. This has the advantage that there is no extra re-inflation,
> although it might be strange to have object walking be able to fsck.
>
> The simplest solution would be to just relax this - check the blob if it
> exists, but if it doesn't, it's OK. Some things in favor of this
> solution:
>
>  - This is something we already do in the partial clone case (although
>    it could be argued that in this case, we're already trusting the
>    server for far more than .gitmodules, so just because it's OK in the
>    partial clone case doesn't mean that it's OK in the regular case).
>
>  - Also, the commit message for this feature (from ed8b10f631 ("fsck: check
>    .gitmodules content", 2018-05-21)) gives a rationale of a newer
>    server being able to protect older clients.
>     - Servers using receive-pack (instead of fetch-pack) to obtain
>       objects would still be protected, since receive-pack still only
>       accepts one packfile at a time (and there are currently no plans
>       to expand this).
>     - Also, malicious .gitobjects files could still be crafted that pass
>       fsck checking - for example, by containing a URL (of another
>       server) that refers to a repo with a .gitobjects that would fail
>       fsck.
>
> So I would rather go with just relaxing the check, but if consensus is
> that we should still do it, I'll investigate doing it in the
> connectivity check.

Would this still behave if the $DAYJOB's packfile-uri server support was
behaving as documented in packfile-uri.txt, or just because it has
outside-spec behavior?

I.e. the spec[1] says this:

    This is the implementation: a feature, marked experimental, that
    allows the server to be configured by one or more
    `uploadpack.blobPackfileUri=<sha1> <uri>` entries. Whenever the list
    of objects to be sent is assembled, all such blobs are excluded,
    replaced with URIs. The client will download those URIs, expecting
    them to each point to packfiles containing single blobs.

Which I can't see leaving an opening for more than packfile-uri being to
serve up packfiles which each contain a single blob.

In that case it seems to me we'd be OK (but I haven't tested), because
fsck_finish() will call read_object_file() which'll try to read that
"blob from the object store when it encounters the ".gitmodules" tree,
and because we'd have already downloaded the packfile with the blob
before moving onto the main dialog.

But as we discussed on-list before[2] this isn't the way packfile-uri
actually works in the wild. It's really just sending some arbitrary data
in a pack in that URI, with a server that knows what's in that pack and
will send the rest in such a way that everything ends up being
connected.

As far as I can tell the only reason this is called "packfile URI" and
behaves this way in git.git is because of the convenience of
intrumenting pack-objects.c with an "oidset excluded_by_config" to not
stream those blobs in a pack, but it isn't how the only (I'm pretty
sure) production server implementation in the wild behaves at all.

So *poke* about the reply I had in [3] late last year. I think the first
thing worth doing here is fixing the docs so they describe how this
works. You didn't get back on that (and I also forgot about it until
this thread), but it would be nice to know what you think about the
suggested prose there.

Re-reading it I'd add something like this to the spec:

 A. That the config is called "uploadpack.blobPackfileUri" in git.git
    has nothing to do with how this is expected to behave on the
    wire. It's just to serve the narrow support pack-objects.c has for
    crafting such a pack.

 B. It's then called "packfile-uris" on the wire, nothing to do with
    blobs. Just packs with a checksum that we'll validate. An older
    versions of this spec said "[a] packfiles containing single blobs"
    but it can be any combination of blob/tree/commit data.

 C. A client is then expected to deal with any combination of data
    ordered/sliced/split up etc. in any possible way from such a
    combination of "packfile-uris" and PACK dialog, as long as the end
    result is valid.

Except that the result of this discussion will perhaps be a more narrow
definition for "C".

1. https://github.com/git/git/blob/cd8402e0fd8cfc0ec9fb10e22ffb6aabd992eae1/Documentation/technical/packfile-uri.txt#L37-L41
2. https://lore.kernel.org/git/20201125190957.1113461-1-jonathantanmy@google.com/
3. https://lore.kernel.org/git/87tut5vghw.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-20  8:07 ` Ævar Arnfjörð Bjarmason
@ 2021-01-20 19:30   ` Jonathan Tan
  2021-01-21  3:06     ` Junio C Hamano
  2021-01-20 19:36   ` [PATCH] Doc: clarify contents of packfile sent as URI Jonathan Tan
  1 sibling, 1 reply; 134+ messages in thread
From: Jonathan Tan @ 2021-01-20 19:30 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git, peff, stolee, me

> Would this still behave if the $DAYJOB's packfile-uri server support was
> behaving as documented in packfile-uri.txt, or just because it has
> outside-spec behavior?
> 
> I.e. the spec[1] says this:
> 
>     This is the implementation: a feature, marked experimental, that
>     allows the server to be configured by one or more
>     `uploadpack.blobPackfileUri=<sha1> <uri>` entries. Whenever the list
>     of objects to be sent is assembled, all such blobs are excluded,
>     replaced with URIs. The client will download those URIs, expecting
>     them to each point to packfiles containing single blobs.
> 
> Which I can't see leaving an opening for more than packfile-uri being to
> serve up packfiles which each contain a single blob.

I meant to leave an opening by referring to this just as a Minimum
Viable Product and by explaining in Future Work that the protocol allows
evolution of (among other things) which objects the server sends through
a URI without any protocol changes.

But in any case, this will also happen even if we constrain ourselves to
excluding single blobs and sending them via other packfiles instead -
see below.

> In that case it seems to me we'd be OK (but I haven't tested), because
> fsck_finish() will call read_object_file() which'll try to read that
> "blob from the object store when it encounters the ".gitmodules" tree,
> and because we'd have already downloaded the packfile with the blob
> before moving onto the main dialog.

We wouldn't be OK, actually. Suppose we have a separate packfile
containing only the ".gitmodules" blob - when we call fsck_finish(), we
would not have downloaded the other packfile yet. Git processes the
entire fetch response by piping the inline packfile (after demux) into
index-pack (which is the one that calls fsck_finish()) before it
downloads any of the other packfile(s).

> But as we discussed on-list before[2] this isn't the way packfile-uri
> actually works in the wild. It's really just sending some arbitrary data
> in a pack in that URI, with a server that knows what's in that pack and
> will send the rest in such a way that everything ends up being
> connected.
> 
> As far as I can tell the only reason this is called "packfile URI" and
> behaves this way in git.git is because of the convenience of
> intrumenting pack-objects.c with an "oidset excluded_by_config" to not
> stream those blobs in a pack, but it isn't how the only (I'm pretty
> sure) production server implementation in the wild behaves at all.

I don't know if this is the only production server implementation, but
yes, this particular one (googlesource.com) can put objects of multiple
types in the other packfile, not only a single blob. There is some JGit
code here [1] that can send a URI corresponding to a "CachedPack" (which
may contain all objects, not only blobs) if that pack is also available
through a URI.

[1] https://gerrit.googlesource.com/jgit/+/a004820858b54d18c6f72fc94dc33bce8b606d66

> So *poke* about the reply I had in [3] late last year. I think the first
> thing worth doing here is fixing the docs so they describe how this
> works. You didn't get back on that (and I also forgot about it until
> this thread), but it would be nice to know what you think about the
> suggested prose there.

Rereading that, the issue is that uploadpack.blobPackfileUri is indeed
how the current Git server handles it - it excludes a blob and sends a
URI instead. The client is not supposed to see how the server has
configured it, and should not be constrained by the fact that the server
that is being shipped with it only excludes single blobs.

> Re-reading it I'd add something like this to the spec:
> 
>  A. That the config is called "uploadpack.blobPackfileUri" in git.git
>     has nothing to do with how this is expected to behave on the
>     wire. It's just to serve the narrow support pack-objects.c has for
>     crafting such a pack.

Yes, that's true.

>  B. It's then called "packfile-uris" on the wire, nothing to do with
>     blobs. Just packs with a checksum that we'll validate. An older
>     versions of this spec said "[a] packfiles containing single blobs"
>     but it can be any combination of blob/tree/commit data.

Yes, we can delete that line.

>  C. A client is then expected to deal with any combination of data
>     ordered/sliced/split up etc. in any possible way from such a
>     combination of "packfile-uris" and PACK dialog, as long as the end
>     result is valid.
> 
> Except that the result of this discussion will perhaps be a more narrow
> definition for "C".

Yes. I think all these can be done just by changing the last sentence in
"Server design" - I'll send a patch.

> 1. https://github.com/git/git/blob/cd8402e0fd8cfc0ec9fb10e22ffb6aabd992eae1/Documentation/technical/packfile-uri.txt#L37-L41
> 2. https://lore.kernel.org/git/20201125190957.1113461-1-jonathantanmy@google.com/
> 3. https://lore.kernel.org/git/87tut5vghw.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH] Doc: clarify contents of packfile sent as URI
  2021-01-20  8:07 ` Ævar Arnfjörð Bjarmason
  2021-01-20 19:30   ` Jonathan Tan
@ 2021-01-20 19:36   ` Jonathan Tan
  1 sibling, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-01-20 19:36 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, avarab

Clarify that, when the packfile-uri feature is used, the client should
not assume that the extra packfiles downloaded would only contain a
single blob, but support packfiles containing multiple objects of all
types.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/technical/packfile-uri.txt | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/packfile-uri.txt b/Documentation/technical/packfile-uri.txt
index 318713abc3..f7eabc6c76 100644
--- a/Documentation/technical/packfile-uri.txt
+++ b/Documentation/technical/packfile-uri.txt
@@ -37,8 +37,11 @@ at least so that we can test the client.
 This is the implementation: a feature, marked experimental, that allows the
 server to be configured by one or more `uploadpack.blobPackfileUri=<sha1>
 <uri>` entries. Whenever the list of objects to be sent is assembled, all such
-blobs are excluded, replaced with URIs. The client will download those URIs,
-expecting them to each point to packfiles containing single blobs.
+blobs are excluded, replaced with URIs. As noted in "Future work" below, the
+server can evolve in the future to support excluding other objects (or other
+implementations of servers could be made that support excluding other objects)
+without needing a protocol change, so clients should not expect that packfiles
+downloaded in this way only contain single blobs.
 
 Client design
 -------------
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-20 19:30   ` Jonathan Tan
@ 2021-01-21  3:06     ` Junio C Hamano
  2021-01-21 18:32       ` Jonathan Tan
  0 siblings, 1 reply; 134+ messages in thread
From: Junio C Hamano @ 2021-01-21  3:06 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: avarab, git, peff, stolee, me

Jonathan Tan <jonathantanmy@google.com> writes:

> We wouldn't be OK, actually. Suppose we have a separate packfile
> containing only the ".gitmodules" blob - when we call fsck_finish(), we
> would not have downloaded the other packfile yet. Git processes the
> entire fetch response by piping the inline packfile (after demux) into
> index-pack (which is the one that calls fsck_finish()) before it
> downloads any of the other packfile(s).

Is that order documented as a requirement for implementation?

Naïvely, I would expect that a CDN offload would be to relieve
servers from the burden of having to repack ancient part of the
history all the time for any new "clone" clients and that is what
the "here is a URI, go fetch it because I won't give you objects
that already appear there" feature is about.  Because we expect that
the offloaded contents would not be up-to-date, the traditional
packfile transfer would then is used to complete the history with
objects necessary for the parts of the history newer than the
offloaded contents.

And from that viewpoint, it sounds totally backwards to start
processing the up-to-the-minute fresh packfile that came via the
traditional packfile transfer before the CDN offloaded contents are
fetched and stored safely in our repository.

We probably want to finish interaction with the live server as
quickly as possible---it would go counter to that wish if we force
the live part of the history hang in flight, unprocessed, while the
client downloads offloaded bulk from CDN and processes it, making
the server side stuck waiting for some write(2) to go through.

But I still wonder if it is an option to locally delay the
processing of the up-to-the-minute-fresh part.

Instead of feeding what comes from them directly to "index-pack
--fsck-objects", would it make sense to spool it to a temporary, so
that we can release the server early, but then make sure to fetch
and process packfile URI material before coming back to process the
spooled packdata.  That would allow the newer part of the history to
have newer trees that still reference the same old .gitmodules that
is found in the frozen packfile that comes from CDN, no?

Or can there be a situation where some objects in CDN pack are
referred to by objects in the up-to-the-minute-fresh pack (e.g. a
".gitmodules" blob in CDN pack is still unchanged and used in an
updated tree in the latest revision) and some other objects in CDN
pack refer to an object in the live part of the history?  If there
is such a cyclic dependency, "index-pack --fsck" one pack at a time
would not work, but I doubt such a cycle can arise.

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-21  3:06     ` Junio C Hamano
@ 2021-01-21 18:32       ` Jonathan Tan
  2021-01-21 18:39         ` Junio C Hamano
  0 siblings, 1 reply; 134+ messages in thread
From: Jonathan Tan @ 2021-01-21 18:32 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, avarab, git, peff, stolee, me

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > We wouldn't be OK, actually. Suppose we have a separate packfile
> > containing only the ".gitmodules" blob - when we call fsck_finish(), we
> > would not have downloaded the other packfile yet. Git processes the
> > entire fetch response by piping the inline packfile (after demux) into
> > index-pack (which is the one that calls fsck_finish()) before it
> > downloads any of the other packfile(s).
> 
> Is that order documented as a requirement for implementation?
> 
> Naïvely, I would expect that a CDN offload would be to relieve
> servers from the burden of having to repack ancient part of the
> history all the time for any new "clone" clients and that is what
> the "here is a URI, go fetch it because I won't give you objects
> that already appear there" feature is about.  Because we expect that
> the offloaded contents would not be up-to-date, the traditional
> packfile transfer would then is used to complete the history with
> objects necessary for the parts of the history newer than the
> offloaded contents.
> 
> And from that viewpoint, it sounds totally backwards to start
> processing the up-to-the-minute fresh packfile that came via the
> traditional packfile transfer before the CDN offloaded contents are
> fetched and stored safely in our repository.
> 
> We probably want to finish interaction with the live server as
> quickly as possible---it would go counter to that wish if we force
> the live part of the history hang in flight, unprocessed, while the
> client downloads offloaded bulk from CDN and processes it, making
> the server side stuck waiting for some write(2) to go through.
> 
> But I still wonder if it is an option to locally delay the
> processing of the up-to-the-minute-fresh part.
> 
> Instead of feeding what comes from them directly to "index-pack
> --fsck-objects", would it make sense to spool it to a temporary, so
> that we can release the server early, but then make sure to fetch
> and process packfile URI material before coming back to process the
> spooled packdata.  That would allow the newer part of the history to
> have newer trees that still reference the same old .gitmodules that
> is found in the frozen packfile that comes from CDN, no?
> 
> Or can there be a situation where some objects in CDN pack are
> referred to by objects in the up-to-the-minute-fresh pack (e.g. a
> ".gitmodules" blob in CDN pack is still unchanged and used in an
> updated tree in the latest revision) and some other objects in CDN
> pack refer to an object in the live part of the history?  If there
> is such a cyclic dependency, "index-pack --fsck" one pack at a time
> would not work, but I doubt such a cycle can arise.

My intention is that the order of the packfiles (and cyclic
dependencies) would not matter, so we wouldn't need to delay any
processing of the up-to-the-minute-fresh part. I'm currently working on
getting index-pack to output a list of the dangling .gitmodules files,
so that fetch-pack (its consumer) can do one final fsck on those files.

Another way, as you said, is to say that the order of the packfiles
matters (which potentially allows some simplification on the client
side) but I don't think that we need to lose this flexibility.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: RFC on packfile URIs and .gitmodules check
  2021-01-21 18:32       ` Jonathan Tan
@ 2021-01-21 18:39         ` Junio C Hamano
  0 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-01-21 18:39 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: avarab, git, peff, stolee, me

Jonathan Tan <jonathantanmy@google.com> writes:

>> Jonathan Tan <jonathantanmy@google.com> writes:
>> 
>> Or can there be a situation where some objects in CDN pack are
>> referred to by objects in the up-to-the-minute-fresh pack (e.g. a
>> ".gitmodules" blob in CDN pack is still unchanged and used in an
>> updated tree in the latest revision) and some other objects in CDN
>> pack refer to an object in the live part of the history?  If there
>> is such a cyclic dependency, "index-pack --fsck" one pack at a time
>> would not work, but I doubt such a cycle can arise.
>
> My intention is that the order of the packfiles (and cyclic
> dependencies) would not matter...
> I'm currently working on
> getting index-pack to output a list of the dangling .gitmodules files,
> so that fetch-pack (its consumer) can do one final fsck on those files.

In other words, it essentially becomes "we check everything we
obtained as a single unit across multiple packs, but for performance
we'll let index-pack work as much as possible on each individual
pack while it has necessary data in its core, and then we conclude
by checking the objects on the 'boundaries' that cannot be validated
using info that is only in one pack".

That does sound like the right approach.  THanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-01-15 23:43 RFC on packfile URIs and .gitmodules check Jonathan Tan
  2021-01-16  0:30 ` Junio C Hamano
  2021-01-20  8:07 ` Ævar Arnfjörð Bjarmason
@ 2021-01-24  2:34 ` Jonathan Tan
  2021-01-24  2:34   ` [PATCH 1/4] http: allow custom index-pack args Jonathan Tan
                     ` (5 more replies)
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
  3 siblings, 6 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-01-24  2:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This patch set resolves the .gitmodules-and-tree-in-separate-packfiles
issue I mentioned in [1] by having index-pack print out all dangling
.gitmodules (instead of returning with an error code) and then teaching
fetch-pack to read those and run its own fsck checks after all
index-pack invocations are complete.

As part of this, index-pack has to output (1) the hash that goes into
the name of the .pack/.idx file and (2) the hashes of all dangling
.gitmodules. I just had (2) come after (1). If anyone has a better idea,
I'm interested.

I also discovered a bug in that different index-pack arguments were used
when processing the inline packfile and when processing the ones
referenced by URIs. Patch 1-3 fixes that bug by passing the arguments to
use as a space-separated URL-encoded list. (URL-encoded so that we can
have spaces in the arguments.) Again, if anyone has a better idea, I'm
interested. It is only in patch 4 that we have the dangling .gitmodules
fix.

[1] https://lore.kernel.org/git/20210115234300.350442-1-jonathantanmy@google.com/

Jonathan Tan (4):
  http: allow custom index-pack args
  http-fetch: allow custom index-pack args
  fetch-pack: with packfile URIs, use index-pack arg
  fetch-pack: print and use dangling .gitmodules

 Documentation/git-http-fetch.txt |   9 ++-
 Documentation/git-index-pack.txt |   7 +-
 builtin/index-pack.c             |   9 ++-
 builtin/receive-pack.c           |   2 +-
 fetch-pack.c                     | 106 ++++++++++++++++++++++++++-----
 fsck.c                           |  16 +++--
 fsck.h                           |   8 +++
 http-fetch.c                     |  35 +++++++++-
 http.c                           |  15 +++--
 http.h                           |  10 +--
 pack-write.c                     |   8 ++-
 pack.h                           |   2 +-
 t/t5550-http-fetch-dumb.sh       |   3 +-
 t/t5702-protocol-v2.sh           |  47 ++++++++++++++
 14 files changed, 232 insertions(+), 45 deletions(-)

-- 
2.30.0.280.ga3ce27912f-goog


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 1/4] http: allow custom index-pack args
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
@ 2021-01-24  2:34   ` Jonathan Tan
  2021-01-24  2:34   ` [PATCH 2/4] http-fetch: " Jonathan Tan
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-01-24  2:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Currently, when fetching, packfiles referenced by URIs are run through
index-pack without any arguments other than --stdin and --keep, no
matter what arguments are used for the packfile that is inline in the
fetch response. As a preparation for ensuring that all packs (whether
inline or not) use the same index-pack arguments, teach the http
subsystem to allow custom index-pack arguments.

http-fetch has been updated to use the new API. For now, it passes
--keep alone instead of --keep with a process ID, but this is only
temporary because http-fetch itself will be taught to accept index-pack
parameters (instead of using a hardcoded constant) in a subsequent
commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 http-fetch.c |  6 +++++-
 http.c       | 15 ++++++++-------
 http.h       | 10 +++++-----
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/http-fetch.c b/http-fetch.c
index c4ccc5fea9..2d1d9d054f 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -43,6 +43,9 @@ static int fetch_using_walker(const char *raw_url, int get_verbosely,
 	return rc;
 }
 
+static const char *index_pack_args[] =
+	{"index-pack", "--stdin", "--keep", NULL};
+
 static void fetch_single_packfile(struct object_id *packfile_hash,
 				  const char *url) {
 	struct http_pack_request *preq;
@@ -55,7 +58,8 @@ static void fetch_single_packfile(struct object_id *packfile_hash,
 	if (preq == NULL)
 		die("couldn't create http pack request");
 	preq->slot->results = &results;
-	preq->generate_keep = 1;
+	preq->index_pack_args = index_pack_args;
+	preq->preserve_index_pack_stdout = 1;
 
 	if (start_active_slot(preq->slot)) {
 		run_active_slot(preq->slot);
diff --git a/http.c b/http.c
index 8b23a546af..f8ea28bb2e 100644
--- a/http.c
+++ b/http.c
@@ -2259,6 +2259,9 @@ void release_http_pack_request(struct http_pack_request *preq)
 	free(preq);
 }
 
+static const char *default_index_pack_args[] =
+	{"index-pack", "--stdin", NULL};
+
 int finish_http_pack_request(struct http_pack_request *preq)
 {
 	struct child_process ip = CHILD_PROCESS_INIT;
@@ -2270,17 +2273,15 @@ int finish_http_pack_request(struct http_pack_request *preq)
 
 	tmpfile_fd = xopen(preq->tmpfile.buf, O_RDONLY);
 
-	strvec_push(&ip.args, "index-pack");
-	strvec_push(&ip.args, "--stdin");
 	ip.git_cmd = 1;
 	ip.in = tmpfile_fd;
-	if (preq->generate_keep) {
-		strvec_pushf(&ip.args, "--keep=git %"PRIuMAX,
-			     (uintmax_t)getpid());
+	ip.argv = preq->index_pack_args ? preq->index_pack_args
+					: default_index_pack_args;
+
+	if (preq->preserve_index_pack_stdout)
 		ip.out = 0;
-	} else {
+	else
 		ip.no_stdout = 1;
-	}
 
 	if (run_command(&ip)) {
 		ret = -1;
diff --git a/http.h b/http.h
index 5de792ef3f..bf3d1270ad 100644
--- a/http.h
+++ b/http.h
@@ -218,12 +218,12 @@ struct http_pack_request {
 	char *url;
 
 	/*
-	 * If this is true, finish_http_pack_request() will pass "--keep" to
-	 * index-pack, resulting in the creation of a keep file, and will not
-	 * suppress its stdout (that is, the "keep\t<hash>\n" line will be
-	 * printed to stdout).
+	 * index-pack command to run. Must be terminated by NULL.
+	 *
+	 * If NULL, defaults to	{"index-pack", "--stdin", NULL}.
 	 */
-	unsigned generate_keep : 1;
+	const char **index_pack_args;
+	unsigned preserve_index_pack_stdout : 1;
 
 	FILE *packfile;
 	struct strbuf tmpfile;
-- 
2.30.0.280.ga3ce27912f-goog


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 2/4] http-fetch: allow custom index-pack args
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
  2021-01-24  2:34   ` [PATCH 1/4] http: allow custom index-pack args Jonathan Tan
@ 2021-01-24  2:34   ` Jonathan Tan
  2021-01-24 11:52     ` Ævar Arnfjörð Bjarmason
  2021-02-16 20:49     ` Josh Steadmon
  2021-01-24  2:34   ` [PATCH 3/4] fetch-pack: with packfile URIs, use index-pack arg Jonathan Tan
                     ` (3 subsequent siblings)
  5 siblings, 2 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-01-24  2:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This is the next step in teaching fetch-pack to pass its index-pack
arguments when processing packfiles referenced by URIs.

The "--keep" in fetch-pack.c will be replaced with a full message in a
subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/git-http-fetch.txt |  9 ++++++--
 fetch-pack.c                     |  1 +
 http-fetch.c                     | 35 +++++++++++++++++++++++++++-----
 t/t5550-http-fetch-dumb.sh       |  3 ++-
 4 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-http-fetch.txt b/Documentation/git-http-fetch.txt
index 4deb4893f5..aa171088e8 100644
--- a/Documentation/git-http-fetch.txt
+++ b/Documentation/git-http-fetch.txt
@@ -41,11 +41,16 @@ commit-id::
 		<commit-id>['\t'<filename-as-in--w>]
 
 --packfile=<hash>::
-	Instead of a commit id on the command line (which is not expected in
+	For internal use only. Instead of a commit id on the command line (which is not expected in
 	this case), 'git http-fetch' fetches the packfile directly at the given
 	URL and uses index-pack to generate corresponding .idx and .keep files.
 	The hash is used to determine the name of the temporary file and is
-	arbitrary. The output of index-pack is printed to stdout.
+	arbitrary. The output of index-pack is printed to stdout. Requires
+	--index-pack-args.
+
+--index-pack-args=<args>::
+	For internal use only. The command to run on the contents of the
+	downloaded pack. Arguments are URL-encoded separated by spaces.
 
 --recover::
 	Verify that everything reachable from target is fetched.  Used after
diff --git a/fetch-pack.c b/fetch-pack.c
index 876f90c759..274ae602f7 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1645,6 +1645,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 		strvec_pushf(&cmd.args, "--packfile=%.*s",
 			     (int) the_hash_algo->hexsz,
 			     packfile_uris.items[i].string);
+		strvec_push(&cmd.args, "--index-pack-args=index-pack --stdin --keep");
 		strvec_push(&cmd.args, uri);
 		cmd.git_cmd = 1;
 		cmd.no_stdin = 1;
diff --git a/http-fetch.c b/http-fetch.c
index 2d1d9d054f..12feb84e71 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -3,6 +3,7 @@
 #include "exec-cmd.h"
 #include "http.h"
 #include "walker.h"
+#include "strvec.h"
 
 static const char http_fetch_usage[] = "git http-fetch "
 "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url";
@@ -43,11 +44,9 @@ static int fetch_using_walker(const char *raw_url, int get_verbosely,
 	return rc;
 }
 
-static const char *index_pack_args[] =
-	{"index-pack", "--stdin", "--keep", NULL};
-
 static void fetch_single_packfile(struct object_id *packfile_hash,
-				  const char *url) {
+				  const char *url,
+				  const char **index_pack_args) {
 	struct http_pack_request *preq;
 	struct slot_results results;
 	int ret;
@@ -90,6 +89,7 @@ int cmd_main(int argc, const char **argv)
 	int packfile = 0;
 	int nongit;
 	struct object_id packfile_hash;
+	const char *index_pack_args = NULL;
 
 	setup_git_directory_gently(&nongit);
 
@@ -116,6 +116,8 @@ int cmd_main(int argc, const char **argv)
 			packfile = 1;
 			if (parse_oid_hex(p, &packfile_hash, &end) || *end)
 				die(_("argument to --packfile must be a valid hash (got '%s')"), p);
+		} else if (skip_prefix(argv[arg], "--index-pack-args=", &p)) {
+			index_pack_args = p;
 		}
 		arg++;
 	}
@@ -128,10 +130,33 @@ int cmd_main(int argc, const char **argv)
 	git_config(git_default_config, NULL);
 
 	if (packfile) {
-		fetch_single_packfile(&packfile_hash, argv[arg]);
+		struct strvec encoded = STRVEC_INIT;
+		char **raw;
+		int i;
+
+		if (!index_pack_args)
+			die(_("--packfile requires --index-pack-args"));
+
+		strvec_split(&encoded, index_pack_args);
+
+		CALLOC_ARRAY(raw, encoded.nr + 1);
+		for (i = 0; i < encoded.nr; i++)
+			raw[i] = url_percent_decode(encoded.v[i]);
+
+		fetch_single_packfile(&packfile_hash, argv[arg],
+				      (const char **) raw);
+
+		for (i = 0; i < encoded.nr; i++)
+			free(raw[i]);
+		free(raw);
+		strvec_clear(&encoded);
+
 		return 0;
 	}
 
+	if (index_pack_args)
+		die(_("--index-pack-args can only be used with --packfile"));
+
 	if (commits_on_stdin) {
 		commits = walker_targets_stdin(&commit_id, &write_ref);
 	} else {
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 483578b2d7..af90e7efed 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -224,7 +224,8 @@ test_expect_success 'http-fetch --packfile' '
 
 	git init packfileclient &&
 	p=$(cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && ls objects/pack/pack-*.pack) &&
-	git -C packfileclient http-fetch --packfile=$ARBITRARY "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
+	git -C packfileclient http-fetch --packfile=$ARBITRARY \
+		--index-pack-args="index-pack --stdin --keep" "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
 
 	grep "^keep.[0-9a-f]\{16,\}$" out &&
 	cut -c6- out >packhash &&
-- 
2.30.0.280.ga3ce27912f-goog


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 3/4] fetch-pack: with packfile URIs, use index-pack arg
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
  2021-01-24  2:34   ` [PATCH 1/4] http: allow custom index-pack args Jonathan Tan
  2021-01-24  2:34   ` [PATCH 2/4] http-fetch: " Jonathan Tan
@ 2021-01-24  2:34   ` Jonathan Tan
  2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-01-24  2:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Unify the index-pack arguments used when processing the inline pack and
when downloading packfiles referenced by URIs. This is done by teaching
get_pack() to also store the index-pack arguments whenever at least one
packfile URI is given, and then when processing the packfile URI(s),
using the stored arguments.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 274ae602f7..fe69635eb5 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -797,12 +797,13 @@ static void write_promisor_file(const char *keep_name,
 }
 
 /*
- * Pass 1 as "only_packfile" if the pack received is the only pack in this
- * fetch request (that is, if there were no packfile URIs provided).
+ * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
+ * The string to pass as the --index-pack-args argument to http-fetch will be
+ * stored there. (It must be freed by the caller.)
  */
 static int get_pack(struct fetch_pack_args *args,
 		    int xd[2], struct string_list *pack_lockfiles,
-		    int only_packfile,
+		    char **index_pack_args,
 		    struct ref **sought, int nr_sought)
 {
 	struct async demux;
@@ -845,7 +846,7 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_push(&cmd.args, alternate_shallow_file);
 	}
 
-	if (do_keep || args->from_promisor) {
+	if (do_keep || args->from_promisor || index_pack_args) {
 		if (pack_lockfiles)
 			cmd.out = -1;
 		cmd_name = "index-pack";
@@ -863,7 +864,7 @@ static int get_pack(struct fetch_pack_args *args,
 				     "--keep=fetch-pack %"PRIuMAX " on %s",
 				     (uintmax_t)getpid(), hostname);
 		}
-		if (only_packfile && args->check_self_contained_and_connected)
+		if (!index_pack_args && args->check_self_contained_and_connected)
 			strvec_push(&cmd.args, "--check-self-contained-and-connected");
 		else
 			/*
@@ -901,7 +902,7 @@ static int get_pack(struct fetch_pack_args *args,
 	    : transfer_fsck_objects >= 0
 	    ? transfer_fsck_objects
 	    : 0) {
-		if (args->from_promisor || !only_packfile)
+		if (args->from_promisor || index_pack_args)
 			/*
 			 * We cannot use --strict in index-pack because it
 			 * checks both broken objects and links, but we only
@@ -913,6 +914,19 @@ static int get_pack(struct fetch_pack_args *args,
 				     fsck_msg_types.buf);
 	}
 
+	if (index_pack_args) {
+		struct strbuf joined = STRBUF_INIT;
+		int i;
+
+		for (i = 0; i < cmd.args.nr; i++) {
+			if (i)
+				strbuf_addch(&joined, ' ');
+			strbuf_addstr_urlencode(&joined, cmd.args.v[i],
+						is_rfc3986_unreserved);
+		}
+		*index_pack_args = strbuf_detach(&joined, NULL);
+	}
+
 	cmd.in = demux.out;
 	cmd.git_cmd = 1;
 	if (start_command(&cmd))
@@ -1084,7 +1098,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		alternate_shallow_file = setup_temporary_shallow(si->shallow);
 	else
 		alternate_shallow_file = NULL;
-	if (get_pack(args, fd, pack_lockfiles, 1, sought, nr_sought))
+	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought))
 		die(_("git fetch-pack: fetch failed."));
 
  all_done:
@@ -1535,6 +1549,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	int seen_ack = 0;
 	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
 	int i;
+	char *index_pack_args = NULL;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1624,7 +1639,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 				receive_packfile_uris(&reader, &packfile_uris);
 			process_section_header(&reader, "packfile", 0);
 			if (get_pack(args, fd, pack_lockfiles,
-				     !packfile_uris.nr, sought, nr_sought))
+				     packfile_uris.nr ? &index_pack_args : NULL,
+				     sought, nr_sought))
 				die(_("git fetch-pack: fetch failed."));
 			do_check_stateless_delimiter(args, &reader);
 
@@ -1645,7 +1661,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 		strvec_pushf(&cmd.args, "--packfile=%.*s",
 			     (int) the_hash_algo->hexsz,
 			     packfile_uris.items[i].string);
-		strvec_push(&cmd.args, "--index-pack-args=index-pack --stdin --keep");
+		strvec_pushf(&cmd.args, "--index-pack-args=%s", index_pack_args);
 		strvec_push(&cmd.args, uri);
 		cmd.git_cmd = 1;
 		cmd.no_stdin = 1;
@@ -1681,6 +1697,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 						 packname));
 	}
 	string_list_clear(&packfile_uris, 0);
+	FREE_AND_NULL(index_pack_args);
 
 	if (negotiator)
 		negotiator->release(negotiator);
-- 
2.30.0.280.ga3ce27912f-goog


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
                     ` (2 preceding siblings ...)
  2021-01-24  2:34   ` [PATCH 3/4] fetch-pack: with packfile URIs, use index-pack arg Jonathan Tan
@ 2021-01-24  2:34   ` Jonathan Tan
  2021-01-24  7:56     ` Junio C Hamano
                       ` (3 more replies)
  2021-01-24  6:29   ` [PATCH 0/4] Check .gitmodules when using packfile URIs Junio C Hamano
  2021-02-18 23:34   ` Junio C Hamano
  5 siblings, 4 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-01-24  2:34 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

Teach index-pack to print dangling .gitmodules links after its "keep" or
"pack" line instead of declaring an error, and teach fetch-pack to check
such lines printed.

This allows the tree side of the .gitmodules link to be in one packfile
and the blob side to be in another without failing the fsck check,
because it is now fetch-pack which checks such objects after all
packfiles have been downloaded and indexed (and not index-pack on an
individual packfile, as it is before this commit).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 Documentation/git-index-pack.txt |  7 ++-
 builtin/index-pack.c             |  9 +++-
 builtin/receive-pack.c           |  2 +-
 fetch-pack.c                     | 78 +++++++++++++++++++++++++++-----
 fsck.c                           | 16 +++++--
 fsck.h                           |  8 ++++
 pack-write.c                     |  8 +++-
 pack.h                           |  2 +-
 t/t5702-protocol-v2.sh           | 47 +++++++++++++++++++
 9 files changed, 155 insertions(+), 22 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index af0c26232c..e74a4a1eda 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -78,7 +78,12 @@ OPTIONS
 	Die if the pack contains broken links. For internal use only.
 
 --fsck-objects::
-	Die if the pack contains broken objects. For internal use only.
+	For internal use only.
++
+Die if the pack contains broken objects. If the pack contains a tree
+pointing to a .gitmodules blob that does not exist, prints the hash of
+that blob (for the caller to check) after the hash that goes into the
+name of the pack/idx file (see "Notes").
 
 --threads=<n>::
 	Specifies the number of threads to spawn when resolving
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 557bd2f348..f995c15115 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1888,8 +1888,13 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	else
 		close(input_fd);
 
-	if (do_fsck_object && fsck_finish(&fsck_options))
-		die(_("fsck error in pack objects"));
+	if (do_fsck_object) {
+		struct fsck_options fo = FSCK_OPTIONS_STRICT;
+
+		fo.print_dangling_gitmodules = 1;
+		if (fsck_finish(&fo))
+			die(_("fsck error in pack objects"));
+	}
 
 	free(objects);
 	strbuf_release(&index_name_buf);
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index d49d050e6e..ed2c9b42e9 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2275,7 +2275,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		status = start_command(&child);
 		if (status)
 			return "index-pack fork failed";
-		pack_lockfile = index_pack_lockfile(child.out);
+		pack_lockfile = index_pack_lockfile(child.out, NULL);
 		close(child.out);
 		status = finish_command(&child);
 		if (status)
diff --git a/fetch-pack.c b/fetch-pack.c
index fe69635eb5..128362e0ba 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -796,6 +796,26 @@ static void write_promisor_file(const char *keep_name,
 	strbuf_release(&promisor_name);
 }
 
+static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
+{
+	int len = the_hash_algo->hexsz + 1; /* hash + NL */
+
+	do {
+		char hex_hash[GIT_MAX_HEXSZ + 1];
+		int read_len = read_in_full(fd, hex_hash, len);
+		struct object_id oid;
+		const char *end;
+
+		if (!read_len)
+			return;
+		if (read_len != len)
+			die("invalid length read %d", read_len);
+		if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
+			die("invalid hash");
+		oidset_insert(gitmodules_oids, &oid);
+	} while (1);
+}
+
 /*
  * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
  * The string to pass as the --index-pack-args argument to http-fetch will be
@@ -804,7 +824,8 @@ static void write_promisor_file(const char *keep_name,
 static int get_pack(struct fetch_pack_args *args,
 		    int xd[2], struct string_list *pack_lockfiles,
 		    char **index_pack_args,
-		    struct ref **sought, int nr_sought)
+		    struct ref **sought, int nr_sought,
+		    struct oidset *gitmodules_oids)
 {
 	struct async demux;
 	int do_keep = args->keep_pack;
@@ -812,6 +833,7 @@ static int get_pack(struct fetch_pack_args *args,
 	struct pack_header header;
 	int pass_header = 0;
 	struct child_process cmd = CHILD_PROCESS_INIT;
+	int fsck_objects = 0;
 	int ret;
 
 	memset(&demux, 0, sizeof(demux));
@@ -846,8 +868,15 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_push(&cmd.args, alternate_shallow_file);
 	}
 
-	if (do_keep || args->from_promisor || index_pack_args) {
-		if (pack_lockfiles)
+	if (fetch_fsck_objects >= 0
+	    ? fetch_fsck_objects
+	    : transfer_fsck_objects >= 0
+	    ? transfer_fsck_objects
+	    : 0)
+		fsck_objects = 1;
+
+	if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
+		if (pack_lockfiles || fsck_objects)
 			cmd.out = -1;
 		cmd_name = "index-pack";
 		strvec_push(&cmd.args, cmd_name);
@@ -897,11 +926,7 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_pushf(&cmd.args, "--pack_header=%"PRIu32",%"PRIu32,
 			     ntohl(header.hdr_version),
 				 ntohl(header.hdr_entries));
-	if (fetch_fsck_objects >= 0
-	    ? fetch_fsck_objects
-	    : transfer_fsck_objects >= 0
-	    ? transfer_fsck_objects
-	    : 0) {
+	if (fsck_objects) {
 		if (args->from_promisor || index_pack_args)
 			/*
 			 * We cannot use --strict in index-pack because it
@@ -931,10 +956,15 @@ static int get_pack(struct fetch_pack_args *args,
 	cmd.git_cmd = 1;
 	if (start_command(&cmd))
 		die(_("fetch-pack: unable to fork off %s"), cmd_name);
-	if (do_keep && pack_lockfiles) {
-		char *pack_lockfile = index_pack_lockfile(cmd.out);
+	if (do_keep && (pack_lockfiles || fsck_objects)) {
+		int is_well_formed;
+		char *pack_lockfile = index_pack_lockfile(cmd.out, &is_well_formed);
+
+		if (!is_well_formed)
+			die(_("fetch-pack: invalid index-pack output"));
 		if (pack_lockfile)
 			string_list_append_nodup(pack_lockfiles, pack_lockfile);
+		parse_gitmodules_oids(cmd.out, gitmodules_oids);
 		close(cmd.out);
 	}
 
@@ -969,6 +999,22 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
 	return strcmp(a->name, b->name);
 }
 
+static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
+{
+	struct oidset_iter iter;
+	const struct object_id *oid;
+	struct fsck_options fo = FSCK_OPTIONS_STRICT;
+
+	if (!oidset_size(gitmodules_oids))
+		return;
+
+	oidset_iter_init(gitmodules_oids, &iter);
+	while ((oid = oidset_iter_next(&iter)))
+		register_found_gitmodules(oid);
+	if (fsck_finish(&fo))
+		die("fsck failed");
+}
+
 static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 				 int fd[2],
 				 const struct ref *orig_ref,
@@ -983,6 +1029,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	int agent_len;
 	struct fetch_negotiator negotiator_alloc;
 	struct fetch_negotiator *negotiator;
+	struct oidset gitmodules_oids = OIDSET_INIT;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1098,8 +1145,10 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		alternate_shallow_file = setup_temporary_shallow(si->shallow);
 	else
 		alternate_shallow_file = NULL;
-	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought))
+	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought,
+		     &gitmodules_oids))
 		die(_("git fetch-pack: fetch failed."));
+	fsck_gitmodules_oids(&gitmodules_oids);
 
  all_done:
 	if (negotiator)
@@ -1550,6 +1599,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
 	int i;
 	char *index_pack_args = NULL;
+	struct oidset gitmodules_oids = OIDSET_INIT;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1640,7 +1690,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			process_section_header(&reader, "packfile", 0);
 			if (get_pack(args, fd, pack_lockfiles,
 				     packfile_uris.nr ? &index_pack_args : NULL,
-				     sought, nr_sought))
+				     sought, nr_sought, &gitmodules_oids))
 				die(_("git fetch-pack: fetch failed."));
 			do_check_stateless_delimiter(args, &reader);
 
@@ -1680,6 +1730,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 
 		packname[the_hash_algo->hexsz] = '\0';
 
+		parse_gitmodules_oids(cmd.out, &gitmodules_oids);
+
 		close(cmd.out);
 
 		if (finish_command(&cmd))
@@ -1699,6 +1751,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	string_list_clear(&packfile_uris, 0);
 	FREE_AND_NULL(index_pack_args);
 
+	fsck_gitmodules_oids(&gitmodules_oids);
+
 	if (negotiator)
 		negotiator->release(negotiator);
 
diff --git a/fsck.c b/fsck.c
index f82e2fe9e3..04f3d342af 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1243,6 +1243,11 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
+void register_found_gitmodules(const struct object_id *oid)
+{
+	oidset_insert(&gitmodules_found, oid);
+}
+
 int fsck_finish(struct fsck_options *options)
 {
 	int ret = 0;
@@ -1262,10 +1267,13 @@ int fsck_finish(struct fsck_options *options)
 		if (!buf) {
 			if (is_promisor_object(oid))
 				continue;
-			ret |= report(options,
-				      oid, OBJ_BLOB,
-				      FSCK_MSG_GITMODULES_MISSING,
-				      "unable to read .gitmodules blob");
+			if (options->print_dangling_gitmodules)
+				printf("%s\n", oid_to_hex(oid));
+			else
+				ret |= report(options,
+					      oid, OBJ_BLOB,
+					      FSCK_MSG_GITMODULES_MISSING,
+					      "unable to read .gitmodules blob");
 			continue;
 		}
 
diff --git a/fsck.h b/fsck.h
index 69cf715e79..4b8cf03445 100644
--- a/fsck.h
+++ b/fsck.h
@@ -41,6 +41,12 @@ struct fsck_options {
 	int *msg_type;
 	struct oidset skiplist;
 	kh_oid_map_t *object_names;
+
+	/*
+	 * If 1, print the hashes of missing .gitmodules blobs instead of
+	 * considering them to be errors.
+	 */
+	unsigned print_dangling_gitmodules:1;
 };
 
 #define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
@@ -62,6 +68,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
+void register_found_gitmodules(const struct object_id *oid);
+
 /*
  * Some fsck checks are context-dependent, and may end up queued; run this
  * after completing all fsck_object() calls in order to resolve any remaining
diff --git a/pack-write.c b/pack-write.c
index 3513665e1e..f66ea8e5a1 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -272,7 +272,7 @@ void fixup_pack_header_footer(int pack_fd,
 	fsync_or_die(pack_fd, pack_name);
 }
 
-char *index_pack_lockfile(int ip_out)
+char *index_pack_lockfile(int ip_out, int *is_well_formed)
 {
 	char packname[GIT_MAX_HEXSZ + 6];
 	const int len = the_hash_algo->hexsz + 6;
@@ -286,11 +286,17 @@ char *index_pack_lockfile(int ip_out)
 	 */
 	if (read_in_full(ip_out, packname, len) == len && packname[len-1] == '\n') {
 		const char *name;
+
+		if (is_well_formed)
+			*is_well_formed = 1;
 		packname[len-1] = 0;
 		if (skip_prefix(packname, "keep\t", &name))
 			return xstrfmt("%s/pack/pack-%s.keep",
 				       get_object_directory(), name);
+		return NULL;
 	}
+	if (is_well_formed)
+		*is_well_formed = 0;
 	return NULL;
 }
 
diff --git a/pack.h b/pack.h
index 9fc0945ac9..09cffec395 100644
--- a/pack.h
+++ b/pack.h
@@ -85,7 +85,7 @@ int verify_pack_index(struct packed_git *);
 int verify_pack(struct repository *, struct packed_git *, verify_fn fn, struct progress *, uint32_t);
 off_t write_pack_header(struct hashfile *f, uint32_t);
 void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t);
-char *index_pack_lockfile(int fd);
+char *index_pack_lockfile(int fd, int *is_well_formed);
 
 /*
  * The "hdr" output buffer should be at least this big, which will handle sizes
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 7d5b17909b..8b8fb43dbc 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -936,6 +936,53 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object'
 	test_i18ngrep "invalid author/committer line - missing email" error
 '
 
+test_expect_success 'packfile-uri with transfer.fsckobjects succeeds when .gitmodules is separate from tree' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo "[submodule libfoo]" >"$P/.gitmodules" &&
+	echo "path = include/foo" >>"$P/.gitmodules" &&
+	echo "url = git://example.com/git/lib.git" >>"$P/.gitmodules" &&
+	git -C "$P" add .gitmodules &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" .gitmodules >h &&
+
+	sane_unset GIT_TEST_SIDEBAND_ALL &&
+	git -c protocol.version=2 -c transfer.fsckobjects=1 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child &&
+
+	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
+	ls http_child/.git/objects/pack/* >filelist &&
+	test_line_count = 4 filelist
+'
+
+test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodules separate from tree is invalid' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child err &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo "[submodule \"..\"]" >"$P/.gitmodules" &&
+	echo "path = include/foo" >>"$P/.gitmodules" &&
+	echo "url = git://example.com/git/lib.git" >>"$P/.gitmodules" &&
+	git -C "$P" add .gitmodules &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" .gitmodules >h &&
+
+	sane_unset GIT_TEST_SIDEBAND_ALL &&
+	test_must_fail git -c protocol.version=2 -c transfer.fsckobjects=1 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child 2>err &&
+	test_i18ngrep "disallowed submodule name" err
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.
 
-- 
2.30.0.280.ga3ce27912f-goog


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
                     ` (3 preceding siblings ...)
  2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
@ 2021-01-24  6:29   ` Junio C Hamano
  2021-01-28  0:35     ` Jonathan Tan
  2021-02-18 23:34   ` Junio C Hamano
  5 siblings, 1 reply; 134+ messages in thread
From: Junio C Hamano @ 2021-01-24  6:29 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> As part of this, index-pack has to output (1) the hash that goes into
> the name of the .pack/.idx file and (2) the hashes of all dangling
> .gitmodules. I just had (2) come after (1). If anyone has a better idea,
> I'm interested.

I have this feeling that the "blobs that need to be validated across
packs" will *not* be the last enhancement we'd need to make to the
output from index-pack to allow richer communication between it and
its invoker.  While there is no reason to change how the first line
of the output looks like, we'd probably want to make sure that the
future versions of Git can easily tell "list of blobs that require
further validation" from other additional information.

I am not comfortable to recommend "ok, then let's add a delimiter
line '---\n' if/when we need to have something after the list of
blobs and append more stuff in future versions of Git", because we
may find need to emit new kinds of info before the list of blobs
that needs further validation, for example, in future versions of
Git.

Having said all that, the internal communication between the
index-pack and its caller do not need as much care about
compatibility across versions as output visible to end-users, so
when a future version of Git needs to send different kinds of
information in different order from what you created here, we can do
so pretty much freely, I would guess.

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
@ 2021-01-24  7:56     ` Junio C Hamano
  2021-01-26  1:57       ` Junio C Hamano
  2021-01-24 12:18     ` Ævar Arnfjörð Bjarmason
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 134+ messages in thread
From: Junio C Hamano @ 2021-01-24  7:56 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
> index 7d5b17909b..8b8fb43dbc 100755
> ...
> +	sane_unset GIT_TEST_SIDEBAND_ALL &&
> +	git -c protocol.version=2 -c transfer.fsckobjects=1 \
> +		-c fetch.uriprotocols=http,https \
> +		clone "$HTTPD_URL/smart/http_parent" http_child &&
> +
> +	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).

Ehh, please don't.  We may add multi-pack-index there, or perhaps
reverse index files in the future.  If you care about having two
packs logically because you are exercising the out-of-band
prepackaged packfile plus the dynamic transfer, make sure you have
two packs (and probably the idx files that go with them).  Don't
assume there will be one .idx each for them *AND* nothing else
there.

> +	ls http_child/.git/objects/pack/* >filelist &&
> +	test_line_count = 4 filelist
> +'

IOW,

	d=http_child/.git/objects/pack/
	ls "$d"/*.pack "$d"/*.idx >filelist &&
	test_line_count = 4 filelist

or something like that.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 2/4] http-fetch: allow custom index-pack args
  2021-01-24  2:34   ` [PATCH 2/4] http-fetch: " Jonathan Tan
@ 2021-01-24 11:52     ` Ævar Arnfjörð Bjarmason
  2021-01-28  0:32       ` Jonathan Tan
  2021-02-16 20:49     ` Josh Steadmon
  1 sibling, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-01-24 11:52 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Sun, Jan 24 2021, Jonathan Tan wrote:

>  --packfile=<hash>::
> -	Instead of a commit id on the command line (which is not expected in
> +	For internal use only. Instead of a commit id on the command line (which is not expected in

Leaves the rest at ~79 and this long line at ~100. Perhaps a follow-up
change to re-word-wrap would be in order?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
  2021-01-24  7:56     ` Junio C Hamano
@ 2021-01-24 12:18     ` Ævar Arnfjörð Bjarmason
  2021-01-28  1:03       ` Jonathan Tan
  2021-01-24 12:30     ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:27     ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-01-24 12:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Sun, Jan 24 2021, Jonathan Tan wrote:

> +void register_found_gitmodules(const struct object_id *oid)
> +{
> +	oidset_insert(&gitmodules_found, oid);
> +}
> +

In fsck.c we only use this variable to insert into it, or in fsck_blob()
to do the actual check, but then we either abort early if we've found
it, or right after that:

        if (object_on_skiplist(options, oid))
                return 0;

So (along with comments I have below...) you could just use the existing
"skiplist" option instead, no?

>  int fsck_finish(struct fsck_options *options)
>  {
>  	int ret = 0;
> @@ -1262,10 +1267,13 @@ int fsck_finish(struct fsck_options *options)
>  		if (!buf) {
>  			if (is_promisor_object(oid))
>  				continue;
> -			ret |= report(options,
> -				      oid, OBJ_BLOB,
> -				      FSCK_MSG_GITMODULES_MISSING,
> -				      "unable to read .gitmodules blob");
> +			if (options->print_dangling_gitmodules)
> +				printf("%s\n", oid_to_hex(oid));
> +			else
> +				ret |= report(options,
> +					      oid, OBJ_BLOB,
> +					      FSCK_MSG_GITMODULES_MISSING,
> +					      "unable to read .gitmodules blob");
>  			continue;
>  		}
>  
> diff --git a/fsck.h b/fsck.h
> index 69cf715e79..4b8cf03445 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -41,6 +41,12 @@ struct fsck_options {
>  	int *msg_type;
>  	struct oidset skiplist;
>  	kh_oid_map_t *object_names;
> +
> +	/*
> +	 * If 1, print the hashes of missing .gitmodules blobs instead of
> +	 * considering them to be errors.
> +	 */
> +	unsigned print_dangling_gitmodules:1;
>  };
>  
>  #define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
> @@ -62,6 +68,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
>  int fsck_object(struct object *obj, void *data, unsigned long size,
>  	struct fsck_options *options);
>  
> +void register_found_gitmodules(const struct object_id *oid);
> +
>  /*
>   * Some fsck checks are context-dependent, and may end up queued; run this
>   * after completing all fsck_object() calls in order to resolve any remaining


This whole thing seems just like the bad path I took in earlier rounds
of my in-flight mktag series. You don't need this new custom API. You
just setup an error handler for your fsck which ignores / prints / logs
/ whatever the OIDs you want if you get a FSCK_MSG_GITMODULES_MISSING
error, which you then "return 0" on.

If you don't have FSCK_MSG_GITMODULES_MISSING punt and call
fsck_error_function().

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
  2021-01-24  7:56     ` Junio C Hamano
  2021-01-24 12:18     ` Ævar Arnfjörð Bjarmason
@ 2021-01-24 12:30     ` Ævar Arnfjörð Bjarmason
  2021-01-28  1:15       ` Jonathan Tan
  2021-02-17 19:27     ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-01-24 12:30 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Sun, Jan 24 2021, Jonathan Tan wrote:
>  --fsck-objects::
> -	Die if the pack contains broken objects. For internal use only.
> +	For internal use only.
> ++
> +Die if the pack contains broken objects. If the pack contains a tree
> +pointing to a .gitmodules blob that does not exist, prints the hash of
> +that blob (for the caller to check) after the hash that goes into the
> +name of the pack/idx file (see "Notes").

[I should have waited a bit and sent one E-Mail]

Is this really generally usable as an IPC mechanism, what if we need
another set of OIDs we care about? Shouldn't it at least be hidden
behind some option so you don't get a deluge of output from index-pack
if you're not in this packfile-uri mode?

But, along with my other E-Mail...

> [...]
> +static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
> +{
> +	int len = the_hash_algo->hexsz + 1; /* hash + NL */
> +
> +	do {
> +		char hex_hash[GIT_MAX_HEXSZ + 1];
> +		int read_len = read_in_full(fd, hex_hash, len);
> +		struct object_id oid;
> +		const char *end;
> +
> +		if (!read_len)
> +			return;
> +		if (read_len != len)
> +			die("invalid length read %d", read_len);
> +		if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
> +			die("invalid hash");
> +		oidset_insert(gitmodules_oids, &oid);
> +	} while (1);
> +}
> +

Doesn't this IPC mechanism already exist in the form of fsck.skipList?
See my 1f3299fda9 (fsck: make fsck_config() re-usable, 2021-01-05) on
"next". I.e. as noted in my just-sent-E-Mail you could probably just
re-use skiplist as-is.

Or if not it seems to me that this whole IPC mechanism would be better
done with a tempfile and passing it along like we already pass the
fsck.skipList between these processes.

I doubt it's going to be large enough to matter, we could just put it in
.git/ somewhere, like we put gc.log etc (but created with a mktemp()
name...).

Or if we want to keep the "print <list> | process" model we can refactor
the existing fsck IPC noted in 1f3299fda9 a bit, so e.g. you pass some
version of "lines prefixed with "fsck-skiplist: " go into list xyz via a
command-line option. And then existing option(s) and your potential new
list (which as noted, I think is probably redundant to the skiplist) can
use it.




^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24  7:56     ` Junio C Hamano
@ 2021-01-26  1:57       ` Junio C Hamano
  2021-01-28  1:04         ` Jonathan Tan
  0 siblings, 1 reply; 134+ messages in thread
From: Junio C Hamano @ 2021-01-26  1:57 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>> diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
>> index 7d5b17909b..8b8fb43dbc 100755
>> ...
>> +	sane_unset GIT_TEST_SIDEBAND_ALL &&
>> +	git -c protocol.version=2 -c transfer.fsckobjects=1 \
>> +		-c fetch.uriprotocols=http,https \
>> +		clone "$HTTPD_URL/smart/http_parent" http_child &&
>> +
>> +	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
>
> Ehh, please don't.  We may add multi-pack-index there, or perhaps
> reverse index files in the future.  If you care about having two
> packs logically because you are exercising the out-of-band
> prepackaged packfile plus the dynamic transfer, make sure you have
> two packs (and probably the idx files that go with them).  Don't
> assume there will be one .idx each for them *AND* nothing else
> there.
>
>> +	ls http_child/.git/objects/pack/* >filelist &&
>> +	test_line_count = 4 filelist
>> +'
>
> IOW,
>
> 	d=http_child/.git/objects/pack/
> 	ls "$d"/*.pack "$d"/*.idx >filelist &&
> 	test_line_count = 4 filelist
>
> or something like that.

FYI, I have the following queued to make the tip of 'seen' pass the
tests.

---- >8 -------- >8 -------- >8 -------- >8 -------- >8 -------- >8 ----
From: Junio C Hamano <gitster@pobox.com>
Date: Mon, 25 Jan 2021 17:27:10 -0800
Subject: [PATCH] SQUASH??? test fix

---
 t/t5702-protocol-v2.sh | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 8b8fb43dbc..b1bc73a9a9 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -847,8 +847,9 @@ test_expect_success 'part of packfile response provided as URI' '
 	test -f hfound &&
 	test -f h2found &&
 
-	# Ensure that there are exactly 6 files (3 .pack and 3 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
+	# Ensure that there are exactly 3 packfiles with associated .idx
+	ls http_child/.git/objects/pack/*.pack \
+	    http_child/.git/objects/pack/*.idx >filelist &&
 	test_line_count = 6 filelist
 '
 
@@ -901,8 +902,9 @@ test_expect_success 'packfile-uri with transfer.fsckobjects' '
 		-c fetch.uriprotocols=http,https \
 		clone "$HTTPD_URL/smart/http_parent" http_child &&
 
-	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
+	# Ensure that there are exactly 2 packfiles with associated .idx
+	ls http_child/.git/objects/pack/*.pack \
+	    http_child/.git/objects/pack/*.idx >filelist &&
 	test_line_count = 4 filelist
 '
 
@@ -956,8 +958,9 @@ test_expect_success 'packfile-uri with transfer.fsckobjects succeeds when .gitmo
 		-c fetch.uriprotocols=http,https \
 		clone "$HTTPD_URL/smart/http_parent" http_child &&
 
-	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
+	# Ensure that there are exactly 2 packfiles with associated .idx
+	ls http_child/.git/objects/pack/*.pack \
+	    http_child/.git/objects/pack/*.idx >filelist &&
 	test_line_count = 4 filelist
 '
 
-- 
2.30.0-509-gbbf2750a06


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 2/4] http-fetch: allow custom index-pack args
  2021-01-24 11:52     ` Ævar Arnfjörð Bjarmason
@ 2021-01-28  0:32       ` Jonathan Tan
  0 siblings, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-01-28  0:32 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> On Sun, Jan 24 2021, Jonathan Tan wrote:
> 
> >  --packfile=<hash>::
> > -	Instead of a commit id on the command line (which is not expected in
> > +	For internal use only. Instead of a commit id on the command line (which is not expected in
> 
> Leaves the rest at ~79 and this long line at ~100. Perhaps a follow-up
> change to re-word-wrap would be in order?

Hmm...I'll split that onto two lines then. I don't think it's worth the
extra commit in history to have it exactly wrapped right, so I'll forgo
the follow-up change for now.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-01-24  6:29   ` [PATCH 0/4] Check .gitmodules when using packfile URIs Junio C Hamano
@ 2021-01-28  0:35     ` Jonathan Tan
  2021-02-18 11:31       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 134+ messages in thread
From: Jonathan Tan @ 2021-01-28  0:35 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > As part of this, index-pack has to output (1) the hash that goes into
> > the name of the .pack/.idx file and (2) the hashes of all dangling
> > .gitmodules. I just had (2) come after (1). If anyone has a better idea,
> > I'm interested.
> 
> I have this feeling that the "blobs that need to be validated across
> packs" will *not* be the last enhancement we'd need to make to the
> output from index-pack to allow richer communication between it and
> its invoker.  While there is no reason to change how the first line
> of the output looks like, we'd probably want to make sure that the
> future versions of Git can easily tell "list of blobs that require
> further validation" from other additional information.
> 
> I am not comfortable to recommend "ok, then let's add a delimiter
> line '---\n' if/when we need to have something after the list of
> blobs and append more stuff in future versions of Git", because we
> may find need to emit new kinds of info before the list of blobs
> that needs further validation, for example, in future versions of
> Git.
> 
> Having said all that, the internal communication between the
> index-pack and its caller do not need as much care about
> compatibility across versions as output visible to end-users, so
> when a future version of Git needs to send different kinds of
> information in different order from what you created here, we can do
> so pretty much freely, I would guess.

Yeah, that's what I thought too - since this is an internal interface,
we can evolve them in lockstep. If we're really worried about the Git
binaries (on a user's system) getting out of sync, we could just make
sure that subsequent updates to this protocol are
non-backwards-compatible (e.g. have index-pack emit "foo <hash>", where
"foo" is a string that describes the new check, so that current
fetch-pack will reject "foo" since it is not a hash).

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24 12:18     ` Ævar Arnfjörð Bjarmason
@ 2021-01-28  1:03       ` Jonathan Tan
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 134+ messages in thread
From: Jonathan Tan @ 2021-01-28  1:03 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> On Sun, Jan 24 2021, Jonathan Tan wrote:
> 
> > +void register_found_gitmodules(const struct object_id *oid)
> > +{
> > +	oidset_insert(&gitmodules_found, oid);
> > +}
> > +
> 
> In fsck.c we only use this variable to insert into it, or in fsck_blob()
> to do the actual check, but then we either abort early if we've found
> it, or right after that:

By "this variable", do you mean gitmodules_found? fsck_finish() consumes
it.

>         if (object_on_skiplist(options, oid))
>                 return 0;
> 
> So (along with comments I have below...) you could just use the existing
> "skiplist" option instead, no?

I don't understand this part (in particular, the part you quoted). About
"skiplist", I'll reply to your other email [1] which has more details.

[1] https://lore.kernel.org/git/87czxu7c15.fsf@evledraar.gmail.com/

> This whole thing seems just like the bad path I took in earlier rounds
> of my in-flight mktag series. You don't need this new custom API. You
> just setup an error handler for your fsck which ignores / prints / logs
> / whatever the OIDs you want if you get a FSCK_MSG_GITMODULES_MISSING
> error, which you then "return 0" on.
> 
> If you don't have FSCK_MSG_GITMODULES_MISSING punt and call
> fsck_error_function().

I tried that first, and the issue is that IDs like
FSCK_MSG_GITMODULES_MISSING are internal to fsck.c. As for whether we
should start exposing the IDs publicly, I think we should wait until a
few new cases like this come up, so that we more fully understand the
requirements first.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-26  1:57       ` Junio C Hamano
@ 2021-01-28  1:04         ` Jonathan Tan
  0 siblings, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-01-28  1:04 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> > Ehh, please don't.  We may add multi-pack-index there, or perhaps
> > reverse index files in the future.  If you care about having two
> > packs logically because you are exercising the out-of-band
> > prepackaged packfile plus the dynamic transfer, make sure you have
> > two packs (and probably the idx files that go with them).  Don't
> > assume there will be one .idx each for them *AND* nothing else
> > there.
> >
> >> +	ls http_child/.git/objects/pack/* >filelist &&
> >> +	test_line_count = 4 filelist
> >> +'
> >
> > IOW,
> >
> > 	d=http_child/.git/objects/pack/
> > 	ls "$d"/*.pack "$d"/*.idx >filelist &&
> > 	test_line_count = 4 filelist
> >
> > or something like that.
> 
> FYI, I have the following queued to make the tip of 'seen' pass the
> tests.

[snip]

OK - I'll include these changes in the next version.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24 12:30     ` Ævar Arnfjörð Bjarmason
@ 2021-01-28  1:15       ` Jonathan Tan
  2021-02-17  2:10         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 134+ messages in thread
From: Jonathan Tan @ 2021-01-28  1:15 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> On Sun, Jan 24 2021, Jonathan Tan wrote:
> >  --fsck-objects::
> > -	Die if the pack contains broken objects. For internal use only.
> > +	For internal use only.
> > ++
> > +Die if the pack contains broken objects. If the pack contains a tree
> > +pointing to a .gitmodules blob that does not exist, prints the hash of
> > +that blob (for the caller to check) after the hash that goes into the
> > +name of the pack/idx file (see "Notes").
> 
> [I should have waited a bit and sent one E-Mail]
> 
> Is this really generally usable as an IPC mechanism, what if we need
> another set of OIDs we care about? Shouldn't it at least be hidden
> behind some option so you don't get a deluge of output from index-pack
> if you're not in this packfile-uri mode?

--fsck-objects is only for internal use, and it's only used by
fetch-pack.c. So its only consumer does want the output.

Junio also mentioned the possibility of another set of OIDs, and I
replied [1].

[1] https://lore.kernel.org/git/20210128003536.3874866-1-jonathantanmy@google.com/

> But, along with my other E-Mail...
> 
> > [...]
> > +static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
> > +{
> > +	int len = the_hash_algo->hexsz + 1; /* hash + NL */
> > +
> > +	do {
> > +		char hex_hash[GIT_MAX_HEXSZ + 1];
> > +		int read_len = read_in_full(fd, hex_hash, len);
> > +		struct object_id oid;
> > +		const char *end;
> > +
> > +		if (!read_len)
> > +			return;
> > +		if (read_len != len)
> > +			die("invalid length read %d", read_len);
> > +		if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
> > +			die("invalid hash");
> > +		oidset_insert(gitmodules_oids, &oid);
> > +	} while (1);
> > +}
> > +
> 
> Doesn't this IPC mechanism already exist in the form of fsck.skipList?
> See my 1f3299fda9 (fsck: make fsck_config() re-usable, 2021-01-05) on
> "next". I.e. as noted in my just-sent-E-Mail you could probably just
> re-use skiplist as-is.

I'm not sure how fsck.skipList could be used here. Before running
fsck_finish() for the first time, we don't know which .gitmodules are
missing and which are not. And when running fsck_finish() for the second
time, we definitely do not want to skip any blobs.

> Or if not it seems to me that this whole IPC mechanism would be better
> done with a tempfile and passing it along like we already pass the
> fsck.skipList between these processes.
> 
> I doubt it's going to be large enough to matter, we could just put it in
> .git/ somewhere, like we put gc.log etc (but created with a mktemp()
> name...).
> 
> Or if we want to keep the "print <list> | process" model we can refactor
> the existing fsck IPC noted in 1f3299fda9 a bit, so e.g. you pass some
> version of "lines prefixed with "fsck-skiplist: " go into list xyz via a
> command-line option. And then existing option(s) and your potential new
> list (which as noted, I think is probably redundant to the skiplist) can
> use it.

I think using stdout is superior to using a tempfile - we don't have to
worry about interrupted invocations, for example.

What do you mean by "the existing fsck IPC noted in 1f3299fda9"? If you
mean the ability to pass a list of OIDs, for example using "-c
fsck.skipList=filename.txt", I'm not sure that it solves anything.
Firstly, I don't think that the skipList is useful here (as I said
earlier). And secondly, I don't think that OID input is the issue -
right now, the design is a process (index-pack, calling fsck_finish())
writing to its output which is then picked up by the calling process
(fetch-pack). We are not sending the dangling .gitmodules through stdin
anywhere.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 2/4] http-fetch: allow custom index-pack args
  2021-01-24  2:34   ` [PATCH 2/4] http-fetch: " Jonathan Tan
  2021-01-24 11:52     ` Ævar Arnfjörð Bjarmason
@ 2021-02-16 20:49     ` Josh Steadmon
  2021-02-16 22:57       ` Junio C Hamano
  1 sibling, 1 reply; 134+ messages in thread
From: Josh Steadmon @ 2021-02-16 20:49 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On 2021.01.23 18:34, Jonathan Tan wrote:
> This is the next step in teaching fetch-pack to pass its index-pack
> arguments when processing packfiles referenced by URIs.
> 
> The "--keep" in fetch-pack.c will be replaced with a full message in a
> subsequent commit.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  Documentation/git-http-fetch.txt |  9 ++++++--
>  fetch-pack.c                     |  1 +
>  http-fetch.c                     | 35 +++++++++++++++++++++++++++-----
>  t/t5550-http-fetch-dumb.sh       |  3 ++-
>  4 files changed, 40 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/git-http-fetch.txt b/Documentation/git-http-fetch.txt
> index 4deb4893f5..aa171088e8 100644
> --- a/Documentation/git-http-fetch.txt
> +++ b/Documentation/git-http-fetch.txt
> @@ -41,11 +41,16 @@ commit-id::
>  		<commit-id>['\t'<filename-as-in--w>]
>  
>  --packfile=<hash>::
> -	Instead of a commit id on the command line (which is not expected in
> +	For internal use only. Instead of a commit id on the command line (which is not expected in
>  	this case), 'git http-fetch' fetches the packfile directly at the given
>  	URL and uses index-pack to generate corresponding .idx and .keep files.
>  	The hash is used to determine the name of the temporary file and is
> -	arbitrary. The output of index-pack is printed to stdout.
> +	arbitrary. The output of index-pack is printed to stdout. Requires
> +	--index-pack-args.
> +
> +--index-pack-args=<args>::
> +	For internal use only. The command to run on the contents of the
> +	downloaded pack. Arguments are URL-encoded separated by spaces.

I'm a bit skeptical of using URL encoding to work around embedded
spaces. I believe in Emily's config-based hooks series, she wrote an
argument parser to pull repeated arguments into a strvec, could you do
something like that here?

I'm sympathetic to the idea that since this is an internal-only flag, we
can be a bit weird with the argument format, though.

>  --recover::
>  	Verify that everything reachable from target is fetched.  Used after
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 876f90c759..274ae602f7 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1645,6 +1645,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  		strvec_pushf(&cmd.args, "--packfile=%.*s",
>  			     (int) the_hash_algo->hexsz,
>  			     packfile_uris.items[i].string);
> +		strvec_push(&cmd.args, "--index-pack-args=index-pack --stdin --keep");
>  		strvec_push(&cmd.args, uri);
>  		cmd.git_cmd = 1;
>  		cmd.no_stdin = 1;
> diff --git a/http-fetch.c b/http-fetch.c
> index 2d1d9d054f..12feb84e71 100644
> --- a/http-fetch.c
> +++ b/http-fetch.c
> @@ -3,6 +3,7 @@
>  #include "exec-cmd.h"
>  #include "http.h"
>  #include "walker.h"
> +#include "strvec.h"
>  
>  static const char http_fetch_usage[] = "git http-fetch "
>  "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url";
> @@ -43,11 +44,9 @@ static int fetch_using_walker(const char *raw_url, int get_verbosely,
>  	return rc;
>  }
>  
> -static const char *index_pack_args[] =
> -	{"index-pack", "--stdin", "--keep", NULL};
> -
>  static void fetch_single_packfile(struct object_id *packfile_hash,
> -				  const char *url) {
> +				  const char *url,
> +				  const char **index_pack_args) {
>  	struct http_pack_request *preq;
>  	struct slot_results results;
>  	int ret;
> @@ -90,6 +89,7 @@ int cmd_main(int argc, const char **argv)
>  	int packfile = 0;
>  	int nongit;
>  	struct object_id packfile_hash;
> +	const char *index_pack_args = NULL;
>  
>  	setup_git_directory_gently(&nongit);
>  
> @@ -116,6 +116,8 @@ int cmd_main(int argc, const char **argv)
>  			packfile = 1;
>  			if (parse_oid_hex(p, &packfile_hash, &end) || *end)
>  				die(_("argument to --packfile must be a valid hash (got '%s')"), p);
> +		} else if (skip_prefix(argv[arg], "--index-pack-args=", &p)) {
> +			index_pack_args = p;
>  		}
>  		arg++;
>  	}
> @@ -128,10 +130,33 @@ int cmd_main(int argc, const char **argv)
>  	git_config(git_default_config, NULL);
>  
>  	if (packfile) {
> -		fetch_single_packfile(&packfile_hash, argv[arg]);
> +		struct strvec encoded = STRVEC_INIT;
> +		char **raw;
> +		int i;
> +
> +		if (!index_pack_args)
> +			die(_("--packfile requires --index-pack-args"));
> +
> +		strvec_split(&encoded, index_pack_args);
> +
> +		CALLOC_ARRAY(raw, encoded.nr + 1);
> +		for (i = 0; i < encoded.nr; i++)
> +			raw[i] = url_percent_decode(encoded.v[i]);
> +
> +		fetch_single_packfile(&packfile_hash, argv[arg],
> +				      (const char **) raw);
> +
> +		for (i = 0; i < encoded.nr; i++)
> +			free(raw[i]);
> +		free(raw);
> +		strvec_clear(&encoded);
> +
>  		return 0;
>  	}
>  
> +	if (index_pack_args)
> +		die(_("--index-pack-args can only be used with --packfile"));
> +
>  	if (commits_on_stdin) {
>  		commits = walker_targets_stdin(&commit_id, &write_ref);
>  	} else {
> diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
> index 483578b2d7..af90e7efed 100755
> --- a/t/t5550-http-fetch-dumb.sh
> +++ b/t/t5550-http-fetch-dumb.sh
> @@ -224,7 +224,8 @@ test_expect_success 'http-fetch --packfile' '
>  
>  	git init packfileclient &&
>  	p=$(cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && ls objects/pack/pack-*.pack) &&
> -	git -C packfileclient http-fetch --packfile=$ARBITRARY "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
> +	git -C packfileclient http-fetch --packfile=$ARBITRARY \
> +		--index-pack-args="index-pack --stdin --keep" "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
>  
>  	grep "^keep.[0-9a-f]\{16,\}$" out &&
>  	cut -c6- out >packhash &&
> -- 
> 2.30.0.280.ga3ce27912f-goog
> 

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 2/4] http-fetch: allow custom index-pack args
  2021-02-16 20:49     ` Josh Steadmon
@ 2021-02-16 22:57       ` Junio C Hamano
  2021-02-17 19:46         ` Jonathan Tan
  0 siblings, 1 reply; 134+ messages in thread
From: Junio C Hamano @ 2021-02-16 22:57 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: Jonathan Tan, git

Josh Steadmon <steadmon@google.com> writes:

>> +--index-pack-args=<args>::
>> +	For internal use only. The command to run on the contents of the
>> +	downloaded pack. Arguments are URL-encoded separated by spaces.
>
> I'm a bit skeptical of using URL encoding to work around embedded
> spaces. I believe in Emily's config-based hooks series, she wrote an
> argument parser to pull repeated arguments into a strvec, could you do
> something like that here?
>
> I'm sympathetic to the idea that since this is an internal-only flag, we
> can be a bit weird with the argument format, though.

We tend to prefer quote.c::sq_quote*() suite of quoting; does this
codepath have very different constraints that require different
encoding?

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-28  1:03       ` Jonathan Tan
@ 2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                             ` (15 more replies)
  0 siblings, 16 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17  1:48 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Thu, Jan 28 2021, Jonathan Tan wrote:

Sorry I managed to miss this at the time. Hopefully a late reply is
better than never.

>> On Sun, Jan 24 2021, Jonathan Tan wrote:
>> 
>> > +void register_found_gitmodules(const struct object_id *oid)
>> > +{
>> > +	oidset_insert(&gitmodules_found, oid);
>> > +}
>> > +
>> 
>> In fsck.c we only use this variable to insert into it, or in fsck_blob()
>> to do the actual check, but then we either abort early if we've found
>> it, or right after that:
>
> By "this variable", do you mean gitmodules_found? fsck_finish() consumes
> it.

Yes, consumes it to emit errors with report(), no?

>>         if (object_on_skiplist(options, oid))
>>                 return 0;
>> 
>> So (along with comments I have below...) you could just use the existing
>> "skiplist" option instead, no?
>
> I don't understand this part (in particular, the part you quoted). About
> "skiplist", I'll reply to your other email [1] which has more details.
>
> [1] https://lore.kernel.org/git/87czxu7c15.fsf@evledraar.gmail.com/

*nod*

>> This whole thing seems just like the bad path I took in earlier rounds
>> of my in-flight mktag series. You don't need this new custom API. You
>> just setup an error handler for your fsck which ignores / prints / logs
>> / whatever the OIDs you want if you get a FSCK_MSG_GITMODULES_MISSING
>> error, which you then "return 0" on.
>> 
>> If you don't have FSCK_MSG_GITMODULES_MISSING punt and call
>> fsck_error_function().
>
> I tried that first, and the issue is that IDs like
> FSCK_MSG_GITMODULES_MISSING are internal to fsck.c. As for whether we
> should start exposing the IDs publicly, I think we should wait until a
> few new cases like this come up, so that we more fully understand the
> requirements first.

The requirement is that you want the objects ids we'd otherwise error
about in fsck_finish(). Yeah we don't pass the "fsck_msg_id" down in the
"report()" function, but you can reliably strstr() it out of the
message. We document & hard rely on that already, since it's also a
config key.

But yeah, we could just change the report function to pass down the id
and move the relevant macros from fsck.c to fsck.h. I think that would
be a smaller change conceptually than a special-case flag in
fsck_options for something we could otherwise do with the error
reporting.


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-28  1:15       ` Jonathan Tan
@ 2021-02-17  2:10         ` Ævar Arnfjörð Bjarmason
  2021-02-17 20:10           ` Jonathan Tan
  0 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17  2:10 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Thu, Jan 28 2021, Jonathan Tan wrote:

>> On Sun, Jan 24 2021, Jonathan Tan wrote:
>> >  --fsck-objects::
>> > -	Die if the pack contains broken objects. For internal use only.
>> > +	For internal use only.
>> > ++
>> > +Die if the pack contains broken objects. If the pack contains a tree
>> > +pointing to a .gitmodules blob that does not exist, prints the hash of
>> > +that blob (for the caller to check) after the hash that goes into the
>> > +name of the pack/idx file (see "Notes").
>> 
>> [I should have waited a bit and sent one E-Mail]
>> 
>> Is this really generally usable as an IPC mechanism, what if we need
>> another set of OIDs we care about? Shouldn't it at least be hidden
>> behind some option so you don't get a deluge of output from index-pack
>> if you're not in this packfile-uri mode?
>
> --fsck-objects is only for internal use, and it's only used by
> fetch-pack.c. So its only consumer does want the output.
>
> Junio also mentioned the possibility of another set of OIDs, and I
> replied [1].
>
> [1] https://lore.kernel.org/git/20210128003536.3874866-1-jonathantanmy@google.com/
>
>> But, along with my other E-Mail...
>> 
>> > [...]
>> > +static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
>> > +{
>> > +	int len = the_hash_algo->hexsz + 1; /* hash + NL */
>> > +
>> > +	do {
>> > +		char hex_hash[GIT_MAX_HEXSZ + 1];
>> > +		int read_len = read_in_full(fd, hex_hash, len);
>> > +		struct object_id oid;
>> > +		const char *end;
>> > +
>> > +		if (!read_len)
>> > +			return;
>> > +		if (read_len != len)
>> > +			die("invalid length read %d", read_len);
>> > +		if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
>> > +			die("invalid hash");
>> > +		oidset_insert(gitmodules_oids, &oid);
>> > +	} while (1);
>> > +}
>> > +
>> 
>> Doesn't this IPC mechanism already exist in the form of fsck.skipList?
>> See my 1f3299fda9 (fsck: make fsck_config() re-usable, 2021-01-05) on
>> "next". I.e. as noted in my just-sent-E-Mail you could probably just
>> re-use skiplist as-is.
>
> I'm not sure how fsck.skipList could be used here. Before running
> fsck_finish() for the first time, we don't know which .gitmodules are
> missing and which are not. And when running fsck_finish() for the second
> time, we definitely do not want to skip any blobs.
>
>> Or if not it seems to me that this whole IPC mechanism would be better
>> done with a tempfile and passing it along like we already pass the
>> fsck.skipList between these processes.
>> 
>> I doubt it's going to be large enough to matter, we could just put it in
>> .git/ somewhere, like we put gc.log etc (but created with a mktemp()
>> name...).
>> 
>> Or if we want to keep the "print <list> | process" model we can refactor
>> the existing fsck IPC noted in 1f3299fda9 a bit, so e.g. you pass some
>> version of "lines prefixed with "fsck-skiplist: " go into list xyz via a
>> command-line option. And then existing option(s) and your potential new
>> list (which as noted, I think is probably redundant to the skiplist) can
>> use it.
>
> I think using stdout is superior to using a tempfile - we don't have to
> worry about interrupted invocations, for example.
>
> What do you mean by "the existing fsck IPC noted in 1f3299fda9"? If you
> mean the ability to pass a list of OIDs, for example using "-c
> fsck.skipList=filename.txt", I'm not sure that it solves anything.
> Firstly, I don't think that the skipList is useful here (as I said
> earlier). And secondly, I don't think that OID input is the issue -
> right now, the design is a process (index-pack, calling fsck_finish())
> writing to its output which is then picked up by the calling process
> (fetch-pack). We are not sending the dangling .gitmodules through stdin
> anywhere.

Sorry for being unclear here. I don't think (honestly I don't remember,
it's been almost a month) that I meant to you should use the skipList.

Looking at that code again we use object_on_skiplist() to do an early
punt in report(), but also fsck_blob(), presumably you never want the
latter, and that early punting wouldn't be needed if your report()
function intercepted the modules blob id for stashing it away / later
reporting / whatever.

So yeah, I'm 99% sure now that's not what I meant :)

What I meant with:

    Or if we want to keep the "print <list> | process"[...]

Is that we have an existing ad-hoc IPC model for these commands in
passing along the skipList, which is made more complex because sometimes
the initial process reads the file, sometimes it passes it along as-is
to the child.

And then there's this patch that passes OIDs too, but through a
different mechanism.

I was suggesting that perhaps it made more sense to refactor both so
they could use the same mechanism, because we're potentially passing two
lists of OIDs between the two. Just one goes via line-at-a-time in the
output, the other via a config option on the command-line.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
                       ` (2 preceding siblings ...)
  2021-01-24 12:30     ` Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:27     ` Ævar Arnfjörð Bjarmason
  2021-02-17 20:11       ` Jonathan Tan
  3 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:27 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Sun, Jan 24 2021, Jonathan Tan wrote:

> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 557bd2f348..f995c15115 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -1888,8 +1888,13 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
>  	else
>  		close(input_fd);
>  
> -	if (do_fsck_object && fsck_finish(&fsck_options))
> -		die(_("fsck error in pack objects"));
> +	if (do_fsck_object) {
> +		struct fsck_options fo = FSCK_OPTIONS_STRICT;
> +
> +		fo.print_dangling_gitmodules = 1;
> +		if (fsck_finish(&fo))
> +			die(_("fsck error in pack objects"));
> +	}
> [...]
> +static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
> +{
> +	struct oidset_iter iter;
> +	const struct object_id *oid;
> +	struct fsck_options fo = FSCK_OPTIONS_STRICT;
> +
> +	if (!oidset_size(gitmodules_oids))
> +		return;
> +
> +	oidset_iter_init(gitmodules_oids, &iter);
> +	while ((oid = oidset_iter_next(&iter)))
> +		register_found_gitmodules(oid);
> +	if (fsck_finish(&fo))
> +		die("fsck failed");
> +}
> +

What's the need for STRICT here & can't the former use the existing
fsck_options in index-pack.c? With this on top we pass all tests:

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 18531199242..5464edf4778 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1933,10 +1933,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		close(input_fd);
 
 	if (do_fsck_object) {
-		struct fsck_options fo = FSCK_OPTIONS_STRICT;
-
-		fo.print_dangling_gitmodules = 1;
-		if (fsck_finish(&fo))
+		fsck_options.print_dangling_gitmodules = 1;
+		if (fsck_finish(&fsck_options))
 			die(_("fsck error in pack objects"));
 	}
 
diff --git a/fetch-pack.c b/fetch-pack.c
index 0a337a04f1f..a8754d97e3d 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -997,7 +997,7 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 {
 	struct oidset_iter iter;
 	const struct object_id *oid;
-	struct fsck_options fo = FSCK_OPTIONS_STRICT;
+	struct fsck_options fo = FSCK_OPTIONS_DEFAULT;
 
 	if (!oidset_size(gitmodules_oids))
 		return;

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 00/14] fsck: API improvements
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 21:02             ` Junio C Hamano
                               ` (11 more replies)
  2021-02-17 19:42           ` [PATCH 01/14] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
                             ` (14 subsequent siblings)
  15 siblings, 12 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Jonathan Tan pointed out that the fsck error_func doesn't pass you the
ID of the fsck failure in [1]. This series improves the API so it
does, and moves the gitmodules_{found,done} variables into the
fsck_options struct.

The result is that instead of the "print_dangling_gitmodules" member
in that series we can just implement that with the diff at the end of
this cover letter (goes on top of a merge of this series & "seen"),
and without any changes to fsck_finish().

This conflicts with other in-flight fsck changes but the conflict is
rather trivial. Jeff King has another concurrent series to add a
couple of new fsck checks, those need to be moved to fsck.h, and
there's another trivial conflict in 2 hunks due to the
gitmodules_{found,done} move.

1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

Ævar Arnfjörð Bjarmason (14):
  fsck.h: indent arguments to of fsck_set_msg_type
  fsck.h: use use "enum object_type" instead of "int"
  fsck.c: rename variables in fsck_set_msg_type() for less confusion
  fsck.c: move definition of msg_id into append_msg_id()
  fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  fsck.c: undefine temporary STR macro after use
  fsck.c: give "FOREACH_MSG_ID" a more specific name
  fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  fsck.c: pass along the fsck_msg_id in the fsck_error callback
  fsck.c: add an fsck_set_msg_type() API that takes enums
  fsck.h: update FSCK_OPTIONS_* for object_name
  fsck.c: move gitmodules_{found,done} into fsck_options

 builtin/fsck.c           |   7 +-
 builtin/index-pack.c     |   3 +-
 builtin/mktag.c          |   7 +-
 builtin/unpack-objects.c |   3 +-
 fsck.c                   | 160 ++++++++++++---------------------------
 fsck.h                   |  98 +++++++++++++++++++++---
 6 files changed, 152 insertions(+), 126 deletions(-)

-- 

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 82f381f854..22dfcfc5de 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1713,6 +1713,20 @@ static void show_pack_info(int stat_only)
 	}
 }
 
+static int index_pack_fsck_error_func(struct fsck_options *o,
+				      const struct object_id *oid,
+				      enum object_type object_type,
+				      enum fsck_msg_type msg_type,
+				      enum fsck_msg_id msg_id,
+				      const char *message)
+{
+	if (msg_id == FSCK_MSG_GITMODULES_MISSING) {
+		puts(oid_to_hex(oid));
+		return 0;
+	}
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
+}
+
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, rev_index;
@@ -1934,10 +1948,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 		close(input_fd);
 
 	if (do_fsck_object) {
-		struct fsck_options fo = FSCK_OPTIONS_STRICT;
-
-		fo.print_dangling_gitmodules = 1;
-		if (fsck_finish(&fo))
+		fsck_options.error_func = index_pack_fsck_error_func;
+		if (fsck_finish(&fsck_options))
 			die(_("fsck error in pack objects"));
 	}
 
diff --git a/fetch-pack.c b/fetch-pack.c
index 0a337a04f1..9fc2ce86e4 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -40,6 +40,7 @@ static struct shallow_lock shallow_lock;
 static const char *alternate_shallow_file;
 static struct strbuf fsck_msg_types = STRBUF_INIT;
 static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
+static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 
 /* Remember to update object flag allocation in object.h */
 #define COMPLETE	(1U << 0)
@@ -993,19 +994,34 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
 	return strcmp(a->name, b->name);
 }
 
+static int fetch_pack_fsck_error_func(struct fsck_options *o,
+				      const struct object_id *oid,
+				      enum object_type object_type,
+				      enum fsck_msg_type msg_type,
+				      enum fsck_msg_id msg_id,
+				      const char *message)
+{
+	if (msg_id == FSCK_MSG_GITMODULES_MISSING) {
+		puts(oid_to_hex(oid));
+		return 0;
+	}
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
+}
+
 static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 {
 	struct oidset_iter iter;
 	const struct object_id *oid;
-	struct fsck_options fo = FSCK_OPTIONS_STRICT;
 
 	if (!oidset_size(gitmodules_oids))
 		return;
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(oid);
-	if (fsck_finish(&fo))
+		oidset_insert(&fsck_options.gitmodules_found, oid);
+
+	fsck_options.error_func = fetch_pack_fsck_error_func;
+	if (fsck_finish(&fsck_options))
 		die("fsck failed");
 }
 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 01/14] fsck.h: indent arguments to of fsck_set_msg_type
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
                             ` (13 subsequent siblings)
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fsck.h b/fsck.h
index 423c467feb7..df0b64a2163 100644
--- a/fsck.h
+++ b/fsck.h
@@ -11,7 +11,7 @@ struct fsck_options;
 struct object;
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type);
+		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
 int is_valid_msg_type(const char *msg_id, const char *msg_type);
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int"
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 01/14] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 23:40             ` Junio C Hamano
  2021-02-17 19:42           ` [PATCH 03/14] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
                             ` (12 subsequent siblings)
  15 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the fsck_walk_func to use an "enum object_type" instead of an
"int" type. The types are compatible, and ever since this was added in
355885d5315 (add generic, type aware object chain walker, 2008-02-25)
we've used entries from object_type (OBJ_BLOB etc.).

So this doesn't really change anything as far as the generated code is
concerned, it just gives the compiler more information and makes this
easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c           | 3 ++-
 builtin/index-pack.c     | 3 ++-
 builtin/unpack-objects.c | 3 ++-
 fsck.h                   | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c70..68f0329e69e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -197,7 +197,8 @@ static int traverse_reachable(void)
 	return !!result;
 }
 
-static int mark_used(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_used(struct object *obj, enum object_type object_type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return 1;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 54f74c48741..2f291a14d4a 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -212,7 +212,8 @@ static void cleanup_thread(void)
 	free(thread_data);
 }
 
-static int mark_link(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_link(struct object *obj, enum object_type type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return -1;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index dd4a75e030d..ca54fd16688 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -187,7 +187,8 @@ static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
  * that have reachability requirements and calls this function.
  * Verify its reachability and validity recursively and write it out.
  */
-static int check_object(struct object *obj, int type, void *data, struct fsck_options *options)
+static int check_object(struct object *obj, enum object_type type,
+			void *data, struct fsck_options *options)
 {
 	struct obj_buffer *obj_buf;
 
diff --git a/fsck.h b/fsck.h
index df0b64a2163..0c75789d219 100644
--- a/fsck.h
+++ b/fsck.h
@@ -23,7 +23,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type);
  *     <0	error signaled and abort
  *     >0	error signaled and do not abort
  */
-typedef int (*fsck_walk_func)(struct object *obj, int type, void *data, struct fsck_options *options);
+typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
+			      void *data, struct fsck_options *options);
 
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 03/14] fsck.c: rename variables in fsck_set_msg_type() for less confusion
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (2 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 04/14] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
                             ` (11 subsequent siblings)
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename variables in a function added in 0282f4dced0 (fsck: offer a
function to demote fsck errors to warnings, 2015-06-22).

It was needlessly confusing that it took a "msg_type" argument, but
then later declared another "msg_type" of a different type.

Let's rename that to "tmp", and rename "id" to "msg_id" and "msg_id"
to "msg_id_str" etc. This will make a follow-up change smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index 4b7f0b73d73..acccad243ec 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,27 +203,27 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 }
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type)
+		const char *msg_id_str, const char *msg_type_str)
 {
-	int id = parse_msg_id(msg_id), type;
+	int msg_id = parse_msg_id(msg_id_str), msg_type;
 
-	if (id < 0)
-		die("Unhandled message id: %s", msg_id);
-	type = parse_msg_type(msg_type);
+	if (msg_id < 0)
+		die("Unhandled message id: %s", msg_id_str);
+	msg_type = parse_msg_type(msg_type_str);
 
-	if (type != FSCK_ERROR && msg_id_info[id].msg_type == FSCK_FATAL)
-		die("Cannot demote %s to %s", msg_id, msg_type);
+	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
+		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
 	if (!options->msg_type) {
 		int i;
-		int *msg_type;
-		ALLOC_ARRAY(msg_type, FSCK_MSG_MAX);
+		int *tmp;
+		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
-			msg_type[i] = fsck_msg_type(i, options);
-		options->msg_type = msg_type;
+			tmp[i] = fsck_msg_type(i, options);
+		options->msg_type = tmp;
 	}
 
-	options->msg_type[id] = type;
+	options->msg_type[msg_id] = msg_type;
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 04/14] fsck.c: move definition of msg_id into append_msg_id()
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (3 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 03/14] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 05/14] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
                             ` (10 subsequent siblings)
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor code added in 71ab8fa840f (fsck: report the ID of the
error/warning, 2015-06-22) to resolve the msg_id to a string in the
function that wants it, instead of doing it in report().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index acccad243ec..1070071ffec 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,8 +264,9 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, const char *msg_id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
 {
+	const char *msg_id = msg_id_info[id].id_string;
 	for (;;) {
 		char c = *(msg_id)++;
 
@@ -308,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, msg_id_info[id].id_string);
+	append_msg_id(&sb, id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 05/14] fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (4 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 04/14] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 06/14] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
                             ` (9 subsequent siblings)
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the remaining variables of type fsck_msg_id from "id" to
"msg_id". This change is relatively small, and is worth the churn for
a later change where we have different id's in the "report" function.
---
 fsck.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fsck.c b/fsck.c
index 1070071ffec..dbb6f7c4ee2 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,19 +264,19 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id msg_id)
 {
-	const char *msg_id = msg_id_info[id].id_string;
+	const char *msg_id_str = msg_id_info[msg_id].id_string;
 	for (;;) {
-		char c = *(msg_id)++;
+		char c = *(msg_id_str)++;
 
 		if (!c)
 			break;
 		if (c != '_')
 			strbuf_addch(sb, tolower(c));
 		else {
-			assert(*msg_id);
-			strbuf_addch(sb, *(msg_id)++);
+			assert(*msg_id_str);
+			strbuf_addch(sb, *(msg_id_str)++);
 		}
 	}
 
@@ -292,11 +292,11 @@ static int object_on_skiplist(struct fsck_options *opts,
 __attribute__((format (printf, 5, 6)))
 static int report(struct fsck_options *options,
 		  const struct object_id *oid, enum object_type object_type,
-		  enum fsck_msg_id id, const char *fmt, ...)
+		  enum fsck_msg_id msg_id, const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(id, options), result;
+	int msg_type = fsck_msg_type(msg_id, options), result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -309,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, id);
+	append_msg_id(&sb, msg_id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 06/14] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (5 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 05/14] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 07/14] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
                             ` (8 subsequent siblings)
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
fsck_msg_type enum.

These defines were originally introduced in:

 - ba002f3b28a (builtin-fsck: move common object checking code to
   fsck.c, 2008-02-25)
 - f50c4407305 (fsck: disallow demoting grave fsck errors to warnings,
   2015-06-22)
 - efaba7cc77f (fsck: optionally ignore specific fsck issues
   completely, 2015-06-22)
 - f27d05b1704 (fsck: allow upgrading fsck warnings to errors,
   2015-06-22)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  2 +-
 builtin/mktag.c |  3 ++-
 fsck.c          | 21 ++++++++++-----------
 fsck.h          | 17 ++++++++++-------
 4 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 68f0329e69e..d6d745dc702 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -89,7 +89,7 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   int msg_type, const char *message)
+			   enum fsck_msg_type msg_type, const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 41a399a69e4..1834394a9b6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -22,7 +22,8 @@ static int mktag_config(const char *var, const char *value, void *cb)
 static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
-				 int msg_type, const char *message)
+				 enum fsck_msg_type msg_type,
+				 const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/fsck.c b/fsck.c
index dbb6f7c4ee2..00e0fef21ca 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,9 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FSCK_FATAL -1
-#define FSCK_INFO -2
-
 #define FOREACH_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
@@ -97,7 +94,7 @@ static struct {
 	const char *id_string;
 	const char *downcased;
 	const char *camelcased;
-	int msg_type;
+	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
 	FOREACH_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
@@ -164,10 +161,10 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 		list_config_item(list, prefix, msg_id_info[i].camelcased);
 }
 
-static int fsck_msg_type(enum fsck_msg_id msg_id,
+static enum fsck_msg_type fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
-	int msg_type;
+	enum fsck_msg_type msg_type;
 
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
@@ -182,7 +179,7 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 	return msg_type;
 }
 
-static int parse_msg_type(const char *str)
+static enum fsck_msg_type parse_msg_type(const char *str)
 {
 	if (!strcmp(str, "error"))
 		return FSCK_ERROR;
@@ -205,7 +202,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
-	int msg_id = parse_msg_id(msg_id_str), msg_type;
+	int msg_id = parse_msg_id(msg_id_str);
+	enum fsck_msg_type msg_type;
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
@@ -216,7 +214,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 
 	if (!options->msg_type) {
 		int i;
-		int *tmp;
+		enum fsck_msg_type *tmp;
 		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
 			tmp[i] = fsck_msg_type(i, options);
@@ -296,7 +294,8 @@ static int report(struct fsck_options *options,
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(msg_id, options), result;
+	enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);
+	int result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -1262,7 +1261,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			int msg_type, const char *message)
+			enum fsck_msg_type msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index 0c75789d219..c77e8ddf10b 100644
--- a/fsck.h
+++ b/fsck.h
@@ -3,10 +3,13 @@
 
 #include "oidset.h"
 
-#define FSCK_ERROR 1
-#define FSCK_WARN 2
-#define FSCK_IGNORE 3
-
+enum fsck_msg_type {
+	FSCK_INFO = -2,
+	FSCK_FATAL = -1,
+	FSCK_ERROR = 1,
+	FSCK_WARN,
+	FSCK_IGNORE
+};
 struct fsck_options;
 struct object;
 
@@ -29,17 +32,17 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  int msg_type, const char *message);
+			  enum fsck_msg_type msg_type, const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			int msg_type, const char *message);
+			enum fsck_msg_type msg_type, const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
 	fsck_error error_func;
 	unsigned strict:1;
-	int *msg_type;
+	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
 	kh_oid_map_t *object_names;
 };
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 07/14] fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (6 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 06/14] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 08/14] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
                             ` (7 subsequent siblings)
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

There's no reason to defer the calling of parse_msg_type() until after
we've checked if the "id < 0". This is not a hot codepath, and
parse_msg_type() itself may die on invalid input.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 00e0fef21ca..7c53080ad48 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,11 +203,10 @@ void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
 	int msg_id = parse_msg_id(msg_id_str);
-	enum fsck_msg_type msg_type;
+	enum fsck_msg_type msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
-	msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 08/14] fsck.c: undefine temporary STR macro after use
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (7 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 07/14] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 09/14] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
                             ` (6 subsequent siblings)
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

In f417eed8cde (fsck: provide a function to parse fsck message IDs,
2015-06-22) the "STR" macro was introduced, but that short macro name
was not undefined after use as was done earlier in the same series for
the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
messages, 2015-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsck.c b/fsck.c
index 7c53080ad48..88884e91c89 100644
--- a/fsck.c
+++ b/fsck.c
@@ -100,6 +100,7 @@ static struct {
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
+#undef STR
 
 static void prepare_msg_ids(void)
 {
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 09/14] fsck.c: give "FOREACH_MSG_ID" a more specific name
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (8 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 08/14] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 10/14] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
                             ` (5 subsequent siblings)
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
for moving it over to fsck.h. It's good convention to name macros
in *.h files in such a way as to clearly not clash with any other
names in other files.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fsck.c b/fsck.c
index 88884e91c89..1730acd698d 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,7 +22,7 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_MSG_ID(FUNC) \
+#define FOREACH_FSCK_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
 	FUNC(UNTERMINATED_HEADER, FATAL) \
@@ -83,7 +83,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
 enum fsck_msg_id {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	FSCK_MSG_MAX
 };
 #undef MSG_ID
@@ -96,7 +96,7 @@ static struct {
 	const char *camelcased;
 	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 10/14] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (9 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 09/14] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 11/14] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
                             ` (4 subsequent siblings)
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the FOREACH_FSCK_MSG_ID macro and the fsck_msg_id enum it helps
define from fsck.c to fsck.h. This is in preparation for having
non-static functions take the fsck_msg_id as an argument.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 66 ---------------------------------------------------------
 fsck.h | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+), 66 deletions(-)

diff --git a/fsck.c b/fsck.c
index 1730acd698d..980ef2cb8fa 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,72 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_FSCK_MSG_ID(FUNC) \
-	/* fatal errors */ \
-	FUNC(NUL_IN_HEADER, FATAL) \
-	FUNC(UNTERMINATED_HEADER, FATAL) \
-	/* errors */ \
-	FUNC(BAD_DATE, ERROR) \
-	FUNC(BAD_DATE_OVERFLOW, ERROR) \
-	FUNC(BAD_EMAIL, ERROR) \
-	FUNC(BAD_NAME, ERROR) \
-	FUNC(BAD_OBJECT_SHA1, ERROR) \
-	FUNC(BAD_PARENT_SHA1, ERROR) \
-	FUNC(BAD_TAG_OBJECT, ERROR) \
-	FUNC(BAD_TIMEZONE, ERROR) \
-	FUNC(BAD_TREE, ERROR) \
-	FUNC(BAD_TREE_SHA1, ERROR) \
-	FUNC(BAD_TYPE, ERROR) \
-	FUNC(DUPLICATE_ENTRIES, ERROR) \
-	FUNC(MISSING_AUTHOR, ERROR) \
-	FUNC(MISSING_COMMITTER, ERROR) \
-	FUNC(MISSING_EMAIL, ERROR) \
-	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_OBJECT, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_TAG, ERROR) \
-	FUNC(MISSING_TAG_ENTRY, ERROR) \
-	FUNC(MISSING_TREE, ERROR) \
-	FUNC(MISSING_TREE_OBJECT, ERROR) \
-	FUNC(MISSING_TYPE, ERROR) \
-	FUNC(MISSING_TYPE_ENTRY, ERROR) \
-	FUNC(MULTIPLE_AUTHORS, ERROR) \
-	FUNC(TREE_NOT_SORTED, ERROR) \
-	FUNC(UNKNOWN_TYPE, ERROR) \
-	FUNC(ZERO_PADDED_DATE, ERROR) \
-	FUNC(GITMODULES_MISSING, ERROR) \
-	FUNC(GITMODULES_BLOB, ERROR) \
-	FUNC(GITMODULES_LARGE, ERROR) \
-	FUNC(GITMODULES_NAME, ERROR) \
-	FUNC(GITMODULES_SYMLINK, ERROR) \
-	FUNC(GITMODULES_URL, ERROR) \
-	FUNC(GITMODULES_PATH, ERROR) \
-	FUNC(GITMODULES_UPDATE, ERROR) \
-	/* warnings */ \
-	FUNC(BAD_FILEMODE, WARN) \
-	FUNC(EMPTY_NAME, WARN) \
-	FUNC(FULL_PATHNAME, WARN) \
-	FUNC(HAS_DOT, WARN) \
-	FUNC(HAS_DOTDOT, WARN) \
-	FUNC(HAS_DOTGIT, WARN) \
-	FUNC(NULL_SHA1, WARN) \
-	FUNC(ZERO_PADDED_FILEMODE, WARN) \
-	FUNC(NUL_IN_COMMIT, WARN) \
-	/* infos (reported as warnings, but ignored by default) */ \
-	FUNC(GITMODULES_PARSE, INFO) \
-	FUNC(BAD_TAG_NAME, INFO) \
-	FUNC(MISSING_TAGGER_ENTRY, INFO) \
-	/* ignored (elevated when requested) */ \
-	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
-
-#define MSG_ID(id, msg_type) FSCK_MSG_##id,
-enum fsck_msg_id {
-	FOREACH_FSCK_MSG_ID(MSG_ID)
-	FSCK_MSG_MAX
-};
-#undef MSG_ID
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
diff --git a/fsck.h b/fsck.h
index c77e8ddf10b..b4c53aaa08c 100644
--- a/fsck.h
+++ b/fsck.h
@@ -10,6 +10,73 @@ enum fsck_msg_type {
 	FSCK_WARN,
 	FSCK_IGNORE
 };
+
+#define FOREACH_FSCK_MSG_ID(FUNC) \
+	/* fatal errors */ \
+	FUNC(NUL_IN_HEADER, FATAL) \
+	FUNC(UNTERMINATED_HEADER, FATAL) \
+	/* errors */ \
+	FUNC(BAD_DATE, ERROR) \
+	FUNC(BAD_DATE_OVERFLOW, ERROR) \
+	FUNC(BAD_EMAIL, ERROR) \
+	FUNC(BAD_NAME, ERROR) \
+	FUNC(BAD_OBJECT_SHA1, ERROR) \
+	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_TAG_OBJECT, ERROR) \
+	FUNC(BAD_TIMEZONE, ERROR) \
+	FUNC(BAD_TREE, ERROR) \
+	FUNC(BAD_TREE_SHA1, ERROR) \
+	FUNC(BAD_TYPE, ERROR) \
+	FUNC(DUPLICATE_ENTRIES, ERROR) \
+	FUNC(MISSING_AUTHOR, ERROR) \
+	FUNC(MISSING_COMMITTER, ERROR) \
+	FUNC(MISSING_EMAIL, ERROR) \
+	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_OBJECT, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_TAG, ERROR) \
+	FUNC(MISSING_TAG_ENTRY, ERROR) \
+	FUNC(MISSING_TREE, ERROR) \
+	FUNC(MISSING_TREE_OBJECT, ERROR) \
+	FUNC(MISSING_TYPE, ERROR) \
+	FUNC(MISSING_TYPE_ENTRY, ERROR) \
+	FUNC(MULTIPLE_AUTHORS, ERROR) \
+	FUNC(TREE_NOT_SORTED, ERROR) \
+	FUNC(UNKNOWN_TYPE, ERROR) \
+	FUNC(ZERO_PADDED_DATE, ERROR) \
+	FUNC(GITMODULES_MISSING, ERROR) \
+	FUNC(GITMODULES_BLOB, ERROR) \
+	FUNC(GITMODULES_LARGE, ERROR) \
+	FUNC(GITMODULES_NAME, ERROR) \
+	FUNC(GITMODULES_SYMLINK, ERROR) \
+	FUNC(GITMODULES_URL, ERROR) \
+	FUNC(GITMODULES_PATH, ERROR) \
+	FUNC(GITMODULES_UPDATE, ERROR) \
+	/* warnings */ \
+	FUNC(BAD_FILEMODE, WARN) \
+	FUNC(EMPTY_NAME, WARN) \
+	FUNC(FULL_PATHNAME, WARN) \
+	FUNC(HAS_DOT, WARN) \
+	FUNC(HAS_DOTDOT, WARN) \
+	FUNC(HAS_DOTGIT, WARN) \
+	FUNC(NULL_SHA1, WARN) \
+	FUNC(ZERO_PADDED_FILEMODE, WARN) \
+	FUNC(NUL_IN_COMMIT, WARN) \
+	/* infos (reported as warnings, but ignored by default) */ \
+	FUNC(GITMODULES_PARSE, INFO) \
+	FUNC(BAD_TAG_NAME, INFO) \
+	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	/* ignored (elevated when requested) */ \
+	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
+
+#define MSG_ID(id, msg_type) FSCK_MSG_##id,
+enum fsck_msg_id {
+	FOREACH_FSCK_MSG_ID(MSG_ID)
+	FSCK_MSG_MAX
+};
+#undef MSG_ID
+
 struct fsck_options;
 struct object;
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 11/14] fsck.c: pass along the fsck_msg_id in the fsck_error callback
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (10 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 10/14] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 12/14] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
                             ` (3 subsequent siblings)
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the fsck_error callback to also pass along the
fsck_msg_id. Before this change the only way to get the message id was
to parse it back out of the "message".

Let's pass it down explicitly for the benefit of callers that might
want to use it, as discussed in [1].

Passing the msg_type is now redundant, as you can always get it back
from the msg_id, but I'm not changing that convention. It's really
common to need the msg_type, and the report() function itself (which
calls "fsck_error") needs to call fsck_msg_type() to discover
it. Let's not needlessly re-do that work in the user callback.

1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 4 +++-
 builtin/mktag.c | 1 +
 fsck.c          | 6 ++++--
 fsck.h          | 6 ++++--
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d6d745dc702..b71fac4ceca 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -89,7 +89,9 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   enum fsck_msg_type msg_type, const char *message)
+			   enum fsck_msg_type msg_type,
+			   enum fsck_msg_id msg_id,
+			   const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 1834394a9b6..dc989c356f5 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -23,6 +23,7 @@ static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
 				 enum fsck_msg_type msg_type,
+				 enum fsck_msg_id msg_id,
 				 const char *message)
 {
 	switch (msg_type) {
diff --git a/fsck.c b/fsck.c
index 980ef2cb8fa..007f02b556a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -247,7 +247,7 @@ static int report(struct fsck_options *options,
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
 	result = options->error_func(options, oid, object_type,
-				     msg_type, sb.buf);
+				     msg_type, msg_id, sb.buf);
 	strbuf_release(&sb);
 	va_end(ap);
 
@@ -1195,7 +1195,9 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message)
+			enum fsck_msg_type msg_type,
+			enum fsck_msg_id msg_id,
+			const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index b4c53aaa08c..56536d7f29e 100644
--- a/fsck.h
+++ b/fsck.h
@@ -99,11 +99,13 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  enum fsck_msg_type msg_type, const char *message);
+			  enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			  const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message);
+			enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 12/14] fsck.c: add an fsck_set_msg_type() API that takes enums
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (11 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 11/14] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 13/14] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
                             ` (2 subsequent siblings)
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change code I added in acf9de4c94e (mktag: use fsck instead of custom
verify_tag(), 2021-01-05) to make use of a new API function that takes
the fsck_msg_{id,type} types, instead of arbitrary strings that
we'll (hopefully) parse into those types.

At the time that the fsck_set_msg_type() API was introduced in
0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) it was only intended to be used to parse user-supplied
data.

For things that are purely internal to the C code it makes sense to
have the compiler check these arguments, and to skip the sanity
checking of the data in fsck_set_msg_type() which is redundant to
checks we get from the compiler.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/mktag.c |  3 ++-
 fsck.c          | 27 +++++++++++++++++----------
 fsck.h          |  3 +++
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/builtin/mktag.c b/builtin/mktag.c
index dc989c356f5..de67a94f24e 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -93,7 +93,8 @@ int cmd_mktag(int argc, const char **argv, const char *prefix)
 		die_errno(_("could not read from stdin"));
 
 	fsck_options.error_func = mktag_fsck_error_func;
-	fsck_set_msg_type(&fsck_options, "extraheaderentry", "warn");
+	fsck_set_msg_type_from_ids(&fsck_options, FSCK_MSG_EXTRA_HEADER_ENTRY,
+				   FSCK_WARN);
 	/* config might set fsck.extraHeaderEntry=* again */
 	git_config(mktag_config, NULL);
 	if (fsck_tag_standalone(NULL, buf.buf, buf.len, &fsck_options,
diff --git a/fsck.c b/fsck.c
index 007f02b556a..54632404de5 100644
--- a/fsck.c
+++ b/fsck.c
@@ -134,6 +134,22 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 	return 1;
 }
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type)
+{
+	if (!options->msg_type) {
+		int i;
+		enum fsck_msg_type *tmp;
+		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
+		for (i = 0; i < FSCK_MSG_MAX; i++)
+			tmp[i] = fsck_msg_type(i, options);
+		options->msg_type = tmp;
+	}
+
+	options->msg_type[msg_id] = msg_type;
+}
+
 void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
@@ -146,16 +162,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
-	if (!options->msg_type) {
-		int i;
-		enum fsck_msg_type *tmp;
-		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
-		for (i = 0; i < FSCK_MSG_MAX; i++)
-			tmp[i] = fsck_msg_type(i, options);
-		options->msg_type = tmp;
-	}
-
-	options->msg_type[msg_id] = msg_type;
+	fsck_set_msg_type_from_ids(options, msg_id, msg_type);
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
diff --git a/fsck.h b/fsck.h
index 56536d7f29e..af145bb4596 100644
--- a/fsck.h
+++ b/fsck.h
@@ -80,6 +80,9 @@ enum fsck_msg_id {
 struct fsck_options;
 struct object;
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type);
 void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 13/14] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (12 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 12/14] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 19:42           ` [PATCH 14/14] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
  2021-02-17 20:05           ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Add the object_name member to the initialization macro. This was
omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
go, 2016-07-17) when the field was added.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fsck.h b/fsck.h
index af145bb4596..28137a77df0 100644
--- a/fsck.h
+++ b/fsck.h
@@ -119,8 +119,8 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH 14/14] fsck.c: move gitmodules_{found,done} into fsck_options
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (13 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 13/14] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
@ 2021-02-17 19:42           ` Ævar Arnfjörð Bjarmason
  2021-02-17 20:05           ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
  15 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-17 19:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the gitmodules_{found,done} static variables added in
159e7b080bf (fsck: detect gitmodules files, 2018-05-02) into the
fsck_options struct. It makes sense to keep all the context in the
same place.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 19 ++++++++-----------
 fsck.h |  6 ++++--
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/fsck.c b/fsck.c
index 54632404de5..f344b6be3d3 100644
--- a/fsck.c
+++ b/fsck.c
@@ -19,9 +19,6 @@
 #include "credential.h"
 #include "help.h"
 
-static struct oidset gitmodules_found = OIDSET_INIT;
-static struct oidset gitmodules_done = OIDSET_INIT;
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
@@ -621,7 +618,7 @@ static int fsck_tree(const struct object_id *oid,
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
-				oidset_insert(&gitmodules_found, oid);
+				oidset_insert(&options->gitmodules_found, oid);
 			else
 				retval += report(options,
 						 oid, OBJ_TREE,
@@ -635,7 +632,7 @@ static int fsck_tree(const struct object_id *oid,
 				has_dotgit |= is_ntfs_dotgit(backslash);
 				if (is_ntfs_dotgitmodules(backslash)) {
 					if (!S_ISLNK(mode))
-						oidset_insert(&gitmodules_found, oid);
+						oidset_insert(&options->gitmodules_found, oid);
 					else
 						retval += report(options, oid, OBJ_TREE,
 								 FSCK_MSG_GITMODULES_SYMLINK,
@@ -1147,9 +1144,9 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 	struct fsck_gitmodules_data data;
 	struct config_options config_opts = { 0 };
 
-	if (!oidset_contains(&gitmodules_found, oid))
+	if (!oidset_contains(&options->gitmodules_found, oid))
 		return 0;
-	oidset_insert(&gitmodules_done, oid);
+	oidset_insert(&options->gitmodules_done, oid);
 
 	if (object_on_skiplist(options, oid))
 		return 0;
@@ -1220,13 +1217,13 @@ int fsck_finish(struct fsck_options *options)
 	struct oidset_iter iter;
 	const struct object_id *oid;
 
-	oidset_iter_init(&gitmodules_found, &iter);
+	oidset_iter_init(&options->gitmodules_found, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
 		enum object_type type;
 		unsigned long size;
 		char *buf;
 
-		if (oidset_contains(&gitmodules_done, oid))
+		if (oidset_contains(&options->gitmodules_done, oid))
 			continue;
 
 		buf = read_object_file(oid, &type, &size);
@@ -1251,8 +1248,8 @@ int fsck_finish(struct fsck_options *options)
 	}
 
 
-	oidset_clear(&gitmodules_found);
-	oidset_clear(&gitmodules_done);
+	oidset_clear(&options->gitmodules_found);
+	oidset_clear(&options->gitmodules_done);
 	return ret;
 }
 
diff --git a/fsck.h b/fsck.h
index 28137a77df0..99c77289688 100644
--- a/fsck.h
+++ b/fsck.h
@@ -116,11 +116,13 @@ struct fsck_options {
 	unsigned strict:1;
 	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
+	struct oidset gitmodules_found;
+	struct oidset gitmodules_done;
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, OIDSET_INIT, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, OIDSET_INIT, OIDSET_INIT, NULL }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 2/4] http-fetch: allow custom index-pack args
  2021-02-16 22:57       ` Junio C Hamano
@ 2021-02-17 19:46         ` Jonathan Tan
  0 siblings, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-02-17 19:46 UTC (permalink / raw)
  To: gitster; +Cc: steadmon, jonathantanmy, git

> Josh Steadmon <steadmon@google.com> writes:
> 
> >> +--index-pack-args=<args>::
> >> +	For internal use only. The command to run on the contents of the
> >> +	downloaded pack. Arguments are URL-encoded separated by spaces.
> >
> > I'm a bit skeptical of using URL encoding to work around embedded
> > spaces. I believe in Emily's config-based hooks series, she wrote an
> > argument parser to pull repeated arguments into a strvec, could you do
> > something like that here?
> >
> > I'm sympathetic to the idea that since this is an internal-only flag, we
> > can be a bit weird with the argument format, though.
> 
> We tend to prefer quote.c::sq_quote*() suite of quoting; does this
> codepath have very different constraints that require different
> encoding?

My main issue was that I needed to join arbitrary strings and then split
them, which is why I URL-encoded them (so that they would no longer
contain spaces) and then used spaces as the "join" separator. With
Josh's suggestion, I wouldn't need any sort of encoding or quoting, so I
think I'll use that.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
                             ` (14 preceding siblings ...)
  2021-02-17 19:42           ` [PATCH 14/14] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
@ 2021-02-17 20:05           ` Jonathan Tan
  15 siblings, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-02-17 20:05 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> > I tried that first, and the issue is that IDs like
> > FSCK_MSG_GITMODULES_MISSING are internal to fsck.c. As for whether we
> > should start exposing the IDs publicly, I think we should wait until a
> > few new cases like this come up, so that we more fully understand the
> > requirements first.
> 
> The requirement is that you want the objects ids we'd otherwise error
> about in fsck_finish(). Yeah we don't pass the "fsck_msg_id" down in the
> "report()" function, but you can reliably strstr() it out of the
> message.

We can't strstr() because of false positives (if, e.g. there is a
submodule name that contains the string we're looking for), but looking
at report() in fsck.c, the message ID is the very first thing appended,
so I think we can use starts_with().

> We document & hard rely on that already, since it's also a
> config key.

Ah, good point.

> But yeah, we could just change the report function to pass down the id
> and move the relevant macros from fsck.c to fsck.h. I think that would
> be a smaller change conceptually than a special-case flag in
> fsck_options for something we could otherwise do with the error
> reporting.

I agree - I thought this wouldn't be possible, but like you said, we can
reliably make use of the string in report() (or pass the ID, like your
patch set [1] does) so we should do this.

What would be the best way to proceed, now that we have at least 2 patch
sets (mine and yours) in play? I was thinking that I should update my
one to use the string reported in report() (with starts_with()), so that
both our patch sets can be reviewed and merged in parallel, and after
that, update the fsck code to use the ID instead of the string.

[1] https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-02-17  2:10         ` Ævar Arnfjörð Bjarmason
@ 2021-02-17 20:10           ` Jonathan Tan
  2021-02-18 12:07             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 134+ messages in thread
From: Jonathan Tan @ 2021-02-17 20:10 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> Sorry for being unclear here. I don't think (honestly I don't remember,
> it's been almost a month) that I meant to you should use the skipList.
> 
> Looking at that code again we use object_on_skiplist() to do an early
> punt in report(), but also fsck_blob(), presumably you never want the
> latter, and that early punting wouldn't be needed if your report()
> function intercepted the modules blob id for stashing it away / later
> reporting / whatever.
> 
> So yeah, I'm 99% sure now that's not what I meant :)
> 
> What I meant with:
> 
>     Or if we want to keep the "print <list> | process"[...]
> 
> Is that we have an existing ad-hoc IPC model for these commands in
> passing along the skipList, which is made more complex because sometimes
> the initial process reads the file, sometimes it passes it along as-is
> to the child.
> 
> And then there's this patch that passes OIDs too, but through a
> different mechanism.
> 
> I was suggesting that perhaps it made more sense to refactor both so
> they could use the same mechanism, because we're potentially passing two
> lists of OIDs between the two. Just one goes via line-at-a-time in the
> output, the other via a config option on the command-line.

Thanks for your explanation. I still think that they are quite different
- skiplist is a user-written file containing a list of OIDs that will
likely never change, whereas my list of dangling .gitmodules is a list
of OIDs dynamically generated (and thus, always different) whenever a
fetch is done. So I think it's quite reasonable to pass skiplist as a
file name, and my list should be passed line-by-line.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-02-17 19:27     ` Ævar Arnfjörð Bjarmason
@ 2021-02-17 20:11       ` Jonathan Tan
  0 siblings, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-02-17 20:11 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git

> What's the need for STRICT here & can't the former use the existing
> fsck_options in index-pack.c? With this on top we pass all tests:

[snip code]

Good point - I'll do that.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
@ 2021-02-17 21:02             ` Junio C Hamano
  2021-02-18  0:00               ` Ævar Arnfjörð Bjarmason
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                               ` (10 subsequent siblings)
  11 siblings, 1 reply; 134+ messages in thread
From: Junio C Hamano @ 2021-02-17 21:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Jonathan Tan pointed out that the fsck error_func doesn't pass you the
> ID of the fsck failure in [1]. This series improves the API so it
> does, and moves the gitmodules_{found,done} variables into the
> fsck_options struct.
>
> The result is that instead of the "print_dangling_gitmodules" member
> in that series we can just implement that with the diff at the end of
> this cover letter (goes on top of a merge of this series & "seen"),
> and without any changes to fsck_finish().
>
> This conflicts with other in-flight fsck changes but the conflict is
> rather trivial. Jeff King has another concurrent series to add a
> couple of new fsck checks, those need to be moved to fsck.h, and
> there's another trivial conflict in 2 hunks due to the
> gitmodules_{found,done} move.
>
> 1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

Let's get this reviewed now, but with expectation that it will be
rebased after the dust settles.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int"
  2021-02-17 19:42           ` [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
@ 2021-02-17 23:40             ` Junio C Hamano
  0 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-17 23:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Subject: Re: [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int"

use use.

> Change the fsck_walk_func to use an "enum object_type" instead of an
> "int" type. The types are compatible, and ever since this was added in
> 355885d5315 (add generic, type aware object chain walker, 2008-02-25)
> we've used entries from object_type (OBJ_BLOB etc.).
>
> So this doesn't really change anything as far as the generated code is
> concerned, it just gives the compiler more information and makes this
> easier to read.

Yup, as long as we won't trick the compiler into complaining "ah,
but you are not covering OBJ_OFS_DELTA or OBJ_BAD values in your
switch statement", I think a change like this is a good thing.

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/fsck.c           | 3 ++-
>  builtin/index-pack.c     | 3 ++-
>  builtin/unpack-objects.c | 3 ++-
>  fsck.h                   | 3 ++-
>  4 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 821e7798c70..68f0329e69e 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -197,7 +197,8 @@ static int traverse_reachable(void)
>  	return !!result;
>  }
>  
> -static int mark_used(struct object *obj, int type, void *data, struct fsck_options *options)
> +static int mark_used(struct object *obj, enum object_type object_type,
> +		     void *data, struct fsck_options *options)
>  {
>  	if (!obj)
>  		return 1;
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 54f74c48741..2f291a14d4a 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -212,7 +212,8 @@ static void cleanup_thread(void)
>  	free(thread_data);
>  }
>  
> -static int mark_link(struct object *obj, int type, void *data, struct fsck_options *options)
> +static int mark_link(struct object *obj, enum object_type type,
> +		     void *data, struct fsck_options *options)
>  {
>  	if (!obj)
>  		return -1;
> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index dd4a75e030d..ca54fd16688 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -187,7 +187,8 @@ static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
>   * that have reachability requirements and calls this function.
>   * Verify its reachability and validity recursively and write it out.
>   */
> -static int check_object(struct object *obj, int type, void *data, struct fsck_options *options)
> +static int check_object(struct object *obj, enum object_type type,
> +			void *data, struct fsck_options *options)
>  {
>  	struct obj_buffer *obj_buf;
>  
> diff --git a/fsck.h b/fsck.h
> index df0b64a2163..0c75789d219 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -23,7 +23,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type);
>   *     <0	error signaled and abort
>   *     >0	error signaled and do not abort
>   */
> -typedef int (*fsck_walk_func)(struct object *obj, int type, void *data, struct fsck_options *options);
> +typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
> +			      void *data, struct fsck_options *options);
>  
>  /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
>  typedef int (*fsck_error)(struct fsck_options *o,

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-17 21:02             ` Junio C Hamano
@ 2021-02-18  0:00               ` Ævar Arnfjörð Bjarmason
  2021-02-18 19:12                 ` Junio C Hamano
  0 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18  0:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan


On Wed, Feb 17 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Jonathan Tan pointed out that the fsck error_func doesn't pass you the
>> ID of the fsck failure in [1]. This series improves the API so it
>> does, and moves the gitmodules_{found,done} variables into the
>> fsck_options struct.
>>
>> The result is that instead of the "print_dangling_gitmodules" member
>> in that series we can just implement that with the diff at the end of
>> this cover letter (goes on top of a merge of this series & "seen"),
>> and without any changes to fsck_finish().
>>
>> This conflicts with other in-flight fsck changes but the conflict is
>> rather trivial. Jeff King has another concurrent series to add a
>> couple of new fsck checks, those need to be moved to fsck.h, and
>> there's another trivial conflict in 2 hunks due to the
>> gitmodules_{found,done} move.
>>
>> 1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/
>
> Let's get this reviewed now, but with expectation that it will be
> rebased after the dust settles.

Makes sense. Pending a review of this would you be interested in queuing
a v2 of this that doesn't conflict with in-flight topics?

Patches 01..09 & 13/14 can live conflict-free with what's in "seen" now
(I'd have made the 13th the 10th in v1 if I'd noticed). Then I could
re-roll the remainder of this once the other topics land.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen')
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
  2021-02-17 21:02             ` Junio C Hamano
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 22:19               ` Junio C Hamano
                                 ` (23 more replies)
  2021-02-18 10:58             ` [PATCH v2 01/10] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
                               ` (9 subsequent siblings)
  11 siblings, 24 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

As suggested in
https://lore.kernel.org/git/87zh028ctp.fsf@evledraar.gmail.com/ a
version of this that doesn't conflict with other in-flight topics. I
can submit the rest later.

Ævar Arnfjörð Bjarmason (10):
  fsck.h: indent arguments to of fsck_set_msg_type
  fsck.h: use "enum object_type" instead of "int"
  fsck.c: rename variables in fsck_set_msg_type() for less confusion
  fsck.c: move definition of msg_id into append_msg_id()
  fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  fsck.c: undefine temporary STR macro after use
  fsck.c: give "FOREACH_MSG_ID" a more specific name
  fsck.h: update FSCK_OPTIONS_* for object_name

 builtin/fsck.c           |  5 ++--
 builtin/index-pack.c     |  3 +-
 builtin/mktag.c          |  3 +-
 builtin/unpack-objects.c |  3 +-
 fsck.c                   | 60 ++++++++++++++++++++--------------------
 fsck.h                   | 26 +++++++++--------
 6 files changed, 54 insertions(+), 46 deletions(-)

Range-diff:
 -:  ----------- >  1:  88b347b74ed fsck.h: indent arguments to of fsck_set_msg_type
 1:  1a60d65d2ca !  2:  868eac3d4d1 fsck.h: use use "enum object_type" instead of "int"
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    fsck.h: use use "enum object_type" instead of "int"
    +    fsck.h: use "enum object_type" instead of "int"
     
         Change the fsck_walk_func to use an "enum object_type" instead of an
         "int" type. The types are compatible, and ever since this was added in
 2:  24761f269b7 =  3:  f599dc6c8f3 fsck.c: rename variables in fsck_set_msg_type() for less confusion
 3:  fb4c66f9305 =  4:  33f3b1942c1 fsck.c: move definition of msg_id into append_msg_id()
 4:  a129dbd9964 =  5:  28c9245e418 fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
 5:  d9bee41072e =  6:  d25037c6f18 fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
 6:  423568026c3 =  7:  66d0f1047cc fsck.c: call parse_msg_type() early in fsck_set_msg_type()
 7:  cb43e832738 =  8:  7643a5bf211 fsck.c: undefine temporary STR macro after use
 8:  2cd14cb4e2a =  9:  7c64e2267ce fsck.c: give "FOREACH_MSG_ID" a more specific name
 9:  1ada154ef23 <  -:  ----------- fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
10:  c4179445f22 <  -:  ----------- fsck.c: pass along the fsck_msg_id in the fsck_error callback
11:  c1fc724f0e8 <  -:  ----------- fsck.c: add an fsck_set_msg_type() API that takes enums
12:  8de91fac068 = 10:  a98a3512629 fsck.h: update FSCK_OPTIONS_* for object_name
13:  29ff97856ff <  -:  ----------- fsck.c: move gitmodules_{found,done} into fsck_options
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 01/10] fsck.h: indent arguments to of fsck_set_msg_type
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
  2021-02-17 21:02             ` Junio C Hamano
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 10:58             ` [PATCH v2 02/10] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
                               ` (8 subsequent siblings)
  11 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fsck.h b/fsck.h
index 423c467feb7..df0b64a2163 100644
--- a/fsck.h
+++ b/fsck.h
@@ -11,7 +11,7 @@ struct fsck_options;
 struct object;
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type);
+		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
 int is_valid_msg_type(const char *msg_id, const char *msg_type);
 
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 02/10] fsck.h: use "enum object_type" instead of "int"
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (2 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 01/10] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 10:58             ` [PATCH v2 03/10] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
                               ` (7 subsequent siblings)
  11 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the fsck_walk_func to use an "enum object_type" instead of an
"int" type. The types are compatible, and ever since this was added in
355885d5315 (add generic, type aware object chain walker, 2008-02-25)
we've used entries from object_type (OBJ_BLOB etc.).

So this doesn't really change anything as far as the generated code is
concerned, it just gives the compiler more information and makes this
easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c           | 3 ++-
 builtin/index-pack.c     | 3 ++-
 builtin/unpack-objects.c | 3 ++-
 fsck.h                   | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c70..68f0329e69e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -197,7 +197,8 @@ static int traverse_reachable(void)
 	return !!result;
 }
 
-static int mark_used(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_used(struct object *obj, enum object_type object_type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return 1;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 54f74c48741..2f291a14d4a 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -212,7 +212,8 @@ static void cleanup_thread(void)
 	free(thread_data);
 }
 
-static int mark_link(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_link(struct object *obj, enum object_type type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return -1;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index dd4a75e030d..ca54fd16688 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -187,7 +187,8 @@ static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
  * that have reachability requirements and calls this function.
  * Verify its reachability and validity recursively and write it out.
  */
-static int check_object(struct object *obj, int type, void *data, struct fsck_options *options)
+static int check_object(struct object *obj, enum object_type type,
+			void *data, struct fsck_options *options)
 {
 	struct obj_buffer *obj_buf;
 
diff --git a/fsck.h b/fsck.h
index df0b64a2163..0c75789d219 100644
--- a/fsck.h
+++ b/fsck.h
@@ -23,7 +23,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type);
  *     <0	error signaled and abort
  *     >0	error signaled and do not abort
  */
-typedef int (*fsck_walk_func)(struct object *obj, int type, void *data, struct fsck_options *options);
+typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
+			      void *data, struct fsck_options *options);
 
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 03/10] fsck.c: rename variables in fsck_set_msg_type() for less confusion
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (3 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 02/10] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 19:45               ` Jeff King
  2021-02-18 10:58             ` [PATCH v2 04/10] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
                               ` (6 subsequent siblings)
  11 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename variables in a function added in 0282f4dced0 (fsck: offer a
function to demote fsck errors to warnings, 2015-06-22).

It was needlessly confusing that it took a "msg_type" argument, but
then later declared another "msg_type" of a different type.

Let's rename that to "tmp", and rename "id" to "msg_id" and "msg_id"
to "msg_id_str" etc. This will make a follow-up change smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index 4b7f0b73d73..acccad243ec 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,27 +203,27 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 }
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type)
+		const char *msg_id_str, const char *msg_type_str)
 {
-	int id = parse_msg_id(msg_id), type;
+	int msg_id = parse_msg_id(msg_id_str), msg_type;
 
-	if (id < 0)
-		die("Unhandled message id: %s", msg_id);
-	type = parse_msg_type(msg_type);
+	if (msg_id < 0)
+		die("Unhandled message id: %s", msg_id_str);
+	msg_type = parse_msg_type(msg_type_str);
 
-	if (type != FSCK_ERROR && msg_id_info[id].msg_type == FSCK_FATAL)
-		die("Cannot demote %s to %s", msg_id, msg_type);
+	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
+		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
 	if (!options->msg_type) {
 		int i;
-		int *msg_type;
-		ALLOC_ARRAY(msg_type, FSCK_MSG_MAX);
+		int *tmp;
+		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
-			msg_type[i] = fsck_msg_type(i, options);
-		options->msg_type = msg_type;
+			tmp[i] = fsck_msg_type(i, options);
+		options->msg_type = tmp;
 	}
 
-	options->msg_type[id] = type;
+	options->msg_type[msg_id] = msg_type;
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 04/10] fsck.c: move definition of msg_id into append_msg_id()
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (4 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 03/10] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 10:58             ` [PATCH v2 05/10] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
                               ` (5 subsequent siblings)
  11 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor code added in 71ab8fa840f (fsck: report the ID of the
error/warning, 2015-06-22) to resolve the msg_id to a string in the
function that wants it, instead of doing it in report().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index acccad243ec..1070071ffec 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,8 +264,9 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, const char *msg_id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
 {
+	const char *msg_id = msg_id_info[id].id_string;
 	for (;;) {
 		char c = *(msg_id)++;
 
@@ -308,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, msg_id_info[id].id_string);
+	append_msg_id(&sb, id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 05/10] fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (5 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 04/10] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 22:23               ` Junio C Hamano
  2021-02-18 10:58             ` [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
                               ` (4 subsequent siblings)
  11 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the remaining variables of type fsck_msg_id from "id" to
"msg_id". This change is relatively small, and is worth the churn for
a later change where we have different id's in the "report" function.
---
 fsck.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fsck.c b/fsck.c
index 1070071ffec..dbb6f7c4ee2 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,19 +264,19 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id msg_id)
 {
-	const char *msg_id = msg_id_info[id].id_string;
+	const char *msg_id_str = msg_id_info[msg_id].id_string;
 	for (;;) {
-		char c = *(msg_id)++;
+		char c = *(msg_id_str)++;
 
 		if (!c)
 			break;
 		if (c != '_')
 			strbuf_addch(sb, tolower(c));
 		else {
-			assert(*msg_id);
-			strbuf_addch(sb, *(msg_id)++);
+			assert(*msg_id_str);
+			strbuf_addch(sb, *(msg_id_str)++);
 		}
 	}
 
@@ -292,11 +292,11 @@ static int object_on_skiplist(struct fsck_options *opts,
 __attribute__((format (printf, 5, 6)))
 static int report(struct fsck_options *options,
 		  const struct object_id *oid, enum object_type object_type,
-		  enum fsck_msg_id id, const char *fmt, ...)
+		  enum fsck_msg_id msg_id, const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(id, options), result;
+	int msg_type = fsck_msg_type(msg_id, options), result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -309,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, id);
+	append_msg_id(&sb, msg_id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (6 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 05/10] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 19:52               ` Jeff King
  2021-02-18 10:58             ` [PATCH v2 07/10] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
                               ` (3 subsequent siblings)
  11 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
fsck_msg_type enum.

These defines were originally introduced in:

 - ba002f3b28a (builtin-fsck: move common object checking code to
   fsck.c, 2008-02-25)
 - f50c4407305 (fsck: disallow demoting grave fsck errors to warnings,
   2015-06-22)
 - efaba7cc77f (fsck: optionally ignore specific fsck issues
   completely, 2015-06-22)
 - f27d05b1704 (fsck: allow upgrading fsck warnings to errors,
   2015-06-22)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  2 +-
 builtin/mktag.c |  3 ++-
 fsck.c          | 21 ++++++++++-----------
 fsck.h          | 17 ++++++++++-------
 4 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 68f0329e69e..d6d745dc702 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -89,7 +89,7 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   int msg_type, const char *message)
+			   enum fsck_msg_type msg_type, const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 41a399a69e4..1834394a9b6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -22,7 +22,8 @@ static int mktag_config(const char *var, const char *value, void *cb)
 static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
-				 int msg_type, const char *message)
+				 enum fsck_msg_type msg_type,
+				 const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/fsck.c b/fsck.c
index dbb6f7c4ee2..00e0fef21ca 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,9 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FSCK_FATAL -1
-#define FSCK_INFO -2
-
 #define FOREACH_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
@@ -97,7 +94,7 @@ static struct {
 	const char *id_string;
 	const char *downcased;
 	const char *camelcased;
-	int msg_type;
+	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
 	FOREACH_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
@@ -164,10 +161,10 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 		list_config_item(list, prefix, msg_id_info[i].camelcased);
 }
 
-static int fsck_msg_type(enum fsck_msg_id msg_id,
+static enum fsck_msg_type fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
-	int msg_type;
+	enum fsck_msg_type msg_type;
 
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
@@ -182,7 +179,7 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 	return msg_type;
 }
 
-static int parse_msg_type(const char *str)
+static enum fsck_msg_type parse_msg_type(const char *str)
 {
 	if (!strcmp(str, "error"))
 		return FSCK_ERROR;
@@ -205,7 +202,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
-	int msg_id = parse_msg_id(msg_id_str), msg_type;
+	int msg_id = parse_msg_id(msg_id_str);
+	enum fsck_msg_type msg_type;
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
@@ -216,7 +214,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 
 	if (!options->msg_type) {
 		int i;
-		int *tmp;
+		enum fsck_msg_type *tmp;
 		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
 			tmp[i] = fsck_msg_type(i, options);
@@ -296,7 +294,8 @@ static int report(struct fsck_options *options,
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(msg_id, options), result;
+	enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);
+	int result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -1262,7 +1261,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			int msg_type, const char *message)
+			enum fsck_msg_type msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index 0c75789d219..c77e8ddf10b 100644
--- a/fsck.h
+++ b/fsck.h
@@ -3,10 +3,13 @@
 
 #include "oidset.h"
 
-#define FSCK_ERROR 1
-#define FSCK_WARN 2
-#define FSCK_IGNORE 3
-
+enum fsck_msg_type {
+	FSCK_INFO = -2,
+	FSCK_FATAL = -1,
+	FSCK_ERROR = 1,
+	FSCK_WARN,
+	FSCK_IGNORE
+};
 struct fsck_options;
 struct object;
 
@@ -29,17 +32,17 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  int msg_type, const char *message);
+			  enum fsck_msg_type msg_type, const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			int msg_type, const char *message);
+			enum fsck_msg_type msg_type, const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
 	fsck_error error_func;
 	unsigned strict:1;
-	int *msg_type;
+	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
 	kh_oid_map_t *object_names;
 };
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 07/10] fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (7 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 22:29               ` Junio C Hamano
  2021-02-18 10:58             ` [PATCH v2 08/10] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
                               ` (2 subsequent siblings)
  11 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

There's no reason to defer the calling of parse_msg_type() until after
we've checked if the "id < 0". This is not a hot codepath, and
parse_msg_type() itself may die on invalid input.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 00e0fef21ca..7c53080ad48 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,11 +203,10 @@ void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
 	int msg_id = parse_msg_id(msg_id_str);
-	enum fsck_msg_type msg_type;
+	enum fsck_msg_type msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
-	msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 08/10] fsck.c: undefine temporary STR macro after use
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (8 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 07/10] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 22:30               ` Junio C Hamano
  2021-02-18 10:58             ` [PATCH v2 09/10] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
  2021-02-18 10:58             ` [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
  11 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

In f417eed8cde (fsck: provide a function to parse fsck message IDs,
2015-06-22) the "STR" macro was introduced, but that short macro name
was not undefined after use as was done earlier in the same series for
the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
messages, 2015-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsck.c b/fsck.c
index 7c53080ad48..88884e91c89 100644
--- a/fsck.c
+++ b/fsck.c
@@ -100,6 +100,7 @@ static struct {
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
+#undef STR
 
 static void prepare_msg_ids(void)
 {
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 09/10] fsck.c: give "FOREACH_MSG_ID" a more specific name
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (9 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 08/10] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 19:56               ` Jeff King
  2021-02-18 10:58             ` [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
  11 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
for moving it over to fsck.h. It's good convention to name macros
in *.h files in such a way as to clearly not clash with any other
names in other files.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fsck.c b/fsck.c
index 88884e91c89..1730acd698d 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,7 +22,7 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_MSG_ID(FUNC) \
+#define FOREACH_FSCK_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
 	FUNC(UNTERMINATED_HEADER, FATAL) \
@@ -83,7 +83,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
 enum fsck_msg_id {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	FSCK_MSG_MAX
 };
 #undef MSG_ID
@@ -96,7 +96,7 @@ static struct {
 	const char *camelcased;
 	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
                               ` (10 preceding siblings ...)
  2021-02-18 10:58             ` [PATCH v2 09/10] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
@ 2021-02-18 10:58             ` Ævar Arnfjörð Bjarmason
  2021-02-18 19:56               ` Jeff King
  2021-02-18 22:32               ` Junio C Hamano
  11 siblings, 2 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Add the object_name member to the initialization macro. This was
omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
go, 2016-07-17) when the field was added.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fsck.h b/fsck.h
index c77e8ddf10b..5d44ff1c8e3 100644
--- a/fsck.h
+++ b/fsck.h
@@ -47,8 +47,8 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.30.0.284.gd98b1dd5eaa7


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-01-28  0:35     ` Jonathan Tan
@ 2021-02-18 11:31       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 11:31 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: gitster, git, Patrick Steinhardt


On Thu, Jan 28 2021, Jonathan Tan wrote:

>> Jonathan Tan <jonathantanmy@google.com> writes:
>> 
>> > As part of this, index-pack has to output (1) the hash that goes into
>> > the name of the .pack/.idx file and (2) the hashes of all dangling
>> > .gitmodules. I just had (2) come after (1). If anyone has a better idea,
>> > I'm interested.
>> 
>> I have this feeling that the "blobs that need to be validated across
>> packs" will *not* be the last enhancement we'd need to make to the
>> output from index-pack to allow richer communication between it and
>> its invoker.  While there is no reason to change how the first line
>> of the output looks like, we'd probably want to make sure that the
>> future versions of Git can easily tell "list of blobs that require
>> further validation" from other additional information.
>> 
>> I am not comfortable to recommend "ok, then let's add a delimiter
>> line '---\n' if/when we need to have something after the list of
>> blobs and append more stuff in future versions of Git", because we
>> may find need to emit new kinds of info before the list of blobs
>> that needs further validation, for example, in future versions of
>> Git.
>> 
>> Having said all that, the internal communication between the
>> index-pack and its caller do not need as much care about
>> compatibility across versions as output visible to end-users, so
>> when a future version of Git needs to send different kinds of
>> information in different order from what you created here, we can do
>> so pretty much freely, I would guess.
>
> Yeah, that's what I thought too - since this is an internal interface,
> we can evolve them in lockstep. If we're really worried about the Git
> binaries (on a user's system) getting out of sync, 

I'm thinking in reading "getting out of sync" that you may be missing an
aspect of the issue here.

We're not talking about some abnormal error in some packaging system,
but how we'd expect all installations of git to behave if you update
them with *.rpm, *.deb etc, e.g. when your binaries are in
/usr/libexec/git-core. I suppose NixOS or something where there's
hash-based paths may be exempt from this.

On those systems if you've got a server serving concurrent traffic and
update the "git" package you could expect failure if any git process
invoked by another is incompatible during such an upgrade.

If you browse some of the recent GIT_CONFIG_PARAMETERS discussion this
was discussed there. I.e. even if GIT_CONFIG_PARAMETERS is internal-only
we bent over backwards not to change it in such a way as to have process
A invoking process B and the two not understanding each other because of
such an upgrade.

That's exactly because of this case, where receive-pack may be started
on version A, someone runs "apt install git" in the background
concurrently, and now a version A of that program is talking to a
version B index-pack.

> we could just make sure that subsequent updates to this protocol are
> non-backwards-compatible (e.g. have index-pack emit "foo <hash>",
> where "foo" is a string that describes the new check, so that current
> fetch-pack will reject "foo" since it is not a hash).

And then presumably index-pack would die and receive-pack would die on
the push or whatever, so the push fails for the end user.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 4/4] fetch-pack: print and use dangling .gitmodules
  2021-02-17 20:10           ` Jonathan Tan
@ 2021-02-18 12:07             ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-18 12:07 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git


On Wed, Feb 17 2021, Jonathan Tan wrote:

>> Sorry for being unclear here. I don't think (honestly I don't remember,
>> it's been almost a month) that I meant to you should use the skipList.
>> 
>> Looking at that code again we use object_on_skiplist() to do an early
>> punt in report(), but also fsck_blob(), presumably you never want the
>> latter, and that early punting wouldn't be needed if your report()
>> function intercepted the modules blob id for stashing it away / later
>> reporting / whatever.
>> 
>> So yeah, I'm 99% sure now that's not what I meant :)
>> 
>> What I meant with:
>> 
>>     Or if we want to keep the "print <list> | process"[...]
>> 
>> Is that we have an existing ad-hoc IPC model for these commands in
>> passing along the skipList, which is made more complex because sometimes
>> the initial process reads the file, sometimes it passes it along as-is
>> to the child.
>> 
>> And then there's this patch that passes OIDs too, but through a
>> different mechanism.
>> 
>> I was suggesting that perhaps it made more sense to refactor both so
>> they could use the same mechanism, because we're potentially passing two
>> lists of OIDs between the two. Just one goes via line-at-a-time in the
>> output, the other via a config option on the command-line.
>
> Thanks for your explanation. I still think that they are quite different
> - skiplist is a user-written file containing a list of OIDs that will
> likely never change, whereas my list of dangling .gitmodules is a list
> of OIDs dynamically generated (and thus, always different) whenever a
> fetch is done. So I think it's quite reasonable to pass skiplist as a
> file name, and my list should be passed line-by-line.

Sure, but I'm not talking about passing it as a tempfile.

Yes, I suggested that in the third-to-last paragraph of [1] but then
went on to say that we could also move to some IPC mechanism where you
spew in the list of dangling .gitmodules, and we also spew in the
skipList and anything else we want to pass in.

I'm not saying this needs to be part of this series. But let me
rephrase:

We now have some combination of
{receive-pack,upload-pack,send-pack,fetch-pack,unpack-objects} that need
to communicate locally or pass data back & forth, passing data either
via a CLI option to read a file, packnames/refs on --stdin, or (now) a
single list of OIDs on stdout.

Let's say we don't just need to pass the .gitmodules OIDs, but also
e.g. .mailmap OIDs or whatever (due to some future vulnerability).

Would this IPC mechanism deal with that, or would we need to introduce a
breaking change (Re: my recently send mail about concurrent updates of
libexec programs)? Can we use soemething like pkt-line to talk back &
forth in an extensible way?

Not needed now, just food for thought...

1. https://lore.kernel.org/git/87czxu7c15.fsf@evledraar.gmail.com/

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-18  0:00               ` Ævar Arnfjörð Bjarmason
@ 2021-02-18 19:12                 ` Junio C Hamano
  2021-02-18 19:57                   ` Jeff King
  0 siblings, 1 reply; 134+ messages in thread
From: Junio C Hamano @ 2021-02-18 19:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> Let's get this reviewed now, but with expectation that it will be
>> rebased after the dust settles.
>
> Makes sense. Pending a review of this would you be interested in queuing
> a v2 of this that doesn't conflict with in-flight topics?

Not really.  I am not sure your recent patches are getting
sufficient review bandwidth they deserve.

> Patches 01..09 & 13/14 can live conflict-free with what's in "seen" now
> (I'd have made the 13th the 10th in v1 if I'd noticed). Then I could
> re-roll the remainder of this once the other topics land.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 03/10] fsck.c: rename variables in fsck_set_msg_type() for less confusion
  2021-02-18 10:58             ` [PATCH v2 03/10] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
@ 2021-02-18 19:45               ` Jeff King
  0 siblings, 0 replies; 134+ messages in thread
From: Jeff King @ 2021-02-18 19:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Jonathan Tan

On Thu, Feb 18, 2021 at 11:58:33AM +0100, Ævar Arnfjörð Bjarmason wrote:

> Rename variables in a function added in 0282f4dced0 (fsck: offer a
> function to demote fsck errors to warnings, 2015-06-22).
> 
> It was needlessly confusing that it took a "msg_type" argument, but
> then later declared another "msg_type" of a different type.
> 
> Let's rename that to "tmp", and rename "id" to "msg_id" and "msg_id"
> to "msg_id_str" etc. This will make a follow-up change smaller.

I think this is an improvement, though maybe "severity" would be a
less-generic term than "type".

>  void fsck_set_msg_type(struct fsck_options *options,
> -		const char *msg_id, const char *msg_type)
> +		const char *msg_id_str, const char *msg_type_str)
>  {
> -	int id = parse_msg_id(msg_id), type;
> +	int msg_id = parse_msg_id(msg_id_str), msg_type;

I always get nervous when a refactoring renames something away from
"foo", and then renames another thing _to_ "foo". Any untouched bits of
code are vulnerable to confusing them.

But I think the types are sufficiently different that we can mostly rely
on the compiler (though things like numeric or bool comparisons can work
with either pointers or ints), and the fact that we can see the entire
function is small enough that we can see the entire thing in the context
here.

So I think it is OK.

-Peff

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-02-18 10:58             ` [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
@ 2021-02-18 19:52               ` Jeff King
  2021-02-18 22:27                 ` Junio C Hamano
  0 siblings, 1 reply; 134+ messages in thread
From: Jeff King @ 2021-02-18 19:52 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Jonathan Tan

On Thu, Feb 18, 2021 at 11:58:36AM +0100, Ævar Arnfjörð Bjarmason wrote:

> Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
> fsck_msg_type enum.

Makes sense. As with my previous comment, I wonder if "severity" is a
more descriptive term.

> diff --git a/fsck.h b/fsck.h
> index 0c75789d219..c77e8ddf10b 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -3,10 +3,13 @@
>  
>  #include "oidset.h"
>  
> -#define FSCK_ERROR 1
> -#define FSCK_WARN 2
> -#define FSCK_IGNORE 3
> -
> +enum fsck_msg_type {
> +	FSCK_INFO = -2,
> +	FSCK_FATAL = -1,
> +	FSCK_ERROR = 1,
> +	FSCK_WARN,
> +	FSCK_IGNORE
> +};

You kept the values the same as they were before, which is good in a
refactoring step, but...wow, the ordering is weird and confusing.

In FATAL/ERROR/WARN/IGNORE the number increases as severity decreases.
Maybe reversed from how I'd do it, but at least the order makes sense.
But somehow INFO is on the far side of FATAL?

Again, not something to address in this patch, but I hope something we
could maybe deal with in the longer term (perhaps along with fixing the
weird "INFO is a warning from the user's perspective, but WARNING is
generally an error" behavior).

I also know that this is assigning WARN and IGNORE based on
counting-by-one from ERROR, so it's correct. But I think it would be
more obvious if you simply filled in the values manually, so a reader
does not have to wonder why some are assigned and some are not.

-Peff

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-02-18 10:58             ` [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
@ 2021-02-18 19:56               ` Jeff King
  2021-02-18 22:33                 ` Junio C Hamano
  2021-02-18 22:32               ` Junio C Hamano
  1 sibling, 1 reply; 134+ messages in thread
From: Jeff King @ 2021-02-18 19:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Jonathan Tan

On Thu, Feb 18, 2021 at 11:58:40AM +0100, Ævar Arnfjörð Bjarmason wrote:

> Add the object_name member to the initialization macro. This was
> omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
> go, 2016-07-17) when the field was added.

We're correct either way here, because trailing fields that are not
initialized will get the usual zero-initialization. But I don't mind
trying to be more complete.

That said, we have embraced designated initializers these days, in which
case we usually omit the NULL ones. So perhaps:

  #define FSCK_OPTIONS_DEFAULT { \
	.walk = fsck_error_function, \
	.skiplist = OIDSET_INIT, \
  }
  #define FSCK_OPTIONS_STRICT { \
	.walk = fsck_error_function, \
	.skiplist = OIDSET_INIT, \
	.strict = 1, \
  }

would be more readable still?

-Peff

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 09/10] fsck.c: give "FOREACH_MSG_ID" a more specific name
  2021-02-18 10:58             ` [PATCH v2 09/10] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
@ 2021-02-18 19:56               ` Jeff King
  0 siblings, 0 replies; 134+ messages in thread
From: Jeff King @ 2021-02-18 19:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Johannes Schindelin, Jonathan Tan

On Thu, Feb 18, 2021 at 11:58:39AM +0100, Ævar Arnfjörð Bjarmason wrote:

> Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
> for moving it over to fsck.h. It's good convention to name macros
> in *.h files in such a way as to clearly not clash with any other
> names in other files.

The patch to move it is not in this v2 of the series, so arguably this
is less interesting. However, I think the resulting code is equally or
more readable, so I don't mind it standing on its own.

-Peff

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-18 19:12                 ` Junio C Hamano
@ 2021-02-18 19:57                   ` Jeff King
  2021-02-18 20:27                     ` Junio C Hamano
  2021-02-18 22:36                     ` Junio C Hamano
  0 siblings, 2 replies; 134+ messages in thread
From: Jeff King @ 2021-02-18 19:57 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Schindelin,
	Jonathan Tan

On Thu, Feb 18, 2021 at 11:12:26AM -0800, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> 
> >> Let's get this reviewed now, but with expectation that it will be
> >> rebased after the dust settles.
> >
> > Makes sense. Pending a review of this would you be interested in queuing
> > a v2 of this that doesn't conflict with in-flight topics?
> 
> Not really.  I am not sure your recent patches are getting
> sufficient review bandwidth they deserve.

FWIW, I just read through v2 (without having looked at all at v1 yet!),
and they all seemed like quite reasonable cleanups. I left a few small
comments that might be worth a quick re-roll, but I would also be OK
with the patches being picked up as-is.

-Peff

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-18 19:57                   ` Jeff King
@ 2021-02-18 20:27                     ` Junio C Hamano
  2021-02-19  0:54                       ` Ævar Arnfjörð Bjarmason
  2021-02-18 22:36                     ` Junio C Hamano
  1 sibling, 1 reply; 134+ messages in thread
From: Junio C Hamano @ 2021-02-18 20:27 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Schindelin,
	Jonathan Tan

Jeff King <peff@peff.net> writes:

> On Thu, Feb 18, 2021 at 11:12:26AM -0800, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>> 
>> >> Let's get this reviewed now, but with expectation that it will be
>> >> rebased after the dust settles.
>> >
>> > Makes sense. Pending a review of this would you be interested in queuing
>> > a v2 of this that doesn't conflict with in-flight topics?
>> 
>> Not really.  I am not sure your recent patches are getting
>> sufficient review bandwidth they deserve.
>
> FWIW, I just read through v2 (without having looked at all at v1 yet!),
> and they all seemed like quite reasonable cleanups. I left a few small
> comments that might be worth a quick re-roll, but I would also be OK
> with the patches being picked up as-is.

That's good to hear.  I shouldn't even have bothered to answer the
question, if the v2 were to have sent to the list without waiting
for my reply ;-)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen')
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
@ 2021-02-18 22:19               ` Junio C Hamano
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
                                 ` (22 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:19 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> As suggested in
> https://lore.kernel.org/git/87zh028ctp.fsf@evledraar.gmail.com/ a
> version of this that doesn't conflict with other in-flight topics. I
> can submit the rest later.

And a bystander does not have a clue what this thing is about,
beyond that it tweaks fsck API, how urgent it would be, what benefit
it brings to us?

That kind of things are expected to be described here.

The cover letter of v1 does not do much better job, either, but is
it fair to understand that this primarily is about allowing the
callback functions (which handle various problems fsck machinery
finds) to learn what error it encountered, so that things like
"enumerate missing .gitmodules blobs" 384c9d1c (fetch-pack: print
and use dangling .gitmodules, 2021-01-23) wants to do does not have
to be written by inserting a very narrow custom code into the
general error reporting codepath, but by customizing the error
reporting function?

If so, can we at least say something a bit more specific and
focused, than the overly broad "API improvements"?

THanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 05/10] fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  2021-02-18 10:58             ` [PATCH v2 05/10] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
@ 2021-02-18 22:23               ` Junio C Hamano
  0 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Rename the remaining variables of type fsck_msg_id from "id" to
> "msg_id". This change is relatively small, and is worth the churn for
> a later change where we have different id's in the "report" function.
> ---
>  fsck.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)

Up to this point I have no objections to the patches themselves, but
this one is not signed off.

> diff --git a/fsck.c b/fsck.c
> index 1070071ffec..dbb6f7c4ee2 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -264,19 +264,19 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
>  	free(to_free);
>  }
>  
> -static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
> +static void append_msg_id(struct strbuf *sb, enum fsck_msg_id msg_id)
>  {
> -	const char *msg_id = msg_id_info[id].id_string;
> +	const char *msg_id_str = msg_id_info[msg_id].id_string;
>  	for (;;) {
> -		char c = *(msg_id)++;
> +		char c = *(msg_id_str)++;
>  
>  		if (!c)
>  			break;
>  		if (c != '_')
>  			strbuf_addch(sb, tolower(c));
>  		else {
> -			assert(*msg_id);
> -			strbuf_addch(sb, *(msg_id)++);
> +			assert(*msg_id_str);
> +			strbuf_addch(sb, *(msg_id_str)++);
>  		}
>  	}
>  
> @@ -292,11 +292,11 @@ static int object_on_skiplist(struct fsck_options *opts,
>  __attribute__((format (printf, 5, 6)))
>  static int report(struct fsck_options *options,
>  		  const struct object_id *oid, enum object_type object_type,
> -		  enum fsck_msg_id id, const char *fmt, ...)
> +		  enum fsck_msg_id msg_id, const char *fmt, ...)
>  {
>  	va_list ap;
>  	struct strbuf sb = STRBUF_INIT;
> -	int msg_type = fsck_msg_type(id, options), result;
> +	int msg_type = fsck_msg_type(msg_id, options), result;
>  
>  	if (msg_type == FSCK_IGNORE)
>  		return 0;
> @@ -309,7 +309,7 @@ static int report(struct fsck_options *options,
>  	else if (msg_type == FSCK_INFO)
>  		msg_type = FSCK_WARN;
>  
> -	append_msg_id(&sb, id);
> +	append_msg_id(&sb, msg_id);
>  
>  	va_start(ap, fmt);
>  	strbuf_vaddf(&sb, fmt, ap);

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-02-18 19:52               ` Jeff King
@ 2021-02-18 22:27                 ` Junio C Hamano
  0 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:27 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Schindelin,
	Jonathan Tan

Jeff King <peff@peff.net> writes:

> On Thu, Feb 18, 2021 at 11:58:36AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
>> fsck_msg_type enum.
>
> Makes sense. As with my previous comment, I wonder if "severity" is a
> more descriptive term.
>
>> diff --git a/fsck.h b/fsck.h
>> index 0c75789d219..c77e8ddf10b 100644
>> --- a/fsck.h
>> +++ b/fsck.h
>> @@ -3,10 +3,13 @@
>>  
>>  #include "oidset.h"
>>  
>> -#define FSCK_ERROR 1
>> -#define FSCK_WARN 2
>> -#define FSCK_IGNORE 3
>> -
>> +enum fsck_msg_type {
>> +	FSCK_INFO = -2,
>> +	FSCK_FATAL = -1,
>> +	FSCK_ERROR = 1,
>> +	FSCK_WARN,
>> +	FSCK_IGNORE
>> +};
>
> You kept the values the same as they were before, which is good in a
> refactoring step, but...wow, the ordering is weird and confusing.
>
> In FATAL/ERROR/WARN/IGNORE the number increases as severity decreases.
> Maybe reversed from how I'd do it, but at least the order makes sense.
> But somehow INFO is on the far side of FATAL?
>
> Again, not something to address in this patch, but I hope something we
> could maybe deal with in the longer term (perhaps along with fixing the
> weird "INFO is a warning from the user's perspective, but WARNING is
> generally an error" behavior).
>
> I also know that this is assigning WARN and IGNORE based on
> counting-by-one from ERROR, so it's correct. But I think it would be
> more obvious if you simply filled in the values manually, so a reader
> does not have to wonder why some are assigned and some are not.

I had the same reaction, plus "Wow, we had FSCK_* constants in two
different places and without colliding?  Have we been lucky?
Declaring it in one place, whether we use enum or not (as enum is
not very useful in C as a type checking vehicle), makes a lot of
sense but why does this come this late in the series, instead of
being at the front as a trivial low-hanging fruit?"

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 07/10] fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  2021-02-18 10:58             ` [PATCH v2 07/10] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
@ 2021-02-18 22:29               ` Junio C Hamano
  0 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> There's no reason to defer the calling of parse_msg_type() until after
> we've checked if the "id < 0". This is not a hot codepath, and
> parse_msg_type() itself may die on invalid input.

That explains why this change can be done, but does not justify why
it is a good change.  Unlike all the previous steps, I would rather
say this is borderline needless churn.

Let's keep reading as the picture may change as we touch more code
around this area.

Thanks.


>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fsck.c b/fsck.c
> index 00e0fef21ca..7c53080ad48 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -203,11 +203,10 @@ void fsck_set_msg_type(struct fsck_options *options,
>  		const char *msg_id_str, const char *msg_type_str)
>  {
>  	int msg_id = parse_msg_id(msg_id_str);
> -	enum fsck_msg_type msg_type;
> +	enum fsck_msg_type msg_type = parse_msg_type(msg_type_str);
>  
>  	if (msg_id < 0)
>  		die("Unhandled message id: %s", msg_id_str);
> -	msg_type = parse_msg_type(msg_type_str);
>  
>  	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
>  		die("Cannot demote %s to %s", msg_id_str, msg_type_str);

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 08/10] fsck.c: undefine temporary STR macro after use
  2021-02-18 10:58             ` [PATCH v2 08/10] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
@ 2021-02-18 22:30               ` Junio C Hamano
  0 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> In f417eed8cde (fsck: provide a function to parse fsck message IDs,
> 2015-06-22) the "STR" macro was introduced, but that short macro name
> was not undefined after use as was done earlier in the same series for
> the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
> messages, 2015-06-22).
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fsck.c b/fsck.c
> index 7c53080ad48..88884e91c89 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -100,6 +100,7 @@ static struct {
>  	{ NULL, NULL, NULL, -1 }
>  };
>  #undef MSG_ID
> +#undef STR

Good clean-up.

>  
>  static void prepare_msg_ids(void)
>  {

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-02-18 10:58             ` [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
  2021-02-18 19:56               ` Jeff King
@ 2021-02-18 22:32               ` Junio C Hamano
  1 sibling, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add the object_name member to the initialization macro. This was
> omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
> go, 2016-07-17) when the field was added.

This is more of a Meh to me.  If this were to change us to
designated initializers and omit NULL and 0 initialization, it would
be more interesting.

Thanks.

>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  fsck.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fsck.h b/fsck.h
> index c77e8ddf10b..5d44ff1c8e3 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -47,8 +47,8 @@ struct fsck_options {
>  	kh_oid_map_t *object_names;
>  };
>  
> -#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
> -#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
> +#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
> +#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
>  
>  /* descend in all linked child objects
>   * the return value is:

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-02-18 19:56               ` Jeff King
@ 2021-02-18 22:33                 ` Junio C Hamano
  0 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:33 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, git, Johannes Schindelin,
	Jonathan Tan

Jeff King <peff@peff.net> writes:

> On Thu, Feb 18, 2021 at 11:58:40AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> Add the object_name member to the initialization macro. This was
>> omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
>> go, 2016-07-17) when the field was added.
>
> We're correct either way here, because trailing fields that are not
> initialized will get the usual zero-initialization. But I don't mind
> trying to be more complete.
>
> That said, we have embraced designated initializers these days, in which
> case we usually omit the NULL ones. So perhaps:
>
>   #define FSCK_OPTIONS_DEFAULT { \
> 	.walk = fsck_error_function, \
> 	.skiplist = OIDSET_INIT, \
>   }
>   #define FSCK_OPTIONS_STRICT { \
> 	.walk = fsck_error_function, \
> 	.skiplist = OIDSET_INIT, \
> 	.strict = 1, \
>   }
>
> would be more readable still?

Ahh, I should probably have read your reviews first before reading
patches myself ;-)

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-18 19:57                   ` Jeff King
  2021-02-18 20:27                     ` Junio C Hamano
@ 2021-02-18 22:36                     ` Junio C Hamano
  1 sibling, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-18 22:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff King
  Cc: git, Johannes Schindelin, Jonathan Tan

Jeff King <peff@peff.net> writes:

> On Thu, Feb 18, 2021 at 11:12:26AM -0800, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>> 
>> >> Let's get this reviewed now, but with expectation that it will be
>> >> rebased after the dust settles.
>> >
>> > Makes sense. Pending a review of this would you be interested in queuing
>> > a v2 of this that doesn't conflict with in-flight topics?
>> 
>> Not really.  I am not sure your recent patches are getting
>> sufficient review bandwidth they deserve.
>
> FWIW, I just read through v2 (without having looked at all at v1 yet!),
> and they all seemed like quite reasonable cleanups. I left a few small
> comments that might be worth a quick re-roll, but I would also be OK
> with the patches being picked up as-is.

Yeah, all except for a handful minor nits looked good.

Thanks for writing and reviewing.  Perhaps a final reroll to tie the
loose ends, or is it just a matter of signing off one of them and
droping a couple of other ones (which other ones)?




^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
                     ` (4 preceding siblings ...)
  2021-01-24  6:29   ` [PATCH 0/4] Check .gitmodules when using packfile URIs Junio C Hamano
@ 2021-02-18 23:34   ` Junio C Hamano
  2021-02-19  0:46     ` Jonathan Tan
  2021-02-19  1:08     ` Ævar Arnfjörð Bjarmason
  5 siblings, 2 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-18 23:34 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

Jonathan Tan <jonathantanmy@google.com> writes:

> This patch set resolves the .gitmodules-and-tree-in-separate-packfiles
> issue I mentioned in [1] by having index-pack print out all dangling
> .gitmodules (instead of returning with an error code) and then teaching
> fetch-pack to read those and run its own fsck checks after all
> index-pack invocations are complete.
>
> As part of this, index-pack has to output (1) the hash that goes into
> the name of the .pack/.idx file and (2) the hashes of all dangling
> .gitmodules. I just had (2) come after (1). If anyone has a better idea,
> I'm interested.
>
> I also discovered a bug in that different index-pack arguments were used
> when processing the inline packfile and when processing the ones
> referenced by URIs. Patch 1-3 fixes that bug by passing the arguments to
> use as a space-separated URL-encoded list. (URL-encoded so that we can
> have spaces in the arguments.) Again, if anyone has a better idea, I'm
> interested. It is only in patch 4 that we have the dangling .gitmodules
> fix.

This seems to have been stalled but I think it would be a better
approach to use a custom callback for error reporting, suggested by
Ævar, which would be where his fsck API clean-up topic would lead
to.

If it is not ultra-urgent, perhaps you can retract the ones that are
queued right now, work with Ævar to finish the error-callback work
and rebuild this topic on top of it?  Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-02-18 23:34   ` Junio C Hamano
@ 2021-02-19  0:46     ` Jonathan Tan
  2021-02-20  3:31       ` Junio C Hamano
  2021-02-19  1:08     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 134+ messages in thread
From: Jonathan Tan @ 2021-02-19  0:46 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git

> This seems to have been stalled but I think it would be a better
> approach to use a custom callback for error reporting, suggested by
> Ævar, which would be where his fsck API clean-up topic would lead
> to.
> 
> If it is not ultra-urgent, perhaps you can retract the ones that are
> queued right now, work with Ævar to finish the error-callback work
> and rebuild this topic on top of it?  Thanks.

OK - that works. My original idea was to rewrite it using an
error-callback but using starts_with() instead of the ID that Ævar's
work will provide, but seeing that at least one other contributor (Peff)
seems OK with the patches, rebasing mine on top of his works too. I'll
also take a look at his patches.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 00/14] fsck: API improvements
  2021-02-18 20:27                     ` Junio C Hamano
@ 2021-02-19  0:54                       ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-19  0:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, Johannes Schindelin, Jonathan Tan


On Thu, Feb 18 2021, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
>
>> On Thu, Feb 18, 2021 at 11:12:26AM -0800, Junio C Hamano wrote:
>>
>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>> 
>>> >> Let's get this reviewed now, but with expectation that it will be
>>> >> rebased after the dust settles.
>>> >
>>> > Makes sense. Pending a review of this would you be interested in queuing
>>> > a v2 of this that doesn't conflict with in-flight topics?
>>> 
>>> Not really.  I am not sure your recent patches are getting
>>> sufficient review bandwidth they deserve.
>>
>> FWIW, I just read through v2 (without having looked at all at v1 yet!),
>> and they all seemed like quite reasonable cleanups. I left a few small
>> comments that might be worth a quick re-roll, but I would also be OK
>> with the patches being picked up as-is.
>
> That's good to hear.  I shouldn't even have bothered to answer the
> question, if the v2 were to have sent to the list without waiting
> for my reply ;-)

FWIW it's not that I didn't care about the reply, but I'm somewhat
intermittently available time/network wise in the coming days. And
there's the TZ difference between us.

I sent v1 thinking you might be willing to pick it up & resolve the
conflict, but since you expressed an interest in deferring it until
conflicting work landed figured I'd ask (and then just sent the patches)
if you'd be interested in a conflict-free version to queue alongside
those changes.

If it was still "nah" fair enough, I'd just wait. But if not those
patches would be there to pickup.

Thanks a lot to you & Jeff for the review on v2. I won't have time to
address all that today, and in any case I got the message that maybe I
should stop firehosing the list with patch series's for a bit :)

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-02-18 23:34   ` Junio C Hamano
  2021-02-19  0:46     ` Jonathan Tan
@ 2021-02-19  1:08     ` Ævar Arnfjörð Bjarmason
  2021-02-20  3:29       ` Junio C Hamano
  1 sibling, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-19  1:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git


On Fri, Feb 19 2021, Junio C Hamano wrote:

> Jonathan Tan <jonathantanmy@google.com> writes:
>
>> This patch set resolves the .gitmodules-and-tree-in-separate-packfiles
>> issue I mentioned in [1] by having index-pack print out all dangling
>> .gitmodules (instead of returning with an error code) and then teaching
>> fetch-pack to read those and run its own fsck checks after all
>> index-pack invocations are complete.
>>
>> As part of this, index-pack has to output (1) the hash that goes into
>> the name of the .pack/.idx file and (2) the hashes of all dangling
>> .gitmodules. I just had (2) come after (1). If anyone has a better idea,
>> I'm interested.
>>
>> I also discovered a bug in that different index-pack arguments were used
>> when processing the inline packfile and when processing the ones
>> referenced by URIs. Patch 1-3 fixes that bug by passing the arguments to
>> use as a space-separated URL-encoded list. (URL-encoded so that we can
>> have spaces in the arguments.) Again, if anyone has a better idea, I'm
>> interested. It is only in patch 4 that we have the dangling .gitmodules
>> fix.
>
> This seems to have been stalled but I think it would be a better
> approach to use a custom callback for error reporting, suggested by
> Ævar, which would be where his fsck API clean-up topic would lead
> to.
>
> If it is not ultra-urgent, perhaps you can retract the ones that are
> queued right now, work with Ævar to finish the error-callback work
> and rebuild this topic on top of it?  Thanks.

If my vote counts for something I think it makes sense to have
Jonathan's series go first and just ignore my fsck API improvement
patches (well, the part of my v1[1] which conflicts with his work).

I'm also happy to help him queue his on top of a v1 version of my
series.

But the end result of doing so (shown after the "--" in [1]) is just a
small re-arrangement of code to get a cleaner fsck API use, it doesn't
actually matter to anyone using git.

Whereas his patches actually do, we have in-the-wild server/repo/clone
setups that are getting on-clone errors, and the window for 2.31 is
getting closer.

We can always do the small API use refactoring later. My interest in
barking up that tree was just that I've been poking at that part of the
fsck API and have some follow-up work that hasn't made it onto the list
yet that makes other use of the fsck API.

So in the longer term I wanted us to think about not needing N special
cases like "print_dangling_gitmodules" if we could help it, but in the
shorter term having it is a non-issue.

1. https://lore.kernel.org/git/20210217194246.25342-1-avarab@gmail.com/

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-02-19  1:08     ` Ævar Arnfjörð Bjarmason
@ 2021-02-20  3:29       ` Junio C Hamano
  0 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-20  3:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jonathan Tan, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Fri, Feb 19 2021, Junio C Hamano wrote:
>
>> This seems to have been stalled but I think it would be a better
>> approach to use a custom callback for error reporting, suggested by
>> Ævar, which would be where his fsck API clean-up topic would lead
>> to.
>>
>> If it is not ultra-urgent, perhaps you can retract the ones that are
>> queued right now, work with Ævar to finish the error-callback work
>> and rebuild this topic on top of it?  Thanks.
>
> If my vote counts for something I think it makes sense to have
> Jonathan's series go first and just ignore my fsck API improvement
> patches (well, the part of my v1[1] which conflicts with his work).
>
> I'm also happy to help him queue his on top of a v1 version of my
> series.

Either would work for us, I would think.

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH 0/4] Check .gitmodules when using packfile URIs
  2021-02-19  0:46     ` Jonathan Tan
@ 2021-02-20  3:31       ` Junio C Hamano
  0 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-20  3:31 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Ævar Arnfjörð Bjarmason

Jonathan Tan <jonathantanmy@google.com> writes:

>> This seems to have been stalled but I think it would be a better
>> approach to use a custom callback for error reporting, suggested by
>> Ævar, which would be where his fsck API clean-up topic would lead
>> to.
>> 
>> If it is not ultra-urgent, perhaps you can retract the ones that are
>> queued right now, work with Ævar to finish the error-callback work
>> and rebuild this topic on top of it?  Thanks.
>
> OK - that works. My original idea was to rewrite it using an
> error-callback but using starts_with() instead of the ID that Ævar's
> work will provide, but seeing that at least one other contributor (Peff)
> seems OK with the patches, rebasing mine on top of his works too. I'll
> also take a look at his patches.

Thanks, either way would work for me, but if the suggested route
forces you review Ævar's code and work together, that would be a
good bonus point ;-)


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 0/4] Check .gitmodules when using packfile URIs
  2021-01-15 23:43 RFC on packfile URIs and .gitmodules check Jonathan Tan
                   ` (2 preceding siblings ...)
  2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
@ 2021-02-22 19:20 ` Jonathan Tan
  2021-02-22 19:20   ` [PATCH v2 1/4] http: allow custom index-pack args Jonathan Tan
                     ` (4 more replies)
  3 siblings, 5 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-02-22 19:20 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, avarab, gitster

Here's v2. I think I've addressed all the review comments, including
passing the index-pack args as separate arguments (to avoid the
necessity to somehow encode in order to get rid of spaces), and by using
a custom error function instead of a specific option in fsck.

This applies on master. I mentioned earlier [1] that I was planning to
implement this on Ævar's fsck API improvements, but after looking at the
latest v2, I see that it omits patch 11 from v1 (which is the one I
need), so what I've done is to use a string check in the meantime.

[1] https://lore.kernel.org/git/20210219004612.1181920-1-jonathantanmy@google.com/

Jonathan Tan (4):
  http: allow custom index-pack args
  http-fetch: allow custom index-pack args
  fetch-pack: with packfile URIs, use index-pack arg
  fetch-pack: print and use dangling .gitmodules

 Documentation/git-http-fetch.txt |  10 ++-
 Documentation/git-index-pack.txt |   7 ++-
 builtin/index-pack.c             |  25 +++++++-
 builtin/receive-pack.c           |   2 +-
 fetch-pack.c                     | 103 ++++++++++++++++++++++++++-----
 fsck.c                           |   5 ++
 fsck.h                           |   2 +
 http-fetch.c                     |  20 +++++-
 http.c                           |  15 ++---
 http.h                           |  10 +--
 pack-write.c                     |   8 ++-
 pack.h                           |   2 +-
 t/t5550-http-fetch-dumb.sh       |   5 +-
 t/t5702-protocol-v2.sh           |  58 +++++++++++++++--
 14 files changed, 227 insertions(+), 45 deletions(-)

Range-diff against v1:
-:  ---------- > 1:  b7e376be16 http: allow custom index-pack args
1:  9fba6c9bcc ! 2:  57220ceb84 http-fetch: allow custom index-pack args
    @@ Documentation/git-http-fetch.txt: commit-id::
      
      --packfile=<hash>::
     -	Instead of a commit id on the command line (which is not expected in
    -+	For internal use only. Instead of a commit id on the command line (which is not expected in
    ++	For internal use only. Instead of a commit id on the command
    ++	line (which is not expected in
      	this case), 'git http-fetch' fetches the packfile directly at the given
      	URL and uses index-pack to generate corresponding .idx and .keep files.
      	The hash is used to determine the name of the temporary file and is
    @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      		strvec_pushf(&cmd.args, "--packfile=%.*s",
      			     (int) the_hash_algo->hexsz,
      			     packfile_uris.items[i].string);
    -+		strvec_push(&cmd.args, "--index-pack-args=index-pack --stdin --keep");
    ++		strvec_push(&cmd.args, "--index-pack-arg=index-pack");
    ++		strvec_push(&cmd.args, "--index-pack-arg=--stdin");
    ++		strvec_push(&cmd.args, "--index-pack-arg=--keep");
      		strvec_push(&cmd.args, uri);
      		cmd.git_cmd = 1;
      		cmd.no_stdin = 1;
    @@ http-fetch.c: int cmd_main(int argc, const char **argv)
      	int packfile = 0;
      	int nongit;
      	struct object_id packfile_hash;
    -+	const char *index_pack_args = NULL;
    ++	struct strvec index_pack_args = STRVEC_INIT;
      
      	setup_git_directory_gently(&nongit);
      
    @@ http-fetch.c: int cmd_main(int argc, const char **argv)
      			packfile = 1;
      			if (parse_oid_hex(p, &packfile_hash, &end) || *end)
      				die(_("argument to --packfile must be a valid hash (got '%s')"), p);
    -+		} else if (skip_prefix(argv[arg], "--index-pack-args=", &p)) {
    -+			index_pack_args = p;
    ++		} else if (skip_prefix(argv[arg], "--index-pack-arg=", &p)) {
    ++			strvec_push(&index_pack_args, p);
      		}
      		arg++;
      	}
    @@ http-fetch.c: int cmd_main(int argc, const char **argv)
      
      	if (packfile) {
     -		fetch_single_packfile(&packfile_hash, argv[arg]);
    -+		struct strvec encoded = STRVEC_INIT;
    -+		char **raw;
    -+		int i;
    -+
    -+		if (!index_pack_args)
    ++		if (!index_pack_args.nr)
     +			die(_("--packfile requires --index-pack-args"));
     +
    -+		strvec_split(&encoded, index_pack_args);
    -+
    -+		CALLOC_ARRAY(raw, encoded.nr + 1);
    -+		for (i = 0; i < encoded.nr; i++)
    -+			raw[i] = url_percent_decode(encoded.v[i]);
    -+
     +		fetch_single_packfile(&packfile_hash, argv[arg],
    -+				      (const char **) raw);
    -+
    -+		for (i = 0; i < encoded.nr; i++)
    -+			free(raw[i]);
    -+		free(raw);
    -+		strvec_clear(&encoded);
    ++				      index_pack_args.v);
     +
      		return 0;
      	}
      
    -+	if (index_pack_args)
    ++	if (index_pack_args.nr)
     +		die(_("--index-pack-args can only be used with --packfile"));
     +
      	if (commits_on_stdin) {
    @@ t/t5550-http-fetch-dumb.sh: test_expect_success 'http-fetch --packfile' '
      	p=$(cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && ls objects/pack/pack-*.pack) &&
     -	git -C packfileclient http-fetch --packfile=$ARBITRARY "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
     +	git -C packfileclient http-fetch --packfile=$ARBITRARY \
    -+		--index-pack-args="index-pack --stdin --keep" "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
    ++		--index-pack-arg=index-pack --index-pack-arg=--stdin \
    ++		--index-pack-arg=--keep \
    ++		"$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
      
      	grep "^keep.[0-9a-f]\{16,\}$" out &&
      	cut -c6- out >packhash &&
2:  7c3244e79f ! 3:  aa87335464 fetch-pack: with packfile URIs, use index-pack arg
    @@ fetch-pack.c: static void write_promisor_file(const char *keep_name,
     - * Pass 1 as "only_packfile" if the pack received is the only pack in this
     - * fetch request (that is, if there were no packfile URIs provided).
     + * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
    -+ * The string to pass as the --index-pack-args argument to http-fetch will be
    ++ * The strings to pass as the --index-pack-arg arguments to http-fetch will be
     + * stored there. (It must be freed by the caller.)
       */
      static int get_pack(struct fetch_pack_args *args,
      		    int xd[2], struct string_list *pack_lockfiles,
     -		    int only_packfile,
    -+		    char **index_pack_args,
    ++		    struct strvec *index_pack_args,
      		    struct ref **sought, int nr_sought)
      {
      	struct async demux;
    @@ fetch-pack.c: static int get_pack(struct fetch_pack_args *args,
      	}
      
     +	if (index_pack_args) {
    -+		struct strbuf joined = STRBUF_INIT;
     +		int i;
     +
    -+		for (i = 0; i < cmd.args.nr; i++) {
    -+			if (i)
    -+				strbuf_addch(&joined, ' ');
    -+			strbuf_addstr_urlencode(&joined, cmd.args.v[i],
    -+						is_rfc3986_unreserved);
    -+		}
    -+		*index_pack_args = strbuf_detach(&joined, NULL);
    ++		for (i = 0; i < cmd.args.nr; i++)
    ++			strvec_push(index_pack_args, cmd.args.v[i]);
     +	}
     +
      	cmd.in = demux.out;
    @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      	int seen_ack = 0;
      	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
      	int i;
    -+	char *index_pack_args = NULL;
    ++	struct strvec index_pack_args = STRVEC_INIT;
      
      	negotiator = &negotiator_alloc;
      	fetch_negotiator_init(r, negotiator);
    @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      				die(_("git fetch-pack: fetch failed."));
      			do_check_stateless_delimiter(args, &reader);
      
    +@@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
    + 	}
    + 
    + 	for (i = 0; i < packfile_uris.nr; i++) {
    ++		int j;
    + 		struct child_process cmd = CHILD_PROCESS_INIT;
    + 		char packname[GIT_MAX_HEXSZ + 1];
    + 		const char *uri = packfile_uris.items[i].string +
     @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      		strvec_pushf(&cmd.args, "--packfile=%.*s",
      			     (int) the_hash_algo->hexsz,
      			     packfile_uris.items[i].string);
    --		strvec_push(&cmd.args, "--index-pack-args=index-pack --stdin --keep");
    -+		strvec_pushf(&cmd.args, "--index-pack-args=%s", index_pack_args);
    +-		strvec_push(&cmd.args, "--index-pack-arg=index-pack");
    +-		strvec_push(&cmd.args, "--index-pack-arg=--stdin");
    +-		strvec_push(&cmd.args, "--index-pack-arg=--keep");
    ++		for (j = 0; j < index_pack_args.nr; j++)
    ++			strvec_pushf(&cmd.args, "--index-pack-arg=%s",
    ++				     index_pack_args.v[j]);
      		strvec_push(&cmd.args, uri);
      		cmd.git_cmd = 1;
      		cmd.no_stdin = 1;
    @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      						 packname));
      	}
      	string_list_clear(&packfile_uris, 0);
    -+	FREE_AND_NULL(index_pack_args);
    ++	strvec_clear(&index_pack_args);
      
      	if (negotiator)
      		negotiator->release(negotiator);
3:  384c9d1c73 ! 4:  e8b18d02e6 fetch-pack: print and use dangling .gitmodules
    @@ Documentation/git-index-pack.txt: OPTIONS
      	Specifies the number of threads to spawn when resolving
     
      ## builtin/index-pack.c ##
    +@@ builtin/index-pack.c: static void show_pack_info(int stat_only)
    + 	}
    + }
    + 
    ++static int print_dangling_gitmodules(struct fsck_options *o,
    ++				     const struct object_id *oid,
    ++				     enum object_type object_type,
    ++				     int msg_type, const char *message)
    ++{
    ++	/*
    ++	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
    ++	 * instead of relying on this string check.
    ++	 */
    ++	if (starts_with(message, "gitmodulesMissing")) {
    ++		printf("%s\n", oid_to_hex(oid));
    ++		return 0;
    ++	}
    ++	return fsck_error_function(o, oid, object_type, msg_type, message);
    ++}
    ++
    + int cmd_index_pack(int argc, const char **argv, const char *prefix)
    + {
    + 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0;
     @@ builtin/index-pack.c: int cmd_index_pack(int argc, const char **argv, const char *prefix)
      	else
      		close(input_fd);
    @@ builtin/index-pack.c: int cmd_index_pack(int argc, const char **argv, const char
     -	if (do_fsck_object && fsck_finish(&fsck_options))
     -		die(_("fsck error in pack objects"));
     +	if (do_fsck_object) {
    -+		struct fsck_options fo = FSCK_OPTIONS_STRICT;
    ++		struct fsck_options fo = fsck_options;
     +
    -+		fo.print_dangling_gitmodules = 1;
    ++		fo.error_func = print_dangling_gitmodules;
     +		if (fsck_finish(&fo))
     +			die(_("fsck error in pack objects"));
     +	}
    @@ fetch-pack.c: static void write_promisor_file(const char *keep_name,
     +
      /*
       * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
    -  * The string to pass as the --index-pack-args argument to http-fetch will be
    +  * The strings to pass as the --index-pack-arg arguments to http-fetch will be
     @@ fetch-pack.c: static void write_promisor_file(const char *keep_name,
      static int get_pack(struct fetch_pack_args *args,
      		    int xd[2], struct string_list *pack_lockfiles,
    - 		    char **index_pack_args,
    + 		    struct strvec *index_pack_args,
     -		    struct ref **sought, int nr_sought)
     +		    struct ref **sought, int nr_sought,
     +		    struct oidset *gitmodules_oids)
    @@ fetch-pack.c: static struct ref *do_fetch_pack(struct fetch_pack_args *args,
     @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
      	int i;
    - 	char *index_pack_args = NULL;
    + 	struct strvec index_pack_args = STRVEC_INIT;
     +	struct oidset gitmodules_oids = OIDSET_INIT;
      
      	negotiator = &negotiator_alloc;
    @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      		if (finish_command(&cmd))
     @@ fetch-pack.c: static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
      	string_list_clear(&packfile_uris, 0);
    - 	FREE_AND_NULL(index_pack_args);
    + 	strvec_clear(&index_pack_args);
      
     +	fsck_gitmodules_oids(&gitmodules_oids);
     +
    @@ fsck.c: int fsck_error_function(struct fsck_options *o,
      int fsck_finish(struct fsck_options *options)
      {
      	int ret = 0;
    -@@ fsck.c: int fsck_finish(struct fsck_options *options)
    - 		if (!buf) {
    - 			if (is_promisor_object(oid))
    - 				continue;
    --			ret |= report(options,
    --				      oid, OBJ_BLOB,
    --				      FSCK_MSG_GITMODULES_MISSING,
    --				      "unable to read .gitmodules blob");
    -+			if (options->print_dangling_gitmodules)
    -+				printf("%s\n", oid_to_hex(oid));
    -+			else
    -+				ret |= report(options,
    -+					      oid, OBJ_BLOB,
    -+					      FSCK_MSG_GITMODULES_MISSING,
    -+					      "unable to read .gitmodules blob");
    - 			continue;
    - 		}
    - 
     
      ## fsck.h ##
    -@@ fsck.h: struct fsck_options {
    - 	int *msg_type;
    - 	struct oidset skiplist;
    - 	kh_oid_map_t *object_names;
    -+
    -+	/*
    -+	 * If 1, print the hashes of missing .gitmodules blobs instead of
    -+	 * considering them to be errors.
    -+	 */
    -+	unsigned print_dangling_gitmodules:1;
    - };
    - 
    - #define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
     @@ fsck.h: int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
      int fsck_object(struct object *obj, void *data, unsigned long size,
      	struct fsck_options *options);
    @@ pack.h: int verify_pack_index(struct packed_git *);
       * The "hdr" output buffer should be at least this big, which will handle sizes
     
      ## t/t5702-protocol-v2.sh ##
    +@@ t/t5702-protocol-v2.sh: test_expect_success 'part of packfile response provided as URI' '
    + 	test -f hfound &&
    + 	test -f h2found &&
    + 
    +-	# Ensure that there are exactly 6 files (3 .pack and 3 .idx).
    +-	ls http_child/.git/objects/pack/* >filelist &&
    ++	# Ensure that there are exactly 3 packfiles with associated .idx
    ++	ls http_child/.git/objects/pack/*.pack \
    ++	    http_child/.git/objects/pack/*.idx >filelist &&
    + 	test_line_count = 6 filelist
    + '
    + 
    +@@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobjects' '
    + 		-c fetch.uriprotocols=http,https \
    + 		clone "$HTTPD_URL/smart/http_parent" http_child &&
    + 
    +-	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
    +-	ls http_child/.git/objects/pack/* >filelist &&
    ++	# Ensure that there are exactly 2 packfiles with associated .idx
    ++	ls http_child/.git/objects/pack/*.pack \
    ++	    http_child/.git/objects/pack/*.idx >filelist &&
    + 	test_line_count = 4 filelist
    + '
    + 
     @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object'
      	test_i18ngrep "invalid author/committer line - missing email" error
      '
    @@ t/t5702-protocol-v2.sh: test_expect_success 'packfile-uri with transfer.fsckobje
     +		-c fetch.uriprotocols=http,https \
     +		clone "$HTTPD_URL/smart/http_parent" http_child &&
     +
    -+	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
    -+	ls http_child/.git/objects/pack/* >filelist &&
    ++	# Ensure that there are exactly 2 packfiles with associated .idx
    ++	ls http_child/.git/objects/pack/*.pack \
    ++	    http_child/.git/objects/pack/*.idx >filelist &&
     +	test_line_count = 4 filelist
     +'
     +
4:  da0d7b38ae < -:  ---------- SQUASH??? test fix
-- 
2.30.0.617.g56c4b15f3c-goog


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 1/4] http: allow custom index-pack args
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
@ 2021-02-22 19:20   ` Jonathan Tan
  2021-02-22 19:20   ` [PATCH v2 2/4] http-fetch: " Jonathan Tan
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-02-22 19:20 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, avarab, gitster

Currently, when fetching, packfiles referenced by URIs are run through
index-pack without any arguments other than --stdin and --keep, no
matter what arguments are used for the packfile that is inline in the
fetch response. As a preparation for ensuring that all packs (whether
inline or not) use the same index-pack arguments, teach the http
subsystem to allow custom index-pack arguments.

http-fetch has been updated to use the new API. For now, it passes
--keep alone instead of --keep with a process ID, but this is only
temporary because http-fetch itself will be taught to accept index-pack
parameters (instead of using a hardcoded constant) in a subsequent
commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 http-fetch.c |  6 +++++-
 http.c       | 15 ++++++++-------
 http.h       | 10 +++++-----
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/http-fetch.c b/http-fetch.c
index c4ccc5fea9..2d1d9d054f 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -43,6 +43,9 @@ static int fetch_using_walker(const char *raw_url, int get_verbosely,
 	return rc;
 }
 
+static const char *index_pack_args[] =
+	{"index-pack", "--stdin", "--keep", NULL};
+
 static void fetch_single_packfile(struct object_id *packfile_hash,
 				  const char *url) {
 	struct http_pack_request *preq;
@@ -55,7 +58,8 @@ static void fetch_single_packfile(struct object_id *packfile_hash,
 	if (preq == NULL)
 		die("couldn't create http pack request");
 	preq->slot->results = &results;
-	preq->generate_keep = 1;
+	preq->index_pack_args = index_pack_args;
+	preq->preserve_index_pack_stdout = 1;
 
 	if (start_active_slot(preq->slot)) {
 		run_active_slot(preq->slot);
diff --git a/http.c b/http.c
index 8b23a546af..f8ea28bb2e 100644
--- a/http.c
+++ b/http.c
@@ -2259,6 +2259,9 @@ void release_http_pack_request(struct http_pack_request *preq)
 	free(preq);
 }
 
+static const char *default_index_pack_args[] =
+	{"index-pack", "--stdin", NULL};
+
 int finish_http_pack_request(struct http_pack_request *preq)
 {
 	struct child_process ip = CHILD_PROCESS_INIT;
@@ -2270,17 +2273,15 @@ int finish_http_pack_request(struct http_pack_request *preq)
 
 	tmpfile_fd = xopen(preq->tmpfile.buf, O_RDONLY);
 
-	strvec_push(&ip.args, "index-pack");
-	strvec_push(&ip.args, "--stdin");
 	ip.git_cmd = 1;
 	ip.in = tmpfile_fd;
-	if (preq->generate_keep) {
-		strvec_pushf(&ip.args, "--keep=git %"PRIuMAX,
-			     (uintmax_t)getpid());
+	ip.argv = preq->index_pack_args ? preq->index_pack_args
+					: default_index_pack_args;
+
+	if (preq->preserve_index_pack_stdout)
 		ip.out = 0;
-	} else {
+	else
 		ip.no_stdout = 1;
-	}
 
 	if (run_command(&ip)) {
 		ret = -1;
diff --git a/http.h b/http.h
index 5de792ef3f..bf3d1270ad 100644
--- a/http.h
+++ b/http.h
@@ -218,12 +218,12 @@ struct http_pack_request {
 	char *url;
 
 	/*
-	 * If this is true, finish_http_pack_request() will pass "--keep" to
-	 * index-pack, resulting in the creation of a keep file, and will not
-	 * suppress its stdout (that is, the "keep\t<hash>\n" line will be
-	 * printed to stdout).
+	 * index-pack command to run. Must be terminated by NULL.
+	 *
+	 * If NULL, defaults to	{"index-pack", "--stdin", NULL}.
 	 */
-	unsigned generate_keep : 1;
+	const char **index_pack_args;
+	unsigned preserve_index_pack_stdout : 1;
 
 	FILE *packfile;
 	struct strbuf tmpfile;
-- 
2.30.0.617.g56c4b15f3c-goog


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 2/4] http-fetch: allow custom index-pack args
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
  2021-02-22 19:20   ` [PATCH v2 1/4] http: allow custom index-pack args Jonathan Tan
@ 2021-02-22 19:20   ` Jonathan Tan
  2021-02-23 13:17     ` Ævar Arnfjörð Bjarmason
  2021-03-05  0:19     ` Jonathan Nieder
  2021-02-22 19:20   ` [PATCH v2 3/4] fetch-pack: with packfile URIs, use index-pack arg Jonathan Tan
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-02-22 19:20 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, avarab, gitster

This is the next step in teaching fetch-pack to pass its index-pack
arguments when processing packfiles referenced by URIs.

The "--keep" in fetch-pack.c will be replaced with a full message in a
subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-http-fetch.txt | 10 ++++++++--
 fetch-pack.c                     |  3 +++
 http-fetch.c                     | 20 +++++++++++++++-----
 t/t5550-http-fetch-dumb.sh       |  5 ++++-
 4 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-http-fetch.txt b/Documentation/git-http-fetch.txt
index 4deb4893f5..9fa17b60e4 100644
--- a/Documentation/git-http-fetch.txt
+++ b/Documentation/git-http-fetch.txt
@@ -41,11 +41,17 @@ commit-id::
 		<commit-id>['\t'<filename-as-in--w>]
 
 --packfile=<hash>::
-	Instead of a commit id on the command line (which is not expected in
+	For internal use only. Instead of a commit id on the command
+	line (which is not expected in
 	this case), 'git http-fetch' fetches the packfile directly at the given
 	URL and uses index-pack to generate corresponding .idx and .keep files.
 	The hash is used to determine the name of the temporary file and is
-	arbitrary. The output of index-pack is printed to stdout.
+	arbitrary. The output of index-pack is printed to stdout. Requires
+	--index-pack-args.
+
+--index-pack-args=<args>::
+	For internal use only. The command to run on the contents of the
+	downloaded pack. Arguments are URL-encoded separated by spaces.
 
 --recover::
 	Verify that everything reachable from target is fetched.  Used after
diff --git a/fetch-pack.c b/fetch-pack.c
index 876f90c759..aeac010b0b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1645,6 +1645,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 		strvec_pushf(&cmd.args, "--packfile=%.*s",
 			     (int) the_hash_algo->hexsz,
 			     packfile_uris.items[i].string);
+		strvec_push(&cmd.args, "--index-pack-arg=index-pack");
+		strvec_push(&cmd.args, "--index-pack-arg=--stdin");
+		strvec_push(&cmd.args, "--index-pack-arg=--keep");
 		strvec_push(&cmd.args, uri);
 		cmd.git_cmd = 1;
 		cmd.no_stdin = 1;
diff --git a/http-fetch.c b/http-fetch.c
index 2d1d9d054f..fa642462a9 100644
--- a/http-fetch.c
+++ b/http-fetch.c
@@ -3,6 +3,7 @@
 #include "exec-cmd.h"
 #include "http.h"
 #include "walker.h"
+#include "strvec.h"
 
 static const char http_fetch_usage[] = "git http-fetch "
 "[-c] [-t] [-a] [-v] [--recover] [-w ref] [--stdin | --packfile=hash | commit-id] url";
@@ -43,11 +44,9 @@ static int fetch_using_walker(const char *raw_url, int get_verbosely,
 	return rc;
 }
 
-static const char *index_pack_args[] =
-	{"index-pack", "--stdin", "--keep", NULL};
-
 static void fetch_single_packfile(struct object_id *packfile_hash,
-				  const char *url) {
+				  const char *url,
+				  const char **index_pack_args) {
 	struct http_pack_request *preq;
 	struct slot_results results;
 	int ret;
@@ -90,6 +89,7 @@ int cmd_main(int argc, const char **argv)
 	int packfile = 0;
 	int nongit;
 	struct object_id packfile_hash;
+	struct strvec index_pack_args = STRVEC_INIT;
 
 	setup_git_directory_gently(&nongit);
 
@@ -116,6 +116,8 @@ int cmd_main(int argc, const char **argv)
 			packfile = 1;
 			if (parse_oid_hex(p, &packfile_hash, &end) || *end)
 				die(_("argument to --packfile must be a valid hash (got '%s')"), p);
+		} else if (skip_prefix(argv[arg], "--index-pack-arg=", &p)) {
+			strvec_push(&index_pack_args, p);
 		}
 		arg++;
 	}
@@ -128,10 +130,18 @@ int cmd_main(int argc, const char **argv)
 	git_config(git_default_config, NULL);
 
 	if (packfile) {
-		fetch_single_packfile(&packfile_hash, argv[arg]);
+		if (!index_pack_args.nr)
+			die(_("--packfile requires --index-pack-args"));
+
+		fetch_single_packfile(&packfile_hash, argv[arg],
+				      index_pack_args.v);
+
 		return 0;
 	}
 
+	if (index_pack_args.nr)
+		die(_("--index-pack-args can only be used with --packfile"));
+
 	if (commits_on_stdin) {
 		commits = walker_targets_stdin(&commit_id, &write_ref);
 	} else {
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 483578b2d7..358b322e05 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -224,7 +224,10 @@ test_expect_success 'http-fetch --packfile' '
 
 	git init packfileclient &&
 	p=$(cd "$HTTPD_DOCUMENT_ROOT_PATH"/repo_pack.git && ls objects/pack/pack-*.pack) &&
-	git -C packfileclient http-fetch --packfile=$ARBITRARY "$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
+	git -C packfileclient http-fetch --packfile=$ARBITRARY \
+		--index-pack-arg=index-pack --index-pack-arg=--stdin \
+		--index-pack-arg=--keep \
+		"$HTTPD_URL"/dumb/repo_pack.git/$p >out &&
 
 	grep "^keep.[0-9a-f]\{16,\}$" out &&
 	cut -c6- out >packhash &&
-- 
2.30.0.617.g56c4b15f3c-goog


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 3/4] fetch-pack: with packfile URIs, use index-pack arg
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
  2021-02-22 19:20   ` [PATCH v2 1/4] http: allow custom index-pack args Jonathan Tan
  2021-02-22 19:20   ` [PATCH v2 2/4] http-fetch: " Jonathan Tan
@ 2021-02-22 19:20   ` Jonathan Tan
  2021-02-22 19:20   ` [PATCH v2 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
  2021-02-22 20:12   ` [PATCH v2 0/4] Check .gitmodules when using packfile URIs Junio C Hamano
  4 siblings, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-02-22 19:20 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, avarab, gitster

Unify the index-pack arguments used when processing the inline pack and
when downloading packfiles referenced by URIs. This is done by teaching
get_pack() to also store the index-pack arguments whenever at least one
packfile URI is given, and then when processing the packfile URI(s),
using the stored arguments.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 fetch-pack.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index aeac010b0b..dd0a6c4b34 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -797,12 +797,13 @@ static void write_promisor_file(const char *keep_name,
 }
 
 /*
- * Pass 1 as "only_packfile" if the pack received is the only pack in this
- * fetch request (that is, if there were no packfile URIs provided).
+ * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
+ * The strings to pass as the --index-pack-arg arguments to http-fetch will be
+ * stored there. (It must be freed by the caller.)
  */
 static int get_pack(struct fetch_pack_args *args,
 		    int xd[2], struct string_list *pack_lockfiles,
-		    int only_packfile,
+		    struct strvec *index_pack_args,
 		    struct ref **sought, int nr_sought)
 {
 	struct async demux;
@@ -845,7 +846,7 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_push(&cmd.args, alternate_shallow_file);
 	}
 
-	if (do_keep || args->from_promisor) {
+	if (do_keep || args->from_promisor || index_pack_args) {
 		if (pack_lockfiles)
 			cmd.out = -1;
 		cmd_name = "index-pack";
@@ -863,7 +864,7 @@ static int get_pack(struct fetch_pack_args *args,
 				     "--keep=fetch-pack %"PRIuMAX " on %s",
 				     (uintmax_t)getpid(), hostname);
 		}
-		if (only_packfile && args->check_self_contained_and_connected)
+		if (!index_pack_args && args->check_self_contained_and_connected)
 			strvec_push(&cmd.args, "--check-self-contained-and-connected");
 		else
 			/*
@@ -901,7 +902,7 @@ static int get_pack(struct fetch_pack_args *args,
 	    : transfer_fsck_objects >= 0
 	    ? transfer_fsck_objects
 	    : 0) {
-		if (args->from_promisor || !only_packfile)
+		if (args->from_promisor || index_pack_args)
 			/*
 			 * We cannot use --strict in index-pack because it
 			 * checks both broken objects and links, but we only
@@ -913,6 +914,13 @@ static int get_pack(struct fetch_pack_args *args,
 				     fsck_msg_types.buf);
 	}
 
+	if (index_pack_args) {
+		int i;
+
+		for (i = 0; i < cmd.args.nr; i++)
+			strvec_push(index_pack_args, cmd.args.v[i]);
+	}
+
 	cmd.in = demux.out;
 	cmd.git_cmd = 1;
 	if (start_command(&cmd))
@@ -1084,7 +1092,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		alternate_shallow_file = setup_temporary_shallow(si->shallow);
 	else
 		alternate_shallow_file = NULL;
-	if (get_pack(args, fd, pack_lockfiles, 1, sought, nr_sought))
+	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought))
 		die(_("git fetch-pack: fetch failed."));
 
  all_done:
@@ -1535,6 +1543,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	int seen_ack = 0;
 	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
 	int i;
+	struct strvec index_pack_args = STRVEC_INIT;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1624,7 +1633,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 				receive_packfile_uris(&reader, &packfile_uris);
 			process_section_header(&reader, "packfile", 0);
 			if (get_pack(args, fd, pack_lockfiles,
-				     !packfile_uris.nr, sought, nr_sought))
+				     packfile_uris.nr ? &index_pack_args : NULL,
+				     sought, nr_sought))
 				die(_("git fetch-pack: fetch failed."));
 			do_check_stateless_delimiter(args, &reader);
 
@@ -1636,6 +1646,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	}
 
 	for (i = 0; i < packfile_uris.nr; i++) {
+		int j;
 		struct child_process cmd = CHILD_PROCESS_INIT;
 		char packname[GIT_MAX_HEXSZ + 1];
 		const char *uri = packfile_uris.items[i].string +
@@ -1645,9 +1656,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 		strvec_pushf(&cmd.args, "--packfile=%.*s",
 			     (int) the_hash_algo->hexsz,
 			     packfile_uris.items[i].string);
-		strvec_push(&cmd.args, "--index-pack-arg=index-pack");
-		strvec_push(&cmd.args, "--index-pack-arg=--stdin");
-		strvec_push(&cmd.args, "--index-pack-arg=--keep");
+		for (j = 0; j < index_pack_args.nr; j++)
+			strvec_pushf(&cmd.args, "--index-pack-arg=%s",
+				     index_pack_args.v[j]);
 		strvec_push(&cmd.args, uri);
 		cmd.git_cmd = 1;
 		cmd.no_stdin = 1;
@@ -1683,6 +1694,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 						 packname));
 	}
 	string_list_clear(&packfile_uris, 0);
+	strvec_clear(&index_pack_args);
 
 	if (negotiator)
 		negotiator->release(negotiator);
-- 
2.30.0.617.g56c4b15f3c-goog


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v2 4/4] fetch-pack: print and use dangling .gitmodules
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
                     ` (2 preceding siblings ...)
  2021-02-22 19:20   ` [PATCH v2 3/4] fetch-pack: with packfile URIs, use index-pack arg Jonathan Tan
@ 2021-02-22 19:20   ` Jonathan Tan
  2021-02-22 20:12   ` [PATCH v2 0/4] Check .gitmodules when using packfile URIs Junio C Hamano
  4 siblings, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-02-22 19:20 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, avarab, gitster

Teach index-pack to print dangling .gitmodules links after its "keep" or
"pack" line instead of declaring an error, and teach fetch-pack to check
such lines printed.

This allows the tree side of the .gitmodules link to be in one packfile
and the blob side to be in another without failing the fsck check,
because it is now fetch-pack which checks such objects after all
packfiles have been downloaded and indexed (and not index-pack on an
individual packfile, as it is before this commit).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-index-pack.txt |  7 ++-
 builtin/index-pack.c             | 25 +++++++++-
 builtin/receive-pack.c           |  2 +-
 fetch-pack.c                     | 78 +++++++++++++++++++++++++++-----
 fsck.c                           |  5 ++
 fsck.h                           |  2 +
 pack-write.c                     |  8 +++-
 pack.h                           |  2 +-
 t/t5702-protocol-v2.sh           | 58 ++++++++++++++++++++++--
 9 files changed, 165 insertions(+), 22 deletions(-)

diff --git a/Documentation/git-index-pack.txt b/Documentation/git-index-pack.txt
index af0c26232c..e74a4a1eda 100644
--- a/Documentation/git-index-pack.txt
+++ b/Documentation/git-index-pack.txt
@@ -78,7 +78,12 @@ OPTIONS
 	Die if the pack contains broken links. For internal use only.
 
 --fsck-objects::
-	Die if the pack contains broken objects. For internal use only.
+	For internal use only.
++
+Die if the pack contains broken objects. If the pack contains a tree
+pointing to a .gitmodules blob that does not exist, prints the hash of
+that blob (for the caller to check) after the hash that goes into the
+name of the pack/idx file (see "Notes").
 
 --threads=<n>::
 	Specifies the number of threads to spawn when resolving
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 557bd2f348..0444febeee 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1693,6 +1693,22 @@ static void show_pack_info(int stat_only)
 	}
 }
 
+static int print_dangling_gitmodules(struct fsck_options *o,
+				     const struct object_id *oid,
+				     enum object_type object_type,
+				     int msg_type, const char *message)
+{
+	/*
+	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
+	 * instead of relying on this string check.
+	 */
+	if (starts_with(message, "gitmodulesMissing")) {
+		printf("%s\n", oid_to_hex(oid));
+		return 0;
+	}
+	return fsck_error_function(o, oid, object_type, msg_type, message);
+}
+
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0;
@@ -1888,8 +1904,13 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	else
 		close(input_fd);
 
-	if (do_fsck_object && fsck_finish(&fsck_options))
-		die(_("fsck error in pack objects"));
+	if (do_fsck_object) {
+		struct fsck_options fo = fsck_options;
+
+		fo.error_func = print_dangling_gitmodules;
+		if (fsck_finish(&fo))
+			die(_("fsck error in pack objects"));
+	}
 
 	free(objects);
 	strbuf_release(&index_name_buf);
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index d49d050e6e..ed2c9b42e9 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2275,7 +2275,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		status = start_command(&child);
 		if (status)
 			return "index-pack fork failed";
-		pack_lockfile = index_pack_lockfile(child.out);
+		pack_lockfile = index_pack_lockfile(child.out, NULL);
 		close(child.out);
 		status = finish_command(&child);
 		if (status)
diff --git a/fetch-pack.c b/fetch-pack.c
index dd0a6c4b34..f9def5ac74 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -796,6 +796,26 @@ static void write_promisor_file(const char *keep_name,
 	strbuf_release(&promisor_name);
 }
 
+static void parse_gitmodules_oids(int fd, struct oidset *gitmodules_oids)
+{
+	int len = the_hash_algo->hexsz + 1; /* hash + NL */
+
+	do {
+		char hex_hash[GIT_MAX_HEXSZ + 1];
+		int read_len = read_in_full(fd, hex_hash, len);
+		struct object_id oid;
+		const char *end;
+
+		if (!read_len)
+			return;
+		if (read_len != len)
+			die("invalid length read %d", read_len);
+		if (parse_oid_hex(hex_hash, &oid, &end) || *end != '\n')
+			die("invalid hash");
+		oidset_insert(gitmodules_oids, &oid);
+	} while (1);
+}
+
 /*
  * If packfile URIs were provided, pass a non-NULL pointer to index_pack_args.
  * The strings to pass as the --index-pack-arg arguments to http-fetch will be
@@ -804,7 +824,8 @@ static void write_promisor_file(const char *keep_name,
 static int get_pack(struct fetch_pack_args *args,
 		    int xd[2], struct string_list *pack_lockfiles,
 		    struct strvec *index_pack_args,
-		    struct ref **sought, int nr_sought)
+		    struct ref **sought, int nr_sought,
+		    struct oidset *gitmodules_oids)
 {
 	struct async demux;
 	int do_keep = args->keep_pack;
@@ -812,6 +833,7 @@ static int get_pack(struct fetch_pack_args *args,
 	struct pack_header header;
 	int pass_header = 0;
 	struct child_process cmd = CHILD_PROCESS_INIT;
+	int fsck_objects = 0;
 	int ret;
 
 	memset(&demux, 0, sizeof(demux));
@@ -846,8 +868,15 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_push(&cmd.args, alternate_shallow_file);
 	}
 
-	if (do_keep || args->from_promisor || index_pack_args) {
-		if (pack_lockfiles)
+	if (fetch_fsck_objects >= 0
+	    ? fetch_fsck_objects
+	    : transfer_fsck_objects >= 0
+	    ? transfer_fsck_objects
+	    : 0)
+		fsck_objects = 1;
+
+	if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
+		if (pack_lockfiles || fsck_objects)
 			cmd.out = -1;
 		cmd_name = "index-pack";
 		strvec_push(&cmd.args, cmd_name);
@@ -897,11 +926,7 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_pushf(&cmd.args, "--pack_header=%"PRIu32",%"PRIu32,
 			     ntohl(header.hdr_version),
 				 ntohl(header.hdr_entries));
-	if (fetch_fsck_objects >= 0
-	    ? fetch_fsck_objects
-	    : transfer_fsck_objects >= 0
-	    ? transfer_fsck_objects
-	    : 0) {
+	if (fsck_objects) {
 		if (args->from_promisor || index_pack_args)
 			/*
 			 * We cannot use --strict in index-pack because it
@@ -925,10 +950,15 @@ static int get_pack(struct fetch_pack_args *args,
 	cmd.git_cmd = 1;
 	if (start_command(&cmd))
 		die(_("fetch-pack: unable to fork off %s"), cmd_name);
-	if (do_keep && pack_lockfiles) {
-		char *pack_lockfile = index_pack_lockfile(cmd.out);
+	if (do_keep && (pack_lockfiles || fsck_objects)) {
+		int is_well_formed;
+		char *pack_lockfile = index_pack_lockfile(cmd.out, &is_well_formed);
+
+		if (!is_well_formed)
+			die(_("fetch-pack: invalid index-pack output"));
 		if (pack_lockfile)
 			string_list_append_nodup(pack_lockfiles, pack_lockfile);
+		parse_gitmodules_oids(cmd.out, gitmodules_oids);
 		close(cmd.out);
 	}
 
@@ -963,6 +993,22 @@ static int cmp_ref_by_name(const void *a_, const void *b_)
 	return strcmp(a->name, b->name);
 }
 
+static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
+{
+	struct oidset_iter iter;
+	const struct object_id *oid;
+	struct fsck_options fo = FSCK_OPTIONS_STRICT;
+
+	if (!oidset_size(gitmodules_oids))
+		return;
+
+	oidset_iter_init(gitmodules_oids, &iter);
+	while ((oid = oidset_iter_next(&iter)))
+		register_found_gitmodules(oid);
+	if (fsck_finish(&fo))
+		die("fsck failed");
+}
+
 static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 				 int fd[2],
 				 const struct ref *orig_ref,
@@ -977,6 +1023,7 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 	int agent_len;
 	struct fetch_negotiator negotiator_alloc;
 	struct fetch_negotiator *negotiator;
+	struct oidset gitmodules_oids = OIDSET_INIT;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1092,8 +1139,10 @@ static struct ref *do_fetch_pack(struct fetch_pack_args *args,
 		alternate_shallow_file = setup_temporary_shallow(si->shallow);
 	else
 		alternate_shallow_file = NULL;
-	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought))
+	if (get_pack(args, fd, pack_lockfiles, NULL, sought, nr_sought,
+		     &gitmodules_oids))
 		die(_("git fetch-pack: fetch failed."));
+	fsck_gitmodules_oids(&gitmodules_oids);
 
  all_done:
 	if (negotiator)
@@ -1544,6 +1593,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	struct string_list packfile_uris = STRING_LIST_INIT_DUP;
 	int i;
 	struct strvec index_pack_args = STRVEC_INIT;
+	struct oidset gitmodules_oids = OIDSET_INIT;
 
 	negotiator = &negotiator_alloc;
 	fetch_negotiator_init(r, negotiator);
@@ -1634,7 +1684,7 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 			process_section_header(&reader, "packfile", 0);
 			if (get_pack(args, fd, pack_lockfiles,
 				     packfile_uris.nr ? &index_pack_args : NULL,
-				     sought, nr_sought))
+				     sought, nr_sought, &gitmodules_oids))
 				die(_("git fetch-pack: fetch failed."));
 			do_check_stateless_delimiter(args, &reader);
 
@@ -1677,6 +1727,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 
 		packname[the_hash_algo->hexsz] = '\0';
 
+		parse_gitmodules_oids(cmd.out, &gitmodules_oids);
+
 		close(cmd.out);
 
 		if (finish_command(&cmd))
@@ -1696,6 +1748,8 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
 	string_list_clear(&packfile_uris, 0);
 	strvec_clear(&index_pack_args);
 
+	fsck_gitmodules_oids(&gitmodules_oids);
+
 	if (negotiator)
 		negotiator->release(negotiator);
 
diff --git a/fsck.c b/fsck.c
index f82e2fe9e3..49ef6569e8 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1243,6 +1243,11 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
+void register_found_gitmodules(const struct object_id *oid)
+{
+	oidset_insert(&gitmodules_found, oid);
+}
+
 int fsck_finish(struct fsck_options *options)
 {
 	int ret = 0;
diff --git a/fsck.h b/fsck.h
index 69cf715e79..d75b723bd5 100644
--- a/fsck.h
+++ b/fsck.h
@@ -62,6 +62,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
+void register_found_gitmodules(const struct object_id *oid);
+
 /*
  * Some fsck checks are context-dependent, and may end up queued; run this
  * after completing all fsck_object() calls in order to resolve any remaining
diff --git a/pack-write.c b/pack-write.c
index 3513665e1e..f66ea8e5a1 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -272,7 +272,7 @@ void fixup_pack_header_footer(int pack_fd,
 	fsync_or_die(pack_fd, pack_name);
 }
 
-char *index_pack_lockfile(int ip_out)
+char *index_pack_lockfile(int ip_out, int *is_well_formed)
 {
 	char packname[GIT_MAX_HEXSZ + 6];
 	const int len = the_hash_algo->hexsz + 6;
@@ -286,11 +286,17 @@ char *index_pack_lockfile(int ip_out)
 	 */
 	if (read_in_full(ip_out, packname, len) == len && packname[len-1] == '\n') {
 		const char *name;
+
+		if (is_well_formed)
+			*is_well_formed = 1;
 		packname[len-1] = 0;
 		if (skip_prefix(packname, "keep\t", &name))
 			return xstrfmt("%s/pack/pack-%s.keep",
 				       get_object_directory(), name);
+		return NULL;
 	}
+	if (is_well_formed)
+		*is_well_formed = 0;
 	return NULL;
 }
 
diff --git a/pack.h b/pack.h
index 9fc0945ac9..09cffec395 100644
--- a/pack.h
+++ b/pack.h
@@ -85,7 +85,7 @@ int verify_pack_index(struct packed_git *);
 int verify_pack(struct repository *, struct packed_git *, verify_fn fn, struct progress *, uint32_t);
 off_t write_pack_header(struct hashfile *f, uint32_t);
 void fixup_pack_header_footer(int, unsigned char *, const char *, uint32_t, unsigned char *, off_t);
-char *index_pack_lockfile(int fd);
+char *index_pack_lockfile(int fd, int *is_well_formed);
 
 /*
  * The "hdr" output buffer should be at least this big, which will handle sizes
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 7d5b17909b..b1bc73a9a9 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -847,8 +847,9 @@ test_expect_success 'part of packfile response provided as URI' '
 	test -f hfound &&
 	test -f h2found &&
 
-	# Ensure that there are exactly 6 files (3 .pack and 3 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
+	# Ensure that there are exactly 3 packfiles with associated .idx
+	ls http_child/.git/objects/pack/*.pack \
+	    http_child/.git/objects/pack/*.idx >filelist &&
 	test_line_count = 6 filelist
 '
 
@@ -901,8 +902,9 @@ test_expect_success 'packfile-uri with transfer.fsckobjects' '
 		-c fetch.uriprotocols=http,https \
 		clone "$HTTPD_URL/smart/http_parent" http_child &&
 
-	# Ensure that there are exactly 4 files (2 .pack and 2 .idx).
-	ls http_child/.git/objects/pack/* >filelist &&
+	# Ensure that there are exactly 2 packfiles with associated .idx
+	ls http_child/.git/objects/pack/*.pack \
+	    http_child/.git/objects/pack/*.idx >filelist &&
 	test_line_count = 4 filelist
 '
 
@@ -936,6 +938,54 @@ test_expect_success 'packfile-uri with transfer.fsckobjects fails on bad object'
 	test_i18ngrep "invalid author/committer line - missing email" error
 '
 
+test_expect_success 'packfile-uri with transfer.fsckobjects succeeds when .gitmodules is separate from tree' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo "[submodule libfoo]" >"$P/.gitmodules" &&
+	echo "path = include/foo" >>"$P/.gitmodules" &&
+	echo "url = git://example.com/git/lib.git" >>"$P/.gitmodules" &&
+	git -C "$P" add .gitmodules &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" .gitmodules >h &&
+
+	sane_unset GIT_TEST_SIDEBAND_ALL &&
+	git -c protocol.version=2 -c transfer.fsckobjects=1 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child &&
+
+	# Ensure that there are exactly 2 packfiles with associated .idx
+	ls http_child/.git/objects/pack/*.pack \
+	    http_child/.git/objects/pack/*.idx >filelist &&
+	test_line_count = 4 filelist
+'
+
+test_expect_success 'packfile-uri with transfer.fsckobjects fails when .gitmodules separate from tree is invalid' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child err &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo "[submodule \"..\"]" >"$P/.gitmodules" &&
+	echo "path = include/foo" >>"$P/.gitmodules" &&
+	echo "url = git://example.com/git/lib.git" >>"$P/.gitmodules" &&
+	git -C "$P" add .gitmodules &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" .gitmodules >h &&
+
+	sane_unset GIT_TEST_SIDEBAND_ALL &&
+	test_must_fail git -c protocol.version=2 -c transfer.fsckobjects=1 \
+		-c fetch.uriprotocols=http,https \
+		clone "$HTTPD_URL/smart/http_parent" http_child 2>err &&
+	test_i18ngrep "disallowed submodule name" err
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.
 
-- 
2.30.0.617.g56c4b15f3c-goog


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 0/4] Check .gitmodules when using packfile URIs
  2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
                     ` (3 preceding siblings ...)
  2021-02-22 19:20   ` [PATCH v2 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
@ 2021-02-22 20:12   ` Junio C Hamano
  4 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-02-22 20:12 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, avarab

Jonathan Tan <jonathantanmy@google.com> writes:

> Here's v2. I think I've addressed all the review comments, including
> passing the index-pack args as separate arguments (to avoid the
> necessity to somehow encode in order to get rid of spaces), and by using
> a custom error function instead of a specific option in fsck.
>
> This applies on master. I mentioned earlier [1] that I was planning to
> implement this on Ævar's fsck API improvements, but after looking at the
> latest v2, I see that it omits patch 11 from v1 (which is the one I
> need), so what I've done is to use a string check in the meantime.
>
> [1] https://lore.kernel.org/git/20210219004612.1181920-1-jonathantanmy@google.com/

I only looked at the difference between this round and what is in
'seen', but everything looked reasonable to me (including the code
that is near NEEDSWORK comment, and what the comment said).

Will queue.  Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/4] http-fetch: allow custom index-pack args
  2021-02-22 19:20   ` [PATCH v2 2/4] http-fetch: " Jonathan Tan
@ 2021-02-23 13:17     ` Ævar Arnfjörð Bjarmason
  2021-02-23 16:51       ` Jonathan Tan
  2021-03-05  0:19     ` Jonathan Nieder
  1 sibling, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-02-23 13:17 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster


On Mon, Feb 22 2021, Jonathan Tan wrote:

> diff --git a/Documentation/git-http-fetch.txt b/Documentation/git-http-fetch.txt
> index 4deb4893f5..9fa17b60e4 100644
> --- a/Documentation/git-http-fetch.txt
> +++ b/Documentation/git-http-fetch.txt
> @@ -41,11 +41,17 @@ commit-id::
>  		<commit-id>['\t'<filename-as-in--w>]
>  
>  --packfile=<hash>::
> -	Instead of a commit id on the command line (which is not expected in
> +	For internal use only. Instead of a commit id on the command
> +	line (which is not expected in
>  	this case), 'git http-fetch' fetches the packfile directly at the given
>  	URL and uses index-pack to generate corresponding .idx and .keep files.
>  	The hash is used to determine the name of the temporary file and is
> -	arbitrary. The output of index-pack is printed to stdout.
> +	arbitrary. The output of index-pack is printed to stdout. Requires
> +	--index-pack-args.
> +
> +--index-pack-args=<args>::
> +	For internal use only. The command to run on the contents of the
> +	downloaded pack. Arguments are URL-encoded separated by spaces.
>  
>  --recover::
>  	Verify that everything reachable from target is fetched.  Used after
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 876f90c759..aeac010b0b 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1645,6 +1645,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
>  		strvec_pushf(&cmd.args, "--packfile=%.*s",
>  			     (int) the_hash_algo->hexsz,
>  			     packfile_uris.items[i].string);
> +		strvec_push(&cmd.args, "--index-pack-arg=index-pack");
> +		strvec_push(&cmd.args, "--index-pack-arg=--stdin");
> +		strvec_push(&cmd.args, "--index-pack-arg=--keep");

The docs say --*-args, but the code checks --*arg, that seems like a
mistake that should be fixed to make the code/tests use the plural form,
no?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/4] http-fetch: allow custom index-pack args
  2021-02-23 13:17     ` Ævar Arnfjörð Bjarmason
@ 2021-02-23 16:51       ` Jonathan Tan
  0 siblings, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-02-23 16:51 UTC (permalink / raw)
  To: avarab; +Cc: jonathantanmy, git, gitster

> > diff --git a/Documentation/git-http-fetch.txt b/Documentation/git-http-fetch.txt
> > index 4deb4893f5..9fa17b60e4 100644
> > --- a/Documentation/git-http-fetch.txt
> > +++ b/Documentation/git-http-fetch.txt
> > @@ -41,11 +41,17 @@ commit-id::
> >  		<commit-id>['\t'<filename-as-in--w>]
> >  
> >  --packfile=<hash>::
> > -	Instead of a commit id on the command line (which is not expected in
> > +	For internal use only. Instead of a commit id on the command
> > +	line (which is not expected in
> >  	this case), 'git http-fetch' fetches the packfile directly at the given
> >  	URL and uses index-pack to generate corresponding .idx and .keep files.
> >  	The hash is used to determine the name of the temporary file and is
> > -	arbitrary. The output of index-pack is printed to stdout.
> > +	arbitrary. The output of index-pack is printed to stdout. Requires
> > +	--index-pack-args.
> > +
> > +--index-pack-args=<args>::
> > +	For internal use only. The command to run on the contents of the
> > +	downloaded pack. Arguments are URL-encoded separated by spaces.
> >  
> >  --recover::
> >  	Verify that everything reachable from target is fetched.  Used after
> > diff --git a/fetch-pack.c b/fetch-pack.c
> > index 876f90c759..aeac010b0b 100644
> > --- a/fetch-pack.c
> > +++ b/fetch-pack.c
> > @@ -1645,6 +1645,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args,
> >  		strvec_pushf(&cmd.args, "--packfile=%.*s",
> >  			     (int) the_hash_algo->hexsz,
> >  			     packfile_uris.items[i].string);
> > +		strvec_push(&cmd.args, "--index-pack-arg=index-pack");
> > +		strvec_push(&cmd.args, "--index-pack-arg=--stdin");
> > +		strvec_push(&cmd.args, "--index-pack-arg=--keep");
> 
> The docs say --*-args, but the code checks --*arg, that seems like a
> mistake that should be fixed to make the code/tests use the plural form,
> no?

Thanks for catching that. Originally it was plural since this single
argument would give multiple arguments to index-pack, but now each
argument gives only a single argument, so "arg" is correct. I'll update
it in the next version.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v2 2/4] http-fetch: allow custom index-pack args
  2021-02-22 19:20   ` [PATCH v2 2/4] http-fetch: " Jonathan Tan
  2021-02-23 13:17     ` Ævar Arnfjörð Bjarmason
@ 2021-03-05  0:19     ` Jonathan Nieder
  2021-03-05  1:16       ` [PATCH] fetch-pack: do not mix --pack_header and packfile uri Jonathan Tan
  1 sibling, 1 reply; 134+ messages in thread
From: Jonathan Nieder @ 2021-03-05  0:19 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, avarab, gitster, Nathan Mulcahey

Hi Jonathan,

Jonathan Tan wrote:

> This is the next step in teaching fetch-pack to pass its index-pack
> arguments when processing packfiles referenced by URIs.
>
> The "--keep" in fetch-pack.c will be replaced with a full message in a
> subsequent commit.
>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/git-http-fetch.txt | 10 ++++++++--
>  fetch-pack.c                     |  3 +++
>  http-fetch.c                     | 20 +++++++++++++++-----
>  t/t5550-http-fetch-dumb.sh       |  5 ++++-
>  4 files changed, 30 insertions(+), 8 deletions(-)

This is producing an interesting symptom for me:

 git init repro
 cd repro
 git config fetch.uriprotocols https
 git config remote.origin.url https://fuchsia.googlesource.com/fuchsia
 git config remote.origin.fetch +refs/heads/*:refs/remotes/origin/*
 git fetch -p origin

Expected result: fetches

Actual result:

 fatal: pack has bad object at offset 12: unknown object type 5
 fatal: finish_http_pack_request gave result -1
 fatal: fetch-pack: expected keep then TAB at start of http-fetch output

Thanks to Nathan Mulcahey (cc-ed) for a clear report.

Bisects to b664e9ffa153189dae9b88f32d1c5fedcf85056a, which is part of
"next" and 2.31.0-rc1.  Another report of the same is at
https://crbug.com/1184814.

Known problem?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05  0:19     ` Jonathan Nieder
@ 2021-03-05  1:16       ` Jonathan Tan
  2021-03-05  1:52         ` Junio C Hamano
  2021-03-05 18:50         ` Junio C Hamano
  0 siblings, 2 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-03-05  1:16 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, jrnieder, nmulcahey

When fetching (as opposed to cloning) from a repository with packfile
URIs enabled, an error like this may occur:

 fatal: pack has bad object at offset 12: unknown object type 5
 fatal: finish_http_pack_request gave result -1
 fatal: fetch-pack: expected keep then TAB at start of http-fetch output

This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs,
use index-pack arg", 2021-02-22), when the index-pack args used when
processing the inline packfile of a fetch response and when processing
packfile URIs were unified.

This bug happens because fetch, by default, partially reads (and
consumes) the header of the inline packfile to determine if it should
store the downloaded objects as a packfile or loose objects, and thus
passes --pack_header=<...> to index-pack to inform it that some bytes
are missing. However, when it subsequently fetches the additional
packfiles linked by URIs, it reuses the same index-pack arguments, thus
wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are
missing.

This does not happen when cloning because "git clone" always passes
do_keep, which instructs the fetch mechanism to always retain the
packfile, eliminating the need to read the header.

There are a few ways to fix this, including filtering out pack_header
arguments when downloading the additional packfiles, but I decided to
stick to always using index-pack throughout when packfile URIs are
present - thus, Git no longer needs to read the bytes, and no longer
needs --pack_header here.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Here's a fix for this issue.

This is on jt/transfer-fsck-across-packs.

One simplification that we could do is to eliminate the unpack-objects
codepath. As far as I understand, the main advantage of writing loose
objects is that we have automatic SHA-1 collision detection, but we have
such mitigations when writing packs too, so that might not be as large a
benefit as we think. This simplification would have enabled us to avoid
this bug, I think.
---
 fetch-pack.c           |  4 ++--
 t/t5702-protocol-v2.sh | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index f9def5ac74..e990607742 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -852,7 +852,7 @@ static int get_pack(struct fetch_pack_args *args,
 	else
 		demux.out = xd[0];
 
-	if (!args->keep_pack && unpack_limit) {
+	if (!args->keep_pack && unpack_limit && !index_pack_args) {
 
 		if (read_pack_header(demux.out, &header))
 			die(_("protocol error: bad pack header"));
@@ -885,7 +885,7 @@ static int get_pack(struct fetch_pack_args *args,
 			strvec_push(&cmd.args, "-v");
 		if (args->use_thin_pack)
 			strvec_push(&cmd.args, "--fix-thin");
-		if (do_keep && (args->lock_pack || unpack_limit)) {
+		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
 			char hostname[HOST_NAME_MAX + 1];
 			if (xgethostname(hostname, sizeof(hostname)))
 				xsnprintf(hostname, sizeof(hostname), "localhost");
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index b1bc73a9a9..9df1ec82ca 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -853,6 +853,27 @@ test_expect_success 'part of packfile response provided as URI' '
 	test_line_count = 6 filelist
 '
 
+test_expect_success 'packfile URIs with fetch instead of clone' '
+	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+	rm -rf "$P" http_child log &&
+
+	git init "$P" &&
+	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
+
+	echo my-blob >"$P/my-blob" &&
+	git -C "$P" add my-blob &&
+	git -C "$P" commit -m x &&
+
+	configure_exclusion "$P" my-blob >h &&
+
+	git init http_child &&
+
+	GIT_TEST_SIDEBAND_ALL=1 \
+	git -C http_child -c protocol.version=2 \
+		-c fetch.uriprotocols=http,https \
+		fetch "$HTTPD_URL/smart/http_parent"
+'
+
 test_expect_success 'fetching with valid packfile URI but invalid hash fails' '
 	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
 	rm -rf "$P" http_child log &&
-- 
2.30.1.766.gb4fecdf3b7-goog


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05  1:16       ` [PATCH] fetch-pack: do not mix --pack_header and packfile uri Jonathan Tan
@ 2021-03-05  1:52         ` Junio C Hamano
  2021-03-05 18:50         ` Junio C Hamano
  1 sibling, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-03-05  1:52 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Jonathan Tan <jonathantanmy@google.com> writes:

> One simplification that we could do is to eliminate the unpack-objects
> codepath. As far as I understand, the main advantage of writing loose
> objects is that we have automatic SHA-1 collision detection, but we have
> such mitigations when writing packs too, so that might not be as large a
> benefit as we think. This simplification would have enabled us to avoid
> this bug, I think.

My understanding is that the primary advantage of loose objects
codepath is to help us avoid having too many little packs (instead,
we can accumulate enough objects in the loose form and let GC pack
them, at least the ones among them that are still reachable, into a
single pack).  Historically, the only mode of operation "repack"
offers that reduces the number of remaining packs has been "do full
reachability of the entire history, and pack everything into one",
so avoiding creation of little packs and leaving things loose until
we accumulate enough used to matter.

With the geometric rolling repacking, it may not matter as much, and
keeping everything packed, even in a small pack, might start to be
overall win.  So I am not opposed to such a simplification; we may
not be ready for it right now, but I think it would be a sensible
future direction.





^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05  1:16       ` [PATCH] fetch-pack: do not mix --pack_header and packfile uri Jonathan Tan
  2021-03-05  1:52         ` Junio C Hamano
@ 2021-03-05 18:50         ` Junio C Hamano
  2021-03-05 19:46           ` Junio C Hamano
  2021-03-05 22:59           ` Jonathan Tan
  1 sibling, 2 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-03-05 18:50 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Jonathan Tan <jonathantanmy@google.com> writes:

> When fetching (as opposed to cloning) from a repository with packfile
> URIs enabled, an error like this may occur:
>
>  fatal: pack has bad object at offset 12: unknown object type 5
>  fatal: finish_http_pack_request gave result -1
>  fatal: fetch-pack: expected keep then TAB at start of http-fetch output
>
> This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs,
> use index-pack arg", 2021-02-22), when the index-pack args used when
> processing the inline packfile of a fetch response and when processing
> packfile URIs were unified.

> This bug happens because fetch, by default, partially reads (and
> consumes) the header of the inline packfile to determine if it should
> store the downloaded objects as a packfile or loose objects, and thus
> passes --pack_header=<...> to index-pack to inform it that some bytes
> are missing. 

... and what the values in them are.

> However, when it subsequently fetches the additional
> packfiles linked by URIs, it reuses the same index-pack arguments, thus
> wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are
> missing.
>
> This does not happen when cloning because "git clone" always passes
> do_keep, which instructs the fetch mechanism to always retain the
> packfile, eliminating the need to read the header.
>
> There are a few ways to fix this, including filtering out pack_header
> arguments when downloading the additional packfiles, but ...

Avoiding the condition that exhibits the breakage is possible, and I
think it is what is done here, but I actually think that the only
right fix is to pass correct argument to commands we invoke in the
first place.  Why are we reusing the same argument array to begin
with?

    ... goes back and reads the offending commit ...

commit b664e9ffa153189dae9b88f32d1c5fedcf85056a
Author: Jonathan Tan <jonathantanmy@google.com>
Date:   Mon Feb 22 11:20:08 2021 -0800

    fetch-pack: with packfile URIs, use index-pack arg
    
    Unify the index-pack arguments used when processing the inline pack and
    when downloading packfiles referenced by URIs. This is done by teaching
    get_pack() to also store the index-pack arguments whenever at least one
    packfile URI is given, and then when processing the packfile URI(s),
    using the stored arguments.

THis makes it sound like the entire idea of this offending commit
was wrong, and before it, the codepath that processed the packfile
fetched from the packfile URI were using the index-pack correctly
by using index-pack arguments that are independent from the one that
is used to process the packfile given in-stream.  Why isn't the fix
just a straight revert of the commit???

> This is on jt/transfer-fsck-across-packs.

Ouch.  This definitely is an -rc material.


> -	if (!args->keep_pack && unpack_limit) {
> +	if (!args->keep_pack && unpack_limit && !index_pack_args) {

This one makes sense as an "avoid conditions that reveals how badly
the code is broken" band-aid.  When we have index-pack related
arguments, we cannot use the unpack-objects codepath even if we are
being fed a tiny pack, so there is no point peeking at the beginning
of the pack stream to find out how many objects it has.  OK.

> @@ -885,7 +885,7 @@ static int get_pack(struct fetch_pack_args *args,
>  			strvec_push(&cmd.args, "-v");
>  		if (args->use_thin_pack)
>  			strvec_push(&cmd.args, "--fix-thin");
> -		if (do_keep && (args->lock_pack || unpack_limit)) {
> +		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
>  			char hostname[HOST_NAME_MAX + 1];
>  			if (xgethostname(hostname, sizeof(hostname)))
>  				xsnprintf(hostname, sizeof(hostname), "localhost");

I do not quite get what this hunk is doing.  Care to explain?

Thanks.

> diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
> index b1bc73a9a9..9df1ec82ca 100755
> --- a/t/t5702-protocol-v2.sh
> +++ b/t/t5702-protocol-v2.sh
> @@ -853,6 +853,27 @@ test_expect_success 'part of packfile response provided as URI' '
>  	test_line_count = 6 filelist
>  '
>  
> +test_expect_success 'packfile URIs with fetch instead of clone' '
> +	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
> +	rm -rf "$P" http_child log &&
> +
> +	git init "$P" &&
> +	git -C "$P" config "uploadpack.allowsidebandall" "true" &&
> +
> +	echo my-blob >"$P/my-blob" &&
> +	git -C "$P" add my-blob &&
> +	git -C "$P" commit -m x &&
> +
> +	configure_exclusion "$P" my-blob >h &&
> +
> +	git init http_child &&
> +
> +	GIT_TEST_SIDEBAND_ALL=1 \
> +	git -C http_child -c protocol.version=2 \
> +		-c fetch.uriprotocols=http,https \
> +		fetch "$HTTPD_URL/smart/http_parent"
> +'
> +
>  test_expect_success 'fetching with valid packfile URI but invalid hash fails' '
>  	P="$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
>  	rm -rf "$P" http_child log &&

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05 18:50         ` Junio C Hamano
@ 2021-03-05 19:46           ` Junio C Hamano
  2021-03-05 23:11             ` Jonathan Tan
  2021-03-05 23:20             ` Junio C Hamano
  2021-03-05 22:59           ` Jonathan Tan
  1 sibling, 2 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-03-05 19:46 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Junio C Hamano <gitster@pobox.com> writes:

> Avoiding the condition that exhibits the breakage is possible, and I
> think it is what is done here, but I actually think that the only
> right fix is to pass correct argument to commands we invoke in the
> first place.  Why are we reusing the same argument array to begin
> with?
>
>     ... goes back and reads the offending commit ...
>
> commit b664e9ffa153189dae9b88f32d1c5fedcf85056a
> Author: Jonathan Tan <jonathantanmy@google.com>
> Date:   Mon Feb 22 11:20:08 2021 -0800
>
>     fetch-pack: with packfile URIs, use index-pack arg
>     
>     Unify the index-pack arguments used when processing the inline pack and
>     when downloading packfiles referenced by URIs. This is done by teaching
>     get_pack() to also store the index-pack arguments whenever at least one
>     packfile URI is given, and then when processing the packfile URI(s),
>     using the stored arguments.
>
> THis makes it sound like the entire idea of this offending commit
> was wrong, and before it, the codepath that processed the packfile
> fetched from the packfile URI were using the index-pack correctly
> by using index-pack arguments that are independent from the one that
> is used to process the packfile given in-stream.  Why isn't the fix
> just a straight revert of the commit???

By the way, the band-aid in this patch may be OK for the upcoming
release (purely because it is easy to see that is sufficient for
today's codebase), but I said the above because I worry about the
health of the codebase in the longer term.  The "pass_header" may
not stay to be the only difference between the URI packfile and
in-stream packfile in the way they make index-pack invocations.

>> This is on jt/transfer-fsck-across-packs.
>
> Ouch.  This definitely is an -rc material.

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05 18:50         ` Junio C Hamano
  2021-03-05 19:46           ` Junio C Hamano
@ 2021-03-05 22:59           ` Jonathan Tan
  2021-03-05 23:18             ` Junio C Hamano
  1 sibling, 1 reply; 134+ messages in thread
From: Jonathan Tan @ 2021-03-05 22:59 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, jrnieder, nmulcahey

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > When fetching (as opposed to cloning) from a repository with packfile
> > URIs enabled, an error like this may occur:
> >
> >  fatal: pack has bad object at offset 12: unknown object type 5
> >  fatal: finish_http_pack_request gave result -1
> >  fatal: fetch-pack: expected keep then TAB at start of http-fetch output
> >
> > This bug was introduced in b664e9ffa1 ("fetch-pack: with packfile URIs,
> > use index-pack arg", 2021-02-22), when the index-pack args used when
> > processing the inline packfile of a fetch response and when processing
> > packfile URIs were unified.
> 
> > This bug happens because fetch, by default, partially reads (and
> > consumes) the header of the inline packfile to determine if it should
> > store the downloaded objects as a packfile or loose objects, and thus
> > passes --pack_header=<...> to index-pack to inform it that some bytes
> > are missing. 
> 
> ... and what the values in them are.

Ah, that's true.

> > However, when it subsequently fetches the additional
> > packfiles linked by URIs, it reuses the same index-pack arguments, thus
> > wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are
> > missing.
> >
> > This does not happen when cloning because "git clone" always passes
> > do_keep, which instructs the fetch mechanism to always retain the
> > packfile, eliminating the need to read the header.
> >
> > There are a few ways to fix this, including filtering out pack_header
> > arguments when downloading the additional packfiles, but ...
> 
> Avoiding the condition that exhibits the breakage is possible, and I
> think it is what is done here, but I actually think that the only
> right fix is to pass correct argument to commands we invoke in the
> first place.  Why are we reusing the same argument array to begin
> with?
> 
>     ... goes back and reads the offending commit ...
> 
> commit b664e9ffa153189dae9b88f32d1c5fedcf85056a
> Author: Jonathan Tan <jonathantanmy@google.com>
> Date:   Mon Feb 22 11:20:08 2021 -0800
> 
>     fetch-pack: with packfile URIs, use index-pack arg
>     
>     Unify the index-pack arguments used when processing the inline pack and
>     when downloading packfiles referenced by URIs. This is done by teaching
>     get_pack() to also store the index-pack arguments whenever at least one
>     packfile URI is given, and then when processing the packfile URI(s),
>     using the stored arguments.
> 
> THis makes it sound like the entire idea of this offending commit
> was wrong, and before it, the codepath that processed the packfile
> fetched from the packfile URI were using the index-pack correctly
> by using index-pack arguments that are independent from the one that
> is used to process the packfile given in-stream.  Why isn't the fix
> just a straight revert of the commit???

I should probably have written more in the commit message to justify the
unification, but it is also part of a bug fix (in particular,
--fsck-objects wasn't being passed to the index-pack that indexed the
packfiles linked by URI) and for code health purposes (to prevent future
bugs by eliminating the divergence). So reverting that commit would
reintroduce another bug.

> > @@ -885,7 +885,7 @@ static int get_pack(struct fetch_pack_args *args,
> >  			strvec_push(&cmd.args, "-v");
> >  		if (args->use_thin_pack)
> >  			strvec_push(&cmd.args, "--fix-thin");
> > -		if (do_keep && (args->lock_pack || unpack_limit)) {
> > +		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
> >  			char hostname[HOST_NAME_MAX + 1];
> >  			if (xgethostname(hostname, sizeof(hostname)))
> >  				xsnprintf(hostname, sizeof(hostname), "localhost");
> 
> I do not quite get what this hunk is doing.  Care to explain?

The "do_keep" part was unnecessarily restrictive and I used a band-aid
solution to loosen it. I think this started from 88e2f9ed8e ("introduce
fetch-object: fetch one promisor object", 2017-12-05) where I might have
misunderstood what do_keep was meant to do, and taught fetch-pack to use
"index-pack" if do_keep is true or args->from_promisor is true. What I
should have done is to set do_keep to true if args->from_promisor is
true. Future commits continued to do that with fsck_objects and
index_pack_args.

Maybe what I can do is to refactor get_pack() so that do_keep retains
its original meaning of whether to use "index-pack" or "unpack-objects",
and then we wouldn't need this line. What do you think (code-wise and
whether this fits in with the release schedule, if we want to get this
in before release)?

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05 19:46           ` Junio C Hamano
@ 2021-03-05 23:11             ` Jonathan Tan
  2021-03-05 23:20             ` Junio C Hamano
  1 sibling, 0 replies; 134+ messages in thread
From: Jonathan Tan @ 2021-03-05 23:11 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, jrnieder, nmulcahey

> By the way, the band-aid in this patch may be OK for the upcoming
> release (purely because it is easy to see that is sufficient for
> today's codebase), but I said the above because I worry about the
> health of the codebase in the longer term.  The "pass_header" may
> not stay to be the only difference between the URI packfile and
> in-stream packfile in the way they make index-pack invocations.

That is true, but at the same time, I think it's better to have the
arguments be the same because there are options (e.g. --promisor and
--fsck-objects) that have to be duplicated, and I think that for the
most part, the URI packfiles and the inline packfile will be processed
identically.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05 22:59           ` Jonathan Tan
@ 2021-03-05 23:18             ` Junio C Hamano
  2021-03-08 19:14               ` Jonathan Tan
  0 siblings, 1 reply; 134+ messages in thread
From: Junio C Hamano @ 2021-03-05 23:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Jonathan Tan <jonathantanmy@google.com> writes:

> I should probably have written more in the commit message to justify the
> unification, but it is also part of a bug fix (in particular,
> --fsck-objects wasn't being passed to the index-pack that indexed the
> packfiles linked by URI) and for code health purposes (to prevent future
> bugs by eliminating the divergence). So reverting that commit would
> reintroduce another bug.

Not necessarily.  Unifying two that do not inherently have to be
identical makes it impossible to pass two different things, and that
is what we are seeing in the bug this patch is trying to fix (by
forcing the two to be identical by eliminating the unpack-objects
codepath in certain cases).  

The right "fix" for the original bug would have been to keep them
still separate yet making it easy to pass args that must be used in
both of them, no?

>> > -		if (do_keep && (args->lock_pack || unpack_limit)) {
>> > +		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
>> >  			char hostname[HOST_NAME_MAX + 1];
>> >  			if (xgethostname(hostname, sizeof(hostname)))
>> >  				xsnprintf(hostname, sizeof(hostname), "localhost");
>> 
>> I do not quite get what this hunk is doing.  Care to explain?
>
> The "do_keep" part was unnecessarily restrictive and I used a band-aid
> solution to loosen it. I think this started from 88e2f9ed8e ("introduce
> fetch-object: fetch one promisor object", 2017-12-05) where I might have
> misunderstood what do_keep was meant to do, and taught fetch-pack to use
> "index-pack" if do_keep is true or args->from_promisor is true. What I
> should have done is to set do_keep to true if args->from_promisor is
> true. Future commits continued to do that with fsck_objects and
> index_pack_args.

> Maybe what I can do is to refactor get_pack() so that do_keep retains
> its original meaning of whether to use "index-pack" or "unpack-objects",
> and then we wouldn't need this line. What do you think (code-wise and
> whether this fits in with the release schedule, if we want to get this
> in before release)?

How bad is the breakage this one is trying to fix?  I know it would
only affect folks who have to interact with the server that uses
packfile URI feature, but do they have a workaround, perhaps with a
configuration knob or command line option to ignore the packfile
URI, and how large is the affected population?

I cannot shake the feeling that we are seeing band-aid on top of
band-aid forced by having chosen to go in a wrong direction in the
beginning X-<, and prefer to see the code drift even further into
the same direction; hence my earlier suggestion to go back to the
root cause by first reverting the wrong fix that introduced this bug
and fixing the original bug in a different way.

I dunno how involved the necessary surgery would be, though.  If
this is easy to work around, perhaps it might be a better option for
the overall project to ship the upcoming release with this listed as
a known breakage.

Thanks.




^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05 19:46           ` Junio C Hamano
  2021-03-05 23:11             ` Jonathan Tan
@ 2021-03-05 23:20             ` Junio C Hamano
  1 sibling, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-03-05 23:20 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Junio C Hamano <gitster@pobox.com> writes:

>> THis makes it sound like the entire idea of this offending commit
>> was wrong, and before it, the codepath that processed the packfile
>> fetched from the packfile URI were using the index-pack correctly
>> by using index-pack arguments that are independent from the one that
>> is used to process the packfile given in-stream.  Why isn't the fix
>> just a straight revert of the commit???
>
> By the way, the band-aid in this patch may be OK for the upcoming
> release (purely because it is easy to see that is sufficient for
> today's codebase), but I said the above because I worry about the
> health of the codebase in the longer term.  The "pass_header" may
> not stay to be the only difference between the URI packfile and
> in-stream packfile in the way they make index-pack invocations.

For example, the URI one presumably is a CDN hosted long term one,
which may be a good candidate to --keep, and in-stream one,
especially when packfile URI feature is used, can be expected to be
recent small leftover bits that it is likely that we do not want to
keep (in fact, if they are small enough, we'd prefer to keep them
loose).

^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 00/22] fsck: API improvements
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
  2021-02-18 22:19               ` Junio C Hamano
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-07 23:04                 ` Junio C Hamano
  2021-03-06 11:04               ` [PATCH v3 01/22] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
                                 ` (21 subsequent siblings)
  23 siblings, 1 reply; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Now that jt/transfer-fsck-across-packs has been merged to master
here's a re-roll of v1[1]+v2[2] of this series. v2 was slimmed-down +
had a trivial typo fix, so I've done the range-diff against v1.

This makes the recent fetch-pack work use the fsck_msg_id API to
distinguish messages, and has other various cleanups and improvements
to make the fsck API easier to use in the future.

There's a an easy merge conflict here with other in-flight changes to
fsck. I figured it was better to send this now than wait for those to
land.

1. https://lore.kernel.org/git/20210217194246.25342-1-avarab@gmail.com/
2. https://lore.kernel.org/git/20210218105840.11989-1-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (22):
  fsck.h: update FSCK_OPTIONS_* for object_name
  fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT}
  fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro
  fsck.h: indent arguments to of fsck_set_msg_type
  fsck.h: use "enum object_type" instead of "int"
  fsck.c: rename variables in fsck_set_msg_type() for less confusion
  fsck.c: move definition of msg_id into append_msg_id()
  fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
  fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  fsck.h: re-order and re-assign "enum fsck_msg_type"
  fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  fsck.c: undefine temporary STR macro after use
  fsck.c: give "FOREACH_MSG_ID" a more specific name
  fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  fsck.c: pass along the fsck_msg_id in the fsck_error callback
  fsck.c: add an fsck_set_msg_type() API that takes enums
  fsck.c: move gitmodules_{found,done} into fsck_options
  fetch-pack: don't needlessly copy fsck_options
  fetch-pack: use file-scope static struct for fsck_options
  fetch-pack: use new fsck API to printing dangling submodules

 Makefile                 |   1 +
 builtin/fsck.c           |   7 +-
 builtin/index-pack.c     |  30 ++-----
 builtin/mktag.c          |   7 +-
 builtin/unpack-objects.c |   3 +-
 fetch-pack.c             |   6 +-
 fsck-cb.c                |  16 ++++
 fsck.c                   | 175 ++++++++++++---------------------------
 fsck.h                   | 132 ++++++++++++++++++++++++++---
 9 files changed, 211 insertions(+), 166 deletions(-)
 create mode 100644 fsck-cb.c

Range-diff:
13:  8de91fac068 =  1:  9d809466bd1 fsck.h: update FSCK_OPTIONS_* for object_name
 -:  ----------- >  2:  33e8b6d6545 fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
 -:  ----------- >  3:  c23f7ce9e4a fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT}
 -:  ----------- >  4:  5dde68df6c3 fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro
 1:  88b347b74ed =  5:  7ae35a6e9d2 fsck.h: indent arguments to of fsck_set_msg_type
 2:  1a60d65d2ca !  6:  dfb5f754b37 fsck.h: use use "enum object_type" instead of "int"
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    fsck.h: use use "enum object_type" instead of "int"
    +    fsck.h: use "enum object_type" instead of "int"
     
         Change the fsck_walk_func to use an "enum object_type" instead of an
         "int" type. The types are compatible, and ever since this was added in
 3:  24761f269b7 !  7:  fd58ec73c6b fsck.c: rename variables in fsck_set_msg_type() for less confusion
    @@ Commit message
         It was needlessly confusing that it took a "msg_type" argument, but
         then later declared another "msg_type" of a different type.
     
    -    Let's rename that to "tmp", and rename "id" to "msg_id" and "msg_id"
    -    to "msg_id_str" etc. This will make a follow-up change smaller.
    +    Let's rename that to "severity", and rename "id" to "msg_id" and
    +    "msg_id" to "msg_id_str" etc. This will make a follow-up change
    +    smaller.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ fsck.c: int is_valid_msg_type(const char *msg_id, const char *msg_type)
      		int i;
     -		int *msg_type;
     -		ALLOC_ARRAY(msg_type, FSCK_MSG_MAX);
    -+		int *tmp;
    -+		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
    ++		int *severity;
    ++		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
      		for (i = 0; i < FSCK_MSG_MAX; i++)
     -			msg_type[i] = fsck_msg_type(i, options);
     -		options->msg_type = msg_type;
    -+			tmp[i] = fsck_msg_type(i, options);
    -+		options->msg_type = tmp;
    ++			severity[i] = fsck_msg_type(i, options);
    ++		options->msg_type = severity;
      	}
      
     -	options->msg_type[id] = type;
 4:  fb4c66f9305 =  8:  48cb4d3bb70 fsck.c: move definition of msg_id into append_msg_id()
 5:  a129dbd9964 !  9:  2c80ad32038 fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
    @@ Commit message
         "msg_id". This change is relatively small, and is worth the churn for
         a later change where we have different id's in the "report" function.
     
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
    +
      ## fsck.c ##
     @@ fsck.c: void fsck_set_msg_types(struct fsck_options *options, const char *values)
      	free(to_free);
 -:  ----------- > 10:  92dfbdfb624 fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
 6:  d9bee41072e ! 11:  c1c476af69b fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
    @@ Commit message
          - f27d05b1704 (fsck: allow upgrading fsck warnings to errors,
            2015-06-22)
     
    +    The reason these were defined in two different places is because we
    +    use FSCK_{IGNORE,INFO,FATAL} only in fsck.c, but FSCK_{ERROR,WARN} are
    +    used by external callbacks.
    +
    +    Untangling that would take some more work, since we expose the new
    +    "enum fsck_msg_type" to both. Similar to "enum object_type" it's not
    +    worth structuring the API in such a way that only those who need
    +    FSCK_{ERROR,WARN} pass around a different type.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/fsck.c ##
    @@ builtin/fsck.c: static int objerror(struct object *obj, const char *err)
      	switch (msg_type) {
      	case FSCK_WARN:
     
    + ## builtin/index-pack.c ##
    +@@ builtin/index-pack.c: static void show_pack_info(int stat_only)
    + static int print_dangling_gitmodules(struct fsck_options *o,
    + 				     const struct object_id *oid,
    + 				     enum object_type object_type,
    +-				     int msg_type, const char *message)
    ++				     enum fsck_msg_type msg_type,
    ++				     const char *message)
    + {
    + 	/*
    + 	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
    +
      ## builtin/mktag.c ##
     @@ builtin/mktag.c: static int mktag_config(const char *var, const char *value, void *cb)
      static int mktag_fsck_error_func(struct fsck_options *o,
    @@ fsck.c: void list_config_fsck_msg_ids(struct string_list *list, const char *pref
     +static enum fsck_msg_type fsck_msg_type(enum fsck_msg_id msg_id,
      	struct fsck_options *options)
      {
    --	int msg_type;
    -+	enum fsck_msg_type msg_type;
    - 
      	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
      
    + 	if (!options->msg_type) {
    +-		int msg_type = msg_id_info[msg_id].msg_type;
    ++		enum fsck_msg_type msg_type = msg_id_info[msg_id].msg_type;
    + 
    + 		if (options->strict && msg_type == FSCK_WARN)
    + 			msg_type = FSCK_ERROR;
     @@ fsck.c: static int fsck_msg_type(enum fsck_msg_id msg_id,
    - 	return msg_type;
    + 	return options->msg_type[msg_id];
      }
      
     -static int parse_msg_type(const char *str)
    @@ fsck.c: void fsck_set_msg_type(struct fsck_options *options,
      
      	if (!options->msg_type) {
      		int i;
    --		int *tmp;
    -+		enum fsck_msg_type *tmp;
    - 		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
    +-		int *severity;
    ++		enum fsck_msg_type *severity;
    + 		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
      		for (i = 0; i < FSCK_MSG_MAX; i++)
    - 			tmp[i] = fsck_msg_type(i, options);
    + 			severity[i] = fsck_msg_type(i, options);
     @@ fsck.c: static int report(struct fsck_options *options,
      {
      	va_list ap;
    @@ fsck.h
     -#define FSCK_ERROR 1
     -#define FSCK_WARN 2
     -#define FSCK_IGNORE 3
    --
     +enum fsck_msg_type {
    -+	FSCK_INFO = -2,
    ++	FSCK_INFO  = -2,
     +	FSCK_FATAL = -1,
     +	FSCK_ERROR = 1,
     +	FSCK_WARN,
     +	FSCK_IGNORE
     +};
    + 
      struct fsck_options;
      struct object;
    - 
     @@ fsck.h: typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
      /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
      typedef int (*fsck_error)(struct fsck_options *o,
 -:  ----------- > 12:  d55587719a5 fsck.h: re-order and re-assign "enum fsck_msg_type"
 7:  423568026c3 = 13:  32828d1c78c fsck.c: call parse_msg_type() early in fsck_set_msg_type()
 8:  cb43e832738 = 14:  5c62066235c fsck.c: undefine temporary STR macro after use
 9:  2cd14cb4e2a = 15:  f8e50fbf7d3 fsck.c: give "FOREACH_MSG_ID" a more specific name
10:  1ada154ef23 ! 16:  cd74dee8769 fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
    @@ fsck.c
      ## fsck.h ##
     @@ fsck.h: enum fsck_msg_type {
      	FSCK_WARN,
    - 	FSCK_IGNORE
      };
    -+
    + 
     +#define FOREACH_FSCK_MSG_ID(FUNC) \
     +	/* fatal errors */ \
     +	FUNC(NUL_IN_HEADER, FATAL) \
11:  c4179445f22 ! 17:  234e287d081 fsck.c: pass along the fsck_msg_id in the fsck_error callback
    @@ builtin/fsck.c: static int objerror(struct object *obj, const char *err)
      	switch (msg_type) {
      	case FSCK_WARN:
     
    + ## builtin/index-pack.c ##
    +@@ builtin/index-pack.c: static int print_dangling_gitmodules(struct fsck_options *o,
    + 				     const struct object_id *oid,
    + 				     enum object_type object_type,
    + 				     enum fsck_msg_type msg_type,
    ++				     enum fsck_msg_id msg_id,
    + 				     const char *message)
    + {
    + 	/*
    +@@ builtin/index-pack.c: static int print_dangling_gitmodules(struct fsck_options *o,
    + 		printf("%s\n", oid_to_hex(oid));
    + 		return 0;
    + 	}
    +-	return fsck_error_function(o, oid, object_type, msg_type, message);
    ++	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
    + }
    + 
    + int cmd_index_pack(int argc, const char **argv, const char *prefix)
    +
      ## builtin/mktag.c ##
     @@ builtin/mktag.c: static int mktag_fsck_error_func(struct fsck_options *o,
      				 const struct object_id *oid,
12:  c1fc724f0e8 ! 18:  8049dc07391 fsck.c: add an fsck_set_msg_type() API that takes enums
    @@ fsck.c: int is_valid_msg_type(const char *msg_id, const char *msg_type)
     +{
     +	if (!options->msg_type) {
     +		int i;
    -+		enum fsck_msg_type *tmp;
    -+		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
    ++		enum fsck_msg_type *severity;
    ++		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
     +		for (i = 0; i < FSCK_MSG_MAX; i++)
    -+			tmp[i] = fsck_msg_type(i, options);
    -+		options->msg_type = tmp;
    ++			severity[i] = fsck_msg_type(i, options);
    ++		options->msg_type = severity;
     +	}
     +
     +	options->msg_type[msg_id] = msg_type;
    @@ fsck.c: void fsck_set_msg_type(struct fsck_options *options,
      
     -	if (!options->msg_type) {
     -		int i;
    --		enum fsck_msg_type *tmp;
    --		ALLOC_ARRAY(tmp, FSCK_MSG_MAX);
    +-		enum fsck_msg_type *severity;
    +-		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
     -		for (i = 0; i < FSCK_MSG_MAX; i++)
    --			tmp[i] = fsck_msg_type(i, options);
    --		options->msg_type = tmp;
    +-			severity[i] = fsck_msg_type(i, options);
    +-		options->msg_type = severity;
     -	}
     -
     -	options->msg_type[msg_id] = msg_type;
14:  29ff97856ff ! 19:  4224a29d15c fsck.c: move gitmodules_{found,done} into fsck_options
    @@ Commit message
         fsck_options struct. It makes sense to keep all the context in the
         same place.
     
    +    This requires changing the recently added register_found_gitmodules()
    +    function added in 5476e1efde (fetch-pack: print and use dangling
    +    .gitmodules, 2021-02-22) to take fsck_options. That function will be
    +    removed in a subsequent commit, but as it'll require the new
    +    gitmodules_found attribute of "fsck_options" we need this intermediate
    +    step first.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    + ## fetch-pack.c ##
    +@@ fetch-pack.c: static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
    + 
    + 	oidset_iter_init(gitmodules_oids, &iter);
    + 	while ((oid = oidset_iter_next(&iter)))
    +-		register_found_gitmodules(oid);
    ++		register_found_gitmodules(&fo, oid);
    + 	if (fsck_finish(&fo))
    + 		die("fsck failed");
    + }
    +
      ## fsck.c ##
     @@
      #include "credential.h"
    @@ fsck.c: static int fsck_blob(const struct object_id *oid, const char *buf,
      
      	if (object_on_skiplist(options, oid))
      		return 0;
    +@@ fsck.c: int fsck_error_function(struct fsck_options *o,
    + 	return 1;
    + }
    + 
    +-void register_found_gitmodules(const struct object_id *oid)
    ++void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
    + {
    +-	oidset_insert(&gitmodules_found, oid);
    ++	oidset_insert(&options->gitmodules_found, oid);
    + }
    + 
    + int fsck_finish(struct fsck_options *options)
     @@ fsck.c: int fsck_finish(struct fsck_options *options)
      	struct oidset_iter iter;
      	const struct object_id *oid;
    @@ fsck.h: struct fsck_options {
      	kh_oid_map_t *object_names;
      };
      
    --#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
    --#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
    -+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, OIDSET_INIT, OIDSET_INIT, NULL }
    -+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, OIDSET_INIT, OIDSET_INIT, NULL }
    +@@ fsck.h: struct fsck_options {
    + 	.walk = NULL, \
    + 	.msg_type = NULL, \
    + 	.skiplist = OIDSET_INIT, \
    ++	.gitmodules_found = OIDSET_INIT, \
    ++	.gitmodules_done = OIDSET_INIT, \
    + 	.object_names = NULL,
    + #define FSCK_OPTIONS_COMMON_ERROR_FUNC \
    + 	FSCK_OPTIONS_COMMON \
    +@@ fsck.h: int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
    + int fsck_object(struct object *obj, void *data, unsigned long size,
    + 	struct fsck_options *options);
    + 
    +-void register_found_gitmodules(const struct object_id *oid);
    ++void register_found_gitmodules(struct fsck_options *options,
    ++			       const struct object_id *oid);
      
    - /* descend in all linked child objects
    -  * the return value is:
    + /*
    +  * fsck a tag, and pass info about it back to the caller. This is
 -:  ----------- > 20:  40b13468129 fetch-pack: don't needlessly copy fsck_options
 -:  ----------- > 21:  8e418abfbd7 fetch-pack: use file-scope static struct for fsck_options
 -:  ----------- > 22:  113de190f7d fetch-pack: use new fsck API to printing dangling submodules
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 01/22] fsck.h: update FSCK_OPTIONS_* for object_name
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
  2021-02-18 22:19               ` Junio C Hamano
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
                                 ` (20 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Add the object_name member to the initialization macro. This was
omitted in 7b35efd734e (fsck_walk(): optionally name objects on the
go, 2016-07-17) when the field was added.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fsck.h b/fsck.h
index 733378f1260..2274843ba0c 100644
--- a/fsck.h
+++ b/fsck.h
@@ -43,8 +43,8 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT }
+#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT}
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (2 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 01/22] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 03/22] fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
                                 ` (19 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/fsck.h b/fsck.h
index 2274843ba0c..40f3cb3f645 100644
--- a/fsck.h
+++ b/fsck.h
@@ -43,8 +43,22 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL, OIDSET_INIT, NULL }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL, OIDSET_INIT, NULL }
+#define FSCK_OPTIONS_DEFAULT { \
+	.walk = NULL, \
+	.error_func = fsck_error_function, \
+	.strict = 0, \
+	.msg_type = NULL, \
+	.skiplist = OIDSET_INIT, \
+	.object_names = NULL, \
+}
+#define FSCK_OPTIONS_STRICT { \
+	.walk = NULL, \
+	.error_func = fsck_error_function, \
+	.strict = 1, \
+	.msg_type = NULL, \
+	.skiplist = OIDSET_INIT, \
+	.object_names = NULL, \
+}
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 03/22] fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT}
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (3 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro Ævar Arnfjörð Bjarmason
                                 ` (18 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Use a temporary macro to define what FSCK_OPTIONS_{DEFAULT,STRICT}
have in common, and define the two in terms of that macro.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/fsck.h b/fsck.h
index 40f3cb3f645..ea3a907ec3b 100644
--- a/fsck.h
+++ b/fsck.h
@@ -43,22 +43,14 @@ struct fsck_options {
 	kh_oid_map_t *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { \
+#define FSCK_OPTIONS_COMMON \
 	.walk = NULL, \
 	.error_func = fsck_error_function, \
-	.strict = 0, \
 	.msg_type = NULL, \
 	.skiplist = OIDSET_INIT, \
-	.object_names = NULL, \
-}
-#define FSCK_OPTIONS_STRICT { \
-	.walk = NULL, \
-	.error_func = fsck_error_function, \
-	.strict = 1, \
-	.msg_type = NULL, \
-	.skiplist = OIDSET_INIT, \
-	.object_names = NULL, \
-}
+	.object_names = NULL,
+#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON }
+#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (4 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 03/22] fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 05/22] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
                                 ` (17 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro for those that would like
to use FSCK_OPTIONS_COMMON in their own initialization, but supply
their own error functions.

Nothing is being changed to use this yet, but in some subsequent
commits we'll make use of this macro.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fsck.h b/fsck.h
index ea3a907ec3b..dc35924cbf5 100644
--- a/fsck.h
+++ b/fsck.h
@@ -45,12 +45,15 @@ struct fsck_options {
 
 #define FSCK_OPTIONS_COMMON \
 	.walk = NULL, \
-	.error_func = fsck_error_function, \
 	.msg_type = NULL, \
 	.skiplist = OIDSET_INIT, \
 	.object_names = NULL,
-#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON }
-#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON }
+#define FSCK_OPTIONS_COMMON_ERROR_FUNC \
+	FSCK_OPTIONS_COMMON \
+	.error_func = fsck_error_function
+
+#define FSCK_OPTIONS_DEFAULT	{ .strict = 0, FSCK_OPTIONS_COMMON_ERROR_FUNC }
+#define FSCK_OPTIONS_STRICT	{ .strict = 1, FSCK_OPTIONS_COMMON_ERROR_FUNC }
 
 /* descend in all linked child objects
  * the return value is:
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 05/22] fsck.h: indent arguments to of fsck_set_msg_type
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (5 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 06/22] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
                                 ` (16 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fsck.h b/fsck.h
index dc35924cbf5..5e488cef6b3 100644
--- a/fsck.h
+++ b/fsck.h
@@ -11,7 +11,7 @@ struct fsck_options;
 struct object;
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type);
+		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
 int is_valid_msg_type(const char *msg_id, const char *msg_type);
 
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 06/22] fsck.h: use "enum object_type" instead of "int"
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (6 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 05/22] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 07/22] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
                                 ` (15 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the fsck_walk_func to use an "enum object_type" instead of an
"int" type. The types are compatible, and ever since this was added in
355885d5315 (add generic, type aware object chain walker, 2008-02-25)
we've used entries from object_type (OBJ_BLOB etc.).

So this doesn't really change anything as far as the generated code is
concerned, it just gives the compiler more information and makes this
easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c           | 3 ++-
 builtin/index-pack.c     | 3 ++-
 builtin/unpack-objects.c | 3 ++-
 fsck.h                   | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c70..68f0329e69e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -197,7 +197,8 @@ static int traverse_reachable(void)
 	return !!result;
 }
 
-static int mark_used(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_used(struct object *obj, enum object_type object_type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return 1;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index bad57488079..69f24fe9f76 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -212,7 +212,8 @@ static void cleanup_thread(void)
 	free(thread_data);
 }
 
-static int mark_link(struct object *obj, int type, void *data, struct fsck_options *options)
+static int mark_link(struct object *obj, enum object_type type,
+		     void *data, struct fsck_options *options)
 {
 	if (!obj)
 		return -1;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index dd4a75e030d..ca54fd16688 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -187,7 +187,8 @@ static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
  * that have reachability requirements and calls this function.
  * Verify its reachability and validity recursively and write it out.
  */
-static int check_object(struct object *obj, int type, void *data, struct fsck_options *options)
+static int check_object(struct object *obj, enum object_type type,
+			void *data, struct fsck_options *options)
 {
 	struct obj_buffer *obj_buf;
 
diff --git a/fsck.h b/fsck.h
index 5e488cef6b3..f67edd8f1f9 100644
--- a/fsck.h
+++ b/fsck.h
@@ -23,7 +23,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type);
  *     <0	error signaled and abort
  *     >0	error signaled and do not abort
  */
-typedef int (*fsck_walk_func)(struct object *obj, int type, void *data, struct fsck_options *options);
+typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
+			      void *data, struct fsck_options *options);
 
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 07/22] fsck.c: rename variables in fsck_set_msg_type() for less confusion
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (7 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 06/22] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 08/22] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
                                 ` (14 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename variables in a function added in 0282f4dced0 (fsck: offer a
function to demote fsck errors to warnings, 2015-06-22).

It was needlessly confusing that it took a "msg_type" argument, but
then later declared another "msg_type" of a different type.

Let's rename that to "severity", and rename "id" to "msg_id" and
"msg_id" to "msg_id_str" etc. This will make a follow-up change
smaller.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index e3030f3b358..0a9ac9ca070 100644
--- a/fsck.c
+++ b/fsck.c
@@ -203,27 +203,27 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 }
 
 void fsck_set_msg_type(struct fsck_options *options,
-		const char *msg_id, const char *msg_type)
+		const char *msg_id_str, const char *msg_type_str)
 {
-	int id = parse_msg_id(msg_id), type;
+	int msg_id = parse_msg_id(msg_id_str), msg_type;
 
-	if (id < 0)
-		die("Unhandled message id: %s", msg_id);
-	type = parse_msg_type(msg_type);
+	if (msg_id < 0)
+		die("Unhandled message id: %s", msg_id_str);
+	msg_type = parse_msg_type(msg_type_str);
 
-	if (type != FSCK_ERROR && msg_id_info[id].msg_type == FSCK_FATAL)
-		die("Cannot demote %s to %s", msg_id, msg_type);
+	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
+		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
 	if (!options->msg_type) {
 		int i;
-		int *msg_type;
-		ALLOC_ARRAY(msg_type, FSCK_MSG_MAX);
+		int *severity;
+		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
-			msg_type[i] = fsck_msg_type(i, options);
-		options->msg_type = msg_type;
+			severity[i] = fsck_msg_type(i, options);
+		options->msg_type = severity;
 	}
 
-	options->msg_type[id] = type;
+	options->msg_type[msg_id] = msg_type;
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 08/22] fsck.c: move definition of msg_id into append_msg_id()
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (8 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 07/22] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 09/22] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
                                 ` (13 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor code added in 71ab8fa840f (fsck: report the ID of the
error/warning, 2015-06-22) to resolve the msg_id to a string in the
function that wants it, instead of doing it in report().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 0a9ac9ca070..b977493f57a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,8 +264,9 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, const char *msg_id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
 {
+	const char *msg_id = msg_id_info[id].id_string;
 	for (;;) {
 		char c = *(msg_id)++;
 
@@ -308,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, msg_id_info[id].id_string);
+	append_msg_id(&sb, id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 09/22] fsck.c: rename remaining fsck_msg_id "id" to "msg_id"
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (9 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 08/22] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 10/22] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
                                 ` (12 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the remaining variables of type fsck_msg_id from "id" to
"msg_id". This change is relatively small, and is worth the churn for
a later change where we have different id's in the "report" function.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fsck.c b/fsck.c
index b977493f57a..6b72ddaa51d 100644
--- a/fsck.c
+++ b/fsck.c
@@ -264,19 +264,19 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values)
 	free(to_free);
 }
 
-static void append_msg_id(struct strbuf *sb, enum fsck_msg_id id)
+static void append_msg_id(struct strbuf *sb, enum fsck_msg_id msg_id)
 {
-	const char *msg_id = msg_id_info[id].id_string;
+	const char *msg_id_str = msg_id_info[msg_id].id_string;
 	for (;;) {
-		char c = *(msg_id)++;
+		char c = *(msg_id_str)++;
 
 		if (!c)
 			break;
 		if (c != '_')
 			strbuf_addch(sb, tolower(c));
 		else {
-			assert(*msg_id);
-			strbuf_addch(sb, *(msg_id)++);
+			assert(*msg_id_str);
+			strbuf_addch(sb, *(msg_id_str)++);
 		}
 	}
 
@@ -292,11 +292,11 @@ static int object_on_skiplist(struct fsck_options *opts,
 __attribute__((format (printf, 5, 6)))
 static int report(struct fsck_options *options,
 		  const struct object_id *oid, enum object_type object_type,
-		  enum fsck_msg_id id, const char *fmt, ...)
+		  enum fsck_msg_id msg_id, const char *fmt, ...)
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(id, options), result;
+	int msg_type = fsck_msg_type(msg_id, options), result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -309,7 +309,7 @@ static int report(struct fsck_options *options,
 	else if (msg_type == FSCK_INFO)
 		msg_type = FSCK_WARN;
 
-	append_msg_id(&sb, id);
+	append_msg_id(&sb, msg_id);
 
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 10/22] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type"
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (10 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 09/22] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
                                 ` (11 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor "if options->msg_type" and other code added in
0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) to reduce the scope of the "int msg_type" variable.

This is in preparation for changing its type in a subsequent commit,
only using it in the "!options->msg_type" scope makes that change

This also brings the code in line with the fsck_set_msg_type()
function (also added in 0282f4dced0), which does a similar check for
"!options->msg_type". Another minor benefit is getting rid of the
style violation of not having braces for the body of the "if".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/fsck.c b/fsck.c
index 6b72ddaa51d..0988ab65792 100644
--- a/fsck.c
+++ b/fsck.c
@@ -167,19 +167,17 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 static int fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
-	int msg_type;
-
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
-	if (options->msg_type)
-		msg_type = options->msg_type[msg_id];
-	else {
-		msg_type = msg_id_info[msg_id].msg_type;
+	if (!options->msg_type) {
+		int msg_type = msg_id_info[msg_id].msg_type;
+
 		if (options->strict && msg_type == FSCK_WARN)
 			msg_type = FSCK_ERROR;
+		return msg_type;
 	}
 
-	return msg_type;
+	return options->msg_type[msg_id];
 }
 
 static int parse_msg_type(const char *str)
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (11 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 10/22] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
                                 ` (10 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new
fsck_msg_type enum.

These defines were originally introduced in:

 - ba002f3b28a (builtin-fsck: move common object checking code to
   fsck.c, 2008-02-25)
 - f50c4407305 (fsck: disallow demoting grave fsck errors to warnings,
   2015-06-22)
 - efaba7cc77f (fsck: optionally ignore specific fsck issues
   completely, 2015-06-22)
 - f27d05b1704 (fsck: allow upgrading fsck warnings to errors,
   2015-06-22)

The reason these were defined in two different places is because we
use FSCK_{IGNORE,INFO,FATAL} only in fsck.c, but FSCK_{ERROR,WARN} are
used by external callbacks.

Untangling that would take some more work, since we expose the new
"enum fsck_msg_type" to both. Similar to "enum object_type" it's not
worth structuring the API in such a way that only those who need
FSCK_{ERROR,WARN} pass around a different type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c       |  2 +-
 builtin/index-pack.c |  3 ++-
 builtin/mktag.c      |  3 ++-
 fsck.c               | 21 ++++++++++-----------
 fsck.h               | 16 ++++++++++------
 5 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 68f0329e69e..d6d745dc702 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -89,7 +89,7 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   int msg_type, const char *message)
+			   enum fsck_msg_type msg_type, const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 69f24fe9f76..56b8efaa89b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1716,7 +1716,8 @@ static void show_pack_info(int stat_only)
 static int print_dangling_gitmodules(struct fsck_options *o,
 				     const struct object_id *oid,
 				     enum object_type object_type,
-				     int msg_type, const char *message)
+				     enum fsck_msg_type msg_type,
+				     const char *message)
 {
 	/*
 	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 41a399a69e4..1834394a9b6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -22,7 +22,8 @@ static int mktag_config(const char *var, const char *value, void *cb)
 static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
-				 int msg_type, const char *message)
+				 enum fsck_msg_type msg_type,
+				 const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/fsck.c b/fsck.c
index 0988ab65792..fb7d071bbf9 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,9 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FSCK_FATAL -1
-#define FSCK_INFO -2
-
 #define FOREACH_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
@@ -97,7 +94,7 @@ static struct {
 	const char *id_string;
 	const char *downcased;
 	const char *camelcased;
-	int msg_type;
+	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
 	FOREACH_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
@@ -164,13 +161,13 @@ void list_config_fsck_msg_ids(struct string_list *list, const char *prefix)
 		list_config_item(list, prefix, msg_id_info[i].camelcased);
 }
 
-static int fsck_msg_type(enum fsck_msg_id msg_id,
+static enum fsck_msg_type fsck_msg_type(enum fsck_msg_id msg_id,
 	struct fsck_options *options)
 {
 	assert(msg_id >= 0 && msg_id < FSCK_MSG_MAX);
 
 	if (!options->msg_type) {
-		int msg_type = msg_id_info[msg_id].msg_type;
+		enum fsck_msg_type msg_type = msg_id_info[msg_id].msg_type;
 
 		if (options->strict && msg_type == FSCK_WARN)
 			msg_type = FSCK_ERROR;
@@ -180,7 +177,7 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 	return options->msg_type[msg_id];
 }
 
-static int parse_msg_type(const char *str)
+static enum fsck_msg_type parse_msg_type(const char *str)
 {
 	if (!strcmp(str, "error"))
 		return FSCK_ERROR;
@@ -203,7 +200,8 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
-	int msg_id = parse_msg_id(msg_id_str), msg_type;
+	int msg_id = parse_msg_id(msg_id_str);
+	enum fsck_msg_type msg_type;
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
@@ -214,7 +212,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 
 	if (!options->msg_type) {
 		int i;
-		int *severity;
+		enum fsck_msg_type *severity;
 		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
 		for (i = 0; i < FSCK_MSG_MAX; i++)
 			severity[i] = fsck_msg_type(i, options);
@@ -294,7 +292,8 @@ static int report(struct fsck_options *options,
 {
 	va_list ap;
 	struct strbuf sb = STRBUF_INIT;
-	int msg_type = fsck_msg_type(msg_id, options), result;
+	enum fsck_msg_type msg_type = fsck_msg_type(msg_id, options);
+	int result;
 
 	if (msg_type == FSCK_IGNORE)
 		return 0;
@@ -1265,7 +1264,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			int msg_type, const char *message)
+			enum fsck_msg_type msg_type, const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index f67edd8f1f9..2ecc15eee77 100644
--- a/fsck.h
+++ b/fsck.h
@@ -3,9 +3,13 @@
 
 #include "oidset.h"
 
-#define FSCK_ERROR 1
-#define FSCK_WARN 2
-#define FSCK_IGNORE 3
+enum fsck_msg_type {
+	FSCK_INFO  = -2,
+	FSCK_FATAL = -1,
+	FSCK_ERROR = 1,
+	FSCK_WARN,
+	FSCK_IGNORE
+};
 
 struct fsck_options;
 struct object;
@@ -29,17 +33,17 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  int msg_type, const char *message);
+			  enum fsck_msg_type msg_type, const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			int msg_type, const char *message);
+			enum fsck_msg_type msg_type, const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
 	fsck_error error_func;
 	unsigned strict:1;
-	int *msg_type;
+	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
 	kh_oid_map_t *object_names;
 };
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type"
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (12 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 13/22] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
                                 ` (9 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the values in the "enum fsck_msg_type" from being manually
assigned to using default C enum values.

This means we end up with a FSCK_IGNORE=0, which was previously
defined as "2".

I'm confident that nothing relies on these values, we always compare
them explicitly. Let's not omit "0" so it won't be assumed that we're
using these as a boolean somewhere.

This also allows us to re-structure the fields to mark which are
"private" v.s. "public". See the preceding commit for a rationale for
not simply splitting these into two enums, namely that this is used
for both the private and public fsck API.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fsck.h b/fsck.h
index 2ecc15eee77..fce9981a0cb 100644
--- a/fsck.h
+++ b/fsck.h
@@ -4,11 +4,13 @@
 #include "oidset.h"
 
 enum fsck_msg_type {
-	FSCK_INFO  = -2,
-	FSCK_FATAL = -1,
-	FSCK_ERROR = 1,
+	/* for internal use only */
+	FSCK_IGNORE,
+	FSCK_INFO,
+	FSCK_FATAL,
+	/* "public", fed to e.g. error_func callbacks */
+	FSCK_ERROR,
 	FSCK_WARN,
-	FSCK_IGNORE
 };
 
 struct fsck_options;
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 13/22] fsck.c: call parse_msg_type() early in fsck_set_msg_type()
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (13 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 14/22] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
                                 ` (8 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

There's no reason to defer the calling of parse_msg_type() until after
we've checked if the "id < 0". This is not a hot codepath, and
parse_msg_type() itself may die on invalid input.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index fb7d071bbf9..2ccf1a2f0fd 100644
--- a/fsck.c
+++ b/fsck.c
@@ -201,11 +201,10 @@ void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
 	int msg_id = parse_msg_id(msg_id_str);
-	enum fsck_msg_type msg_type;
+	enum fsck_msg_type msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_id < 0)
 		die("Unhandled message id: %s", msg_id_str);
-	msg_type = parse_msg_type(msg_type_str);
 
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 14/22] fsck.c: undefine temporary STR macro after use
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (14 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 13/22] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 15/22] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
                                 ` (7 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

In f417eed8cde (fsck: provide a function to parse fsck message IDs,
2015-06-22) the "STR" macro was introduced, but that short macro name
was not undefined after use as was done earlier in the same series for
the MSG_ID macro in c99ba492f1c (fsck: introduce identifiers for fsck
messages, 2015-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fsck.c b/fsck.c
index 2ccf1a2f0fd..f4c924ed044 100644
--- a/fsck.c
+++ b/fsck.c
@@ -100,6 +100,7 @@ static struct {
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
+#undef STR
 
 static void prepare_msg_ids(void)
 {
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 15/22] fsck.c: give "FOREACH_MSG_ID" a more specific name
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (15 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 14/22] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 16/22] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
                                 ` (6 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Rename the FOREACH_MSG_ID macro to FOREACH_FSCK_MSG_ID in preparation
for moving it over to fsck.h. It's good convention to name macros
in *.h files in such a way as to clearly not clash with any other
names in other files.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fsck.c b/fsck.c
index f4c924ed044..6fbc56e9faa 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,7 +22,7 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_MSG_ID(FUNC) \
+#define FOREACH_FSCK_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
 	FUNC(UNTERMINATED_HEADER, FATAL) \
@@ -83,7 +83,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
 enum fsck_msg_id {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	FSCK_MSG_MAX
 };
 #undef MSG_ID
@@ -96,7 +96,7 @@ static struct {
 	const char *camelcased;
 	enum fsck_msg_type msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
-	FOREACH_MSG_ID(MSG_ID)
+	FOREACH_FSCK_MSG_ID(MSG_ID)
 	{ NULL, NULL, NULL, -1 }
 };
 #undef MSG_ID
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 16/22] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (16 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 15/22] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
                                 ` (5 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the FOREACH_FSCK_MSG_ID macro and the fsck_msg_id enum it helps
define from fsck.c to fsck.h. This is in preparation for having
non-static functions take the fsck_msg_id as an argument.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fsck.c | 66 ----------------------------------------------------------
 fsck.h | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 66 deletions(-)

diff --git a/fsck.c b/fsck.c
index 6fbc56e9faa..8a66168e516 100644
--- a/fsck.c
+++ b/fsck.c
@@ -22,72 +22,6 @@
 static struct oidset gitmodules_found = OIDSET_INIT;
 static struct oidset gitmodules_done = OIDSET_INIT;
 
-#define FOREACH_FSCK_MSG_ID(FUNC) \
-	/* fatal errors */ \
-	FUNC(NUL_IN_HEADER, FATAL) \
-	FUNC(UNTERMINATED_HEADER, FATAL) \
-	/* errors */ \
-	FUNC(BAD_DATE, ERROR) \
-	FUNC(BAD_DATE_OVERFLOW, ERROR) \
-	FUNC(BAD_EMAIL, ERROR) \
-	FUNC(BAD_NAME, ERROR) \
-	FUNC(BAD_OBJECT_SHA1, ERROR) \
-	FUNC(BAD_PARENT_SHA1, ERROR) \
-	FUNC(BAD_TAG_OBJECT, ERROR) \
-	FUNC(BAD_TIMEZONE, ERROR) \
-	FUNC(BAD_TREE, ERROR) \
-	FUNC(BAD_TREE_SHA1, ERROR) \
-	FUNC(BAD_TYPE, ERROR) \
-	FUNC(DUPLICATE_ENTRIES, ERROR) \
-	FUNC(MISSING_AUTHOR, ERROR) \
-	FUNC(MISSING_COMMITTER, ERROR) \
-	FUNC(MISSING_EMAIL, ERROR) \
-	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_OBJECT, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
-	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
-	FUNC(MISSING_TAG, ERROR) \
-	FUNC(MISSING_TAG_ENTRY, ERROR) \
-	FUNC(MISSING_TREE, ERROR) \
-	FUNC(MISSING_TREE_OBJECT, ERROR) \
-	FUNC(MISSING_TYPE, ERROR) \
-	FUNC(MISSING_TYPE_ENTRY, ERROR) \
-	FUNC(MULTIPLE_AUTHORS, ERROR) \
-	FUNC(TREE_NOT_SORTED, ERROR) \
-	FUNC(UNKNOWN_TYPE, ERROR) \
-	FUNC(ZERO_PADDED_DATE, ERROR) \
-	FUNC(GITMODULES_MISSING, ERROR) \
-	FUNC(GITMODULES_BLOB, ERROR) \
-	FUNC(GITMODULES_LARGE, ERROR) \
-	FUNC(GITMODULES_NAME, ERROR) \
-	FUNC(GITMODULES_SYMLINK, ERROR) \
-	FUNC(GITMODULES_URL, ERROR) \
-	FUNC(GITMODULES_PATH, ERROR) \
-	FUNC(GITMODULES_UPDATE, ERROR) \
-	/* warnings */ \
-	FUNC(BAD_FILEMODE, WARN) \
-	FUNC(EMPTY_NAME, WARN) \
-	FUNC(FULL_PATHNAME, WARN) \
-	FUNC(HAS_DOT, WARN) \
-	FUNC(HAS_DOTDOT, WARN) \
-	FUNC(HAS_DOTGIT, WARN) \
-	FUNC(NULL_SHA1, WARN) \
-	FUNC(ZERO_PADDED_FILEMODE, WARN) \
-	FUNC(NUL_IN_COMMIT, WARN) \
-	/* infos (reported as warnings, but ignored by default) */ \
-	FUNC(GITMODULES_PARSE, INFO) \
-	FUNC(BAD_TAG_NAME, INFO) \
-	FUNC(MISSING_TAGGER_ENTRY, INFO) \
-	/* ignored (elevated when requested) */ \
-	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
-
-#define MSG_ID(id, msg_type) FSCK_MSG_##id,
-enum fsck_msg_id {
-	FOREACH_FSCK_MSG_ID(MSG_ID)
-	FSCK_MSG_MAX
-};
-#undef MSG_ID
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
diff --git a/fsck.h b/fsck.h
index fce9981a0cb..c3d3b47b88b 100644
--- a/fsck.h
+++ b/fsck.h
@@ -13,6 +13,72 @@ enum fsck_msg_type {
 	FSCK_WARN,
 };
 
+#define FOREACH_FSCK_MSG_ID(FUNC) \
+	/* fatal errors */ \
+	FUNC(NUL_IN_HEADER, FATAL) \
+	FUNC(UNTERMINATED_HEADER, FATAL) \
+	/* errors */ \
+	FUNC(BAD_DATE, ERROR) \
+	FUNC(BAD_DATE_OVERFLOW, ERROR) \
+	FUNC(BAD_EMAIL, ERROR) \
+	FUNC(BAD_NAME, ERROR) \
+	FUNC(BAD_OBJECT_SHA1, ERROR) \
+	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_TAG_OBJECT, ERROR) \
+	FUNC(BAD_TIMEZONE, ERROR) \
+	FUNC(BAD_TREE, ERROR) \
+	FUNC(BAD_TREE_SHA1, ERROR) \
+	FUNC(BAD_TYPE, ERROR) \
+	FUNC(DUPLICATE_ENTRIES, ERROR) \
+	FUNC(MISSING_AUTHOR, ERROR) \
+	FUNC(MISSING_COMMITTER, ERROR) \
+	FUNC(MISSING_EMAIL, ERROR) \
+	FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_OBJECT, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
+	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
+	FUNC(MISSING_TAG, ERROR) \
+	FUNC(MISSING_TAG_ENTRY, ERROR) \
+	FUNC(MISSING_TREE, ERROR) \
+	FUNC(MISSING_TREE_OBJECT, ERROR) \
+	FUNC(MISSING_TYPE, ERROR) \
+	FUNC(MISSING_TYPE_ENTRY, ERROR) \
+	FUNC(MULTIPLE_AUTHORS, ERROR) \
+	FUNC(TREE_NOT_SORTED, ERROR) \
+	FUNC(UNKNOWN_TYPE, ERROR) \
+	FUNC(ZERO_PADDED_DATE, ERROR) \
+	FUNC(GITMODULES_MISSING, ERROR) \
+	FUNC(GITMODULES_BLOB, ERROR) \
+	FUNC(GITMODULES_LARGE, ERROR) \
+	FUNC(GITMODULES_NAME, ERROR) \
+	FUNC(GITMODULES_SYMLINK, ERROR) \
+	FUNC(GITMODULES_URL, ERROR) \
+	FUNC(GITMODULES_PATH, ERROR) \
+	FUNC(GITMODULES_UPDATE, ERROR) \
+	/* warnings */ \
+	FUNC(BAD_FILEMODE, WARN) \
+	FUNC(EMPTY_NAME, WARN) \
+	FUNC(FULL_PATHNAME, WARN) \
+	FUNC(HAS_DOT, WARN) \
+	FUNC(HAS_DOTDOT, WARN) \
+	FUNC(HAS_DOTGIT, WARN) \
+	FUNC(NULL_SHA1, WARN) \
+	FUNC(ZERO_PADDED_FILEMODE, WARN) \
+	FUNC(NUL_IN_COMMIT, WARN) \
+	/* infos (reported as warnings, but ignored by default) */ \
+	FUNC(GITMODULES_PARSE, INFO) \
+	FUNC(BAD_TAG_NAME, INFO) \
+	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	/* ignored (elevated when requested) */ \
+	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
+
+#define MSG_ID(id, msg_type) FSCK_MSG_##id,
+enum fsck_msg_id {
+	FOREACH_FSCK_MSG_ID(MSG_ID)
+	FSCK_MSG_MAX
+};
+#undef MSG_ID
+
 struct fsck_options;
 struct object;
 
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (17 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 16/22] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 18/22] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
                                 ` (4 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the fsck_error callback to also pass along the
fsck_msg_id. Before this change the only way to get the message id was
to parse it back out of the "message".

Let's pass it down explicitly for the benefit of callers that might
want to use it, as discussed in [1].

Passing the msg_type is now redundant, as you can always get it back
from the msg_id, but I'm not changing that convention. It's really
common to need the msg_type, and the report() function itself (which
calls "fsck_error") needs to call fsck_msg_type() to discover
it. Let's not needlessly re-do that work in the user callback.

1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c       | 4 +++-
 builtin/index-pack.c | 3 ++-
 builtin/mktag.c      | 1 +
 fsck.c               | 6 ++++--
 fsck.h               | 6 ++++--
 5 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d6d745dc702..b71fac4ceca 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -89,7 +89,9 @@ static int objerror(struct object *obj, const char *err)
 static int fsck_error_func(struct fsck_options *o,
 			   const struct object_id *oid,
 			   enum object_type object_type,
-			   enum fsck_msg_type msg_type, const char *message)
+			   enum fsck_msg_type msg_type,
+			   enum fsck_msg_id msg_id,
+			   const char *message)
 {
 	switch (msg_type) {
 	case FSCK_WARN:
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 56b8efaa89b..2b2266a4b7d 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1717,6 +1717,7 @@ static int print_dangling_gitmodules(struct fsck_options *o,
 				     const struct object_id *oid,
 				     enum object_type object_type,
 				     enum fsck_msg_type msg_type,
+				     enum fsck_msg_id msg_id,
 				     const char *message)
 {
 	/*
@@ -1727,7 +1728,7 @@ static int print_dangling_gitmodules(struct fsck_options *o,
 		printf("%s\n", oid_to_hex(oid));
 		return 0;
 	}
-	return fsck_error_function(o, oid, object_type, msg_type, message);
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
 }
 
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 1834394a9b6..dc989c356f5 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -23,6 +23,7 @@ static int mktag_fsck_error_func(struct fsck_options *o,
 				 const struct object_id *oid,
 				 enum object_type object_type,
 				 enum fsck_msg_type msg_type,
+				 enum fsck_msg_id msg_id,
 				 const char *message)
 {
 	switch (msg_type) {
diff --git a/fsck.c b/fsck.c
index 8a66168e516..5a040eb4fd5 100644
--- a/fsck.c
+++ b/fsck.c
@@ -245,7 +245,7 @@ static int report(struct fsck_options *options,
 	va_start(ap, fmt);
 	strbuf_vaddf(&sb, fmt, ap);
 	result = options->error_func(options, oid, object_type,
-				     msg_type, sb.buf);
+				     msg_type, msg_id, sb.buf);
 	strbuf_release(&sb);
 	va_end(ap);
 
@@ -1198,7 +1198,9 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid,
 			enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message)
+			enum fsck_msg_type msg_type,
+			enum fsck_msg_id msg_id,
+			const char *message)
 {
 	if (msg_type == FSCK_WARN) {
 		warning("object %s: %s", fsck_describe_object(o, oid), message);
diff --git a/fsck.h b/fsck.h
index c3d3b47b88b..33ecf3f3f16 100644
--- a/fsck.h
+++ b/fsck.h
@@ -101,11 +101,13 @@ typedef int (*fsck_walk_func)(struct object *obj, enum object_type object_type,
 /* callback for fsck_object, type is FSCK_ERROR or FSCK_WARN */
 typedef int (*fsck_error)(struct fsck_options *o,
 			  const struct object_id *oid, enum object_type object_type,
-			  enum fsck_msg_type msg_type, const char *message);
+			  enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			  const char *message);
 
 int fsck_error_function(struct fsck_options *o,
 			const struct object_id *oid, enum object_type object_type,
-			enum fsck_msg_type msg_type, const char *message);
+			enum fsck_msg_type msg_type, enum fsck_msg_id msg_id,
+			const char *message);
 
 struct fsck_options {
 	fsck_walk_func walk;
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 18/22] fsck.c: add an fsck_set_msg_type() API that takes enums
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (18 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 19/22] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
                                 ` (3 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change code I added in acf9de4c94e (mktag: use fsck instead of custom
verify_tag(), 2021-01-05) to make use of a new API function that takes
the fsck_msg_{id,type} types, instead of arbitrary strings that
we'll (hopefully) parse into those types.

At the time that the fsck_set_msg_type() API was introduced in
0282f4dced0 (fsck: offer a function to demote fsck errors to warnings,
2015-06-22) it was only intended to be used to parse user-supplied
data.

For things that are purely internal to the C code it makes sense to
have the compiler check these arguments, and to skip the sanity
checking of the data in fsck_set_msg_type() which is redundant to
checks we get from the compiler.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/mktag.c |  3 ++-
 fsck.c          | 27 +++++++++++++++++----------
 fsck.h          |  3 +++
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/builtin/mktag.c b/builtin/mktag.c
index dc989c356f5..de67a94f24e 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -93,7 +93,8 @@ int cmd_mktag(int argc, const char **argv, const char *prefix)
 		die_errno(_("could not read from stdin"));
 
 	fsck_options.error_func = mktag_fsck_error_func;
-	fsck_set_msg_type(&fsck_options, "extraheaderentry", "warn");
+	fsck_set_msg_type_from_ids(&fsck_options, FSCK_MSG_EXTRA_HEADER_ENTRY,
+				   FSCK_WARN);
 	/* config might set fsck.extraHeaderEntry=* again */
 	git_config(mktag_config, NULL);
 	if (fsck_tag_standalone(NULL, buf.buf, buf.len, &fsck_options,
diff --git a/fsck.c b/fsck.c
index 5a040eb4fd5..f26f47b2a10 100644
--- a/fsck.c
+++ b/fsck.c
@@ -132,6 +132,22 @@ int is_valid_msg_type(const char *msg_id, const char *msg_type)
 	return 1;
 }
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type)
+{
+	if (!options->msg_type) {
+		int i;
+		enum fsck_msg_type *severity;
+		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
+		for (i = 0; i < FSCK_MSG_MAX; i++)
+			severity[i] = fsck_msg_type(i, options);
+		options->msg_type = severity;
+	}
+
+	options->msg_type[msg_id] = msg_type;
+}
+
 void fsck_set_msg_type(struct fsck_options *options,
 		const char *msg_id_str, const char *msg_type_str)
 {
@@ -144,16 +160,7 @@ void fsck_set_msg_type(struct fsck_options *options,
 	if (msg_type != FSCK_ERROR && msg_id_info[msg_id].msg_type == FSCK_FATAL)
 		die("Cannot demote %s to %s", msg_id_str, msg_type_str);
 
-	if (!options->msg_type) {
-		int i;
-		enum fsck_msg_type *severity;
-		ALLOC_ARRAY(severity, FSCK_MSG_MAX);
-		for (i = 0; i < FSCK_MSG_MAX; i++)
-			severity[i] = fsck_msg_type(i, options);
-		options->msg_type = severity;
-	}
-
-	options->msg_type[msg_id] = msg_type;
+	fsck_set_msg_type_from_ids(options, msg_id, msg_type);
 }
 
 void fsck_set_msg_types(struct fsck_options *options, const char *values)
diff --git a/fsck.h b/fsck.h
index 33ecf3f3f16..6c2fd9c5cc0 100644
--- a/fsck.h
+++ b/fsck.h
@@ -82,6 +82,9 @@ enum fsck_msg_id {
 struct fsck_options;
 struct object;
 
+void fsck_set_msg_type_from_ids(struct fsck_options *options,
+				enum fsck_msg_id msg_id,
+				enum fsck_msg_type msg_type);
 void fsck_set_msg_type(struct fsck_options *options,
 		       const char *msg_id, const char *msg_type);
 void fsck_set_msg_types(struct fsck_options *options, const char *values);
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 19/22] fsck.c: move gitmodules_{found,done} into fsck_options
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (19 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 18/22] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 20/22] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
                                 ` (2 subsequent siblings)
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Move the gitmodules_{found,done} static variables added in
159e7b080bf (fsck: detect gitmodules files, 2018-05-02) into the
fsck_options struct. It makes sense to keep all the context in the
same place.

This requires changing the recently added register_found_gitmodules()
function added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) to take fsck_options. That function will be
removed in a subsequent commit, but as it'll require the new
gitmodules_found attribute of "fsck_options" we need this intermediate
step first.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c |  2 +-
 fsck.c       | 23 ++++++++++-------------
 fsck.h       |  7 ++++++-
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 0cb59acc486..53d7ef00856 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -998,7 +998,7 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(oid);
+		register_found_gitmodules(&fo, oid);
 	if (fsck_finish(&fo))
 		die("fsck failed");
 }
diff --git a/fsck.c b/fsck.c
index f26f47b2a10..565274a946c 100644
--- a/fsck.c
+++ b/fsck.c
@@ -19,9 +19,6 @@
 #include "credential.h"
 #include "help.h"
 
-static struct oidset gitmodules_found = OIDSET_INIT;
-static struct oidset gitmodules_done = OIDSET_INIT;
-
 #define STR(x) #x
 #define MSG_ID(id, msg_type) { STR(id), NULL, NULL, FSCK_##msg_type },
 static struct {
@@ -624,7 +621,7 @@ static int fsck_tree(const struct object_id *oid,
 
 		if (is_hfs_dotgitmodules(name) || is_ntfs_dotgitmodules(name)) {
 			if (!S_ISLNK(mode))
-				oidset_insert(&gitmodules_found, oid);
+				oidset_insert(&options->gitmodules_found, oid);
 			else
 				retval += report(options,
 						 oid, OBJ_TREE,
@@ -638,7 +635,7 @@ static int fsck_tree(const struct object_id *oid,
 				has_dotgit |= is_ntfs_dotgit(backslash);
 				if (is_ntfs_dotgitmodules(backslash)) {
 					if (!S_ISLNK(mode))
-						oidset_insert(&gitmodules_found, oid);
+						oidset_insert(&options->gitmodules_found, oid);
 					else
 						retval += report(options, oid, OBJ_TREE,
 								 FSCK_MSG_GITMODULES_SYMLINK,
@@ -1150,9 +1147,9 @@ static int fsck_blob(const struct object_id *oid, const char *buf,
 	struct fsck_gitmodules_data data;
 	struct config_options config_opts = { 0 };
 
-	if (!oidset_contains(&gitmodules_found, oid))
+	if (!oidset_contains(&options->gitmodules_found, oid))
 		return 0;
-	oidset_insert(&gitmodules_done, oid);
+	oidset_insert(&options->gitmodules_done, oid);
 
 	if (object_on_skiplist(options, oid))
 		return 0;
@@ -1217,9 +1214,9 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
-void register_found_gitmodules(const struct object_id *oid)
+void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
 {
-	oidset_insert(&gitmodules_found, oid);
+	oidset_insert(&options->gitmodules_found, oid);
 }
 
 int fsck_finish(struct fsck_options *options)
@@ -1228,13 +1225,13 @@ int fsck_finish(struct fsck_options *options)
 	struct oidset_iter iter;
 	const struct object_id *oid;
 
-	oidset_iter_init(&gitmodules_found, &iter);
+	oidset_iter_init(&options->gitmodules_found, &iter);
 	while ((oid = oidset_iter_next(&iter))) {
 		enum object_type type;
 		unsigned long size;
 		char *buf;
 
-		if (oidset_contains(&gitmodules_done, oid))
+		if (oidset_contains(&options->gitmodules_done, oid))
 			continue;
 
 		buf = read_object_file(oid, &type, &size);
@@ -1259,8 +1256,8 @@ int fsck_finish(struct fsck_options *options)
 	}
 
 
-	oidset_clear(&gitmodules_found);
-	oidset_clear(&gitmodules_done);
+	oidset_clear(&options->gitmodules_found);
+	oidset_clear(&options->gitmodules_done);
 	return ret;
 }
 
diff --git a/fsck.h b/fsck.h
index 6c2fd9c5cc0..bb59ef05b68 100644
--- a/fsck.h
+++ b/fsck.h
@@ -118,6 +118,8 @@ struct fsck_options {
 	unsigned strict:1;
 	enum fsck_msg_type *msg_type;
 	struct oidset skiplist;
+	struct oidset gitmodules_found;
+	struct oidset gitmodules_done;
 	kh_oid_map_t *object_names;
 };
 
@@ -125,6 +127,8 @@ struct fsck_options {
 	.walk = NULL, \
 	.msg_type = NULL, \
 	.skiplist = OIDSET_INIT, \
+	.gitmodules_found = OIDSET_INIT, \
+	.gitmodules_done = OIDSET_INIT, \
 	.object_names = NULL,
 #define FSCK_OPTIONS_COMMON_ERROR_FUNC \
 	FSCK_OPTIONS_COMMON \
@@ -149,7 +153,8 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
-void register_found_gitmodules(const struct object_id *oid);
+void register_found_gitmodules(struct fsck_options *options,
+			       const struct object_id *oid);
 
 /*
  * fsck a tag, and pass info about it back to the caller. This is
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 20/22] fetch-pack: don't needlessly copy fsck_options
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (20 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 19/22] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 21/22] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 22/22] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change the behavior of the .gitmodules validation added in
5476e1efde (fetch-pack: print and use dangling .gitmodules,
2021-02-22) so we're using one "fsck_options".

I found that code confusing to read. One might think that not setting
up the error_func earlier means that we're relying on the "error_func"
not being set in some code in between the two hunks being modified
here.

But we're not, all we're doing in the rest of "cmd_index_pack()" is
further setup by calling fsck_set_msg_types(), and assigning to
do_fsck_object.

So there was no reason in 5476e1efde to make a shallow copy of the
fsck_options struct before setting error_func. Let's just do this
setup at the top of the function, along with the "walk" assignment.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/index-pack.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 2b2266a4b7d..5ad80b85b47 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1761,6 +1761,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 	fsck_options.walk = mark_link;
+	fsck_options.error_func = print_dangling_gitmodules;
 
 	reset_pack_idx_option(&opts);
 	git_config(git_index_pack_config, &opts);
@@ -1951,13 +1952,8 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 	else
 		close(input_fd);
 
-	if (do_fsck_object) {
-		struct fsck_options fo = fsck_options;
-
-		fo.error_func = print_dangling_gitmodules;
-		if (fsck_finish(&fo))
-			die(_("fsck error in pack objects"));
-	}
+	if (do_fsck_object && fsck_finish(&fsck_options))
+		die(_("fsck error in pack objects"));
 
 	free(objects);
 	strbuf_release(&index_name_buf);
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 21/22] fetch-pack: use file-scope static struct for fsck_options
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (21 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 20/22] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  2021-03-06 11:04               ` [PATCH v3 22/22] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Change code added in 5476e1efde (fetch-pack: print and use dangling
.gitmodules, 2021-02-22) so that we use a file-scoped "static struct
fsck_options" instead of defining one in the "fsck_gitmodules_oids()"
function.

We use this pattern in all of
builtin/{fsck,index-pack,mktag,unpack-objects}.c. It's odd to see
fetch-pack be the odd one out. One might think that we're using other
fsck_options structs in fetch-pack, or doing on fsck twice there, but
we're not.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 fetch-pack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 53d7ef00856..f961c3067cd 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -38,6 +38,7 @@ static int server_supports_filtering;
 static int advertise_sid;
 static struct shallow_lock shallow_lock;
 static const char *alternate_shallow_file;
+static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 static struct strbuf fsck_msg_types = STRBUF_INIT;
 static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 
@@ -991,15 +992,14 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 {
 	struct oidset_iter iter;
 	const struct object_id *oid;
-	struct fsck_options fo = FSCK_OPTIONS_STRICT;
 
 	if (!oidset_size(gitmodules_oids))
 		return;
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(&fo, oid);
-	if (fsck_finish(&fo))
+		register_found_gitmodules(&fsck_options, oid);
+	if (fsck_finish(&fsck_options))
 		die("fsck failed");
 }
 
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* [PATCH v3 22/22] fetch-pack: use new fsck API to printing dangling submodules
  2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
                                 ` (22 preceding siblings ...)
  2021-03-06 11:04               ` [PATCH v3 21/22] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
@ 2021-03-06 11:04               ` Ævar Arnfjörð Bjarmason
  23 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-06 11:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Schindelin, Jonathan Tan,
	Ævar Arnfjörð Bjarmason

Refactor the check added in 5476e1efde (fetch-pack: print and use
dangling .gitmodules, 2021-02-22) to make use of us now passing the
"msg_id" to the user defined "error_func". We can now compare against
the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated
message.

Let's also replace register_found_gitmodules() with directly
manipulating the "gitmodules_found" member. A recent commit moved it
into "fsck_options" so we could do this here.

Add a fsck-cb.c file similar to parse-options-cb.c, the alternative
would be to either define this directly in fsck.c as a public API, or
to create some library shared by fetch-pack.c ad builtin/index-pack.

I expect that there won't be many of these fsck utility functions in
the future, so just having a single fsck-cb.c makes sense.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile             |  1 +
 builtin/index-pack.c | 21 +--------------------
 fetch-pack.c         |  4 ++--
 fsck-cb.c            | 16 ++++++++++++++++
 fsck.c               |  5 -----
 fsck.h               | 22 +++++++++++++++++++---
 6 files changed, 39 insertions(+), 30 deletions(-)
 create mode 100644 fsck-cb.c

diff --git a/Makefile b/Makefile
index dd08b4ced01..5bf128c5d2c 100644
--- a/Makefile
+++ b/Makefile
@@ -879,6 +879,7 @@ LIB_OBJS += fetch-negotiator.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fmt-merge-msg.o
 LIB_OBJS += fsck.o
+LIB_OBJS += fsck-cb.o
 LIB_OBJS += fsmonitor.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 5ad80b85b47..11f0fafd33b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -120,7 +120,7 @@ static int nr_threads;
 static int from_stdin;
 static int strict;
 static int do_fsck_object;
-static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
+static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
 static int verbose;
 static int show_resolving_progress;
 static int show_stat;
@@ -1713,24 +1713,6 @@ static void show_pack_info(int stat_only)
 	}
 }
 
-static int print_dangling_gitmodules(struct fsck_options *o,
-				     const struct object_id *oid,
-				     enum object_type object_type,
-				     enum fsck_msg_type msg_type,
-				     enum fsck_msg_id msg_id,
-				     const char *message)
-{
-	/*
-	 * NEEDSWORK: Plumb the MSG_ID (from fsck.c) here and use it
-	 * instead of relying on this string check.
-	 */
-	if (starts_with(message, "gitmodulesMissing")) {
-		printf("%s\n", oid_to_hex(oid));
-		return 0;
-	}
-	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
-}
-
 int cmd_index_pack(int argc, const char **argv, const char *prefix)
 {
 	int i, fix_thin_pack = 0, verify = 0, stat_only = 0, rev_index;
@@ -1761,7 +1743,6 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
 
 	read_replace_refs = 0;
 	fsck_options.walk = mark_link;
-	fsck_options.error_func = print_dangling_gitmodules;
 
 	reset_pack_idx_option(&opts);
 	git_config(git_index_pack_config, &opts);
diff --git a/fetch-pack.c b/fetch-pack.c
index f961c3067cd..7fc305b65c4 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -38,7 +38,7 @@ static int server_supports_filtering;
 static int advertise_sid;
 static struct shallow_lock shallow_lock;
 static const char *alternate_shallow_file;
-static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
+static struct fsck_options fsck_options = FSCK_OPTIONS_MISSING_GITMODULES;
 static struct strbuf fsck_msg_types = STRBUF_INIT;
 static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 
@@ -998,7 +998,7 @@ static void fsck_gitmodules_oids(struct oidset *gitmodules_oids)
 
 	oidset_iter_init(gitmodules_oids, &iter);
 	while ((oid = oidset_iter_next(&iter)))
-		register_found_gitmodules(&fsck_options, oid);
+		oidset_insert(&fsck_options.gitmodules_found, oid);
 	if (fsck_finish(&fsck_options))
 		die("fsck failed");
 }
diff --git a/fsck-cb.c b/fsck-cb.c
new file mode 100644
index 00000000000..465a49235ac
--- /dev/null
+++ b/fsck-cb.c
@@ -0,0 +1,16 @@
+#include "git-compat-util.h"
+#include "fsck.h"
+
+int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
+					   const struct object_id *oid,
+					   enum object_type object_type,
+					   enum fsck_msg_type msg_type,
+					   enum fsck_msg_id msg_id,
+					   const char *message)
+{
+	if (msg_id == FSCK_MSG_GITMODULES_MISSING) {
+		puts(oid_to_hex(oid));
+		return 0;
+	}
+	return fsck_error_function(o, oid, object_type, msg_type, msg_id, message);
+}
diff --git a/fsck.c b/fsck.c
index 565274a946c..b0089844db9 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1214,11 +1214,6 @@ int fsck_error_function(struct fsck_options *o,
 	return 1;
 }
 
-void register_found_gitmodules(struct fsck_options *options, const struct object_id *oid)
-{
-	oidset_insert(&options->gitmodules_found, oid);
-}
-
 int fsck_finish(struct fsck_options *options)
 {
 	int ret = 0;
diff --git a/fsck.h b/fsck.h
index bb59ef05b68..ae3107638ab 100644
--- a/fsck.h
+++ b/fsck.h
@@ -153,9 +153,6 @@ int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
-void register_found_gitmodules(struct fsck_options *options,
-			       const struct object_id *oid);
-
 /*
  * fsck a tag, and pass info about it back to the caller. This is
  * exposed fsck_object() internals for git-mktag(1).
@@ -204,4 +201,23 @@ const char *fsck_describe_object(struct fsck_options *options,
 int fsck_config_internal(const char *var, const char *value, void *cb,
 			 struct fsck_options *options);
 
+/*
+ * Initializations for callbacks in fsck-cb.c
+ */
+#define FSCK_OPTIONS_MISSING_GITMODULES { \
+	.strict = 1, \
+	.error_func = fsck_error_cb_print_missing_gitmodules, \
+	FSCK_OPTIONS_COMMON \
+}
+
+/*
+ * Error callbacks in fsck-cb.c
+ */
+int fsck_error_cb_print_missing_gitmodules(struct fsck_options *o,
+					   const struct object_id *oid,
+					   enum object_type object_type,
+					   enum fsck_msg_type msg_type,
+					   enum fsck_msg_id msg_id,
+					   const char *message);
+
 #endif
-- 
2.31.0.rc0.126.g04f22c5b82


^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 00/22] fsck: API improvements
  2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
@ 2021-03-07 23:04                 ` Junio C Hamano
  2021-03-08  9:16                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 134+ messages in thread
From: Junio C Hamano @ 2021-03-07 23:04 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Now that jt/transfer-fsck-across-packs has been merged to master
> here's a re-roll of v1[1]+v2[2] of this series.

It unfortunately is not a good time to review or helping any work on
this series, as the base topic introduced an unpleasant regression
and needs to either probably gain a band-aid (or reverted in the
worst case); of course, it would be appreciated to help resolve the
issues on that topic ;-)

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH v3 00/22] fsck: API improvements
  2021-03-07 23:04                 ` Junio C Hamano
@ 2021-03-08  9:16                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 134+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-08  9:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Johannes Schindelin, Jonathan Tan


On Mon, Mar 08 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Now that jt/transfer-fsck-across-packs has been merged to master
>> here's a re-roll of v1[1]+v2[2] of this series.
>
> It unfortunately is not a good time to review or helping any work on
> this series, as the base topic introduced an unpleasant regression
> and needs to either probably gain a band-aid (or reverted in the
> worst case); of course, it would be appreciated to help resolve the
> issues on that topic ;-)

I should have mentioned: I saw the bug & proposed fix thread for that.
I see that 2aec3bc4b64 (fetch-pack: do not mix --pack_header and
packfile uri, 2021-03-04) down into next is now merged down to next.

My reading of that thread is that the reported bug is solved, but
perhaps we're not 100% happy with the solution?

In any case, that patch does not conflict with this series, and all
tests pass with/without the two merged together.

I don't forese an issue with the two stepping on each other's toes,
since I'm just modifying the rather low-level fsck interface of spewing
out .gitmodules entries, not touching the logic of what's then done with
that information...



^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-05 23:18             ` Junio C Hamano
@ 2021-03-08 19:14               ` Jonathan Tan
  2021-03-08 19:34                 ` Junio C Hamano
  0 siblings, 1 reply; 134+ messages in thread
From: Jonathan Tan @ 2021-03-08 19:14 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, jrnieder, nmulcahey

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > I should probably have written more in the commit message to justify the
> > unification, but it is also part of a bug fix (in particular,
> > --fsck-objects wasn't being passed to the index-pack that indexed the
> > packfiles linked by URI) and for code health purposes (to prevent future
> > bugs by eliminating the divergence). So reverting that commit would
> > reintroduce another bug.
> 
> Not necessarily.  Unifying two that do not inherently have to be
> identical makes it impossible to pass two different things, and that
> is what we are seeing in the bug this patch is trying to fix (by
> forcing the two to be identical by eliminating the unpack-objects
> codepath in certain cases).  
> 
> The right "fix" for the original bug would have been to keep them
> still separate yet making it easy to pass args that must be used in
> both of them, no?

OK - I'll do this.

> >> > -		if (do_keep && (args->lock_pack || unpack_limit)) {
> >> > +		if ((do_keep || index_pack_args) && (args->lock_pack || unpack_limit)) {
> >> >  			char hostname[HOST_NAME_MAX + 1];
> >> >  			if (xgethostname(hostname, sizeof(hostname)))
> >> >  				xsnprintf(hostname, sizeof(hostname), "localhost");
> >> 
> >> I do not quite get what this hunk is doing.  Care to explain?
> >
> > The "do_keep" part was unnecessarily restrictive and I used a band-aid
> > solution to loosen it. I think this started from 88e2f9ed8e ("introduce
> > fetch-object: fetch one promisor object", 2017-12-05) where I might have
> > misunderstood what do_keep was meant to do, and taught fetch-pack to use
> > "index-pack" if do_keep is true or args->from_promisor is true. What I
> > should have done is to set do_keep to true if args->from_promisor is
> > true. Future commits continued to do that with fsck_objects and
> > index_pack_args.
> 
> > Maybe what I can do is to refactor get_pack() so that do_keep retains
> > its original meaning of whether to use "index-pack" or "unpack-objects",
> > and then we wouldn't need this line. What do you think (code-wise and
> > whether this fits in with the release schedule, if we want to get this
> > in before release)?
> 
> How bad is the breakage this one is trying to fix?  I know it would
> only affect folks who have to interact with the server that uses
> packfile URI feature, but do they have a workaround, perhaps with a
> configuration knob or command line option to ignore the packfile
> URI,

Yes, there's a workaround (to disable packfile URIs from the client side
using a config variable).

> and how large is the affected population?

The only issues I've seen are within $DAYJOB, and there, we can carry
our own patch to fix this issue. So the affected population (right now)
is probably not much (if it even exists).

> I cannot shake the feeling that we are seeing band-aid on top of
> band-aid forced by having chosen to go in a wrong direction in the
> beginning X-<, and prefer to see the code drift even further into
> the same direction; hence my earlier suggestion to go back to the
> root cause by first reverting the wrong fix that introduced this bug
> and fixing the original bug in a different way.
> 
> I dunno how involved the necessary surgery would be, though.  If
> this is easy to work around, perhaps it might be a better option for
> the overall project to ship the upcoming release with this listed as
> a known breakage.

I don't think it's too difficult - I think we'll only need to filter out
the --pack_header when we figure out the arguments to pass for the
packfiles given by URI. I'll take a look.

^ permalink raw reply	[flat|nested] 134+ messages in thread

* Re: [PATCH] fetch-pack: do not mix --pack_header and packfile uri
  2021-03-08 19:14               ` Jonathan Tan
@ 2021-03-08 19:34                 ` Junio C Hamano
  0 siblings, 0 replies; 134+ messages in thread
From: Junio C Hamano @ 2021-03-08 19:34 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, jrnieder, nmulcahey

Jonathan Tan <jonathantanmy@google.com> writes:

>> I dunno how involved the necessary surgery would be, though.  If
>> this is easy to work around, perhaps it might be a better option for
>> the overall project to ship the upcoming release with this listed as
>> a known breakage.
>
> I don't think it's too difficult - I think we'll only need to filter out
> the --pack_header when we figure out the arguments to pass for the
> packfiles given by URI. I'll take a look.

What you sent earlier is a much better band-aid than "keep the
single args array but filter an element out in only one codepath"
band-aid, I would think.

Any change that is more involved than a single-liner trivial bugfix
would be too late for this cycle, as we'd be cutting -rc2 by the end
of tomorrow.

Thanks.

^ permalink raw reply	[flat|nested] 134+ messages in thread

end of thread, other threads:[~2021-03-08 19:35 UTC | newest]

Thread overview: 134+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-15 23:43 RFC on packfile URIs and .gitmodules check Jonathan Tan
2021-01-16  0:30 ` Junio C Hamano
2021-01-16  3:22   ` Taylor Blau
2021-01-19 12:56     ` Derrick Stolee
2021-01-19 19:13       ` Jonathan Tan
2021-01-20  1:04         ` Junio C Hamano
2021-01-19 19:02     ` Jonathan Tan
2021-01-20  8:07 ` Ævar Arnfjörð Bjarmason
2021-01-20 19:30   ` Jonathan Tan
2021-01-21  3:06     ` Junio C Hamano
2021-01-21 18:32       ` Jonathan Tan
2021-01-21 18:39         ` Junio C Hamano
2021-01-20 19:36   ` [PATCH] Doc: clarify contents of packfile sent as URI Jonathan Tan
2021-01-24  2:34 ` [PATCH 0/4] Check .gitmodules when using packfile URIs Jonathan Tan
2021-01-24  2:34   ` [PATCH 1/4] http: allow custom index-pack args Jonathan Tan
2021-01-24  2:34   ` [PATCH 2/4] http-fetch: " Jonathan Tan
2021-01-24 11:52     ` Ævar Arnfjörð Bjarmason
2021-01-28  0:32       ` Jonathan Tan
2021-02-16 20:49     ` Josh Steadmon
2021-02-16 22:57       ` Junio C Hamano
2021-02-17 19:46         ` Jonathan Tan
2021-01-24  2:34   ` [PATCH 3/4] fetch-pack: with packfile URIs, use index-pack arg Jonathan Tan
2021-01-24  2:34   ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
2021-01-24  7:56     ` Junio C Hamano
2021-01-26  1:57       ` Junio C Hamano
2021-01-28  1:04         ` Jonathan Tan
2021-01-24 12:18     ` Ævar Arnfjörð Bjarmason
2021-01-28  1:03       ` Jonathan Tan
2021-02-17  1:48         ` Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 00/14] fsck: API improvements Ævar Arnfjörð Bjarmason
2021-02-17 21:02             ` Junio C Hamano
2021-02-18  0:00               ` Ævar Arnfjörð Bjarmason
2021-02-18 19:12                 ` Junio C Hamano
2021-02-18 19:57                   ` Jeff King
2021-02-18 20:27                     ` Junio C Hamano
2021-02-19  0:54                       ` Ævar Arnfjörð Bjarmason
2021-02-18 22:36                     ` Junio C Hamano
2021-02-18 10:58             ` [PATCH v2 00/10] fsck: API improvements (no conflicts with 'seen') Ævar Arnfjörð Bjarmason
2021-02-18 22:19               ` Junio C Hamano
2021-03-06 11:04               ` [PATCH v3 00/22] fsck: API improvements Ævar Arnfjörð Bjarmason
2021-03-07 23:04                 ` Junio C Hamano
2021-03-08  9:16                   ` Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 01/22] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 02/22] fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 03/22] fsck.h: reduce duplication between FSCK_OPTIONS_{DEFAULT,STRICT} Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 04/22] fsck.h: add a FSCK_OPTIONS_COMMON_ERROR_FUNC macro Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 05/22] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 06/22] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 07/22] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 08/22] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 09/22] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 10/22] fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 11/22] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 12/22] fsck.h: re-order and re-assign "enum fsck_msg_type" Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 13/22] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 14/22] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 15/22] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 16/22] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 17/22] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 18/22] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 19/22] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 20/22] fetch-pack: don't needlessly copy fsck_options Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 21/22] fetch-pack: use file-scope static struct for fsck_options Ævar Arnfjörð Bjarmason
2021-03-06 11:04               ` [PATCH v3 22/22] fetch-pack: use new fsck API to printing dangling submodules Ævar Arnfjörð Bjarmason
2021-02-18 10:58             ` [PATCH v2 01/10] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
2021-02-18 10:58             ` [PATCH v2 02/10] fsck.h: use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
2021-02-18 10:58             ` [PATCH v2 03/10] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
2021-02-18 19:45               ` Jeff King
2021-02-18 10:58             ` [PATCH v2 04/10] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
2021-02-18 10:58             ` [PATCH v2 05/10] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
2021-02-18 22:23               ` Junio C Hamano
2021-02-18 10:58             ` [PATCH v2 06/10] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
2021-02-18 19:52               ` Jeff King
2021-02-18 22:27                 ` Junio C Hamano
2021-02-18 10:58             ` [PATCH v2 07/10] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
2021-02-18 22:29               ` Junio C Hamano
2021-02-18 10:58             ` [PATCH v2 08/10] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
2021-02-18 22:30               ` Junio C Hamano
2021-02-18 10:58             ` [PATCH v2 09/10] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
2021-02-18 19:56               ` Jeff King
2021-02-18 10:58             ` [PATCH v2 10/10] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
2021-02-18 19:56               ` Jeff King
2021-02-18 22:33                 ` Junio C Hamano
2021-02-18 22:32               ` Junio C Hamano
2021-02-17 19:42           ` [PATCH 01/14] fsck.h: indent arguments to of fsck_set_msg_type Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 02/14] fsck.h: use use "enum object_type" instead of "int" Ævar Arnfjörð Bjarmason
2021-02-17 23:40             ` Junio C Hamano
2021-02-17 19:42           ` [PATCH 03/14] fsck.c: rename variables in fsck_set_msg_type() for less confusion Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 04/14] fsck.c: move definition of msg_id into append_msg_id() Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 05/14] fsck.c: rename remaining fsck_msg_id "id" to "msg_id" Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 06/14] fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 07/14] fsck.c: call parse_msg_type() early in fsck_set_msg_type() Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 08/14] fsck.c: undefine temporary STR macro after use Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 09/14] fsck.c: give "FOREACH_MSG_ID" a more specific name Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 10/14] fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from *.c to *.h Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 11/14] fsck.c: pass along the fsck_msg_id in the fsck_error callback Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 12/14] fsck.c: add an fsck_set_msg_type() API that takes enums Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 13/14] fsck.h: update FSCK_OPTIONS_* for object_name Ævar Arnfjörð Bjarmason
2021-02-17 19:42           ` [PATCH 14/14] fsck.c: move gitmodules_{found,done} into fsck_options Ævar Arnfjörð Bjarmason
2021-02-17 20:05           ` [PATCH 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
2021-01-24 12:30     ` Ævar Arnfjörð Bjarmason
2021-01-28  1:15       ` Jonathan Tan
2021-02-17  2:10         ` Ævar Arnfjörð Bjarmason
2021-02-17 20:10           ` Jonathan Tan
2021-02-18 12:07             ` Ævar Arnfjörð Bjarmason
2021-02-17 19:27     ` Ævar Arnfjörð Bjarmason
2021-02-17 20:11       ` Jonathan Tan
2021-01-24  6:29   ` [PATCH 0/4] Check .gitmodules when using packfile URIs Junio C Hamano
2021-01-28  0:35     ` Jonathan Tan
2021-02-18 11:31       ` Ævar Arnfjörð Bjarmason
2021-02-18 23:34   ` Junio C Hamano
2021-02-19  0:46     ` Jonathan Tan
2021-02-20  3:31       ` Junio C Hamano
2021-02-19  1:08     ` Ævar Arnfjörð Bjarmason
2021-02-20  3:29       ` Junio C Hamano
2021-02-22 19:20 ` [PATCH v2 " Jonathan Tan
2021-02-22 19:20   ` [PATCH v2 1/4] http: allow custom index-pack args Jonathan Tan
2021-02-22 19:20   ` [PATCH v2 2/4] http-fetch: " Jonathan Tan
2021-02-23 13:17     ` Ævar Arnfjörð Bjarmason
2021-02-23 16:51       ` Jonathan Tan
2021-03-05  0:19     ` Jonathan Nieder
2021-03-05  1:16       ` [PATCH] fetch-pack: do not mix --pack_header and packfile uri Jonathan Tan
2021-03-05  1:52         ` Junio C Hamano
2021-03-05 18:50         ` Junio C Hamano
2021-03-05 19:46           ` Junio C Hamano
2021-03-05 23:11             ` Jonathan Tan
2021-03-05 23:20             ` Junio C Hamano
2021-03-05 22:59           ` Jonathan Tan
2021-03-05 23:18             ` Junio C Hamano
2021-03-08 19:14               ` Jonathan Tan
2021-03-08 19:34                 ` Junio C Hamano
2021-02-22 19:20   ` [PATCH v2 3/4] fetch-pack: with packfile URIs, use index-pack arg Jonathan Tan
2021-02-22 19:20   ` [PATCH v2 4/4] fetch-pack: print and use dangling .gitmodules Jonathan Tan
2021-02-22 20:12   ` [PATCH v2 0/4] Check .gitmodules when using packfile URIs Junio C Hamano

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git