git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Document fsck msg ids
@ 2022-10-24 15:00 John Cai via GitGitGadget
  2022-10-24 15:00 ` [PATCH 1/2] fsck: remove the unused BAD_TAG_OBJECT John Cai via GitGitGadget
                   ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: John Cai via GitGitGadget @ 2022-10-24 15:00 UTC (permalink / raw)
  To: git; +Cc: John Cai

fsck does a number of checks, and prints warning or error messages based on
the type of check. The only place the config values that control whether to
ignore these checks are documented is through git-help(1). However, often
times peoples' first instinct to look for a list of valid config values is
in the documentation page for git-config(1). These fsck. configuration
values can be hard to find.

Document these so that both git-config and git-fsck documentation has a list
of valid s.

Patch [1/2] removes an unused msg-id BAD_TAG_OBJECT Patch [2/2] adds a
fsck-msgids.txt that lists msg-ids that fsck checks for

John Cai (2):
  fsck: remove the unused BAD_TAG_OBJECT
  fsck: document msg-id

 Documentation/config/fsck.txt |   5 ++
 Documentation/fsck-msgids.txt | 133 ++++++++++++++++++++++++++++++++++
 fsck.h                        |   1 -
 3 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/fsck-msgids.txt


base-commit: 45c9f05c44b1cb6bd2d6cb95a22cf5e3d21d5b63
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1369%2Fjohn-cai%2Fjc%2Fdocument-fsck-msg-id-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1369/john-cai/jc/document-fsck-msg-id-v1
Pull-Request: https://github.com/git/git/pull/1369
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/2] fsck: remove the unused BAD_TAG_OBJECT
  2022-10-24 15:00 [PATCH 0/2] Document fsck msg ids John Cai via GitGitGadget
@ 2022-10-24 15:00 ` John Cai via GitGitGadget
  2022-10-24 16:57   ` Junio C Hamano
  2022-10-24 15:00 ` [PATCH 2/2] fsck: document msg-id John Cai via GitGitGadget
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: John Cai via GitGitGadget @ 2022-10-24 15:00 UTC (permalink / raw)
  To: git; +Cc: John Cai, John Cai

From: John Cai <johncai86@gmail.com>

The BAD_TAG_OBJECT msg-id is not being used anymore, so we can remove
it.

Signed-off-by: John Cai <johncai86@gmail.com>
---
 fsck.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fsck.h b/fsck.h
index 6f801e53b1d..7d99f6ea337 100644
--- a/fsck.h
+++ b/fsck.h
@@ -24,7 +24,6 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
-	FUNC(BAD_TAG_OBJECT, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/2] fsck: document msg-id
  2022-10-24 15:00 [PATCH 0/2] Document fsck msg ids John Cai via GitGitGadget
  2022-10-24 15:00 ` [PATCH 1/2] fsck: remove the unused BAD_TAG_OBJECT John Cai via GitGitGadget
@ 2022-10-24 15:00 ` John Cai via GitGitGadget
  2022-10-24 17:33   ` Junio C Hamano
  2022-10-24 18:51 ` [PATCH 0/2] Document fsck msg ids Junio C Hamano
  2022-10-25  3:17 ` [PATCH v2 " John Cai via GitGitGadget
  3 siblings, 1 reply; 35+ messages in thread
From: John Cai via GitGitGadget @ 2022-10-24 15:00 UTC (permalink / raw)
  To: git; +Cc: John Cai, John Cai

From: John Cai <johncai86@gmail.com>

The git-config documentation lacks mention of specific <msg-id> that
are supported. While git-help --config will display a list of these options,
often developers' first instinct is to consult the git docs to find valid
config values.

Add a section under the docs for fsck.<msg-id> with the msg-ids that
git-fsck recognizes.

Signed-off-by: John Cai <johncai86@gmail.com>
---
 Documentation/config/fsck.txt |   5 ++
 Documentation/fsck-msgids.txt | 133 ++++++++++++++++++++++++++++++++++
 2 files changed, 138 insertions(+)
 create mode 100644 Documentation/fsck-msgids.txt

diff --git a/Documentation/config/fsck.txt b/Documentation/config/fsck.txt
index 450e8c38e34..b0632075f22 100644
--- a/Documentation/config/fsck.txt
+++ b/Documentation/config/fsck.txt
@@ -35,6 +35,11 @@ allow new instances of the same breakages go unnoticed.
 Setting an unknown `fsck.<msg-id>` value will cause fsck to die, but
 doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
 will only cause git to warn.
++
+The following `<msg-id>` are supported:
++
+
+include::../fsck-msgids.txt[]
 
 fsck.skipList::
 	The path to a list of object names (i.e. one unabbreviated SHA-1 per
diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
new file mode 100644
index 00000000000..888fa3308b7
--- /dev/null
+++ b/Documentation/fsck-msgids.txt
@@ -0,0 +1,133 @@
+--
+`badDate`;;
+	Invalid date format in an author/committer line.
+
+`badEmail`;;
+	Invalid email format in an author/committer line.
+
+`badFilemode`;;
+	A tree contains a bad filemode entry.
+
+`badName`;;
+	An author/committer name is empty.
+
+`badObjectSha1`;;
+	An object has a bad sha1.
+
+`badParentSha1`;;
+	A commit object has a bad parent sha1.
+
+`badTagName`;;
+	A tag has an invalid format.
+
+`badTimezone`;;
+	Found an invalid time zone in an author/committer line.
+
+`badTree`;;
+	A tree cannot be parsed.
+
+`badTreeSha1`;;
+	A tree has an invalid format.
+
+`badType`;;
+	Found an invalid object type.
+
+`duplicateEntries`;;
+	A tree contains duplicate file entries.
+
+`emptyName`;;
+	A path contains an empty name.
+
+`fullPathName`;;
+	A path contains the full path starting with "/".
+
+`gitAttributesSymlink`;;
+	`.gitattributes` is a symlink.
+
+`gitignoreSymlink`;;
+	`.gitignore` is a symlink.
+
+`gitmodulesBlob`;;
+	A non-blob found at `.gitmodules`.
+
+`gitmodulesMissing`;;
+	Unable to read `.gitmodules` blob.
+
+`gitmodulesName`;;
+	A submodule name is invalid.
+
+`gitmodulesParse`;;
+	Could not parse `.gitmodules` blob.
+
+`gitmodulesLarge`;
+	`.gitmodules` blob is too large to parse.
+
+`gitmodulesPath`;;
+	`.gitmodules` path is invalid.
+
+`gitmodulesSymlink`;;
+	`.gitmodules` is a symlink.
+
+`gitmodulesUpdate`;;
+	Found an invalid submodule update setting.
+
+`gitmodulesUrl`;;
+	Found an invalid submodule url.
+
+`hasDot`;;
+	A tree contains an entry named `.`.
+
+`hasDotdot`;;
+	A tree contains an entry named `..`.
+
+`hasDotgit`;;
+	A tree contains an entry named `.git`.
+
+`mailmapSymlink`;;
+	`.mailmap` is a symlink.
+
+`missingAuthor`;;
+	Author is missing.
+
+`missingCommitter`;;
+	Committer is missing.
+
+`missingEmail`;;
+	Email is missing in an author/committer line.
+
+`missingNameBeforeEmail`;;
+	Missing space before an email in an author/committer line.
+
+`missingObject`;;
+	Missing `object` line in tag object.
+
+`missingSpaceBeforeDate`;;
+	Missing space before date in an author/committer line.
+
+`missingSpaceBeforeEmail`;;
+	Missing space before the email in author/committer line.
+
+`missingTag`;;
+	Unexpected end after `type` line in a tag object.
+
+`missingTypeEntry`;;
+	Missing `type` line in a tag object.
+
+`multipleAuthors`;;`
+	Multiple author lines found in a commit.
+
+`nulInCommit`;;
+	Found a NUL byte in the commit object body.
+
+`treeNotSorted`;;
+	A tree is not properly sorted.
+
+`unknownType`;;
+	Found an unknown object type.
+
+`zeroPaddingDate`;;
+	Found a zero padded date in an author/commiter line.
+
+`zeroPaddedFilemode`;;
+	Found a zero padded filemode in a tree.
+--
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] fsck: remove the unused BAD_TAG_OBJECT
  2022-10-24 15:00 ` [PATCH 1/2] fsck: remove the unused BAD_TAG_OBJECT John Cai via GitGitGadget
@ 2022-10-24 16:57   ` Junio C Hamano
  2022-10-24 18:16     ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2022-10-24 16:57 UTC (permalink / raw)
  To: John Cai via GitGitGadget; +Cc: git, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: John Cai <johncai86@gmail.com>
>
> The BAD_TAG_OBJECT msg-id is not being used anymore, so we can remove
> it.

Do you have a ready reference to the change that made it no longer
needed (or stopped detecting the error the message was meant to be
issued against)?


> Signed-off-by: John Cai <johncai86@gmail.com>
> ---
>  fsck.h | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/fsck.h b/fsck.h
> index 6f801e53b1d..7d99f6ea337 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -24,7 +24,6 @@ enum fsck_msg_type {
>  	FUNC(BAD_NAME, ERROR) \
>  	FUNC(BAD_OBJECT_SHA1, ERROR) \
>  	FUNC(BAD_PARENT_SHA1, ERROR) \
> -	FUNC(BAD_TAG_OBJECT, ERROR) \
>  	FUNC(BAD_TIMEZONE, ERROR) \
>  	FUNC(BAD_TREE, ERROR) \
>  	FUNC(BAD_TREE_SHA1, ERROR) \

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fsck: document msg-id
  2022-10-24 15:00 ` [PATCH 2/2] fsck: document msg-id John Cai via GitGitGadget
@ 2022-10-24 17:33   ` Junio C Hamano
  2022-10-25  9:41     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2022-10-24 17:33 UTC (permalink / raw)
  To: John Cai via GitGitGadget; +Cc: git, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: John Cai <johncai86@gmail.com>
>
> The git-config documentation lacks mention of specific <msg-id> that
> are supported. While git-help --config will display a list of these options,
> often developers' first instinct is to consult the git docs to find valid
> config values.

Good observation.  "help --config" only gives names and no
description, so maintaining the description like this new file does
make sense.

Can you also add a referencing comment to fsck.h to tell that folks
who add a new error type that they need to update the -msgids.txt
file as well?

Does this list in the new file cover all existing messages, by the
way?

I also wonder if the order of the entries in this file should be
alphabetical, unlike the entries in fsck.h where more severe ones
come earlier than the less severe ones.  In a sense, matching the
order of two lists makes it easier to maintain, and grouping by
severity in the doc might or might not find it easier to scan and
find what they are, but one of the biggest reason why users will
come to this list is when they see a particular error message and
want to understand what that means, no?  At that point it would be
easier to look things up if 

 (1) the list contains EVERYTHING, even the ones that you are not
     supposed to configure away.

 (2) the list is sorted alphabetically, regardless of the severity.

The first point suggests that it may be a mistake to have the main
list appear in the "what configuration knobs are available for
squelching fsck error messages".  It is OK to refer from there to
the main list, but the main list should list everything, with
comments like "(this error cannot be configured away)", no?

In other words, I think it is better to have a master list of
everything, with their message ID and textual description, which
essentially is your fsck-msgids.txt but additionally mention which
ones are by default FATAL and cannot be configured away (even
better, we can mention what severity level each of them is by
default, by mirroring that is in fsck.h).

And that master list should NOT be made a part of fsck.<msg-type>
configuration item, like this patch did.  It should be a separate
section in "git fsck --help" output whose section title is "Fsck
errors" or something.  

Then the existing description of fsck.<msg-type> configuration can
refer to that "Fsck errors" section for what msg types exist.

Another thing that I noticed while thinking about the patch, but is
better left outside the scope of this patch, is that an attempt to
configure fatal ones away is prevented by fsck.c::fsck_set_msg_type() 
but "git help config | grep '^fsck\.'" lists them.  I think the
"help config" should be fixed.

I almost suggested to extend the FOREACH_FSCK_MSG_ID() definition in
fsck.h so that fsck-msgids.txt gets auto-generated (what is missing
in fsck.h that prevents us from doing so is the textual explanation
you added in the new file in your patch---they could come from
comments on the same line, for example, and can be extracted via a
Perl or sed script at build time).  I do not know if it is a good
idea, especially if we forsee a future in which we may be
translating the documentation, so decided against making such a
suggestion.

But at least, we could _lint_ the manually curated fsck-msgids.txt
using what is in fsck.h to see if a new MSG_ID added to fsck.h is
missing from the doc, or a MSG_ID whose default severity is updated
in fsck.h is stale in the doc, etc.  That can be left for the future
updates, but we should at least instruct developers to keep them in
sync in fsck.h by adding a comment.

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] fsck: remove the unused BAD_TAG_OBJECT
  2022-10-24 16:57   ` Junio C Hamano
@ 2022-10-24 18:16     ` Junio C Hamano
  2022-10-24 18:33       ` John Cai
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2022-10-24 18:16 UTC (permalink / raw)
  To: John Cai via GitGitGadget; +Cc: git, John Cai

Junio C Hamano <gitster@pobox.com> writes:

> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: John Cai <johncai86@gmail.com>
>>
>> The BAD_TAG_OBJECT msg-id is not being used anymore, so we can remove
>> it.
>
> Do you have a ready reference to the change that made it no longer
> needed (or stopped detecting the error the message was meant to be
> issued against)?

2175a0c6 (fsck: stop checking tag->tagged, 2019-10-18) stopped
checking the tagged object referred to by a tag object, which the
error message BAD_TAG_OBJECT was about.  Since then the
BAD_TAG_OBJECT message is no longer used.




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] fsck: remove the unused BAD_TAG_OBJECT
  2022-10-24 18:16     ` Junio C Hamano
@ 2022-10-24 18:33       ` John Cai
  2022-10-24 23:39         ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: John Cai @ 2022-10-24 18:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: John Cai via GitGitGadget, git

Hi Junio,

On 24 Oct 2022, at 14:16, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: John Cai <johncai86@gmail.com>
>>>
>>> The BAD_TAG_OBJECT msg-id is not being used anymore, so we can remove
>>> it.
>>
>> Do you have a ready reference to the change that made it no longer
>> needed (or stopped detecting the error the message was meant to be
>> issued against)?
>
> 2175a0c6 (fsck: stop checking tag->tagged, 2019-10-18) stopped
> checking the tagged object referred to by a tag object, which the
> error message BAD_TAG_OBJECT was about.  Since then the
> BAD_TAG_OBJECT message is no longer used.

Thanks for doing my work for me :)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] Document fsck msg ids
  2022-10-24 15:00 [PATCH 0/2] Document fsck msg ids John Cai via GitGitGadget
  2022-10-24 15:00 ` [PATCH 1/2] fsck: remove the unused BAD_TAG_OBJECT John Cai via GitGitGadget
  2022-10-24 15:00 ` [PATCH 2/2] fsck: document msg-id John Cai via GitGitGadget
@ 2022-10-24 18:51 ` Junio C Hamano
  2022-10-25  3:17 ` [PATCH v2 " John Cai via GitGitGadget
  3 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2022-10-24 18:51 UTC (permalink / raw)
  To: John Cai via GitGitGadget; +Cc: git, John Cai

Here is to illustrate what I meant in my review comments on [2/2],
on top of these two patches.

Two structural changes worth noting are:

- I removed the inclusin from config/fsck.txt because otherwise the
  entry for fsck.<msg-id> becomes way too long and the next entry,
  fsck.skipList, may become harder to find.  But I am OK if you
  really want to make the list of msgids a sublist here, as long as
  it makes it clear that those with default severity FATAL cannot be
  configured away.  That has to be done outside of fsck-msgids.txt
  file (in other words, in config/fsck.txt where it includes the
  file) as it would be a good idea to keep the file a "pure" list,
  to keep it easy to auto-generate in the future.

- The description of each item begins with (SEVERITY) marking, as
  the ones that are (FATAL) cannot be configured.

I think I added all error types that your patch missed.  Also I
added (FATAL)/(ERROR)/(WARN)/(INFO) markings to show what their
default severity levels are, but I may have made mistakes.

If you want to take any part of these changes, feel free to do so,
but please do so after double checking.  This was primarily done to
illustrate how the end result should look like if I were doing this
topic, and not to show how each item should be described exactly.

Thanks.

 Documentation/config/fsck.txt |   5 +-
 Documentation/fsck-msgids.txt | 188 ++++++++++++++++++++++--------------------
 Documentation/git-fsck.txt    |  12 +++
 fsck.h                        |   6 ++
 4 files changed, 119 insertions(+), 92 deletions(-)

diff --git c/Documentation/config/fsck.txt w/Documentation/config/fsck.txt
index b0632075f2..a3c865df56 100644
--- c/Documentation/config/fsck.txt
+++ w/Documentation/config/fsck.txt
@@ -36,10 +36,9 @@ Setting an unknown `fsck.<msg-id>` value will cause fsck to die, but
 doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
 will only cause git to warn.
 +
-The following `<msg-id>` are supported:
-+
+See `Fsck Messages` section of linkgit:git-fsck[1] for supported
+values of `<msg-id>`.
 
-include::../fsck-msgids.txt[]
 
 fsck.skipList::
 	The path to a list of object names (i.e. one unabbreviated SHA-1 per
diff --git c/Documentation/fsck-msgids.txt w/Documentation/fsck-msgids.txt
index 888fa3308b..6831982bdb 100644
--- c/Documentation/fsck-msgids.txt
+++ w/Documentation/fsck-msgids.txt
@@ -1,133 +1,143 @@
---
-`badDate`;;
-	Invalid date format in an author/committer line.
+`badDate`::
+	(ERROR) Invalid date format in an author/committer line.
 
-`badEmail`;;
-	Invalid email format in an author/committer line.
+`badDateOverflow`::
+	(ERROR) Invalid date value in an author/committer line.
 
-`badFilemode`;;
-	A tree contains a bad filemode entry.
+`badEmail`::
+	(ERROR) Invalid email format in an author/committer line.
 
-`badName`;;
-	An author/committer name is empty.
+`badFilemode`::
+	(INFO) A tree contains a bad filemode entry.
 
-`badObjectSha1`;;
-	An object has a bad sha1.
+`badName`::
+	(ERROR) An author/committer name is empty.
 
-`badParentSha1`;;
-	A commit object has a bad parent sha1.
+`badObjectSha1`::
+	(ERROR) An object has a bad sha1.
 
-`badTagName`;;
-	A tag has an invalid format.
+`badParentSha1`::
+	(ERROR) A commit object has a bad parent sha1.
 
-`badTimezone`;;
-	Found an invalid time zone in an author/committer line.
+`badTagName`::
+	(INFO) A tag has an invalid format.
 
-`badTree`;;
-	A tree cannot be parsed.
+`badTimezone`::
+	(ERROR) Found an invalid time zone in an author/committer line.
 
-`badTreeSha1`;;
-	A tree has an invalid format.
+`badTree`::
+	(ERROR) A tree cannot be parsed.
 
-`badType`;;
-	Found an invalid object type.
+`badTreeSha1`::
+	(ERROR) A tree has an invalid format.
 
-`duplicateEntries`;;
-	A tree contains duplicate file entries.
+`badType`::
+	(ERROR) Found an invalid object type.
 
-`emptyName`;;
-	A path contains an empty name.
+`duplicateEntries`::
+	(ERROR) A tree contains duplicate file entries.
 
-`fullPathName`;;
-	A path contains the full path starting with "/".
+`emptyName`::
+	(WARN) A path contains an empty name.
 
-`gitAttributesSymlink`;;
-	`.gitattributes` is a symlink.
+`fullPathName`::
+	(WARN) A path contains the full path starting with "/".
 
-`gitignoreSymlink`;;
-	`.gitignore` is a symlink.
+`gitAttributesSymlink`::
+	(INFO) `.gitattributes` is a symlink.
 
-`gitmodulesBlob`;;
-	A non-blob found at `.gitmodules`.
+`gitignoreSymlink`::
+	(INFO) `.gitignore` is a symlink.
 
-`gitmodulesMissing`;;
-	Unable to read `.gitmodules` blob.
+`gitmodulesBlob`::
+	(ERROR) A non-blob found at `.gitmodules`.
 
-`gitmodulesName`;;
-	A submodule name is invalid.
+`gitmodulesMissing`::
+	(ERROR) Unable to read `.gitmodules` blob.
 
-`gitmodulesParse`;;
-	Could not parse `.gitmodules` blob.
+`gitmodulesName`::
+	(ERROR) A submodule name is invalid.
+
+`gitmodulesParse`::
+	(INFO) Could not parse `.gitmodules` blob.
 
 `gitmodulesLarge`;
-	`.gitmodules` blob is too large to parse.
+	(ERROR) `.gitmodules` blob is too large to parse.
+
+`gitmodulesPath`::
+	(ERROR) `.gitmodules` path is invalid.
+
+`gitmodulesSymlink`::
+	(ERROR) `.gitmodules` is a symlink.
+
+`gitmodulesUpdate`::
+	(ERROR) Found an invalid submodule update setting.
 
-`gitmodulesPath`;;
-	`.gitmodules` path is invalid.
+`gitmodulesUrl`::
+	(ERROR) Found an invalid submodule url.
 
-`gitmodulesSymlink`;;
-	`.gitmodules` is a symlink.
+`hasDot`::
+	(WARN) A tree contains an entry named `.`.
 
-`gitmodulesUpdate`;;
-	Found an invalid submodule update setting.
+`hasDotdot`::
+	(WARN) A tree contains an entry named `..`.
 
-`gitmodulesUrl`;;
-	Found an invalid submodule url.
+`hasDotgit`::
+	(WARN) A tree contains an entry named `.git`.
 
-`hasDot`;;
-	A tree contains an entry named `.`.
+`mailmapSymlink`::
+	(WARN) `.mailmap` is a symlink.
 
-`hasDotdot`;;
-	A tree contains an entry named `..`.
+`missingAuthor`::
+	(ERROR) Author is missing.
 
-`hasDotgit`;;
-	A tree contains an entry named `.git`.
+`missingCommitter`::
+	(ERROR) Committer is missing.
 
-`mailmapSymlink`;;
-	`.mailmap` is a symlink.
+`missingEmail`::
+	(ERROR) Email is missing in an author/committer line.
 
-`missingAuthor`;;
-	Author is missing.
+`missingNameBeforeEmail`::
+	(ERROR) Missing name before an email in an author/committer line.
 
-`missingCommitter`;;
-	Committer is missing.
+`missingObject`::
+	(ERROR) Missing `object` line in tag object.
 
-`missingEmail`;;
-	Email is missing in an author/committer line.
+`missingSpaceBeforeDate`::
+	(ERROR) Missing space before date in an author/committer line.
 
-`missingNameBeforeEmail`;;
-	Missing space before an email in an author/committer line.
+`missingSpaceBeforeEmail`::
+	(ERROR) Missing space before the email in author/committer line.
 
-`missingObject`;;
-	Missing `object` line in tag object.
+`missingTag`::
+	(ERROR) Unexpected end after `type` line in a tag object.
 
-`missingSpaceBeforeDate`;;
-	Missing space before date in an author/committer line.
+`missingTagEntry`::
+	(ERROR) Missing `tag` line in a tag object.
 
-`missingSpaceBeforeEmail`;;
-	Missing space before the email in author/committer line.
+`missingTypeEntry`::
+	(ERROR) Missing `type` line in a tag object.
 
-`missingTag`;;
-	Unexpected end after `type` line in a tag object.
+`multipleAuthors`::`
+	(ERROR) Multiple author lines found in a commit.
 
-`missingTypeEntry`;;
-	Missing `type` line in a tag object.
+`nulInCommit`::
+	(WARN) Found a NUL byte in the commit object body.
 
-`multipleAuthors`;;`
-	Multiple author lines found in a commit.
+`nulInHeader`::
+	(FATAL) NUL byte exists in the object header.
 
-`nulInCommit`;;
-	Found a NUL byte in the commit object body.
+`treeNotSorted`::
+	(ERROR) A tree is not properly sorted.
 
-`treeNotSorted`;;
-	A tree is not properly sorted.
+`unknownType`::
+	(ERROR) Found an unknown object type.
 
-`unknownType`;;
-	Found an unknown object type.
+`unterminatedHeader`::
+	(FATAL) Missing end-of-line in the object header.
 
-`zeroPaddingDate`;;
-	Found a zero padded date in an author/commiter line.
+`zeroPaddingDate`::
+	(ERROR) Found a zero padded date in an author/commiter line.
 
-`zeroPaddedFilemode`;;
-	Found a zero padded filemode in a tree.
---
+`zeroPaddedFilemode`::
+	(WARN) Found a zero padded filemode in a tree.
diff --git c/Documentation/git-fsck.txt w/Documentation/git-fsck.txt
index 29318ea957..8124ae2e67 100644
--- c/Documentation/git-fsck.txt
+++ w/Documentation/git-fsck.txt
@@ -152,6 +152,18 @@ hash mismatch <object>::
 	object database value.
 	This indicates a serious data integrity problem.
 
+
+FSCK MESSAGES
+-------------
+
+The following lists the types of errors `git fsck` detects and what
+each error means, with their default severity.  The severity of the
+errors, other than those that are marked as "(FATAL)", can be tweaked
+by setting corresponding `fsck.<msg-id>` configuration variable.
+
+include::fsck-msgids.txt[]
+
+
 Environment Variables
 ---------------------
 
diff --git c/fsck.h w/fsck.h
index 7d99f6ea33..1fb26c4868 100644
--- c/fsck.h
+++ w/fsck.h
@@ -13,6 +13,12 @@ enum fsck_msg_type {
 	FSCK_WARN,
 };
 
+/*
+ * Documentation/fsck-msgids.txt is meant to document these; when
+ * modifying this list in any way, make sure to make corresponding
+ * changes there, too, to keep them in sync.
+ */
+
 #define FOREACH_FSCK_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] fsck: remove the unused BAD_TAG_OBJECT
  2022-10-24 18:33       ` John Cai
@ 2022-10-24 23:39         ` Jeff King
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2022-10-24 23:39 UTC (permalink / raw)
  To: John Cai; +Cc: Junio C Hamano, John Cai via GitGitGadget, git

On Mon, Oct 24, 2022 at 02:33:39PM -0400, John Cai wrote:

> >>> From: John Cai <johncai86@gmail.com>
> >>>
> >>> The BAD_TAG_OBJECT msg-id is not being used anymore, so we can remove
> >>> it.
> >>
> >> Do you have a ready reference to the change that made it no longer
> >> needed (or stopped detecting the error the message was meant to be
> >> issued against)?
> >
> > 2175a0c6 (fsck: stop checking tag->tagged, 2019-10-18) stopped
> > checking the tagged object referred to by a tag object, which the
> > error message BAD_TAG_OBJECT was about.  Since then the
> > BAD_TAG_OBJECT message is no longer used.
> 
> Thanks for doing my work for me :)

Thanks for finding this. That commit is mine, and I should have removed
it then. :)

For future reference, I did the same search via:

  git log -SBAD_TAG_OBJECT fsck.c

I find "-S" is great for these kind of "when did this stop getting used"
queries.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 0/2] Document fsck msg ids
  2022-10-24 15:00 [PATCH 0/2] Document fsck msg ids John Cai via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-10-24 18:51 ` [PATCH 0/2] Document fsck msg ids Junio C Hamano
@ 2022-10-25  3:17 ` John Cai via GitGitGadget
  2022-10-25  3:17   ` [PATCH v2 1/2] fsck: remove the unused BAD_TAG_OBJECT John Cai via GitGitGadget
                     ` (3 more replies)
  3 siblings, 4 replies; 35+ messages in thread
From: John Cai via GitGitGadget @ 2022-10-25  3:17 UTC (permalink / raw)
  To: git; +Cc: John Cai

fsck does a number of checks, and prints warning or error messages based on
the type of check. The only place the config values that control whether to
ignore these checks are documented is through git-help(1). However, often
times peoples' first instinct to look for a list of valid config values is
in the documentation page for git-config(1). These fsck. configuration
values can be hard to find.

Document these so that both git-config and git-fsck documentation has a list
of valid s.

Patch [1/2] removes an unused msg-id BAD_TAG_OBJECT Patch [2/2] adds a
fsck-msgids.txt that lists msg-ids that fsck checks for

Since v1:

 * provided a full list of error messages along with corresponding default
   severity
 * no longer including the whole list in git-config
 * add a comment in fsck.h to remind developers to keep the documentation
   accurate when making changes to the list of error messages

John Cai (2):
  fsck: remove the unused BAD_TAG_OBJECT
  fsck: document msg-id

 Documentation/config/fsck.txt |   4 +
 Documentation/fsck-msgids.txt | 149 ++++++++++++++++++++++++++++++++++
 Documentation/git-fsck.txt    |  12 +++
 fsck.h                        |   7 +-
 4 files changed, 171 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/fsck-msgids.txt


base-commit: 45c9f05c44b1cb6bd2d6cb95a22cf5e3d21d5b63
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1369%2Fjohn-cai%2Fjc%2Fdocument-fsck-msg-id-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1369/john-cai/jc/document-fsck-msg-id-v2
Pull-Request: https://github.com/git/git/pull/1369

Range-diff vs v1:

 1:  f32ff5dc4ee ! 1:  797c6251671 fsck: remove the unused BAD_TAG_OBJECT
     @@ Metadata
       ## Commit message ##
          fsck: remove the unused BAD_TAG_OBJECT
      
     -    The BAD_TAG_OBJECT msg-id is not being used anymore, so we can remove
     -    it.
     +    2175a0c6 (fsck: stop checking tag->tagged, 2019-10-18) stopped
     +    checking the tagged object referred to by a tag object, which is what the
     +    error message BAD_TAG_OBJECT was for. Since then the BAD_TAG_OBJECT
     +    message is no longer used anywhere.
     +
     +    Remove the BAD_TAG_OBJECT msg-id.
      
          Signed-off-by: John Cai <johncai86@gmail.com>
     +    Reviewed-by: Junio C Hamano <gitster@pobox.com>
      
       ## fsck.h ##
      @@ fsck.h: enum fsck_msg_type {
 2:  3aec3d2c9ca ! 2:  ad43adab435 fsck: document msg-id
     @@ Metadata
       ## Commit message ##
          fsck: document msg-id
      
     -    The git-config documentation lacks mention of specific <msg-id> that
     -    are supported. While git-help --config will display a list of these options,
     -    often developers' first instinct is to consult the git docs to find valid
     +    The documentation lacks mention of specific <msg-id> that are supported.
     +    While git-help --config will display a list of these options, often
     +    developers' first instinct is to consult the git docs to find valid
          config values.
      
     -    Add a section under the docs for fsck.<msg-id> with the msg-ids that
     -    git-fsck recognizes.
     +    Add a list of fsck error messages, and link to it from the git-fsck
     +    documentation.
      
          Signed-off-by: John Cai <johncai86@gmail.com>
     +    Helped-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/config/fsck.txt ##
      @@ Documentation/config/fsck.txt: allow new instances of the same breakages go unnoticed.
     @@ Documentation/config/fsck.txt: allow new instances of the same breakages go unno
       doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
       will only cause git to warn.
      ++
     -+The following `<msg-id>` are supported:
     -++
     ++See `Fsck Messages` section of linkgit:git-fsck[1] for supported
     ++values of `<msg-id>`.
      +
     -+include::../fsck-msgids.txt[]
       
       fsck.skipList::
       	The path to a list of object names (i.e. one unabbreviated SHA-1 per
      
       ## Documentation/fsck-msgids.txt (new) ##
      @@
     -+--
     -+`badDate`;;
     -+	Invalid date format in an author/committer line.
     ++`badDate`::
     ++	(ERROR) Invalid date format in an author/committer line.
     ++
     ++`badDateOverflow`::
     ++	(ERROR) Invalid date value in an author/committer line.
     ++
     ++`badEmail`::
     ++	(ERROR) Invalid email format in an author/committer line.
      +
     -+`badEmail`;;
     -+	Invalid email format in an author/committer line.
     ++`badFilemode`::
     ++	(INFO) A tree contains a bad filemode entry.
      +
     -+`badFilemode`;;
     -+	A tree contains a bad filemode entry.
     ++`badName`::
     ++	(ERROR) An author/committer name is empty.
      +
     -+`badName`;;
     -+	An author/committer name is empty.
     ++`badObjectSha1`::
     ++	(ERROR) An object has a bad sha1.
      +
     -+`badObjectSha1`;;
     -+	An object has a bad sha1.
     ++`badParentSha1`::
     ++	(ERROR) A commit object has a bad parent sha1.
      +
     -+`badParentSha1`;;
     -+	A commit object has a bad parent sha1.
     ++`badTagName`::
     ++	(INFO) A tag has an invalid format.
      +
     -+`badTagName`;;
     -+	A tag has an invalid format.
     ++`badTimezone`::
     ++	(ERROR) Found an invalid time zone in an author/committer line.
      +
     -+`badTimezone`;;
     -+	Found an invalid time zone in an author/committer line.
     ++`badTree`::
     ++	(ERROR) A tree cannot be parsed.
      +
     -+`badTree`;;
     -+	A tree cannot be parsed.
     ++`badTreeSha1`::
     ++	(ERROR) A tree has an invalid format.
      +
     -+`badTreeSha1`;;
     -+	A tree has an invalid format.
     ++`badType`::
     ++	(ERROR) Found an invalid object type.
      +
     -+`badType`;;
     -+	Found an invalid object type.
     ++`duplicateEntries`::
     ++	(ERROR) A tree contains duplicate file entries.
      +
     -+`duplicateEntries`;;
     -+	A tree contains duplicate file entries.
     ++`emptyName`::
     ++	(WARN) A path contains an empty name.
      +
     -+`emptyName`;;
     -+	A path contains an empty name.
     ++`extraHeaderEntry`::
     ++	(IGNORE) Extra headers found after `tagger`.
      +
     -+`fullPathName`;;
     -+	A path contains the full path starting with "/".
     ++`fullPathName`::
     ++	(WARN) A path contains the full path starting with "/".
      +
     -+`gitAttributesSymlink`;;
     -+	`.gitattributes` is a symlink.
     ++`gitAttributesSymlink`::
     ++	(INFO) `.gitattributes` is a symlink.
      +
     -+`gitignoreSymlink`;;
     -+	`.gitignore` is a symlink.
     ++`gitignoreSymlink`::
     ++	(INFO) `.gitignore` is a symlink.
      +
     -+`gitmodulesBlob`;;
     -+	A non-blob found at `.gitmodules`.
     ++`gitmodulesBlob`::
     ++	(ERROR) A non-blob found at `.gitmodules`.
      +
     -+`gitmodulesMissing`;;
     -+	Unable to read `.gitmodules` blob.
     ++`gitmodulesMissing`::
     ++	(ERROR) Unable to read `.gitmodules` blob.
      +
     -+`gitmodulesName`;;
     -+	A submodule name is invalid.
     ++`gitmodulesName`::
     ++	(ERROR) A submodule name is invalid.
      +
     -+`gitmodulesParse`;;
     -+	Could not parse `.gitmodules` blob.
     ++`gitmodulesParse`::
     ++	(INFO) Could not parse `.gitmodules` blob.
      +
      +`gitmodulesLarge`;
     -+	`.gitmodules` blob is too large to parse.
     ++	(ERROR) `.gitmodules` blob is too large to parse.
     ++
     ++`gitmodulesPath`::
     ++	(ERROR) `.gitmodules` path is invalid.
     ++
     ++`gitmodulesSymlink`::
     ++	(ERROR) `.gitmodules` is a symlink.
      +
     -+`gitmodulesPath`;;
     -+	`.gitmodules` path is invalid.
     ++`gitmodulesUpdate`::
     ++	(ERROR) Found an invalid submodule update setting.
      +
     -+`gitmodulesSymlink`;;
     -+	`.gitmodules` is a symlink.
     ++`gitmodulesUrl`::
     ++	(ERROR) Found an invalid submodule url.
      +
     -+`gitmodulesUpdate`;;
     -+	Found an invalid submodule update setting.
     ++`hasDot`::
     ++	(WARN) A tree contains an entry named `.`.
      +
     -+`gitmodulesUrl`;;
     -+	Found an invalid submodule url.
     ++`hasDotdot`::
     ++	(WARN) A tree contains an entry named `..`.
      +
     -+`hasDot`;;
     -+	A tree contains an entry named `.`.
     ++`hasDotgit`::
     ++	(WARN) A tree contains an entry named `.git`.
      +
     -+`hasDotdot`;;
     -+	A tree contains an entry named `..`.
     ++`mailmapSymlink`::
     ++	(INFO) `.mailmap` is a symlink.
      +
     -+`hasDotgit`;;
     -+	A tree contains an entry named `.git`.
     ++`missingAuthor`::
     ++	(ERROR) Author is missing.
      +
     -+`mailmapSymlink`;;
     -+	`.mailmap` is a symlink.
     ++`missingCommitter`::
     ++	(ERROR) Committer is missing.
      +
     -+`missingAuthor`;;
     -+	Author is missing.
     ++`missingEmail`::
     ++	(ERROR) Email is missing in an author/committer line.
      +
     -+`missingCommitter`;;
     -+	Committer is missing.
     ++`missingNameBeforeEmail`::
     ++	(ERROR) Missing name before an email in an author/committer line.
      +
     -+`missingEmail`;;
     -+	Email is missing in an author/committer line.
     ++`missingObject`::
     ++	(ERROR) Missing `object` line in tag object.
      +
     -+`missingNameBeforeEmail`;;
     -+	Missing space before an email in an author/committer line.
     ++`missingSpaceBeforeDate`::
     ++	(ERROR) Missing space before date in an author/committer line.
      +
     -+`missingObject`;;
     -+	Missing `object` line in tag object.
     ++`missingSpaceBeforeEmail`::
     ++	(ERROR) Missing space before the email in author/committer line.
      +
     -+`missingSpaceBeforeDate`;;
     -+	Missing space before date in an author/committer line.
     ++`missingTag`::
     ++	(ERROR) Unexpected end after `type` line in a tag object.
      +
     -+`missingSpaceBeforeEmail`;;
     -+	Missing space before the email in author/committer line.
     ++`missingTagEntry`::
     ++	(ERROR) Missing `tag` line in a tag object.
      +
     -+`missingTag`;;
     -+	Unexpected end after `type` line in a tag object.
     ++`missingTypeEntry`::
     ++	(ERROR) Missing `type` line in a tag object.
      +
     -+`missingTypeEntry`;;
     -+	Missing `type` line in a tag object.
     ++`multipleAuthors`::
     ++	(ERROR) Multiple author lines found in a commit.
      +
     -+`multipleAuthors`;;`
     -+	Multiple author lines found in a commit.
     ++`nulInCommit`::
     ++	(WARN) Found a NUL byte in the commit object body.
      +
     -+`nulInCommit`;;
     -+	Found a NUL byte in the commit object body.
     ++`nulInHeader`::
     ++	(FATAL) NUL byte exists in the object header.
      +
     -+`treeNotSorted`;;
     -+	A tree is not properly sorted.
     ++`nulInSha1`::
     ++	(WARN) Tree contains entries pointing to a null sha1.
      +
     -+`unknownType`;;
     -+	Found an unknown object type.
     ++`treeNotSorted`::
     ++	(ERROR) A tree is not properly sorted.
      +
     -+`zeroPaddingDate`;;
     -+	Found a zero padded date in an author/commiter line.
     ++`unknownType`::
     ++	(ERROR) Found an unknown object type.
      +
     -+`zeroPaddedFilemode`;;
     -+	Found a zero padded filemode in a tree.
     -+--
     ++`unterminatedHeader`::
     ++	(FATAL) Missing end-of-line in the object header.
     ++
     ++`zeroPaddingDate`::
     ++	(ERROR) Found a zero padded date in an author/commiter line.
     ++
     ++`zeroPaddedFilemode`::
     ++	(WARN) Found a zero padded filemode in a tree.
     +
     + ## Documentation/git-fsck.txt ##
     +@@ Documentation/git-fsck.txt: hash mismatch <object>::
     + 	object database value.
     + 	This indicates a serious data integrity problem.
     + 
     ++
     ++FSCK MESSAGES
     ++-------------
     ++
     ++The following lists the types of errors `git fsck` detects and what
     ++each error means, with their default severity.  The severity of the
     ++error, other than those that are marked as "(FATAL)", can be tweaked
     ++by setting the corresponding `fsck.<msg-id>` configuration variable.
     ++
     ++include::fsck-msgids.txt[]
     ++
     ++
     + Environment Variables
     + ---------------------
     + 
     +
     + ## fsck.h ##
     +@@ fsck.h: enum fsck_msg_type {
     + 	FSCK_WARN,
     + };
     + 
     ++/*
     ++ * Documentation/fsck-msgids.txt documents these; when
     ++ * modifying this list in any way, make sure to keep the
     ++ * two in sync.
     ++ */
     ++
     + #define FOREACH_FSCK_MSG_ID(FUNC) \
     + 	/* fatal errors */ \
     + 	FUNC(NUL_IN_HEADER, FATAL) \

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v2 1/2] fsck: remove the unused BAD_TAG_OBJECT
  2022-10-25  3:17 ` [PATCH v2 " John Cai via GitGitGadget
@ 2022-10-25  3:17   ` John Cai via GitGitGadget
  2022-10-25  3:17   ` [PATCH v2 2/2] fsck: document msg-id John Cai via GitGitGadget
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: John Cai via GitGitGadget @ 2022-10-25  3:17 UTC (permalink / raw)
  To: git; +Cc: John Cai, John Cai

From: John Cai <johncai86@gmail.com>

2175a0c6 (fsck: stop checking tag->tagged, 2019-10-18) stopped
checking the tagged object referred to by a tag object, which is what the
error message BAD_TAG_OBJECT was for. Since then the BAD_TAG_OBJECT
message is no longer used anywhere.

Remove the BAD_TAG_OBJECT msg-id.

Signed-off-by: John Cai <johncai86@gmail.com>
Reviewed-by: Junio C Hamano <gitster@pobox.com>
---
 fsck.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fsck.h b/fsck.h
index 6f801e53b1d..7d99f6ea337 100644
--- a/fsck.h
+++ b/fsck.h
@@ -24,7 +24,6 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
-	FUNC(BAD_TAG_OBJECT, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v2 2/2] fsck: document msg-id
  2022-10-25  3:17 ` [PATCH v2 " John Cai via GitGitGadget
  2022-10-25  3:17   ` [PATCH v2 1/2] fsck: remove the unused BAD_TAG_OBJECT John Cai via GitGitGadget
@ 2022-10-25  3:17   ` John Cai via GitGitGadget
  2022-10-25  4:30   ` [PATCH v2 0/2] Document fsck msg ids Junio C Hamano
  2022-10-25 22:42   ` [PATCH v3 0/4] document fsck error message ids Junio C Hamano
  3 siblings, 0 replies; 35+ messages in thread
From: John Cai via GitGitGadget @ 2022-10-25  3:17 UTC (permalink / raw)
  To: git; +Cc: John Cai, John Cai

From: John Cai <johncai86@gmail.com>

The documentation lacks mention of specific <msg-id> that are supported.
While git-help --config will display a list of these options, often
developers' first instinct is to consult the git docs to find valid
config values.

Add a list of fsck error messages, and link to it from the git-fsck
documentation.

Signed-off-by: John Cai <johncai86@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config/fsck.txt |   4 +
 Documentation/fsck-msgids.txt | 149 ++++++++++++++++++++++++++++++++++
 Documentation/git-fsck.txt    |  12 +++
 fsck.h                        |   6 ++
 4 files changed, 171 insertions(+)
 create mode 100644 Documentation/fsck-msgids.txt

diff --git a/Documentation/config/fsck.txt b/Documentation/config/fsck.txt
index 450e8c38e34..a3c865df567 100644
--- a/Documentation/config/fsck.txt
+++ b/Documentation/config/fsck.txt
@@ -35,6 +35,10 @@ allow new instances of the same breakages go unnoticed.
 Setting an unknown `fsck.<msg-id>` value will cause fsck to die, but
 doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
 will only cause git to warn.
++
+See `Fsck Messages` section of linkgit:git-fsck[1] for supported
+values of `<msg-id>`.
+
 
 fsck.skipList::
 	The path to a list of object names (i.e. one unabbreviated SHA-1 per
diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
new file mode 100644
index 00000000000..08e19bac8ad
--- /dev/null
+++ b/Documentation/fsck-msgids.txt
@@ -0,0 +1,149 @@
+`badDate`::
+	(ERROR) Invalid date format in an author/committer line.
+
+`badDateOverflow`::
+	(ERROR) Invalid date value in an author/committer line.
+
+`badEmail`::
+	(ERROR) Invalid email format in an author/committer line.
+
+`badFilemode`::
+	(INFO) A tree contains a bad filemode entry.
+
+`badName`::
+	(ERROR) An author/committer name is empty.
+
+`badObjectSha1`::
+	(ERROR) An object has a bad sha1.
+
+`badParentSha1`::
+	(ERROR) A commit object has a bad parent sha1.
+
+`badTagName`::
+	(INFO) A tag has an invalid format.
+
+`badTimezone`::
+	(ERROR) Found an invalid time zone in an author/committer line.
+
+`badTree`::
+	(ERROR) A tree cannot be parsed.
+
+`badTreeSha1`::
+	(ERROR) A tree has an invalid format.
+
+`badType`::
+	(ERROR) Found an invalid object type.
+
+`duplicateEntries`::
+	(ERROR) A tree contains duplicate file entries.
+
+`emptyName`::
+	(WARN) A path contains an empty name.
+
+`extraHeaderEntry`::
+	(IGNORE) Extra headers found after `tagger`.
+
+`fullPathName`::
+	(WARN) A path contains the full path starting with "/".
+
+`gitAttributesSymlink`::
+	(INFO) `.gitattributes` is a symlink.
+
+`gitignoreSymlink`::
+	(INFO) `.gitignore` is a symlink.
+
+`gitmodulesBlob`::
+	(ERROR) A non-blob found at `.gitmodules`.
+
+`gitmodulesMissing`::
+	(ERROR) Unable to read `.gitmodules` blob.
+
+`gitmodulesName`::
+	(ERROR) A submodule name is invalid.
+
+`gitmodulesParse`::
+	(INFO) Could not parse `.gitmodules` blob.
+
+`gitmodulesLarge`;
+	(ERROR) `.gitmodules` blob is too large to parse.
+
+`gitmodulesPath`::
+	(ERROR) `.gitmodules` path is invalid.
+
+`gitmodulesSymlink`::
+	(ERROR) `.gitmodules` is a symlink.
+
+`gitmodulesUpdate`::
+	(ERROR) Found an invalid submodule update setting.
+
+`gitmodulesUrl`::
+	(ERROR) Found an invalid submodule url.
+
+`hasDot`::
+	(WARN) A tree contains an entry named `.`.
+
+`hasDotdot`::
+	(WARN) A tree contains an entry named `..`.
+
+`hasDotgit`::
+	(WARN) A tree contains an entry named `.git`.
+
+`mailmapSymlink`::
+	(INFO) `.mailmap` is a symlink.
+
+`missingAuthor`::
+	(ERROR) Author is missing.
+
+`missingCommitter`::
+	(ERROR) Committer is missing.
+
+`missingEmail`::
+	(ERROR) Email is missing in an author/committer line.
+
+`missingNameBeforeEmail`::
+	(ERROR) Missing name before an email in an author/committer line.
+
+`missingObject`::
+	(ERROR) Missing `object` line in tag object.
+
+`missingSpaceBeforeDate`::
+	(ERROR) Missing space before date in an author/committer line.
+
+`missingSpaceBeforeEmail`::
+	(ERROR) Missing space before the email in author/committer line.
+
+`missingTag`::
+	(ERROR) Unexpected end after `type` line in a tag object.
+
+`missingTagEntry`::
+	(ERROR) Missing `tag` line in a tag object.
+
+`missingTypeEntry`::
+	(ERROR) Missing `type` line in a tag object.
+
+`multipleAuthors`::
+	(ERROR) Multiple author lines found in a commit.
+
+`nulInCommit`::
+	(WARN) Found a NUL byte in the commit object body.
+
+`nulInHeader`::
+	(FATAL) NUL byte exists in the object header.
+
+`nulInSha1`::
+	(WARN) Tree contains entries pointing to a null sha1.
+
+`treeNotSorted`::
+	(ERROR) A tree is not properly sorted.
+
+`unknownType`::
+	(ERROR) Found an unknown object type.
+
+`unterminatedHeader`::
+	(FATAL) Missing end-of-line in the object header.
+
+`zeroPaddingDate`::
+	(ERROR) Found a zero padded date in an author/commiter line.
+
+`zeroPaddedFilemode`::
+	(WARN) Found a zero padded filemode in a tree.
diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index 29318ea957e..b6a0f8a085c 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -152,6 +152,18 @@ hash mismatch <object>::
 	object database value.
 	This indicates a serious data integrity problem.
 
+
+FSCK MESSAGES
+-------------
+
+The following lists the types of errors `git fsck` detects and what
+each error means, with their default severity.  The severity of the
+error, other than those that are marked as "(FATAL)", can be tweaked
+by setting the corresponding `fsck.<msg-id>` configuration variable.
+
+include::fsck-msgids.txt[]
+
+
 Environment Variables
 ---------------------
 
diff --git a/fsck.h b/fsck.h
index 7d99f6ea337..8355fd4c8f5 100644
--- a/fsck.h
+++ b/fsck.h
@@ -13,6 +13,12 @@ enum fsck_msg_type {
 	FSCK_WARN,
 };
 
+/*
+ * Documentation/fsck-msgids.txt documents these; when
+ * modifying this list in any way, make sure to keep the
+ * two in sync.
+ */
+
 #define FOREACH_FSCK_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/2] Document fsck msg ids
  2022-10-25  3:17 ` [PATCH v2 " John Cai via GitGitGadget
  2022-10-25  3:17   ` [PATCH v2 1/2] fsck: remove the unused BAD_TAG_OBJECT John Cai via GitGitGadget
  2022-10-25  3:17   ` [PATCH v2 2/2] fsck: document msg-id John Cai via GitGitGadget
@ 2022-10-25  4:30   ` Junio C Hamano
  2022-10-25  4:40     ` Junio C Hamano
  2022-10-25  5:12     ` [PATCH] Documentation: add lint-fsck-msgids Junio C Hamano
  2022-10-25 22:42   ` [PATCH v3 0/4] document fsck error message ids Junio C Hamano
  3 siblings, 2 replies; 35+ messages in thread
From: Junio C Hamano @ 2022-10-25  4:30 UTC (permalink / raw)
  To: John Cai via GitGitGadget; +Cc: git, John Cai

"John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Patch [1/2] removes an unused msg-id BAD_TAG_OBJECT Patch [2/2] adds a
> fsck-msgids.txt that lists msg-ids that fsck checks for
>
> Since v1:
>
>  * provided a full list of error messages along with corresponding default
>    severity
>  * no longer including the whole list in git-config
>  * add a comment in fsck.h to remind developers to keep the documentation
>    accurate when making changes to the list of error messages

Thanks.  I did a bit of sanity checking and it made my earlier
suspicion stronger.  We MUST have at least an automated checker to
check the doc against the fsck.h header, if not an automated
generator of the doc from the fsck.h header.

 * Just like your [1/2] pointed out an error type that is no longer
   in use, MISSING_TREE_OBJECT is not used.  It seems that this was
   never used since it was introduced at 159e7b08 (fsck: detect
   gitmodules files, 2018-05-02)

 * A few items were misspelled.

 * A handful items were missing.

The latter two are shown in the attached, which I am tempted to just
squash into your 2/2.  I think the MISSING_TREE_OBJECT one deserves
a separate commit between 1/2 and 2/2.

 Documentation/fsck-msgids.txt | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git c/Documentation/fsck-msgids.txt w/Documentation/fsck-msgids.txt
index 08e19bac8a..7af76ff99a 100644
--- c/Documentation/fsck-msgids.txt
+++ w/Documentation/fsck-msgids.txt
@@ -43,10 +43,10 @@
 `extraHeaderEntry`::
 	(IGNORE) Extra headers found after `tagger`.
 
-`fullPathName`::
+`fullPathname`::
 	(WARN) A path contains the full path starting with "/".
 
-`gitAttributesSymlink`::
+`gitattributesSymlink`::
 	(INFO) `.gitattributes` is a symlink.
 
 `gitignoreSymlink`::
@@ -55,6 +55,9 @@
 `gitmodulesBlob`::
 	(ERROR) A non-blob found at `.gitmodules`.
 
+`gitmodulesLarge`::
+	(ERROR) The `.gitmodules` file is too large to parse.
+
 `gitmodulesMissing`::
 	(ERROR) Unable to read `.gitmodules` blob.
 
@@ -118,6 +121,15 @@
 `missingTagEntry`::
 	(ERROR) Missing `tag` line in a tag object.
 
+`missingTaggerEntry`::
+	(INFO) Missing `tagger` line in a tag object.
+
+`missingTree`::
+	(ERROR) Missing `tree` line in a commit object.
+
+`missingType`::
+	(ERROR) Invalid type value on the `type` line in a tag object.
+
 `missingTypeEntry`::
 	(ERROR) Missing `type` line in a tag object.
 
@@ -130,7 +142,7 @@
 `nulInHeader`::
 	(FATAL) NUL byte exists in the object header.
 
-`nulInSha1`::
+`nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
 `treeNotSorted`::
@@ -142,7 +154,7 @@
 `unterminatedHeader`::
 	(FATAL) Missing end-of-line in the object header.
 
-`zeroPaddingDate`::
+`zeroPaddedDate`::
 	(ERROR) Found a zero padded date in an author/commiter line.
 
 `zeroPaddedFilemode`::

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v2 0/2] Document fsck msg ids
  2022-10-25  4:30   ` [PATCH v2 0/2] Document fsck msg ids Junio C Hamano
@ 2022-10-25  4:40     ` Junio C Hamano
  2022-10-25  5:12     ` [PATCH] Documentation: add lint-fsck-msgids Junio C Hamano
  1 sibling, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2022-10-25  4:40 UTC (permalink / raw)
  To: John Cai via GitGitGadget; +Cc: git, John Cai

Junio C Hamano <gitster@pobox.com> writes:

> Thanks.  I did a bit of sanity checking and it made my earlier
> suspicion stronger.  We MUST have at least an automated checker to
> check the doc against the fsck.h header, if not an automated
> generator of the doc from the fsck.h header.

FYI, here are a pair of quick-and-dirty Perl scripts that I used for
the sanity checking.  The first one "parses" the fsck-msgs.txt and
formats lines like

    badTagName	INFO

i.e. camelCased error message name, a TAB, and the severity.

The second one reads the FOREACH_FSCK_MSG_ID() definition in fsck.h
that look like "FUNC(BAD_TAG_NAME, INFO)", camelcases the name and
shows what can be compared with the output of the first one.

There are two sanity checks that must pass when a developer updates
the documentation.

 - The output from m.perl on the documentation must already be sorted.

 - The output from n.perl on fsck.h, when sorted, must match the
   output from m.perl on the documentation.

$ cat >m.perl <<\EOF
#!/usr/bin/perl

my ($previous, $current);

while (<>) {
	if (!defined $current) {
		if (/^\`([a-zA-Z0-9]*)\`::/) {
			$current = $1;
			if ((defined $previous) &&
			    ($current le $previous)) {
				print STDERR "$previous >= $current???\n";
			}
		}
	} elsif (/^\s+\(([A-Z]+)\) /) {
		print "$current	$1\n";
		$previous = $current;
		undef $current;
	}
}
EOF
$ cat >n.perl <<\EOF
#!/usr/bin/perl

while (<>) {
	if (/^\s+FUNC\(([0-9A-Z_]+), ([A-Z]+)\)/) {
		my ($name, $severity) = ($1, $2);
		my ($first) = 1;
		$name = join('',
			     map {
				     y/A-Z/a-z/;
				     if (!$first) {
					     s/^(.)/uc($1)/e;
				     } else {
					     $first = 0;
				     }
				     $_;
			     }
			     split(/_/, $name));
		print "$name	$severity\n";
	}
}
EOF
$ perl m.perl Documentation/fsck-msgids.txt >/var/tmp/1
$ sort /var/tmp/1 >/var/tmp/2
$ diff -u /var/tmp/1 /var/tmp/2
#### no output should appear in the above comparison
$ perl n.perl fsck.h | sort >/var/tmp/2
$ diff -u /var/tmp/1 /var/tmp/2
#### no output should appear in the above comparison


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] Documentation: add lint-fsck-msgids
  2022-10-25  4:30   ` [PATCH v2 0/2] Document fsck msg ids Junio C Hamano
  2022-10-25  4:40     ` Junio C Hamano
@ 2022-10-25  5:12     ` Junio C Hamano
  1 sibling, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2022-10-25  5:12 UTC (permalink / raw)
  To: John Cai via GitGitGadget; +Cc: git, John Cai

During the initial development of the fsck-msgids.txt feature, it
has become apparent that it is very much error prone to make sure
the description in the documentation file are sorted and correctly
match what is in the fsck.h header file.

Add a quick-and-dirty Perl script and doc-lint target to sanity
check that the fsck-msgids.txt is consistent with the error type
list in the fsck.h header file.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * So the quick-and-dirty one gets repackaged with minimum polish to
   become such a lint-doc checker.  With the MISSING_TREE_OBJECT
   removed from fsck.h header, and the fixes to the msgids.txt
   documentation applied, the result passes the check.

 Documentation/Makefile              | 11 +++++
 Documentation/lint-fsck-msgids.perl | 70 +++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+)
 create mode 100755 Documentation/lint-fsck-msgids.perl

diff --git a/Documentation/Makefile b/Documentation/Makefile
index d47acb2e25..5e1a7f655c 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -476,8 +476,19 @@ $(LINT_DOCS_MAN_SECTION_ORDER): .build/lint-docs/man-section-order/%.ok: %.txt
 .PHONY: lint-docs-man-section-order
 lint-docs-man-section-order: $(LINT_DOCS_MAN_SECTION_ORDER)
 
+.PHONY: lint-docs-fsck-msgids
+LINT_DOCS_FSCK_MSGIDS = .build/lint-docs/fsck-msgids.ok
+$(LINT_DOCS_FSCK_MSGIDS): lint-fsck-msgids.perl
+$(LINT_DOCS_FSCK_MSGIDS): ../fsck.h fsck-msgids.txt
+	$(call mkdir_p_parent_template)
+	$(QUIET_GEN)$(PERL_PATH) lint-fsck-msgids.perl \
+		../fsck.h fsck-msgids.txt $@
+
+lint-docs-fsck-msgids: $(LINT_DOCS_FSCK_MSGIDS)
+
 ## Lint: list of targets above
 .PHONY: lint-docs
+lint-docs: lint-docs-fsck-msgids
 lint-docs: lint-docs-gitlink
 lint-docs: lint-docs-man-end-blurb
 lint-docs: lint-docs-man-section-order
diff --git a/Documentation/lint-fsck-msgids.perl b/Documentation/lint-fsck-msgids.perl
new file mode 100755
index 0000000000..1233ffe815
--- /dev/null
+++ b/Documentation/lint-fsck-msgids.perl
@@ -0,0 +1,70 @@
+#!/usr/bin/perl
+
+my ($fsck_h, $fsck_msgids_txt, $okfile) = @ARGV;
+
+my (%in_fsck_h, $fh, $bad);
+
+open($fh, "<", "$fsck_h") or die;
+while (<$fh>) {
+	if (/^\s+FUNC\(([0-9A-Z_]+), ([A-Z]+)\)/) {
+		my ($name, $severity) = ($1, $2);
+		my ($first) = 1;
+		$name = join('',
+			     map {
+				     y/A-Z/a-z/;
+				     if (!$first) {
+					     s/^(.)/uc($1)/e;
+				     } else {
+					     $first = 0;
+				     }
+				     $_;
+			     }
+			     split(/_/, $name));
+		$in_fsck_h{$name} = $severity;
+	}
+}
+close($fh);
+
+open($fh, "<", "$fsck_msgids_txt") or die;
+my ($previous, $current);
+while (<$fh>) {
+	if (!defined $current) {
+		if (/^\`([a-zA-Z0-9]*)\`::/) {
+			$current = $1;
+			if ((defined $previous) &&
+			    ($current le $previous)) {
+				print STDERR "$previous >= $current in doc\n";
+				$bad = 1;
+			}
+		}
+	} elsif (/^\s+\(([A-Z]+)\) /) {
+		my ($level) = $1;
+		if (!exists $in_fsck_h{$current}) {
+			print STDERR "$current does not exist in fsck.h\n";
+			$bad = 1;
+		} elsif ($in_fsck_h{$current} eq "") {
+			print STDERR "$current defined twice\n";
+			$bad = 1;
+		} elsif ($in_fsck_h{$current} ne $level) {
+			print STDERR "$current severity $level != $in_fsck_h{$current}\n";
+			$bad = 1;
+		}
+		$previous = $current;
+		$in_fsck_h{$current} = ""; # mark as seen.
+		undef $current;
+	}
+}
+close($fh);
+
+for my $key (keys %in_fsck_h) {
+	if ($in_fsck_h{$key} ne "") {
+		print STDERR "$key not explained in doc.\n";
+		$bad = 1;
+	}
+}
+
+die if ($bad);
+
+open($fh, ">", "$okfile");
+print $fh "good\n";
+close($fh);
-- 
2.38.1-359-g84c4c6d5a5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fsck: document msg-id
  2022-10-24 17:33   ` Junio C Hamano
@ 2022-10-25  9:41     ` Ævar Arnfjörð Bjarmason
  2022-10-25 16:07       ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-25  9:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: John Cai via GitGitGadget, git, John Cai


On Mon, Oct 24 2022, Junio C Hamano wrote:

> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> I almost suggested to extend the FOREACH_FSCK_MSG_ID() definition in
> fsck.h so that fsck-msgids.txt gets auto-generated (what is missing
> in fsck.h that prevents us from doing so is the textual explanation
> you added in the new file in your patch---they could come from
> comments on the same line, for example, and can be extracted via a
> Perl or sed script at build time).

I think this is the best eventual approach, whether we want it now is
another matter...

> I do not know if it is a good
> idea, especially if we forsee a future in which we may be
> translating the documentation, so decided against making such a
> suggestion.

...I just wanted to point out that difficulty of translating such a
thing should not be a reason to hold that back, because it's not hard to
translate such an arrangement.

We'd just point the po-doc extraction at the generated *.txt, we'd need
to re-arrange the Makefile dependencies a bit, but it shouldn't be a
problem.

The *.pot-file generation is a step that only the translation
coordinator *needs* to run, so even if it's a manual procedure, or
requires first building the sources...

> But at least, we could _lint_ the manually curated fsck-msgids.txt
> using what is in fsck.h to see if a new MSG_ID added to fsck.h is
> missing from the doc, or a MSG_ID whose default severity is updated
> in fsck.h is stale in the doc, etc.  That can be left for the future
> updates, but we should at least instruct developers to keep them in
> sync in fsck.h by adding a comment.

Yeah, that's a good step.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fsck: document msg-id
  2022-10-25  9:41     ` Ævar Arnfjörð Bjarmason
@ 2022-10-25 16:07       ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2022-10-25 16:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: John Cai via GitGitGadget, git, John Cai

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> We'd just point the po-doc extraction at the generated *.txt, we'd need
> to re-arrange the Makefile dependencies a bit, but it shouldn't be a
> problem.
>
> The *.pot-file generation is a step that only the translation
> coordinator *needs* to run, so even if it's a manual procedure, or
> requires first building the sources...

I thought you made it more distributed to remove the coordinator's
involvement in .pot generation recently, which resulted in us not
tracking the .pot at all, no?  Now all the l10n folks need to do the
extraction?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 0/4] document fsck error message ids
  2022-10-25  3:17 ` [PATCH v2 " John Cai via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-10-25  4:30   ` [PATCH v2 0/2] Document fsck msg ids Junio C Hamano
@ 2022-10-25 22:42   ` Junio C Hamano
  2022-10-25 22:42     ` [PATCH v3 1/4] fsck: remove the unused BAD_TAG_OBJECT Junio C Hamano
                       ` (3 more replies)
  3 siblings, 4 replies; 35+ messages in thread
From: Junio C Hamano @ 2022-10-25 22:42 UTC (permalink / raw)
  To: git; +Cc: John Cai

"git fsck" reports various anomalies it finds in the form of message
tokens, like `badDate`, in its error message, and allows most of
them to be tweaked for their severity levels via configuration
varialbes, like `fsck.badDate`.  We however do not have them
centrally documented anywhere, other than the header file itself
(and the header file does not have explanation on what they mean).

This is John's work, with a bit of help from me.

 * Patches 1 and 2 remove fsck error message IDs from fsck.h that
   are not used in today's code.

 * In patch 3, "git fsck --help" gains a section that lists all the
   fsck error message ids with explanation.  "git config --help"
   gains a reference to the section.

 * While we worked on patch 3, we made many small mistakes (like
   spelling the tokens with incorrect camelCasing, or failing to
   list a few messages), which revealed the need of automated tool
   to catch them.  Patch 4 adds to "make check-docs" a mechanism to
   verify the documentation covers what is defined in fsck.h
   correctly.

Possible future directions that are left outside the scope of this
topic are:

 * "git config --help" may want to include the same list inline
   instead of having a reference to "git fsck --help".

 * Instead of the lint-docs support, we may want to auto-generate
   the documentation from fsck.h.


John Cai (2):
  fsck: remove the unused BAD_TAG_OBJECT
  fsck: document msg-id

Junio C Hamano (2):
  fsck: remove the unused MISSING_TREE_OBJECT
  Documentation: add lint-fsck-msgids

 Documentation/Makefile              |  11 ++
 Documentation/config/fsck.txt       |   4 +
 Documentation/fsck-msgids.txt       | 161 ++++++++++++++++++++++++++++
 Documentation/git-fsck.txt          |  12 +++
 Documentation/lint-fsck-msgids.perl |  70 ++++++++++++
 fsck.h                              |   8 +-
 6 files changed, 264 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/fsck-msgids.txt
 create mode 100755 Documentation/lint-fsck-msgids.perl

-- 
2.38.1-359-g84c4c6d5a5


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v3 1/4] fsck: remove the unused BAD_TAG_OBJECT
  2022-10-25 22:42   ` [PATCH v3 0/4] document fsck error message ids Junio C Hamano
@ 2022-10-25 22:42     ` Junio C Hamano
  2022-10-25 22:42     ` [PATCH v3 2/4] fsck: remove the unused MISSING_TREE_OBJECT Junio C Hamano
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2022-10-25 22:42 UTC (permalink / raw)
  To: git; +Cc: John Cai

From: John Cai <johncai86@gmail.com>

2175a0c6 (fsck: stop checking tag->tagged, 2019-10-18) stopped
checking the tagged object referred to by a tag object, which is what the
error message BAD_TAG_OBJECT was for. Since then the BAD_TAG_OBJECT
message is no longer used anywhere.

Remove the BAD_TAG_OBJECT msg-id.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 fsck.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fsck.h b/fsck.h
index 6f801e53b1..7d99f6ea33 100644
--- a/fsck.h
+++ b/fsck.h
@@ -24,7 +24,6 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
-	FUNC(BAD_TAG_OBJECT, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
-- 
2.38.1-359-g84c4c6d5a5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 2/4] fsck: remove the unused MISSING_TREE_OBJECT
  2022-10-25 22:42   ` [PATCH v3 0/4] document fsck error message ids Junio C Hamano
  2022-10-25 22:42     ` [PATCH v3 1/4] fsck: remove the unused BAD_TAG_OBJECT Junio C Hamano
@ 2022-10-25 22:42     ` Junio C Hamano
  2022-10-25 22:42     ` [PATCH v3 3/4] fsck: document msg-id Junio C Hamano
  2022-10-25 22:42     ` [PATCH v3 4/4] Documentation: add lint-fsck-msgids Junio C Hamano
  3 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2022-10-25 22:42 UTC (permalink / raw)
  To: git; +Cc: John Cai

This error type has never been used since it was introduced at
159e7b08 (fsck: detect gitmodules files, 2018-05-02).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 fsck.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fsck.h b/fsck.h
index 7d99f6ea33..1d7c38f268 100644
--- a/fsck.h
+++ b/fsck.h
@@ -39,7 +39,6 @@ enum fsck_msg_type {
 	FUNC(MISSING_TAG, ERROR) \
 	FUNC(MISSING_TAG_ENTRY, ERROR) \
 	FUNC(MISSING_TREE, ERROR) \
-	FUNC(MISSING_TREE_OBJECT, ERROR) \
 	FUNC(MISSING_TYPE, ERROR) \
 	FUNC(MISSING_TYPE_ENTRY, ERROR) \
 	FUNC(MULTIPLE_AUTHORS, ERROR) \
-- 
2.38.1-359-g84c4c6d5a5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 3/4] fsck: document msg-id
  2022-10-25 22:42   ` [PATCH v3 0/4] document fsck error message ids Junio C Hamano
  2022-10-25 22:42     ` [PATCH v3 1/4] fsck: remove the unused BAD_TAG_OBJECT Junio C Hamano
  2022-10-25 22:42     ` [PATCH v3 2/4] fsck: remove the unused MISSING_TREE_OBJECT Junio C Hamano
@ 2022-10-25 22:42     ` Junio C Hamano
  2022-10-25 22:42     ` [PATCH v3 4/4] Documentation: add lint-fsck-msgids Junio C Hamano
  3 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2022-10-25 22:42 UTC (permalink / raw)
  To: git; +Cc: John Cai

From: John Cai <johncai86@gmail.com>

The documentation lacks mention of specific <msg-id> that are supported.
While git-help --config will display a list of these options, often
developers' first instinct is to consult the git docs to find valid
config values.

Add a list of fsck error messages, and link to it from the git-fsck
documentation.

Signed-off-by: John Cai <johncai86@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/config/fsck.txt |   4 +
 Documentation/fsck-msgids.txt | 161 ++++++++++++++++++++++++++++++++++
 Documentation/git-fsck.txt    |  12 +++
 fsck.h                        |   6 ++
 4 files changed, 183 insertions(+)
 create mode 100644 Documentation/fsck-msgids.txt

diff --git a/Documentation/config/fsck.txt b/Documentation/config/fsck.txt
index 450e8c38e3..a3c865df56 100644
--- a/Documentation/config/fsck.txt
+++ b/Documentation/config/fsck.txt
@@ -35,6 +35,10 @@ allow new instances of the same breakages go unnoticed.
 Setting an unknown `fsck.<msg-id>` value will cause fsck to die, but
 doing the same for `receive.fsck.<msg-id>` and `fetch.fsck.<msg-id>`
 will only cause git to warn.
++
+See `Fsck Messages` section of linkgit:git-fsck[1] for supported
+values of `<msg-id>`.
+
 
 fsck.skipList::
 	The path to a list of object names (i.e. one unabbreviated SHA-1 per
diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
new file mode 100644
index 0000000000..7af76ff99a
--- /dev/null
+++ b/Documentation/fsck-msgids.txt
@@ -0,0 +1,161 @@
+`badDate`::
+	(ERROR) Invalid date format in an author/committer line.
+
+`badDateOverflow`::
+	(ERROR) Invalid date value in an author/committer line.
+
+`badEmail`::
+	(ERROR) Invalid email format in an author/committer line.
+
+`badFilemode`::
+	(INFO) A tree contains a bad filemode entry.
+
+`badName`::
+	(ERROR) An author/committer name is empty.
+
+`badObjectSha1`::
+	(ERROR) An object has a bad sha1.
+
+`badParentSha1`::
+	(ERROR) A commit object has a bad parent sha1.
+
+`badTagName`::
+	(INFO) A tag has an invalid format.
+
+`badTimezone`::
+	(ERROR) Found an invalid time zone in an author/committer line.
+
+`badTree`::
+	(ERROR) A tree cannot be parsed.
+
+`badTreeSha1`::
+	(ERROR) A tree has an invalid format.
+
+`badType`::
+	(ERROR) Found an invalid object type.
+
+`duplicateEntries`::
+	(ERROR) A tree contains duplicate file entries.
+
+`emptyName`::
+	(WARN) A path contains an empty name.
+
+`extraHeaderEntry`::
+	(IGNORE) Extra headers found after `tagger`.
+
+`fullPathname`::
+	(WARN) A path contains the full path starting with "/".
+
+`gitattributesSymlink`::
+	(INFO) `.gitattributes` is a symlink.
+
+`gitignoreSymlink`::
+	(INFO) `.gitignore` is a symlink.
+
+`gitmodulesBlob`::
+	(ERROR) A non-blob found at `.gitmodules`.
+
+`gitmodulesLarge`::
+	(ERROR) The `.gitmodules` file is too large to parse.
+
+`gitmodulesMissing`::
+	(ERROR) Unable to read `.gitmodules` blob.
+
+`gitmodulesName`::
+	(ERROR) A submodule name is invalid.
+
+`gitmodulesParse`::
+	(INFO) Could not parse `.gitmodules` blob.
+
+`gitmodulesLarge`;
+	(ERROR) `.gitmodules` blob is too large to parse.
+
+`gitmodulesPath`::
+	(ERROR) `.gitmodules` path is invalid.
+
+`gitmodulesSymlink`::
+	(ERROR) `.gitmodules` is a symlink.
+
+`gitmodulesUpdate`::
+	(ERROR) Found an invalid submodule update setting.
+
+`gitmodulesUrl`::
+	(ERROR) Found an invalid submodule url.
+
+`hasDot`::
+	(WARN) A tree contains an entry named `.`.
+
+`hasDotdot`::
+	(WARN) A tree contains an entry named `..`.
+
+`hasDotgit`::
+	(WARN) A tree contains an entry named `.git`.
+
+`mailmapSymlink`::
+	(INFO) `.mailmap` is a symlink.
+
+`missingAuthor`::
+	(ERROR) Author is missing.
+
+`missingCommitter`::
+	(ERROR) Committer is missing.
+
+`missingEmail`::
+	(ERROR) Email is missing in an author/committer line.
+
+`missingNameBeforeEmail`::
+	(ERROR) Missing name before an email in an author/committer line.
+
+`missingObject`::
+	(ERROR) Missing `object` line in tag object.
+
+`missingSpaceBeforeDate`::
+	(ERROR) Missing space before date in an author/committer line.
+
+`missingSpaceBeforeEmail`::
+	(ERROR) Missing space before the email in author/committer line.
+
+`missingTag`::
+	(ERROR) Unexpected end after `type` line in a tag object.
+
+`missingTagEntry`::
+	(ERROR) Missing `tag` line in a tag object.
+
+`missingTaggerEntry`::
+	(INFO) Missing `tagger` line in a tag object.
+
+`missingTree`::
+	(ERROR) Missing `tree` line in a commit object.
+
+`missingType`::
+	(ERROR) Invalid type value on the `type` line in a tag object.
+
+`missingTypeEntry`::
+	(ERROR) Missing `type` line in a tag object.
+
+`multipleAuthors`::
+	(ERROR) Multiple author lines found in a commit.
+
+`nulInCommit`::
+	(WARN) Found a NUL byte in the commit object body.
+
+`nulInHeader`::
+	(FATAL) NUL byte exists in the object header.
+
+`nullSha1`::
+	(WARN) Tree contains entries pointing to a null sha1.
+
+`treeNotSorted`::
+	(ERROR) A tree is not properly sorted.
+
+`unknownType`::
+	(ERROR) Found an unknown object type.
+
+`unterminatedHeader`::
+	(FATAL) Missing end-of-line in the object header.
+
+`zeroPaddedDate`::
+	(ERROR) Found a zero padded date in an author/commiter line.
+
+`zeroPaddedFilemode`::
+	(WARN) Found a zero padded filemode in a tree.
diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index 29318ea957..b6a0f8a085 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -152,6 +152,18 @@ hash mismatch <object>::
 	object database value.
 	This indicates a serious data integrity problem.
 
+
+FSCK MESSAGES
+-------------
+
+The following lists the types of errors `git fsck` detects and what
+each error means, with their default severity.  The severity of the
+error, other than those that are marked as "(FATAL)", can be tweaked
+by setting the corresponding `fsck.<msg-id>` configuration variable.
+
+include::fsck-msgids.txt[]
+
+
 Environment Variables
 ---------------------
 
diff --git a/fsck.h b/fsck.h
index 1d7c38f268..6fbce68ad6 100644
--- a/fsck.h
+++ b/fsck.h
@@ -13,6 +13,12 @@ enum fsck_msg_type {
 	FSCK_WARN,
 };
 
+/*
+ * Documentation/fsck-msgids.txt documents these; when
+ * modifying this list in any way, make sure to keep the
+ * two in sync.
+ */
+
 #define FOREACH_FSCK_MSG_ID(FUNC) \
 	/* fatal errors */ \
 	FUNC(NUL_IN_HEADER, FATAL) \
-- 
2.38.1-359-g84c4c6d5a5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-25 22:42   ` [PATCH v3 0/4] document fsck error message ids Junio C Hamano
                       ` (2 preceding siblings ...)
  2022-10-25 22:42     ` [PATCH v3 3/4] fsck: document msg-id Junio C Hamano
@ 2022-10-25 22:42     ` Junio C Hamano
  2022-10-26  2:43       ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2022-10-25 22:42 UTC (permalink / raw)
  To: git; +Cc: John Cai

During the initial development of the fsck-msgids.txt feature, it
has become apparent that it is very much error prone to make sure
the description in the documentation file are sorted and correctly
match what is in the fsck.h header file.

Add a quick-and-dirty Perl script and doc-lint target to sanity
check that the fsck-msgids.txt is consistent with the error type
list in the fsck.h header file.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/Makefile              | 11 +++++
 Documentation/lint-fsck-msgids.perl | 70 +++++++++++++++++++++++++++++
 2 files changed, 81 insertions(+)
 create mode 100755 Documentation/lint-fsck-msgids.perl

diff --git a/Documentation/Makefile b/Documentation/Makefile
index d47acb2e25..5e1a7f655c 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -476,8 +476,19 @@ $(LINT_DOCS_MAN_SECTION_ORDER): .build/lint-docs/man-section-order/%.ok: %.txt
 .PHONY: lint-docs-man-section-order
 lint-docs-man-section-order: $(LINT_DOCS_MAN_SECTION_ORDER)
 
+.PHONY: lint-docs-fsck-msgids
+LINT_DOCS_FSCK_MSGIDS = .build/lint-docs/fsck-msgids.ok
+$(LINT_DOCS_FSCK_MSGIDS): lint-fsck-msgids.perl
+$(LINT_DOCS_FSCK_MSGIDS): ../fsck.h fsck-msgids.txt
+	$(call mkdir_p_parent_template)
+	$(QUIET_GEN)$(PERL_PATH) lint-fsck-msgids.perl \
+		../fsck.h fsck-msgids.txt $@
+
+lint-docs-fsck-msgids: $(LINT_DOCS_FSCK_MSGIDS)
+
 ## Lint: list of targets above
 .PHONY: lint-docs
+lint-docs: lint-docs-fsck-msgids
 lint-docs: lint-docs-gitlink
 lint-docs: lint-docs-man-end-blurb
 lint-docs: lint-docs-man-section-order
diff --git a/Documentation/lint-fsck-msgids.perl b/Documentation/lint-fsck-msgids.perl
new file mode 100755
index 0000000000..1233ffe815
--- /dev/null
+++ b/Documentation/lint-fsck-msgids.perl
@@ -0,0 +1,70 @@
+#!/usr/bin/perl
+
+my ($fsck_h, $fsck_msgids_txt, $okfile) = @ARGV;
+
+my (%in_fsck_h, $fh, $bad);
+
+open($fh, "<", "$fsck_h") or die;
+while (<$fh>) {
+	if (/^\s+FUNC\(([0-9A-Z_]+), ([A-Z]+)\)/) {
+		my ($name, $severity) = ($1, $2);
+		my ($first) = 1;
+		$name = join('',
+			     map {
+				     y/A-Z/a-z/;
+				     if (!$first) {
+					     s/^(.)/uc($1)/e;
+				     } else {
+					     $first = 0;
+				     }
+				     $_;
+			     }
+			     split(/_/, $name));
+		$in_fsck_h{$name} = $severity;
+	}
+}
+close($fh);
+
+open($fh, "<", "$fsck_msgids_txt") or die;
+my ($previous, $current);
+while (<$fh>) {
+	if (!defined $current) {
+		if (/^\`([a-zA-Z0-9]*)\`::/) {
+			$current = $1;
+			if ((defined $previous) &&
+			    ($current le $previous)) {
+				print STDERR "$previous >= $current in doc\n";
+				$bad = 1;
+			}
+		}
+	} elsif (/^\s+\(([A-Z]+)\) /) {
+		my ($level) = $1;
+		if (!exists $in_fsck_h{$current}) {
+			print STDERR "$current does not exist in fsck.h\n";
+			$bad = 1;
+		} elsif ($in_fsck_h{$current} eq "") {
+			print STDERR "$current defined twice\n";
+			$bad = 1;
+		} elsif ($in_fsck_h{$current} ne $level) {
+			print STDERR "$current severity $level != $in_fsck_h{$current}\n";
+			$bad = 1;
+		}
+		$previous = $current;
+		$in_fsck_h{$current} = ""; # mark as seen.
+		undef $current;
+	}
+}
+close($fh);
+
+for my $key (keys %in_fsck_h) {
+	if ($in_fsck_h{$key} ne "") {
+		print STDERR "$key not explained in doc.\n";
+		$bad = 1;
+	}
+}
+
+die if ($bad);
+
+open($fh, ">", "$okfile");
+print $fh "good\n";
+close($fh);
-- 
2.38.1-359-g84c4c6d5a5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-25 22:42     ` [PATCH v3 4/4] Documentation: add lint-fsck-msgids Junio C Hamano
@ 2022-10-26  2:43       ` Ævar Arnfjörð Bjarmason
  2022-10-26  5:34         ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-26  2:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, John Cai


On Tue, Oct 25 2022, Junio C Hamano wrote:

> During the initial development of the fsck-msgids.txt feature, it
> has become apparent that it is very much error prone to make sure
> the description in the documentation file are sorted and correctly
> match what is in the fsck.h header file.

I have local fixes for the same issues in the list of advice in our
docs, some of it's missing, wrong, out of date etc.

I tried to quickly adapt the generation script I had for that, which
works nicely, and by line count much shorter than the lint :)

Having to exhaustively list every *.c file that uses fsck.h is a bit of
a bother, but we have the same with the other generated *.h's, so it
shouldn't be too bad.

And this way, if we get it wrong we get a compile error:
	
	$ git -P diff; make
	diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
	index 7af76ff99a5..f0b4308a6e7 100644
	--- a/Documentation/fsck-msgids.txt
	+++ b/Documentation/fsck-msgids.txt
	@@ -1,6 +1,3 @@
	-`badDate`::
	-       (ERROR) Invalid date format in an author/committer line.
	-
	 `badDateOverflow`::
	        (ERROR) Invalid date value in an author/committer line.
	 
	    CC fsck.o
	fsck.c:31:9: error: excess elements in array initializer [-Werror]
	   31 |         { NULL, NULL, NULL, -1 }
	      |         ^
	fsck.c:31:9: note: (near initialization for ‘msg_id_info’)
	fsck.c: In function ‘fsck_ident’:
	fsck.c:810:51: error: ‘FSCK_MSG_BAD_DATE’ undeclared (first use in this function); did you mean ‘FSCK_MSG_BAD_NAME’?
	  810 |                 return report(options, oid, type, FSCK_MSG_BAD_DATE, "invalid author/committer line - bad date");
	      |                                                   ^~~~~~~~~~~~~~~~~
	      |                                                   FSCK_MSG_BAD_NAME
	fsck.c:810:51: note: each undeclared identifier is reported only once for each function it appears in
	cc1: all warnings being treated as errors
	make: *** [Makefile:2617: fsck.o] Error 1

So, if you're interested:

 Makefile               | 12 ++++++++++++
 fsck.h                 |  8 +-------
 generate-fsckmsgids.sh | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index 85f03c6aed1..21bbc3dfb9f 100644
--- a/Makefile
+++ b/Makefile
@@ -860,6 +860,7 @@ REFTABLE_TEST_LIB = reftable/libreftable_test.a
 GENERATED_H += command-list.h
 GENERATED_H += config-list.h
 GENERATED_H += hook-list.h
+GENERATED_H += fsck-msgids.h
 
 .PHONY: generated-hdrs
 generated-hdrs: $(GENERATED_H)
@@ -2289,6 +2290,14 @@ git$X: git.o GIT-LDFLAGS $(BUILTIN_OBJS) $(GITLIBS)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) \
 		$(filter %.o,$^) $(LIBS)
 
+# Unfortunately our dependency management of generated headers used
+# from within other headers suck, so we'll need to list every user of
+# fsck.h here, but not too bad...
+FSCK_MSGIDS_H_BUILTINS = fsck index-pack mktag receive-pack unpack-objects
+$(foreach f,$(FSCK_MSGIDS_H_BUILTINS:%=builtin/%),$f.sp $f.s $f.o): fsck-msgids.h
+FSCK_MSGIDS_H_LIBS = fetch-pack fsck
+$(foreach f,$(FSCK_MSGIDS_H_LIBS),$f.sp $f.s $f.o): fsck-msgids.h
+
 help.sp help.s help.o: command-list.h
 builtin/bugreport.sp builtin/bugreport.s builtin/bugreport.o: hook-list.h
 
@@ -2333,6 +2342,9 @@ command-list.h: $(wildcard Documentation/git*.txt)
 hook-list.h: generate-hooklist.sh Documentation/githooks.txt
 	$(QUIET_GEN)$(SHELL_PATH) ./generate-hooklist.sh >$@
 
+fsck-msgids.h: generate-fsckmsgids.sh Documentation/fsck-msgids.txt
+	$(QUIET_GEN)$(SHELL_PATH) ./generate-fsckmsgids.sh >$@
+
 SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):\
 	$(localedir_SQ):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\
 	$(gitwebdir_SQ):$(PERL_PATH_SQ):$(PAGER_ENV):\
diff --git a/fsck.h b/fsck.h
index 6fbce68ad67..1a9d68482ea 100644
--- a/fsck.h
+++ b/fsck.h
@@ -2,6 +2,7 @@
 #define GIT_FSCK_H
 
 #include "oidset.h"
+#include "fsck-msgids.h"
 
 enum fsck_msg_type {
 	/* for internal use only */
@@ -79,13 +80,6 @@ enum fsck_msg_type {
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
-#define MSG_ID(id, msg_type) FSCK_MSG_##id,
-enum fsck_msg_id {
-	FOREACH_FSCK_MSG_ID(MSG_ID)
-	FSCK_MSG_MAX
-};
-#undef MSG_ID
-
 struct fsck_options;
 struct object;
 
diff --git a/generate-fsckmsgids.sh b/generate-fsckmsgids.sh
new file mode 100755
index 00000000000..bbf236159aa
--- /dev/null
+++ b/generate-fsckmsgids.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+
+HT='	'
+
+fsck_list () {
+	sed -n \
+		-e "/::$/ {
+			s/::\$//;
+			s/^\`//;
+			s/\`$//;
+			p;
+		}" \
+	    <Documentation/fsck-msgids.txt
+}
+
+txt2label () {
+	sed \
+		-e 's/\([^_]\)\([[:upper:]]\)/\1_\2/g' \
+		-e 's/^/FSCK_MSG_/' |
+		tr '[:lower:]' '[:upper:]'
+}
+
+fsck_labels () {
+	fsck_list |
+	txt2label
+}
+
+listify () {
+	sed \
+		-e "2,\$s/^/$HT/" \
+		-e 's/$/,/'
+}
+
+cat <<EOF
+/* Automatically generated by generate-fsck-msgids.sh */
+
+enum fsck_msg_id {
+	/* Auto-generated from Documentation/fsck-msgids.txt */
+	$(fsck_labels | listify)
+	FSCK_MSG_MAX
+};
+EOF

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-26  2:43       ` Ævar Arnfjörð Bjarmason
@ 2022-10-26  5:34         ` Jeff King
  2022-10-26  6:46           ` Junio C Hamano
  2022-10-26 11:35           ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 35+ messages in thread
From: Jeff King @ 2022-10-26  5:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, git, John Cai

On Wed, Oct 26, 2022 at 04:43:32AM +0200, Ævar Arnfjörð Bjarmason wrote:

> 
> On Tue, Oct 25 2022, Junio C Hamano wrote:
> 
> > During the initial development of the fsck-msgids.txt feature, it
> > has become apparent that it is very much error prone to make sure
> > the description in the documentation file are sorted and correctly
> > match what is in the fsck.h header file.
> 
> I have local fixes for the same issues in the list of advice in our
> docs, some of it's missing, wrong, out of date etc.
> 
> I tried to quickly adapt the generation script I had for that, which
> works nicely, and by line count much shorter than the lint :)

Yeah, my instinct here was to generate rather than lint. If you make a
mistake and the linter hits you over the head, that is better than
quietly letting your mistake go. But better still is making it
impossible to make in the first place.

The downside is added complexity to the build, but it doesn't seem too
bad in this case.

(I had a similar thought after getting hit on the head by the recent
t0450-txt-doc-vs-help.sh).

> Having to exhaustively list every *.c file that uses fsck.h is a bit of
> a bother, but we have the same with the other generated *.h's, so it
> shouldn't be too bad.

It feels like this could be made much shorter by having a separate
fsck-msgs.h and not including it from fsck.h. Only fsck.c and mktag.c
need the actual list. It would probably have to stop being an "enum",
though.

Another alternative is to generate the doc from the code, rather than
the other way around.

> +# Unfortunately our dependency management of generated headers used
> +# from within other headers suck, so we'll need to list every user of
> +# fsck.h here, but not too bad...
> +FSCK_MSGIDS_H_BUILTINS = fsck index-pack mktag receive-pack unpack-objects
> +$(foreach f,$(FSCK_MSGIDS_H_BUILTINS:%=builtin/%),$f.sp $f.s $f.o): fsck-msgids.h
> +FSCK_MSGIDS_H_LIBS = fetch-pack fsck
> +$(foreach f,$(FSCK_MSGIDS_H_LIBS),$f.sp $f.s $f.o): fsck-msgids.h

I don't understand the "used within other headers" part here. Computed
dependencies will get this right. It's only the initial build (before we
have any computed dependencies) that needs to know that the C files in
question depend on the generated file. But that is true whether they do
so directly or indirectly.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-26  5:34         ` Jeff King
@ 2022-10-26  6:46           ` Junio C Hamano
  2022-10-26 11:35           ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2022-10-26  6:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Ævar Arnfjörð Bjarmason, git, John Cai

Jeff King <peff@peff.net> writes:

> Another alternative is to generate the doc from the code, rather than
> the other way around.

Yup, that indeed was what I envisioned in the "Possible future
directions" comment in the cover letter.  It might not even be bad
to do

    #define FOREACH_FSCK_MSG_ID(FUNC) \
	/* fatal errors */ \
	FUNC(NUL_IN_HEADER, FATAL, N_("NUL byte exists in the object header.")) \
	FUNC(UNTERMINATED_HEADER, FATAL, N_("Missing end-of-line in the object header.")) \
	/* errors */ \
	FUNC(BAD_DATE, ERROR, N_("Invalid date format in an author/committer line.")) \
	...

as some of the users of the macro may have use of the translatable
messages themselves.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-26  5:34         ` Jeff King
  2022-10-26  6:46           ` Junio C Hamano
@ 2022-10-26 11:35           ` Ævar Arnfjörð Bjarmason
  2022-10-28  1:23             ` Jeff King
  2022-10-28  3:02             ` John Cai
  1 sibling, 2 replies; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-26 11:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, John Cai


On Wed, Oct 26 2022, Jeff King wrote:

> On Wed, Oct 26, 2022 at 04:43:32AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> 
>> On Tue, Oct 25 2022, Junio C Hamano wrote:
>> 
>> > During the initial development of the fsck-msgids.txt feature, it
>> > has become apparent that it is very much error prone to make sure
>> > the description in the documentation file are sorted and correctly
>> > match what is in the fsck.h header file.
>> 
>> I have local fixes for the same issues in the list of advice in our
>> docs, some of it's missing, wrong, out of date etc.
>> 
>> I tried to quickly adapt the generation script I had for that, which
>> works nicely, and by line count much shorter than the lint :)
>
> Yeah, my instinct here was to generate rather than lint. If you make a
> mistake and the linter hits you over the head, that is better than
> quietly letting your mistake go. But better still is making it
> impossible to make in the first place.
>
> The downside is added complexity to the build, but it doesn't seem too
> bad in this case.

Yeah, it's not, I have local patches to generate advice-{type,config}.h,
and builtin.h. This is a quick POC to do it for fsck-msgids.h.

I see I forgot the .gitignore entry, so it's a rough POC :)

> (I had a similar thought after getting hit on the head by the recent
> t0450-txt-doc-vs-help.sh).

Sorry about that. FWIW I've wanted to assert that for a while, and to do
it by e.g. having the doc *.txt blurbs generated from running "$buildin
-h" during the build.

But when I suggested that I gathered That Junio wasn't up for that
approach, it does have its downsides thorugh. E.g. to build the docs
you'd now have to compile C code, and e.g. that git-scm.com publisher
written in Ruby would have to compile the code to generate its docs.

Or we could do it the other way around, and scape the *.txt for the *.c
generation, but then we need to run a new script for building
builtin/*.o. Also possible, and I think eventually both are better than
what t0450-txt-doc-vs-help.sh's doing now, but that was a less invasive
change than both...

>> Having to exhaustively list every *.c file that uses fsck.h is a bit of
>> a bother, but we have the same with the other generated *.h's, so it
>> shouldn't be too bad.
>
> It feels like this could be made much shorter by having a separate
> fsck-msgs.h and not including it from fsck.h. Only fsck.c and mktag.c
> need the actual list. It would probably have to stop being an "enum",
> though.

Yes, that would make it shorter, but C doesn't have forward decls of
enums, so we'd need to make it "int", ....

> Another alternative is to generate the doc from the code, rather than
> the other way around.

*nod* :)

>> +# Unfortunately our dependency management of generated headers used
>> +# from within other headers suck, so we'll need to list every user of
>> +# fsck.h here, but not too bad...
>> +FSCK_MSGIDS_H_BUILTINS = fsck index-pack mktag receive-pack unpack-objects
>> +$(foreach f,$(FSCK_MSGIDS_H_BUILTINS:%=builtin/%),$f.sp $f.s $f.o): fsck-msgids.h
>> +FSCK_MSGIDS_H_LIBS = fetch-pack fsck
>> +$(foreach f,$(FSCK_MSGIDS_H_LIBS),$f.sp $f.s $f.o): fsck-msgids.h
>
> I don't understand the "used within other headers" part here. Computed
> dependencies will get this right. It's only the initial build (before we
> have any computed dependencies) that needs to know that the C files in
> question depend on the generated file. But that is true whether they do
> so directly or indirectly.

I forgot the full story, but luckily I wrote it down in the WIP commits
:) FWIW if you want to scour that it's mainly:

	https://github.com/avar/git/commit/a00f1cb9ea5 # add advice-type.h
	https://github.com/avar/git/commit/9e080923a11 # generate 'struct advice

Also, before generating builtin.h I've got e.g. this:

	https://github.com/avar/git/commit/5a5360d0134 # just check with 'make' that any file is sorted

To actualy generate it, very WIP:

        http://github.com/avar/git/commit/cf1d02fa6b2

Anyway, in partial summary, why (and sorry this got so long):

 * Yes, once we build e.g. fsck.o we have the full dependency tree,
   yay....
 * ...but only for gcc & clang, but we support other compilers.
 * ...for other compilers (or gcc & clang without the dep detection
   enabled) what should this do:

	make file-that-does-not-use-generated-header.o

   It sucks a bit to have e.g. every *.o depend on command-list.h, when
   we only use it in 1-2 places, ditto the other headers.

   The approach I took to this was to manually list headers that had
   "small uses" (1-10), but say that all of *.h dependend on running the
   *.sh to generate some "widely used" headers.

   E.g. if we generate builtin.h we'll either need that, or add
   builtin/%.o as a dependency, with a manual listing of the handful of
   uses outside of that subdirectory.

 * .... or just not care about any of that, i.e. to have all of our *.sh
   run on the "first build" no matter what, which certainly simplifies
   things, but once you have e.g. 5-10 generated headers doing e.g.:

	make grep.o

   and having it build command-list.h or whatever is a bit noisy, but
   it's a trade-off of manually maintaining deps v.s. not.

 * Our "COMPUTED_HEADER_DEPENDENCIES" only happens when you build the
   *.o, but we have *.sp and *.s targets too. I have some local changes
   *to generalize that, so we e.g. get proper dependencies for building
   **.s files.

 * We also have e.g. "make hdr-check", which works just fine now, but
   it's becausue we have no *.h file that uses another generated *.h
   file.

   Actually, I suspect the posted WIP change is broken in that regard
   (but doing this for real on my topic branch works), i.e. we need to
   start caring about deps for those auxiliary targets.

 * I may be wrong (I'm just looking at some old "here be dragons" note
   of mine), but I think there's also an edge case where the dependency
   graph of .depend.d/* is correct *once you've always had it*, but if a
   file now e.g. depends on foo.h, and I make foo.h include a new
   foo-generated.h things go boom.

   That issue (it if even exists, sorry, I'm blanking on the details)
   was solvable by just doing a "make clean", and building again,
   i.e. once we had re-built once we were OK, it was only a problem with
   an older checkout with an existing build pulling new sources.

   Still, I wanted to not make that case annoying either if I added a
   generated header...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-26 11:35           ` Ævar Arnfjörð Bjarmason
@ 2022-10-28  1:23             ` Jeff King
  2022-10-28  2:04               ` Ævar Arnfjörð Bjarmason
  2022-10-28  3:02             ` John Cai
  1 sibling, 1 reply; 35+ messages in thread
From: Jeff King @ 2022-10-28  1:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, git, John Cai

On Wed, Oct 26, 2022 at 01:35:43PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > (I had a similar thought after getting hit on the head by the recent
> > t0450-txt-doc-vs-help.sh).
> 
> Sorry about that. FWIW I've wanted to assert that for a while, and to do
> it by e.g. having the doc *.txt blurbs generated from running "$buildin
> -h" during the build.
> 
> But when I suggested that I gathered That Junio wasn't up for that
> approach, it does have its downsides thorugh. E.g. to build the docs
> you'd now have to compile C code, and e.g. that git-scm.com publisher
> written in Ruby would have to compile the code to generate its docs.

Yes, it would definitely break the git-scm.com importer. It might be
possible to make it work by actually running "make man" (or maybe even
some partial targets) in a local repo. The nightly update job pulls all
of the data using GitHub's API, but it does run in a heroku dyno that
has git and gcc. Doing a shallow clone of the tag to build is probably
more expensive, but the cost of an actual build isn't that important.
99% of the scheduled job runs are noops because there's no new version
of Git to build manpages for; as long as those remain cheap-ish, we're
OK.

I also think in the long run that the site should move to a system that
builds off a local repo, which we can trigger manually or via GitHub
Actions. But that doesn't exist yet, and I don't think anyone's actively
working on it.

I also think it would be nice if the git-scm.com system relied more on
"make man USE_ASCIIDOCTOR=1", possibly by hooking into
asciidoctor-extensions.rb or similar, rather than munging files in an
ad-hoc manner. But that's also a big change that nobody is actively
working on.

All of which is to say that yes, having docs depend on C code will cause
work on the site, but it may be work that takes us in the right
direction.

> Or we could do it the other way around, and scape the *.txt for the *.c
> generation, but then we need to run a new script for building
> builtin/*.o. Also possible, and I think eventually both are better than
> what t0450-txt-doc-vs-help.sh's doing now, but that was a less invasive
> change than both...

I think this particular case is tricky in that direction, because it's a
big set of dependencies that aren't necessarily one-to-one. E.g.,
builtin/log.c needs to depend on git-log.txt, but also on git-show.txt
and git-format-patch.txt.

> >> +# Unfortunately our dependency management of generated headers used
> >> +# from within other headers suck, so we'll need to list every user of
> >> +# fsck.h here, but not too bad...
> >> +FSCK_MSGIDS_H_BUILTINS = fsck index-pack mktag receive-pack unpack-objects
> >> +$(foreach f,$(FSCK_MSGIDS_H_BUILTINS:%=builtin/%),$f.sp $f.s $f.o): fsck-msgids.h
> >> +FSCK_MSGIDS_H_LIBS = fetch-pack fsck
> >> +$(foreach f,$(FSCK_MSGIDS_H_LIBS),$f.sp $f.s $f.o): fsck-msgids.h
> >
> > I don't understand the "used within other headers" part here. Computed
> > dependencies will get this right. It's only the initial build (before we
> > have any computed dependencies) that needs to know that the C files in
> > question depend on the generated file. But that is true whether they do
> > so directly or indirectly.
> 
> I forgot the full story, but luckily I wrote it down in the WIP commits
> :) FWIW if you want to scour that it's mainly:
> 
> 	https://github.com/avar/git/commit/a00f1cb9ea5 # add advice-type.h
> 	https://github.com/avar/git/commit/9e080923a11 # generate 'struct advice

That seems like the same problem to me. It's just that if something is
in cache.h, then it's needed by basically everything, and so the
dependency list is big.

> Anyway, in partial summary, why (and sorry this got so long):
> 
>  * Yes, once we build e.g. fsck.o we have the full dependency tree,
>    yay....
>  * ...but only for gcc & clang, but we support other compilers.
>  * ...for other compilers (or gcc & clang without the dep detection
>    enabled) what should this do:
> 
> 	make file-that-does-not-use-generated-header.o
> 
>    It sucks a bit to have e.g. every *.o depend on command-list.h, when
>    we only use it in 1-2 places, ditto the other headers.

But that is already true of non-generated headers. If your system
doesn't support computed deps, then all objects depend on all headers.
Yes, that sucks for you. But almost nobody actively develops on such a
system, and people building Git to use rarely notice (because they are
doing an initial build, or jumping versions, both of which generally
require recompilation anyway).

That all comes from b8ba629264 (Makefile: fold MISC_H into LIB_H,
2012-06-20).

>  * We also have e.g. "make hdr-check", which works just fine now, but
>    it's becausue we have no *.h file that uses another generated *.h
>    file.

I'm not that surprised. Probably it should depend on $(GENERATED_H), and
possibly even be checking those files too.

>  * I may be wrong (I'm just looking at some old "here be dragons" note
>    of mine), but I think there's also an edge case where the dependency
>    graph of .depend.d/* is correct *once you've always had it*, but if a
>    file now e.g. depends on foo.h, and I make foo.h include a new
>    foo-generated.h things go boom.

That should work because the timestamp on foo.h will have been updated,
causing anything that includes it to rebuild (and thus updating its
computed dependencies to include foo-generated.h).

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-28  1:23             ` Jeff King
@ 2022-10-28  2:04               ` Ævar Arnfjörð Bjarmason
  2022-10-28  5:32                 ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-28  2:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, John Cai


On Thu, Oct 27 2022, Jeff King wrote:

> On Wed, Oct 26, 2022 at 01:35:43PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > (I had a similar thought after getting hit on the head by the recent
>> > t0450-txt-doc-vs-help.sh).
>> 
>> Sorry about that. FWIW I've wanted to assert that for a while, and to do
>> it by e.g. having the doc *.txt blurbs generated from running "$buildin
>> -h" during the build.
>> 
>> But when I suggested that I gathered That Junio wasn't up for that
>> approach, it does have its downsides thorugh. E.g. to build the docs
>> you'd now have to compile C code, and e.g. that git-scm.com publisher
>> written in Ruby would have to compile the code to generate its docs.
>
> Yes, it would definitely break the git-scm.com importer. It might be
> possible to make it work by actually running "make man" (or maybe even
> some partial targets) in a local repo. The nightly update job pulls all
> of the data using GitHub's API, but it does run in a heroku dyno that
> has git and gcc. Doing a shallow clone of the tag to build is probably
> more expensive, but the cost of an actual build isn't that important.
> 99% of the scheduled job runs are noops because there's no new version
> of Git to build manpages for; as long as those remain cheap-ish, we're
> OK.
>
> I also think in the long run that the site should move to a system that
> builds off a local repo, which we can trigger manually or via GitHub
> Actions. But that doesn't exist yet, and I don't think anyone's actively
> working on it.
>
> I also think it would be nice if the git-scm.com system relied more on
> "make man USE_ASCIIDOCTOR=1", possibly by hooking into
> asciidoctor-extensions.rb or similar, rather than munging files in an
> ad-hoc manner. But that's also a big change that nobody is actively
> working on.
>
> All of which is to say that yes, having docs depend on C code will cause
> work on the site, but it may be work that takes us in the right
> direction.

Good to know.

>> Or we could do it the other way around, and scape the *.txt for the *.c
>> generation, but then we need to run a new script for building
>> builtin/*.o. Also possible, and I think eventually both are better than
>> what t0450-txt-doc-vs-help.sh's doing now, but that was a less invasive
>> change than both...
>
> I think this particular case is tricky in that direction, because it's a
> big set of dependencies that aren't necessarily one-to-one. E.g.,
> builtin/log.c needs to depend on git-log.txt, but also on git-show.txt
> and git-format-patch.txt.

I was thinking of just a generated usage-strings.c or whatever. I.e. one
file with every usage string in it. Then you don't need to hunt down
which thing goes in which file. We'll just have a variable named
"builtin_format_patch_usage". Include it in "builtin.h" and it doesn't
matter if that's in builtin/log.c or builtin/format-patch.c.

It does mean you need to rebuild the C code for other built-ins every
time one of the SYNOPSIS sections changes, but doesn't happen too often.

>> >> +# Unfortunately our dependency management of generated headers used
>> >> +# from within other headers suck, so we'll need to list every user of
>> >> +# fsck.h here, but not too bad...
>> >> +FSCK_MSGIDS_H_BUILTINS = fsck index-pack mktag receive-pack unpack-objects
>> >> +$(foreach f,$(FSCK_MSGIDS_H_BUILTINS:%=builtin/%),$f.sp $f.s $f.o): fsck-msgids.h
>> >> +FSCK_MSGIDS_H_LIBS = fetch-pack fsck
>> >> +$(foreach f,$(FSCK_MSGIDS_H_LIBS),$f.sp $f.s $f.o): fsck-msgids.h
>> >
>> > I don't understand the "used within other headers" part here. Computed
>> > dependencies will get this right. It's only the initial build (before we
>> > have any computed dependencies) that needs to know that the C files in
>> > question depend on the generated file. But that is true whether they do
>> > so directly or indirectly.
>> 
>> I forgot the full story, but luckily I wrote it down in the WIP commits
>> :) FWIW if you want to scour that it's mainly:
>> 
>> 	https://github.com/avar/git/commit/a00f1cb9ea5 # add advice-type.h
>> 	https://github.com/avar/git/commit/9e080923a11 # generate 'struct advice
>
> That seems like the same problem to me. It's just that if something is
> in cache.h, then it's needed by basically everything, and so the
> dependency list is big.

I think I either did or was planning to take it out of cache.h as part
of that, we put way too much in cache.h.

Even advice.c depends on cache.h for its advice.h *sigh*.

Trying it just now putting advice.h in builtin.h instead leaves 10
top-level files not-compiling (and they just need the header).

I think it's fine to include common utilties in our "included by
everything" headers, but if we've got ~500 *.c files and something is
only needed by ~20 of them (as in this case) we should probably just
include it in those 20.

>> Anyway, in partial summary, why (and sorry this got so long):
>> 
>>  * Yes, once we build e.g. fsck.o we have the full dependency tree,
>>    yay....
>>  * ...but only for gcc & clang, but we support other compilers.
>>  * ...for other compilers (or gcc & clang without the dep detection
>>    enabled) what should this do:

I didn't summarize this well enough, I should have said that whether you
have computed deps or not that.....

>> 
>> 	make file-that-does-not-use-generated-header.o
>> 
>>    It sucks a bit to have e.g. every *.o depend on command-list.h, when
>>    we only use it in 1-2 places, ditto the other headers.
> But that is already true of non-generated headers. If your system
> doesn't support computed deps, then all objects depend on all headers.

... this does not build e.g. command-list.h:

	make clean
	make grep.o

But this does:

	make clean
	make help.o

Because we've manually declared that.

[Re-arranging a bit]

> That all comes from b8ba629264 (Makefile: fold MISC_H into LIB_H,
> 2012-06-20).

Yes, but you get that for free whether you have computed deps or
not. I.e. your compiler is about to build the code anyway.

But it wasn't about to run a bunch of shellscripts to generate headers
it doesn't actually need.

> Yes, that sucks for you. But almost nobody actively develops on such a
> system, and people building Git to use rarely notice (because they are
> doing an initial build, or jumping versions, both of which generally
> require recompilation anyway).

I guess I'm "almost nobody" :) Not because I don't use computed deps,
but because I tend to e.g. do large interactive rebases with:

	git rebase -i 'make grep.o'

Or some other subset of our built assets/objects.

On that front no one thing is the end of the world, i.e. sure, if we run
some shellscript to make a needed-header.h that takes 10ms we can live
with it.

But it adds up, which is one reason I've been slowly trickling in
patches to optimize various common parts of the Makefile & those
scripts.

I think before I started that the fixed cost even to have just make
decide it had nothing to do was often 500ms-1s.

I think that's down to 1/4 or so of that on "master", and I've got some
local patches to get it consistently down to <50ms.

It adds up, especially when combined with ccache & the like. I.e. if i'm
testing 10 commits I happen to have cached a "rebase -i" goes from
noticeably slow (~10s) to not being so slow as to be distracting.

Anyway, all of which is to say that that's one thing I was aiming for:
If I was going to submit patches for new generated headers to find some
good trade-off between complexity and over-declaring dependencies.

>>  * We also have e.g. "make hdr-check", which works just fine now, but
>>    it's becausue we have no *.h file that uses another generated *.h
>>    file.
>
> I'm not that surprised. Probably it should depend on $(GENERATED_H), and
> possibly even be checking those files too.
>
>>  * I may be wrong (I'm just looking at some old "here be dragons" note
>>    of mine), but I think there's also an edge case where the dependency
>>    graph of .depend.d/* is correct *once you've always had it*, but if a
>>    file now e.g. depends on foo.h, and I make foo.h include a new
>>    foo-generated.h things go boom.
>
> That should work because the timestamp on foo.h will have been updated,
> causing anything that includes it to rebuild (and thus updating its
> computed dependencies to include foo-generated.h).

Yes, in general it should work, maybe there's some Heisenbug there. I
can't recall or reproduce it, sorry.

But that's a good thing, maybe there's no issue :)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-26 11:35           ` Ævar Arnfjörð Bjarmason
  2022-10-28  1:23             ` Jeff King
@ 2022-10-28  3:02             ` John Cai
  2022-10-28  3:11               ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 35+ messages in thread
From: John Cai @ 2022-10-28  3:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Jeff King, Junio C Hamano, git

Hi Ævar,

On 26 Oct 2022, at 7:35, Ævar Arnfjörð Bjarmason wrote:

> On Wed, Oct 26 2022, Jeff King wrote:
>
>> On Wed, Oct 26, 2022 at 04:43:32AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>>>
>>> On Tue, Oct 25 2022, Junio C Hamano wrote:
>>>
>>>> During the initial development of the fsck-msgids.txt feature, it
>>>> has become apparent that it is very much error prone to make sure
>>>> the description in the documentation file are sorted and correctly
>>>> match what is in the fsck.h header file.
>>>
>>> I have local fixes for the same issues in the list of advice in our
>>> docs, some of it's missing, wrong, out of date etc.
>>>
>>> I tried to quickly adapt the generation script I had for that, which
>>> works nicely, and by line count much shorter than the lint :)
>>
>> Yeah, my instinct here was to generate rather than lint. If you make a
>> mistake and the linter hits you over the head, that is better than
>> quietly letting your mistake go. But better still is making it
>> impossible to make in the first place.
>>
>> The downside is added complexity to the build, but it doesn't seem too
>> bad in this case.
>
> Yeah, it's not, I have local patches to generate advice-{type,config}.h,
> and builtin.h. This is a quick POC to do it for fsck-msgids.h.
>
> I see I forgot the .gitignore entry, so it's a rough POC :)
>
>> (I had a similar thought after getting hit on the head by the recent
>> t0450-txt-doc-vs-help.sh).
>
> Sorry about that. FWIW I've wanted to assert that for a while, and to do
> it by e.g. having the doc *.txt blurbs generated from running "$buildin
> -h" during the build.

If we wanted to go this route of generating the docs from the code (which sounds
like a better long term solution), how would this work? Would we print out the
list of message ids in builtin/fsck.c and write it to
Documentation/fsck-msgids.txt ?

>
> But when I suggested that I gathered That Junio wasn't up for that
> approach, it does have its downsides thorugh. E.g. to build the docs
> you'd now have to compile C code, and e.g. that git-scm.com publisher
> written in Ruby would have to compile the code to generate its docs.
>
> Or we could do it the other way around, and scape the *.txt for the *.c
> generation, but then we need to run a new script for building
> builtin/*.o. Also possible, and I think eventually both are better than
> what t0450-txt-doc-vs-help.sh's doing now, but that was a less invasive
> change than both...
>
>>> Having to exhaustively list every *.c file that uses fsck.h is a bit of
>>> a bother, but we have the same with the other generated *.h's, so it
>>> shouldn't be too bad.
>>
>> It feels like this could be made much shorter by having a separate
>> fsck-msgs.h and not including it from fsck.h. Only fsck.c and mktag.c
>> need the actual list. It would probably have to stop being an "enum",
>> though.
>
> Yes, that would make it shorter, but C doesn't have forward decls of
> enums, so we'd need to make it "int", ....
>
>> Another alternative is to generate the doc from the code, rather than
>> the other way around.
>
> *nod* :)
>
>>> +# Unfortunately our dependency management of generated headers used
>>> +# from within other headers suck, so we'll need to list every user of
>>> +# fsck.h here, but not too bad...
>>> +FSCK_MSGIDS_H_BUILTINS = fsck index-pack mktag receive-pack unpack-objects
>>> +$(foreach f,$(FSCK_MSGIDS_H_BUILTINS:%=builtin/%),$f.sp $f.s $f.o): fsck-msgids.h
>>> +FSCK_MSGIDS_H_LIBS = fetch-pack fsck
>>> +$(foreach f,$(FSCK_MSGIDS_H_LIBS),$f.sp $f.s $f.o): fsck-msgids.h
>>
>> I don't understand the "used within other headers" part here. Computed
>> dependencies will get this right. It's only the initial build (before we
>> have any computed dependencies) that needs to know that the C files in
>> question depend on the generated file. But that is true whether they do
>> so directly or indirectly.
>
> I forgot the full story, but luckily I wrote it down in the WIP commits
> :) FWIW if you want to scour that it's mainly:
>
> 	https://github.com/avar/git/commit/a00f1cb9ea5 # add advice-type.h
> 	https://github.com/avar/git/commit/9e080923a11 # generate 'struct advice
>
> Also, before generating builtin.h I've got e.g. this:
>
> 	https://github.com/avar/git/commit/5a5360d0134 # just check with 'make' that any file is sorted
>
> To actualy generate it, very WIP:
>
>         http://github.com/avar/git/commit/cf1d02fa6b2
>
> Anyway, in partial summary, why (and sorry this got so long):
>
>  * Yes, once we build e.g. fsck.o we have the full dependency tree,
>    yay....
>  * ...but only for gcc & clang, but we support other compilers.
>  * ...for other compilers (or gcc & clang without the dep detection
>    enabled) what should this do:
>
> 	make file-that-does-not-use-generated-header.o
>
>    It sucks a bit to have e.g. every *.o depend on command-list.h, when
>    we only use it in 1-2 places, ditto the other headers.
>
>    The approach I took to this was to manually list headers that had
>    "small uses" (1-10), but say that all of *.h dependend on running the
>    *.sh to generate some "widely used" headers.
>
>    E.g. if we generate builtin.h we'll either need that, or add
>    builtin/%.o as a dependency, with a manual listing of the handful of
>    uses outside of that subdirectory.
>
>  * .... or just not care about any of that, i.e. to have all of our *.sh
>    run on the "first build" no matter what, which certainly simplifies
>    things, but once you have e.g. 5-10 generated headers doing e.g.:
>
> 	make grep.o
>
>    and having it build command-list.h or whatever is a bit noisy, but
>    it's a trade-off of manually maintaining deps v.s. not.
>
>  * Our "COMPUTED_HEADER_DEPENDENCIES" only happens when you build the
>    *.o, but we have *.sp and *.s targets too. I have some local changes
>    *to generalize that, so we e.g. get proper dependencies for building
>    **.s files.
>
>  * We also have e.g. "make hdr-check", which works just fine now, but
>    it's becausue we have no *.h file that uses another generated *.h
>    file.
>
>    Actually, I suspect the posted WIP change is broken in that regard
>    (but doing this for real on my topic branch works), i.e. we need to
>    start caring about deps for those auxiliary targets.
>
>  * I may be wrong (I'm just looking at some old "here be dragons" note
>    of mine), but I think there's also an edge case where the dependency
>    graph of .depend.d/* is correct *once you've always had it*, but if a
>    file now e.g. depends on foo.h, and I make foo.h include a new
>    foo-generated.h things go boom.
>
>    That issue (it if even exists, sorry, I'm blanking on the details)
>    was solvable by just doing a "make clean", and building again,
>    i.e. once we had re-built once we were OK, it was only a problem with
>    an older checkout with an existing build pulling new sources.
>
>    Still, I wanted to not make that case annoying either if I added a
>    generated header...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-28  3:02             ` John Cai
@ 2022-10-28  3:11               ` Ævar Arnfjörð Bjarmason
  2022-10-28  5:32                 ` Junio C Hamano
  2022-10-28  5:35                 ` Jeff King
  0 siblings, 2 replies; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-28  3:11 UTC (permalink / raw)
  To: John Cai; +Cc: Jeff King, Junio C Hamano, git


On Thu, Oct 27 2022, John Cai wrote:

> Hi Ævar,
>
> On 26 Oct 2022, at 7:35, Ævar Arnfjörð Bjarmason wrote:
>
>> On Wed, Oct 26 2022, Jeff King wrote:
>>
>>> On Wed, Oct 26, 2022 at 04:43:32AM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>
>>>>
>>>> On Tue, Oct 25 2022, Junio C Hamano wrote:
>>>>
>>>>> During the initial development of the fsck-msgids.txt feature, it
>>>>> has become apparent that it is very much error prone to make sure
>>>>> the description in the documentation file are sorted and correctly
>>>>> match what is in the fsck.h header file.
>>>>
>>>> I have local fixes for the same issues in the list of advice in our
>>>> docs, some of it's missing, wrong, out of date etc.
>>>>
>>>> I tried to quickly adapt the generation script I had for that, which
>>>> works nicely, and by line count much shorter than the lint :)
>>>
>>> Yeah, my instinct here was to generate rather than lint. If you make a
>>> mistake and the linter hits you over the head, that is better than
>>> quietly letting your mistake go. But better still is making it
>>> impossible to make in the first place.
>>>
>>> The downside is added complexity to the build, but it doesn't seem too
>>> bad in this case.
>>
>> Yeah, it's not, I have local patches to generate advice-{type,config}.h,
>> and builtin.h. This is a quick POC to do it for fsck-msgids.h.
>>
>> I see I forgot the .gitignore entry, so it's a rough POC :)
>>
>>> (I had a similar thought after getting hit on the head by the recent
>>> t0450-txt-doc-vs-help.sh).
>>
>> Sorry about that. FWIW I've wanted to assert that for a while, and to do
>> it by e.g. having the doc *.txt blurbs generated from running "$buildin
>> -h" during the build.
>
> If we wanted to go this route of generating the docs from the code (which sounds
> like a better long term solution), how would this work? Would we print out the
> list of message ids in builtin/fsck.c and write it to
> Documentation/fsck-msgids.txt ?

First, for the purposes of this thread I think Jeff and I are far off
into the weeds here :)

I think nothing needs to change in how this topic's doing things, we're
just takling about the longer term.

But if we go for that: I think in this case & most I can think of
generating the code from the docs is better (as that rough POC I had
showed), because:

 - You just need a shellscript to scrape the docs to make a *.c or *.h,
   whereas you'd need a C compiler to make the docs if it's the other
   way around. But more importantly:

 - The docs are way easier to scrape with some sed/awk/grep/whatever
   few-liner than to scrape C code for generating docs. E.g. see
   config-list.h.

 - Scraping the C code sucks so much that we'd probably make some
   dedicated interface for it, e.g. what we have for "git <cmd>
   --git-completion-helper".

   In that case it's worth it, but for other things we'n need to make
   the interface & maintain it (even if it's some test helper just for
   the build).

But mainly it helps to have a use-case, replacing the linter script with
e.g. the *.sh I demo'd might be a marginal improvement. But e.g. "git
help -c" uses one of those generated files (config-list.h), and actually
does something useful ...

Is there a good use-case for the fsck data like that? I'd think that
we'd want to make sure the docs are in sync with the code, as in we're
not adding new warnings/errors etc. without documenting them. But beyond
that maybe not much, and people would just run "git help fsck" to get
the list of variables..

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-28  2:04               ` Ævar Arnfjörð Bjarmason
@ 2022-10-28  5:32                 ` Jeff King
  2022-10-28 10:41                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2022-10-28  5:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, git, John Cai

On Fri, Oct 28, 2022 at 04:04:12AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I think this particular case is tricky in that direction, because it's a
> > big set of dependencies that aren't necessarily one-to-one. E.g.,
> > builtin/log.c needs to depend on git-log.txt, but also on git-show.txt
> > and git-format-patch.txt.
> 
> I was thinking of just a generated usage-strings.c or whatever. I.e. one
> file with every usage string in it. Then you don't need to hunt down
> which thing goes in which file. We'll just have a variable named
> "builtin_format_patch_usage". Include it in "builtin.h" and it doesn't
> matter if that's in builtin/log.c or builtin/format-patch.c.

Yes, though you have the opposite problem, then: what are the source
files that can produce that usage-strings.c? If your answer is
"Documentation/git-*.txt", then that is a recipe for flaky dependencies.

We already have one such for config-list.h. Try this:

  # introduce a new file
  printf 'foo.bar::\n\ta fake config var\n' \
    >Documentation/config/foo.txt

  # it shows up in the output, as expected
  make
  ./git help --config-for-completion | grep foo

  # now drop it
  rm Documentation/config/foo.txt

  # oops; make won't rebuild anything, and it's still there
  make
  ./git help --config-for-completion | grep foo

The bug is that config-list.h depends on a glob. So we notice when a
file changes, but not when one goes away. And this isn't just
hypothetical. Files come and go as you "git checkout" around history (or
bisect). I don't remember the details, but I'm pretty sure I've gotten
false positive test failures out of this before (maybe a topic branch
with a bogus entry that broke t0012 or t9902, and then moving back to
master doesn't fix it?).

So I'd prefer to avoid introducing more flakiness if we can. You might
be able to piggy back on command-list.txt in this case (that's what
makes command-list.h not flaky).

> It does mean you need to rebuild the C code for other built-ins every
> time one of the SYNOPSIS sections changes, but doesn't happen too often.

If it's a c/h combo that only makes the variable names public (not the
contents of the strings they point to), then it would only trigger a
rebuild when a command is added or removed.

> I think I either did or was planning to take it out of cache.h as part
> of that, we put way too much in cache.h.
> 
> Even advice.c depends on cache.h for its advice.h *sigh*.
> 
> Trying it just now putting advice.h in builtin.h instead leaves 10
> top-level files not-compiling (and they just need the header).
> 
> I think it's fine to include common utilties in our "included by
> everything" headers, but if we've got ~500 *.c files and something is
> only needed by ~20 of them (as in this case) we should probably just
> include it in those 20.

Oh, definitely, we should be shrinking cache.h, and not adding more to
it. Especially not generated stuff.

> >> 	make file-that-does-not-use-generated-header.o
> >> 
> >>    It sucks a bit to have e.g. every *.o depend on command-list.h, when
> >>    we only use it in 1-2 places, ditto the other headers.
> > But that is already true of non-generated headers. If your system
> > doesn't support computed deps, then all objects depend on all headers.
> 
> ... this does not build e.g. command-list.h:
> 
> 	make clean
> 	make grep.o
> 
> But this does:
> 
> 	make clean
> 	make help.o
> 
> Because we've manually declared that.

Right, but...does grep.o actually need command-list.h? If it doesn't
(and that seems to be the case), then all is working as intended. If
grep included some other header that included command-list.h, then yeah,
that would be a bug. But that is true whether grep.c includes it
directly or not. Any Makefile dependency needs to take into account
recursive includes.

> > Yes, that sucks for you. But almost nobody actively develops on such a
> > system, and people building Git to use rarely notice (because they are
> > doing an initial build, or jumping versions, both of which generally
> > require recompilation anyway).
> 
> I guess I'm "almost nobody" :) Not because I don't use computed deps,
> but because I tend to e.g. do large interactive rebases with:
> 
> 	git rebase -i 'make grep.o'
> 
> Or some other subset of our built assets/objects.

I do that all the time, too. But with computed deps, it works (by which
I mean it rebuilds only stuff needed by grep.o).

I'm not even sure what we're talking about anymore. If you are saying
that no, we don't want to just say "everything depends on foo.h that is
generated", I agree. That is wrong to do, and we should specify the
minimal dependencies where appropriate (and take care to keep that set
as small as is feasible using small C interfaces).

If you're saying "for people without computed dependencies, everything
will want to rebuild shell scripts", then I don't care. We decided long
ago that maintaining a manual list of header dependencies was not worth
doing, and people with sub-par compilers will have to suffer.

In the email you're replying to, I was trying to express the second one.
But it sounds like you thought I was trying to argue against the first
one.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-28  3:11               ` Ævar Arnfjörð Bjarmason
@ 2022-10-28  5:32                 ` Junio C Hamano
  2022-10-28  5:37                   ` Jeff King
  2022-10-28  5:35                 ` Jeff King
  1 sibling, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2022-10-28  5:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: John Cai, Jeff King, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> First, for the purposes of this thread I think Jeff and I are far off
> into the weeds here :)

It is good to clearly separate where we would want to draw the line
for this round, to get the already-worked-on-and-immediately-available
improvement going, while we envision a future direction for the longer
term.

> But if we go for that: I think in this case & most I can think of
> generating the code from the docs is better (as that rough POC I had
> showed), because:
>
>  - You just need a shellscript to scrape the docs to make a *.c or *.h,
>    whereas you'd need a C compiler to make the docs if it's the other
>    way around. But more importantly:
>
>  - The docs are way easier to scrape with some sed/awk/grep/whatever
>    few-liner than to scrape C code for generating docs. E.g. see
>    config-list.h.

Scraping docs is easier because we do not have to choose from
multiple choices that are all reasonable ;-).  Either way, the
source material needs some discipline to keep it scrapable (e.g. to
keep the doc scrapable, you'd probably keep each entry a single
line, or a fixed format like "<token>::" followed by "^I(<token>) "
followed by description.  Nothing forbids us from giving developers
a similar rule to keep each entry in FOR_EACH_MSG_ID() macro easier
to scrape, so it is about the same difficulty going either way.

But if you choose to make the code the source of truth, you'd have
to see if it makes more sense to "compile and run" instead of
scraping.  That's another thing to consider and choose from, which
makes it harder ;-)

> But mainly it helps to have a use-case, replacing the linter script with
> e.g. the *.sh I demo'd might be a marginal improvement. But e.g. "git
> help -c" uses one of those generated files (config-list.h), and actually
> does something useful ...

Yes, I've shown how N_("explanation of the error") may fit into the
existing scheme in a separate message upthread.  If we go from code
to doc, it would be a reasonable starting point.

Whichever way the aout-generation goes, we'd complicate the build
dependency graph, which is a downside.  Another is that third-party
consumers of docs now need to generate some docs from the source,
which may be additional burden for them.

> Is there a good use-case for the fsck data like that? I'd think that
> we'd want to make sure the docs are in sync with the code, as in we're
> not adding new warnings/errors etc. without documenting them. But beyond
> that maybe not much, and people would just run "git help fsck" to get
> the list of variables..

"git help fsck-error-codes" that does not have a pre-generated
documentation (instead we'd just dump the N_("explanation") to the
output) is certainly tempting.  I am not sure if it would fly well.
When was the last time you saw a manpage that says "run this command
and view its output for the most up-to-date list" ;-)?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-28  3:11               ` Ævar Arnfjörð Bjarmason
  2022-10-28  5:32                 ` Junio C Hamano
@ 2022-10-28  5:35                 ` Jeff King
  1 sibling, 0 replies; 35+ messages in thread
From: Jeff King @ 2022-10-28  5:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: John Cai, Junio C Hamano, git

On Fri, Oct 28, 2022 at 05:11:07AM +0200, Ævar Arnfjörð Bjarmason wrote:

>  - The docs are way easier to scrape with some sed/awk/grep/whatever
>    few-liner than to scrape C code for generating docs. E.g. see
>    config-list.h.
> 
>  - Scraping the C code sucks so much that we'd probably make some
>    dedicated interface for it, e.g. what we have for "git <cmd>
>    --git-completion-helper".

In the general case, scraping C code is awful. But if you are looking
for one particular thing, it is not unreasonable to say something like
"when we write a usage string, we write it like:

  const char *git_cmdname_usage[] = "...";

> But mainly it helps to have a use-case, replacing the linter script with
> e.g. the *.sh I demo'd might be a marginal improvement. But e.g. "git
> help -c" uses one of those generated files (config-list.h), and actually
> does something useful ...

Yeah. In cases like config-list.h, it really is helpful for it to be
something we can query at run-time (the other option would be stuffing
them into git-completion.bash at build-time, but there are some
downsides there).

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-28  5:32                 ` Junio C Hamano
@ 2022-10-28  5:37                   ` Jeff King
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2022-10-28  5:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, John Cai, git

On Thu, Oct 27, 2022 at 10:32:30PM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> 
> > First, for the purposes of this thread I think Jeff and I are far off
> > into the weeds here :)
> 
> It is good to clearly separate where we would want to draw the line
> for this round, to get the already-worked-on-and-immediately-available
> improvement going, while we envision a future direction for the longer
> term.

Yeah, to be clear, I'm OK with your linting script in the near term. I
didn't look at it _too_ carefully, but if the linter passes, then I
think the proof is in the pudding, and we can consider the topic done
for now.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v3 4/4] Documentation: add lint-fsck-msgids
  2022-10-28  5:32                 ` Jeff King
@ 2022-10-28 10:41                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-28 10:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, John Cai


On Fri, Oct 28 2022, Jeff King wrote:

> On Fri, Oct 28, 2022 at 04:04:12AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > I think this particular case is tricky in that direction, because it's a
>> > big set of dependencies that aren't necessarily one-to-one. E.g.,
>> > builtin/log.c needs to depend on git-log.txt, but also on git-show.txt
>> > and git-format-patch.txt.
>> 
>> I was thinking of just a generated usage-strings.c or whatever. I.e. one
>> file with every usage string in it. Then you don't need to hunt down
>> which thing goes in which file. We'll just have a variable named
>> "builtin_format_patch_usage". Include it in "builtin.h" and it doesn't
>> matter if that's in builtin/log.c or builtin/format-patch.c.
>
> Yes, though you have the opposite problem, then: what are the source
> files that can produce that usage-strings.c? If your answer is
> "Documentation/git-*.txt", then that is a recipe for flaky dependencies.

First, you're not telling me anything new here :) I've looked at
literally all of these special-cases recently for t0450*.

> We already have one such for config-list.h. Try this:
>
>   # introduce a new file
>   printf 'foo.bar::\n\ta fake config var\n' \
>     >Documentation/config/foo.txt
>
>   # it shows up in the output, as expected
>   make
>   ./git help --config-for-completion | grep foo
>
>   # now drop it
>   rm Documentation/config/foo.txt
>
>   # oops; make won't rebuild anything, and it's still there
>   make
>   ./git help --config-for-completion | grep foo

Yes, much of the dependency graph in Documentation/Makefile is simply
broken, and that's not the worst offender.

Try adding a new category to command-list.txt, then remove it, and watch
your entire build fail until you 'git clean -dxf' (because
Documentation/Makefile will only remove the generated cmds-*.txt it
knows about, but another part includes things based on a wildcard.

> The bug is that config-list.h depends on a glob. So we notice when a
> file changes, but not when one goes away. And this isn't just
> hypothetical. Files come and go as you "git checkout" around history (or
> bisect). I don't remember the details, but I'm pretty sure I've gotten
> false positive test failures out of this before (maybe a topic branch
> with a bogus entry that broke t0012 or t9902, and then moving back to
> master doesn't fix it?).
>
> So I'd prefer to avoid introducing more flakiness if we can. You might
> be able to piggy back on command-list.txt in this case (that's what
> makes command-list.h not flaky).

I'm in violent agreement with you that this problem sucks, but it's also
trivial to solve: Just don't create such crappy dependency graphs in our
Makefiles.

So, in this case (and keep in mind this is off the cuff, and I'm not
advocating for doing this now) is:

 1. We already have an exhaustive list of built-ins in the top-level
    Makefile
 2. Turn those into corresponding Documentation/git-*.txt dependencies,
    e.g. builtin/rebase.o to Documentation/git-rebase.txt.
 3. Some won't exist on the other side, we have no
    builtin/cherry-pick.o, but a Documentation/git-cherry-pick.txt etc. I
    know, keep reading... :)
 4. Slurp up the *.txt, parse it (this code's mostly in t0450 already),
    spew out a generated *.c or *.h.

Critically this is based on an exhaustive known list at this point, if
we add/remove built-ins we *will* re-generate it. It's not using a
wildcard dependency.

At this point we have e.g. variables like:

 - git_builtin_bundle_usage
 - git_builtin_log_usage

But because we're not smart enough and/or it doesn't correspond to our
*.txt structure we don't have e.g.:

 - git_builtin_bundle_create_usage (a sub-command)
 - git_builtin_fsck_objects_usage

But that's fine.

We can just handle remaining special-cases by not removing the cases we
didn't handle from the C sources, we'll still need to maintain those on
both sides, and have t0450 hopefully catch them drifting.

I.e. a subsequent change to the C code would be something like:
	
	diff --git a/builtin/bundle.c b/builtin/bundle.c
	index e80efce3a42..557e00be5a2 100644
	--- a/builtin/bundle.c
	+++ b/builtin/bundle.c
	@@ -11,14 +11,7 @@
	  * bundle supporting "fetch", "pull", and "ls-remote".
	  */
	 
	-static const char * const builtin_bundle_usage[] = {
	-  N_("git bundle create [<options>] <file> <git-rev-list args>"),
	-  N_("git bundle verify [<options>] <file>"),
	-  N_("git bundle list-heads <file> [<refname>...]"),
	-  N_("git bundle unbundle <file> [<refname>...]"),
	-  NULL
	-};
	-
	+/* too dumb for these still... */
	 static const char * const builtin_bundle_create_usage[] = {
	   N_("git bundle create [<options>] <file> <git-rev-list args>"),
	   NULL

We didn't handle everything, but that's OK. Perfect shouldn't be the
enemy of the good.

We have on the order of 150 built-ins, by far the majority of them are
relativtely easy to correlate to their docs & scrape them out.

The goal here would be to mostly get rid of the maintenance burden of
maintaining these in two places, if we still need to do that in 10-30
cases instead of 150-200 we've solved most of the problem.

>> It does mean you need to rebuild the C code for other built-ins every
>> time one of the SYNOPSIS sections changes, but doesn't happen too often.
>
> If it's a c/h combo that only makes the variable names public (not the
> contents of the strings they point to), then it would only trigger a
> rebuild when a command is added or removed.

Yeah, and if we didn't like that we can make it a file per variable
(which I think sucks), but a full re-build every time we add a built-in
isn't too bad. And we *way* worse now, every time some utility function
is added to a few common headers we basically do a re-build.

>> I think I either did or was planning to take it out of cache.h as part
>> of that, we put way too much in cache.h.
>> 
>> Even advice.c depends on cache.h for its advice.h *sigh*.
>> 
>> Trying it just now putting advice.h in builtin.h instead leaves 10
>> top-level files not-compiling (and they just need the header).
>> 
>> I think it's fine to include common utilties in our "included by
>> everything" headers, but if we've got ~500 *.c files and something is
>> only needed by ~20 of them (as in this case) we should probably just
>> include it in those 20.
>
> Oh, definitely, we should be shrinking cache.h, and not adding more to
> it. Especially not generated stuff.

*more violent agreement noises* :)

FWIW I have some patches to make headway on that stuff, which I've kept
unsubmitted because I thought nobody was interested...

I.e. a *very large* part of cache.h and friends is stuff that maybe 3-5
files here and there need, not "nearly everything", as one might assume.

We should still have e.g. gettext.h and the like in there,
probably.

Although for those we should probably split them off into builtin.h and
e.g. a new libs.h, so that we don't get translations added in
t/helper/*.c and the like, so maybe ...

>> >> 	make file-that-does-not-use-generated-header.o
>> >> 
>> >>    It sucks a bit to have e.g. every *.o depend on command-list.h, when
>> >>    we only use it in 1-2 places, ditto the other headers.
>> > But that is already true of non-generated headers. If your system
>> > doesn't support computed deps, then all objects depend on all headers.
>> 
>> ... this does not build e.g. command-list.h:
>> 
>> 	make clean
>> 	make grep.o
>> 
>> But this does:
>> 
>> 	make clean
>> 	make help.o
>> 
>> Because we've manually declared that.
>
> Right, but...does grep.o actually need command-list.h? If it doesn't
> (and that seems to be the case), then all is working as intended. If
> grep included some other header that included command-list.h, then yeah,
> that would be a bug. But that is true whether grep.c includes it
> directly or not. Any Makefile dependency needs to take into account
> recursive includes.

Yes, it doesn't need it, and it shouldn't build it. I'm saying that this
is the reason we've needed manually maintained dependencies of these
generated headers.

I.e. if you take the easy way out and say that we have no generated deps
yet, let's build all of it you would build that stuff with grep.o for no
reason.

And yes, I'm aware that that's what we do right now. I'm saying that
it's probably OK because we've had 3-4 of these headers, but people
might mind if it's more, or a lot more...

>> > Yes, that sucks for you. But almost nobody actively develops on such a
>> > system, and people building Git to use rarely notice (because they are
>> > doing an initial build, or jumping versions, both of which generally
>> > require recompilation anyway).
>> 
>> I guess I'm "almost nobody" :) Not because I don't use computed deps,
>> but because I tend to e.g. do large interactive rebases with:
>> 
>> 	git rebase -i 'make grep.o'
>> 
>> Or some other subset of our built assets/objects.
>
> I do that all the time, too. But with computed deps, it works (by which
> I mean it rebuilds only stuff needed by grep.o).
>
> I'm not even sure what we're talking about anymore. If you are saying
> that no, we don't want to just say "everything depends on foo.h that is
> generated", I agree. That is wrong to do, and we should specify the
> minimal dependencies where appropriate (and take care to keep that set
> as small as is feasible using small C interfaces).
>
> If you're saying "for people without computed dependencies, everything
> will want to rebuild shell scripts", then I don't care. We decided long
> ago that maintaining a manual list of header dependencies was not worth
> doing, and people with sub-par compilers will have to suffer.
>
> In the email you're replying to, I was trying to express the second one.
> But it sounds like you thought I was trying to argue against the first
> one.

I'm saying there's a large chasm between using generated dependencies,
and using them all the time.

Maybe my dev setup is just different, but I do most of my development in
a primary development tree where I've generally built things already, so
all the incremental compilation we support works out nicely.

But I also do things like spin up different workrtees for different
revisions, build some subset, and destroy them afterwards. Those don't
benefit from computed dependencies, so it's nice that we're not
over-building things.

Anyway:

All I was trying to get across with the upthread was that I had further
patches in this area that might be useful, but that part of the reason I
hadn't submitted them was because I didn't want to introduce suckage
with them.

We don't make a lot of use of generated code, and the uses we have now
are really isolated (one header used by one *.c file, mostly).

If we make that a one<->many we should be careful, and there's some
dragons lurking e.g. for generated *.h's used by other *.h's (all our
use is by *.c's at this point).

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2022-10-28 11:15 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-24 15:00 [PATCH 0/2] Document fsck msg ids John Cai via GitGitGadget
2022-10-24 15:00 ` [PATCH 1/2] fsck: remove the unused BAD_TAG_OBJECT John Cai via GitGitGadget
2022-10-24 16:57   ` Junio C Hamano
2022-10-24 18:16     ` Junio C Hamano
2022-10-24 18:33       ` John Cai
2022-10-24 23:39         ` Jeff King
2022-10-24 15:00 ` [PATCH 2/2] fsck: document msg-id John Cai via GitGitGadget
2022-10-24 17:33   ` Junio C Hamano
2022-10-25  9:41     ` Ævar Arnfjörð Bjarmason
2022-10-25 16:07       ` Junio C Hamano
2022-10-24 18:51 ` [PATCH 0/2] Document fsck msg ids Junio C Hamano
2022-10-25  3:17 ` [PATCH v2 " John Cai via GitGitGadget
2022-10-25  3:17   ` [PATCH v2 1/2] fsck: remove the unused BAD_TAG_OBJECT John Cai via GitGitGadget
2022-10-25  3:17   ` [PATCH v2 2/2] fsck: document msg-id John Cai via GitGitGadget
2022-10-25  4:30   ` [PATCH v2 0/2] Document fsck msg ids Junio C Hamano
2022-10-25  4:40     ` Junio C Hamano
2022-10-25  5:12     ` [PATCH] Documentation: add lint-fsck-msgids Junio C Hamano
2022-10-25 22:42   ` [PATCH v3 0/4] document fsck error message ids Junio C Hamano
2022-10-25 22:42     ` [PATCH v3 1/4] fsck: remove the unused BAD_TAG_OBJECT Junio C Hamano
2022-10-25 22:42     ` [PATCH v3 2/4] fsck: remove the unused MISSING_TREE_OBJECT Junio C Hamano
2022-10-25 22:42     ` [PATCH v3 3/4] fsck: document msg-id Junio C Hamano
2022-10-25 22:42     ` [PATCH v3 4/4] Documentation: add lint-fsck-msgids Junio C Hamano
2022-10-26  2:43       ` Ævar Arnfjörð Bjarmason
2022-10-26  5:34         ` Jeff King
2022-10-26  6:46           ` Junio C Hamano
2022-10-26 11:35           ` Ævar Arnfjörð Bjarmason
2022-10-28  1:23             ` Jeff King
2022-10-28  2:04               ` Ævar Arnfjörð Bjarmason
2022-10-28  5:32                 ` Jeff King
2022-10-28 10:41                   ` Ævar Arnfjörð Bjarmason
2022-10-28  3:02             ` John Cai
2022-10-28  3:11               ` Ævar Arnfjörð Bjarmason
2022-10-28  5:32                 ` Junio C Hamano
2022-10-28  5:37                   ` Jeff King
2022-10-28  5:35                 ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).