git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Arver <linusa@google.com>
To: Calvin Wan <calvinwan@google.com>, git@vger.kernel.org
Cc: Calvin Wan <calvinwan@google.com>,
	nasamuffin@google.com, chooglen@google.com,
	johnathantanmy@google.com
Subject: Re: [RFC PATCH 0/8] Introduce Git Standard Library
Date: Fri, 30 Jun 2023 00:01:16 -0700	[thread overview]
Message-ID: <owly4jmp8mdf.fsf@fine.c.googlers.com> (raw)
In-Reply-To: <20230627195251.1973421-1-calvinwan@google.com>

Hello Calvin,

Calvin Wan <calvinwan@google.com> writes:
> With our current method of building Git, we can imagine the dependency
> graph as such:
>
>         Git
>          /\
>         /  \
>        /    \
>   libgit.a   ext deps
>
> In libifying parts of Git, we want to shrink the dependency graph to
> only the minimal set of dependencies, so libraries should not use
> libgit.a. Instead, it would look like:
>
>                 Git
>                 /\
>                /  \
>               /    \
>           libgit.a  ext deps
>              /\
>             /  \
>            /    \
> object-store.a  (other lib)
>       |        /
>       |       /
>       |      /
>  config.a   / 
>       |    /
>       |   /
>       |  /
> git-std-lib.a
>
> Instead of containing all of the objects in Git, libgit.a would contain
> objects that are not built by libraries it links against. Consequently,
> if someone wanted their own custom build of Git with their own custom
> implementation of the object store, they would only have to swap out
> object-store.a rather than do a hard fork of Git.

What about the case where someone wants to build program Foo which just
pulls in some bits of Git? For example, I am thinking of trailer.[ch]
which could be refactored to expose a public API. Then the Foo program
could pull this public trailer manipulation API in as a library
dependency (so that Foo can parse trailers in commit messages without
re-implementing that logic in Foo's own codebase). With the proposed Git
Standard Library (GSL) model above, would my Foo program also have to
pull in GSL? If so, isn't this onerous because of the additional bloat?
The Foo developers just want the banana, not the gorilla holding the
banana in the jungle, so to speak.

> Rationale behind Git Standard Library
> ================
>
> The rationale behind Git Standard Library essentially is the result of
> two observations within the Git codebase: every file includes
> git-compat-util.h which defines functions in a couple of different
> files, and wrapper.c + usage.c have difficult-to-separate circular
> dependencies with each other and other files.
>
> Ubiquity of git-compat-util.h and circular dependencies
> ========
>
> Every file in the Git codebase includes git-compat-util.h. It serves as
> "a compatibility aid that isolates the knowledge of platform specific
> inclusion order and what feature macros to define before including which
> system header" (Junio[5]). Since every file includes git-compat-util.h, and
> git-compat-util.h includes wrapper.h and usage.h, it would make sense
> for wrapper.c and usage.c to be a part of the root library. They have
> difficult to separate circular dependencies with each other so they

s/difficult to separate/difficult-to-separate

> can't be independent libraries. Wrapper.c has dependencies on parse.c,
> abspath.c, strbuf.c, which in turn also have dependencies on usage.c and
> wrapper.c -- more circular dependencies. 
>
> Tradeoff between swappability and refactoring
> ========
>
> From the above dependency graph, we can see that git-std-lib.a could be
> many smaller libraries rather than a singular library. So why choose a
> singular library when multiple libraries can be individually easier to
> swap and are more modular? A singular library requires less work to
> separate out circular dependencies within itself so it becomes a
> tradeoff question between work and reward. While there may be a point in
> the future where a file like usage.c would want its own library so that
> someone can have custom die() or error(), the work required to refactor
> out the circular dependencies in some files would be enormous due to
> their ubiquity so therefore I believe it is not worth the tradeoff
> currently. Additionally, we can in the future choose to do this refactor
> and change the API for the library if there becomes enough of a reason
> to do so (remember we are avoiding promising stability of the interfaces
> of those libraries).

Would getting us down the currently proposed path make it even more
difficult to do this refactor? If so, I think it's worth mentioning.

> Reuse of compatibility functions in git-compat-util.h
> ========
>
> Most functions defined in git-compat-util.h are implemented in compat/
> and have dependencies limited to strbuf.h and wrapper.h so they can be
> easily included in git-std-lib.a, which as a root dependency means that
> higher level libraries do not have to worry about compatibility files in
> compat/. The rest of the functions defined in git-compat-util.h are
> implemented in top level files and, in this patch set, are hidden behind
> an #ifdef if their implementation is not in git-std-lib.a.
>
> Rationale summary
> ========
>
> The Git Standard Library allows us to get the libification ball rolling
> with other libraries in Git (such as Glen's removal of global state from
> config iteration[6] prepares a config library). By not spending many
> more months attempting to refactor difficult circular dependencies and
> instead spending that time getting to a state where we can test out
> swapping a library out such as config or object store, we can prove the
> viability of Git libification on a much faster time scale. Additionally
> the code cleanups that have happened so far have been minor and
> beneficial for the codebase. It is probable that making large movements
> would negatively affect code clarity.

It sounds like the circular dependencies are so difficult to untangle that they
are the primary motivation behind grouping these tightly-coupled libraries
together into the Git Standard Library (GSL) banner. Still, I think it would
help reviewers if you explain what tradeoffs we are making by accepting the
circular dependencies as they are instead of untangling them. Conversely, if we
assume that there are no circular dependencies, what kind of benefits do we get
when designing the GSL from this (improved) position? Would there be little to
no additional benefits? If so, then I think it would be easier to support the
current approach (as removing the circularities would not give us significant
advantages for libification).

> Git Standard Library boundary
> ================
>
> While I have described above some useful heuristics for identifying
> potential candidates for git-std-lib.a, a standard library should not
> have a shaky definition for what belongs in it.
>
>  - Low-level files (aka operates only on other primitive types) that are
>    used everywhere within the codebase (wrapper.c, usage.c, strbuf.c)
>    - Dependencies that are low-level and widely used
>      (abspath.c, date.c, hex-ll.c, parse.c, utf8.c)
>  - low-level git/* files with functions defined in git-compat-util.h
>    (ctype.c)
>  - compat/*

I'm confused. Is the list above an example of a shaky definition, or the
opposite? IOW, do you mean that the list above should be the initial set
of content to include in the GSL? Or _not_ to include?

> Series structure
> ================
>
> While my strbuf and git-compat-util series can stand alone, they also
> function as preparatory patches for this series. There are more cleanup
> patches in this series, but since most of them have marginal benefits
> probably not worth the churn on its own, I decided not to split them
> into a separate series like with strbuf and git-compat-util. As an RFC,
> I am looking for comments on whether the rationale behind git-std-lib
> makes sense as well as whether there are better ways to build and enable
> git-std-lib in patch 7, specifically regarding Makefile rules and the
> usage of ifdef's to stub out certain functions and headers. 

If the cleanups are independent I think it would be simpler to put them
in a separate series.

In general, I think the doc would make a stronger case if it expanded
the discussions around alternative approaches to the one proposed, with
the reasons why they were rejected.

Minor nits:
- Documentation/technical/git-std-lib.txt: (style) prefer "we" over "I" ("we
  believe" instead of "I believe").
- There are some "\ No newline at end of file" warnings in this series.

Thanks,
Linus

  parent reply	other threads:[~2023-06-30  7:01 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-27 19:52 [RFC PATCH 0/8] Introduce Git Standard Library Calvin Wan
2023-06-27 19:52 ` [RFC PATCH 1/8] trace2: log fsync stats in trace2 rather than wrapper Calvin Wan
2023-06-28  2:05   ` Victoria Dye
2023-07-05 17:57     ` Calvin Wan
2023-07-05 18:22       ` Victoria Dye
2023-07-11 20:07   ` Jeff Hostetler
2023-06-27 19:52 ` [RFC PATCH 2/8] hex-ll: split out functionality from hex Calvin Wan
2023-06-28 13:15   ` Phillip Wood
2023-06-28 16:55     ` Calvin Wan
2023-06-27 19:52 ` [RFC PATCH 3/8] object: move function to object.c Calvin Wan
2023-06-27 19:52 ` [RFC PATCH 4/8] config: correct bad boolean env value error message Calvin Wan
2023-06-27 19:52 ` [RFC PATCH 5/8] parse: create new library for parsing strings and env values Calvin Wan
2023-06-27 22:58   ` Junio C Hamano
2023-06-27 19:52 ` [RFC PATCH 6/8] pager: remove pager_in_use() Calvin Wan
2023-06-27 23:00   ` Junio C Hamano
2023-06-27 23:18     ` Calvin Wan
2023-06-28  0:30     ` Glen Choo
2023-06-28 16:37       ` Glen Choo
2023-06-28 16:44         ` Calvin Wan
2023-06-28 17:30           ` Junio C Hamano
2023-06-28 20:58       ` Junio C Hamano
2023-06-27 19:52 ` [RFC PATCH 7/8] git-std-lib: introduce git standard library Calvin Wan
2023-06-28 13:27   ` Phillip Wood
2023-06-28 21:15     ` Calvin Wan
2023-06-30 10:00       ` Phillip Wood
2023-06-27 19:52 ` [RFC PATCH 8/8] git-std-lib: add test file to call git-std-lib.a functions Calvin Wan
2023-06-28  0:14 ` [RFC PATCH 0/8] Introduce Git Standard Library Glen Choo
2023-06-28 16:30   ` Calvin Wan
2023-06-30  7:01 ` Linus Arver [this message]
2023-08-10 16:33 ` [RFC PATCH v2 0/7] " Calvin Wan
2023-08-10 16:36   ` [RFC PATCH v2 1/7] hex-ll: split out functionality from hex Calvin Wan
2023-08-10 16:36   ` [RFC PATCH v2 2/7] object: move function to object.c Calvin Wan
2023-08-10 20:32     ` Junio C Hamano
2023-08-10 22:36     ` Glen Choo
2023-08-10 22:43       ` Junio C Hamano
2023-08-10 16:36   ` [RFC PATCH v2 3/7] config: correct bad boolean env value error message Calvin Wan
2023-08-10 20:36     ` Junio C Hamano
2023-08-10 16:36   ` [RFC PATCH v2 4/7] parse: create new library for parsing strings and env values Calvin Wan
2023-08-10 23:21     ` Glen Choo
2023-08-10 23:43       ` Junio C Hamano
2023-08-14 22:15       ` Jonathan Tan
2023-08-14 22:09     ` Jonathan Tan
2023-08-14 22:19       ` Junio C Hamano
2023-08-10 16:36   ` [RFC PATCH v2 5/7] date: push pager.h dependency up Calvin Wan
2023-08-10 23:41     ` Glen Choo
2023-08-14 22:17     ` Jonathan Tan
2023-08-10 16:36   ` [RFC PATCH v2 6/7] git-std-lib: introduce git standard library Calvin Wan
2023-08-14 22:26     ` Jonathan Tan
2023-08-10 16:36   ` [RFC PATCH v2 7/7] git-std-lib: add test file to call git-std-lib.a functions Calvin Wan
2023-08-14 22:28     ` Jonathan Tan
2023-08-10 22:05   ` [RFC PATCH v2 0/7] Introduce Git Standard Library Glen Choo
2023-08-15  9:20     ` Phillip Wood
2023-08-16 17:17       ` Calvin Wan
2023-08-16 21:19         ` Junio C Hamano
2023-08-15  9:41   ` Phillip Wood
2023-09-08 17:41     ` [PATCH v3 0/6] " Calvin Wan
2023-09-08 17:44       ` [PATCH v3 1/6] hex-ll: split out functionality from hex Calvin Wan
2023-09-08 17:44       ` [PATCH v3 2/6] wrapper: remove dependency to Git-specific internal file Calvin Wan
2023-09-15 17:54         ` Jonathan Tan
2023-09-08 17:44       ` [PATCH v3 3/6] config: correct bad boolean env value error message Calvin Wan
2023-09-08 17:44       ` [PATCH v3 4/6] parse: create new library for parsing strings and env values Calvin Wan
2023-09-08 17:44       ` [PATCH v3 5/6] git-std-lib: introduce git standard library Calvin Wan
2023-09-11 13:22         ` Phillip Wood
2023-09-27 14:14           ` Phillip Wood
2023-09-15 18:39         ` Jonathan Tan
2023-09-26 14:23         ` phillip.wood123
2023-09-08 17:44       ` [PATCH v3 6/6] git-std-lib: add test file to call git-std-lib.a functions Calvin Wan
2023-09-09  5:26         ` Junio C Hamano
2023-09-15 18:43         ` Jonathan Tan
2023-09-15 20:22           ` Junio C Hamano
2023-09-08 20:36       ` [PATCH v3 0/6] Introduce Git Standard Library Junio C Hamano
2023-09-08 21:30         ` Junio C Hamano
2023-09-29 21:20 ` [PATCH v4 0/4] Preliminary patches before git-std-lib Jonathan Tan
2023-09-29 21:20   ` [PATCH v4 1/4] hex-ll: separate out non-hash-algo functions Jonathan Tan
2023-10-21  4:14     ` Linus Arver
2023-09-29 21:20   ` [PATCH v4 2/4] wrapper: reduce scope of remove_or_warn() Jonathan Tan
2023-10-10  9:59     ` phillip.wood123
2023-10-10 16:13       ` Junio C Hamano
2023-10-10 17:38         ` Jonathan Tan
2023-09-29 21:20   ` [PATCH v4 3/4] config: correct bad boolean env value error message Jonathan Tan
2023-09-29 23:03     ` Junio C Hamano
2023-09-29 21:20   ` [PATCH v4 4/4] parse: separate out parsing functions from config.h Jonathan Tan
2023-10-10 10:00     ` phillip.wood123
2023-10-10 17:43       ` Jonathan Tan
2023-10-10 17:58         ` Phillip Wood
2023-10-10 20:57           ` Junio C Hamano
2023-10-10 10:05   ` [PATCH v4 0/4] Preliminary patches before git-std-lib phillip.wood123
2023-10-10 16:21     ` Jonathan Tan
2024-02-22 17:50   ` [PATCH v5 0/3] Introduce Git Standard Library Calvin Wan
2024-02-22 17:50   ` [PATCH v5 1/3] pager: include stdint.h because uintmax_t is used Calvin Wan
2024-02-22 21:43     ` Junio C Hamano
2024-02-26 18:59       ` Kyle Lippincott
2024-02-27  0:20         ` Junio C Hamano
2024-02-27  0:56           ` Kyle Lippincott
2024-02-27  2:45             ` Junio C Hamano
2024-02-27 22:29               ` Kyle Lippincott
2024-02-27 23:25                 ` Junio C Hamano
2024-02-27  8:45             ` Jeff King
2024-02-27  9:05               ` Jeff King
2024-02-27 20:10               ` Kyle Lippincott
2024-02-24  1:33     ` Kyle Lippincott
2024-02-24  7:58       ` Junio C Hamano
2024-02-22 17:50   ` [PATCH v5 2/3] git-std-lib: introduce Git Standard Library Calvin Wan
2024-02-29 11:16     ` Phillip Wood
2024-02-29 17:23       ` Junio C Hamano
2024-02-29 18:27         ` Linus Arver
2024-02-29 18:54           ` Junio C Hamano
2024-02-29 20:03             ` Linus Arver
2024-02-22 17:50   ` [PATCH v5 3/3] test-stdlib: show that git-std-lib is independent Calvin Wan
2024-02-22 22:24     ` Junio C Hamano
2024-03-07 21:13     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=owly4jmp8mdf.fsf@fine.c.googlers.com \
    --to=linusa@google.com \
    --cc=calvinwan@google.com \
    --cc=chooglen@google.com \
    --cc=git@vger.kernel.org \
    --cc=johnathantanmy@google.com \
    --cc=nasamuffin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).