git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Calvin Wan <calvinwan@google.com>
To: git@vger.kernel.org
Cc: Calvin Wan <calvinwan@google.com>,
	nasamuffin@google.com, chooglen@google.com,
	johnathantanmy@google.com
Subject: [RFC PATCH 0/8] Introduce Git Standard Library
Date: Tue, 27 Jun 2023 19:52:43 +0000	[thread overview]
Message-ID: <20230627195251.1973421-1-calvinwan@google.com> (raw)

Introduction / Pre-reading
================

The Git Standard Library intends to serve as the foundational library
and root dependency that other libraries in Git will be built off of.
That is to say, suppose we have libraries X and Y; a user that wants to
use X and Y would need to include X, Y, and this Git Standard Library.
This cover letter will explain the rationale behind having a root
dependency that encompasses many files in the form of a standard library
rather than many root dependencies/libraries of those files. This does
not mean that the Git Standard Library will be the only possible root
dependency in the future, but rather the most significant and widely
used one. I will also explain why each file was chosen to be a part of
Git Standard Library v1. I will not explain entirely why we would like
to libify parts of Git -- see here[1] for that context.

Before looking at this series, it probably makes sense to look at the
other series that this is built on top of since that is the state I will
be referring to in this cover letter:

  - Elijah's final cache.h cleanup series[2]
  - my strbuf cleanup series[3]
  - my git-compat-util cleanup series[4]

Most importantly, in the git-compat-util series, the declarations for
functions implemented in wrapper.c and usage.c have been moved to their
respective header files, wrapper.h and usage.h, from git-compat-util.h.
Also config.[ch] had its general parsing code moved to parse.[ch].

Dependency graph in libified Git
================

If you look in the Git Makefile, all of the objects defined in the Git
library are compiled and archived into a singular file, libgit.a, which
is linked against by common-main.o with other external dependencies and
turned into the Git executable. In other words, the Git executable has
dependencies on libgit.a and a couple of external libraries. While our
efforts to libify Git will not affect this current build flow, it will
provide an alternate method for building Git.

With our current method of building Git, we can imagine the dependency
graph as such:

        Git
         /\
        /  \
       /    \
  libgit.a   ext deps

In libifying parts of Git, we want to shrink the dependency graph to
only the minimal set of dependencies, so libraries should not use
libgit.a. Instead, it would look like:

                Git
                /\
               /  \
              /    \
          libgit.a  ext deps
             /\
            /  \
           /    \
object-store.a  (other lib)
      |        /
      |       /
      |      /
 config.a   / 
      |    /
      |   /
      |  /
git-std-lib.a

Instead of containing all of the objects in Git, libgit.a would contain
objects that are not built by libraries it links against. Consequently,
if someone wanted their own custom build of Git with their own custom
implementation of the object store, they would only have to swap out
object-store.a rather than do a hard fork of Git.

Rationale behind Git Standard Library
================

The rationale behind Git Standard Library essentially is the result of
two observations within the Git codebase: every file includes
git-compat-util.h which defines functions in a couple of different
files, and wrapper.c + usage.c have difficult-to-separate circular
dependencies with each other and other files.

Ubiquity of git-compat-util.h and circular dependencies
========

Every file in the Git codebase includes git-compat-util.h. It serves as
"a compatibility aid that isolates the knowledge of platform specific
inclusion order and what feature macros to define before including which
system header" (Junio[5]). Since every file includes git-compat-util.h, and
git-compat-util.h includes wrapper.h and usage.h, it would make sense
for wrapper.c and usage.c to be a part of the root library. They have
difficult to separate circular dependencies with each other so they
can't be independent libraries. Wrapper.c has dependencies on parse.c,
abspath.c, strbuf.c, which in turn also have dependencies on usage.c and
wrapper.c -- more circular dependencies. 

Tradeoff between swappability and refactoring
========

From the above dependency graph, we can see that git-std-lib.a could be
many smaller libraries rather than a singular library. So why choose a
singular library when multiple libraries can be individually easier to
swap and are more modular? A singular library requires less work to
separate out circular dependencies within itself so it becomes a
tradeoff question between work and reward. While there may be a point in
the future where a file like usage.c would want its own library so that
someone can have custom die() or error(), the work required to refactor
out the circular dependencies in some files would be enormous due to
their ubiquity so therefore I believe it is not worth the tradeoff
currently. Additionally, we can in the future choose to do this refactor
and change the API for the library if there becomes enough of a reason
to do so (remember we are avoiding promising stability of the interfaces
of those libraries).

Reuse of compatibility functions in git-compat-util.h
========

Most functions defined in git-compat-util.h are implemented in compat/
and have dependencies limited to strbuf.h and wrapper.h so they can be
easily included in git-std-lib.a, which as a root dependency means that
higher level libraries do not have to worry about compatibility files in
compat/. The rest of the functions defined in git-compat-util.h are
implemented in top level files and, in this patch set, are hidden behind
an #ifdef if their implementation is not in git-std-lib.a.

Rationale summary
========

The Git Standard Library allows us to get the libification ball rolling
with other libraries in Git (such as Glen's removal of global state from
config iteration[6] prepares a config library). By not spending many
more months attempting to refactor difficult circular dependencies and
instead spending that time getting to a state where we can test out
swapping a library out such as config or object store, we can prove the
viability of Git libification on a much faster time scale. Additionally
the code cleanups that have happened so far have been minor and
beneficial for the codebase. It is probable that making large movements
would negatively affect code clarity.

Git Standard Library boundary
================

While I have described above some useful heuristics for identifying
potential candidates for git-std-lib.a, a standard library should not
have a shaky definition for what belongs in it.

 - Low-level files (aka operates only on other primitive types) that are
   used everywhere within the codebase (wrapper.c, usage.c, strbuf.c)
   - Dependencies that are low-level and widely used
     (abspath.c, date.c, hex-ll.c, parse.c, utf8.c)
 - low-level git/* files with functions defined in git-compat-util.h
   (ctype.c)
 - compat/*

There are other files that might fit this definition, but that does not
mean it should belong in git-std-lib.a. Those files should start as
their own separate library since any file added to git-std-lib.a loses
its flexibility of being easily swappable.

Files inside of Git Standard Library
================

The initial set of files in git-std-lib.a are:
abspath.c
ctype.c
date.c
hex-ll.c
parse.c
strbuf.c
usage.c
utf8.c
wrapper.c
relevant compat/ files

Pitfalls
================

In patch 7, I use #ifdef GIT_STD_LIB to both stub out code and hide
certain function headers. As other parts of Git are libified, if we
have to use more ifdefs for each different library, then the codebase
will become uglier and harder to understand. 

There are a small amount of files under compat/* that have dependencies
not inside of git-std-lib.a. While those functions are not called on
Linux, other OSes might call those problematic functions. I don't see
this as a major problem, just moreso an observation that libification in
general may also require some minor compatibility work in the future.

Testing
================

Patch 8 introduces a temporary test file which will be replaced with
unit tests once a unit testing framework is decided upon[7]. It simply
proves that all of the functions in git-std-lib.a do not have any
missing dependencies and can stand up by itself.

I have not yet tested building Git with git-std-lib.a yet (basically
removing the objects in git-std-lib.a from LIB_OBJS and linking against
git-std-lib.a instead), but I intend on testing this in a future version
of this patch. As an RFC, I want to showcase git-std-lib.a as an
experimental dependency that other executables can include in order to
use Git binaries. Internally we have tested building and calling
functions in git-std-lib.a from other programs.

Unit tests should catch any breakages caused by changes to files in
git-std-lib.a (i.e. introduction of a out of scope dependency) and new
functions introduced to git-std-lib.a will require unit tests written
for them.

Series structure
================

While my strbuf and git-compat-util series can stand alone, they also
function as preparatory patches for this series. There are more cleanup
patches in this series, but since most of them have marginal benefits
probably not worth the churn on its own, I decided not to split them
into a separate series like with strbuf and git-compat-util. As an RFC,
I am looking for comments on whether the rationale behind git-std-lib
makes sense as well as whether there are better ways to build and enable
git-std-lib in patch 7, specifically regarding Makefile rules and the
usage of ifdef's to stub out certain functions and headers. 

The patch series is structured as follows:

Patches 1-6 are cleanup patches to remove the last few extraneous
dependencies from git-std-lib.a. Here's a short summary of the
dependencies that are specifically removed from git-std-lib.a since some
of the commit messages and diffs showcase dependency cleanups for other
files not directly related to git-std-lib.a:
 - Patch 1 removes trace2.h and repository.h dependencies from wrapper.c
 - Patch 2 removes the repository.h dependency from strbuf.c inherited from
   hex.c by separating it into hex-ll.c and hex.c
 - Patch 3 removes the object.h dependency from wrapper.c
 - Patch 4 is a bug fix that sets up the next patch. This importantly
   removes the git_config_bool() call from git_env_bool() so that env
   parsing can go in a separate file
 - Patch 5 removes the config.h dependency from wrapper.c and swaps it
   with a dependency to parse.h, which doesn't have extraneous
   dependencies to files outside of git-std-lib.a
 - Patch 6 removes the pager.h dependency from date.c

Patch 7 introduces Git standard library.

Patch 8 introduces a temporary test file for Git standard library. The
test file directly or indirectly calls all functions in git-std-lib.a to
showcase that the functions don't reference missing objects and that
git-std-lib.a can stand on its own.

[1] https://lore.kernel.org/git/CAJoAoZ=Cig_kLocxKGax31sU7Xe4==BGzC__Bg2_pr7krNq6MA@mail.gmail.com/
[2] https://lore.kernel.org/git/pull.1525.v3.git.1684218848.gitgitgadget@gmail.com/
[3] https://lore.kernel.org/git/20230606194720.2053551-1-calvinwan@google.com/
[4] https://lore.kernel.org/git/20230606170711.912972-1-calvinwan@google.com/
[5] https://lore.kernel.org/git/xmqqwn17sydw.fsf@gitster.g/
[6] https://lore.kernel.org/git/pull.1497.v3.git.git.1687290231.gitgitgadget@gmail.com/
[7] https://lore.kernel.org/git/8afdb215d7e10ca16a2ce8226b4127b3d8a2d971.1686352386.git.steadmon@google.com/

Calvin Wan (8):
  trace2: log fsync stats in trace2 rather than wrapper
  hex-ll: split out functionality from hex
  object: move function to object.c
  config: correct bad boolean env value error message
  parse: create new library for parsing strings and env values
  pager: remove pager_in_use()
  git-std-lib: introduce git standard library
  git-std-lib: add test file to call git-std-lib.a functions

 Documentation/technical/git-std-lib.txt | 182 ++++++++++++++++++
 Makefile                                |  30 ++-
 attr.c                                  |   2 +-
 builtin/log.c                           |   2 +-
 color.c                                 |   4 +-
 column.c                                |   2 +-
 config.c                                | 173 +----------------
 config.h                                |  14 +-
 date.c                                  |   4 +-
 git-compat-util.h                       |   7 +-
 git.c                                   |   2 +-
 hex-ll.c                                |  49 +++++
 hex-ll.h                                |  27 +++
 hex.c                                   |  47 -----
 hex.h                                   |  24 +--
 mailinfo.c                              |   2 +-
 object.c                                |   5 +
 object.h                                |   6 +
 pack-objects.c                          |   2 +-
 pack-revindex.c                         |   2 +-
 pager.c                                 |   5 -
 pager.h                                 |   1 -
 parse-options.c                         |   3 +-
 parse.c                                 | 182 ++++++++++++++++++
 parse.h                                 |  20 ++
 pathspec.c                              |   2 +-
 preload-index.c                         |   2 +-
 progress.c                              |   2 +-
 prompt.c                                |   2 +-
 rebase.c                                |   2 +-
 strbuf.c                                |   2 +-
 symlinks.c                              |   2 +
 t/Makefile                              |   4 +
 t/helper/test-env-helper.c              |   2 +-
 t/stdlib-test.c                         | 239 ++++++++++++++++++++++++
 trace2.c                                |  13 ++
 trace2.h                                |   5 +
 unpack-trees.c                          |   2 +-
 url.c                                   |   2 +-
 urlmatch.c                              |   2 +-
 usage.c                                 |   8 +
 wrapper.c                               |  25 +--
 wrapper.h                               |   9 +-
 write-or-die.c                          |   2 +-
 44 files changed, 813 insertions(+), 311 deletions(-)
 create mode 100644 Documentation/technical/git-std-lib.txt
 create mode 100644 hex-ll.c
 create mode 100644 hex-ll.h
 create mode 100644 parse.c
 create mode 100644 parse.h
 create mode 100644 t/stdlib-test.c

-- 
2.41.0.162.gfafddb0af9-goog


             reply	other threads:[~2023-06-27 19:53 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-27 19:52 Calvin Wan [this message]
2023-06-27 19:52 ` [RFC PATCH 1/8] trace2: log fsync stats in trace2 rather than wrapper Calvin Wan
2023-06-28  2:05   ` Victoria Dye
2023-07-05 17:57     ` Calvin Wan
2023-07-05 18:22       ` Victoria Dye
2023-07-11 20:07   ` Jeff Hostetler
2023-06-27 19:52 ` [RFC PATCH 2/8] hex-ll: split out functionality from hex Calvin Wan
2023-06-28 13:15   ` Phillip Wood
2023-06-28 16:55     ` Calvin Wan
2023-06-27 19:52 ` [RFC PATCH 3/8] object: move function to object.c Calvin Wan
2023-06-27 19:52 ` [RFC PATCH 4/8] config: correct bad boolean env value error message Calvin Wan
2023-06-27 19:52 ` [RFC PATCH 5/8] parse: create new library for parsing strings and env values Calvin Wan
2023-06-27 22:58   ` Junio C Hamano
2023-06-27 19:52 ` [RFC PATCH 6/8] pager: remove pager_in_use() Calvin Wan
2023-06-27 23:00   ` Junio C Hamano
2023-06-27 23:18     ` Calvin Wan
2023-06-28  0:30     ` Glen Choo
2023-06-28 16:37       ` Glen Choo
2023-06-28 16:44         ` Calvin Wan
2023-06-28 17:30           ` Junio C Hamano
2023-06-28 20:58       ` Junio C Hamano
2023-06-27 19:52 ` [RFC PATCH 7/8] git-std-lib: introduce git standard library Calvin Wan
2023-06-28 13:27   ` Phillip Wood
2023-06-28 21:15     ` Calvin Wan
2023-06-30 10:00       ` Phillip Wood
2023-06-27 19:52 ` [RFC PATCH 8/8] git-std-lib: add test file to call git-std-lib.a functions Calvin Wan
2023-06-28  0:14 ` [RFC PATCH 0/8] Introduce Git Standard Library Glen Choo
2023-06-28 16:30   ` Calvin Wan
2023-06-30  7:01 ` Linus Arver
2023-08-10 16:33 ` [RFC PATCH v2 0/7] " Calvin Wan
2023-08-10 16:36   ` [RFC PATCH v2 1/7] hex-ll: split out functionality from hex Calvin Wan
2023-08-10 16:36   ` [RFC PATCH v2 2/7] object: move function to object.c Calvin Wan
2023-08-10 20:32     ` Junio C Hamano
2023-08-10 22:36     ` Glen Choo
2023-08-10 22:43       ` Junio C Hamano
2023-08-10 16:36   ` [RFC PATCH v2 3/7] config: correct bad boolean env value error message Calvin Wan
2023-08-10 20:36     ` Junio C Hamano
2023-08-10 16:36   ` [RFC PATCH v2 4/7] parse: create new library for parsing strings and env values Calvin Wan
2023-08-10 23:21     ` Glen Choo
2023-08-10 23:43       ` Junio C Hamano
2023-08-14 22:15       ` Jonathan Tan
2023-08-14 22:09     ` Jonathan Tan
2023-08-14 22:19       ` Junio C Hamano
2023-08-10 16:36   ` [RFC PATCH v2 5/7] date: push pager.h dependency up Calvin Wan
2023-08-10 23:41     ` Glen Choo
2023-08-14 22:17     ` Jonathan Tan
2023-08-10 16:36   ` [RFC PATCH v2 6/7] git-std-lib: introduce git standard library Calvin Wan
2023-08-14 22:26     ` Jonathan Tan
2023-08-10 16:36   ` [RFC PATCH v2 7/7] git-std-lib: add test file to call git-std-lib.a functions Calvin Wan
2023-08-14 22:28     ` Jonathan Tan
2023-08-10 22:05   ` [RFC PATCH v2 0/7] Introduce Git Standard Library Glen Choo
2023-08-15  9:20     ` Phillip Wood
2023-08-16 17:17       ` Calvin Wan
2023-08-16 21:19         ` Junio C Hamano
2023-08-15  9:41   ` Phillip Wood
2023-09-08 17:41     ` [PATCH v3 0/6] " Calvin Wan
2023-09-08 17:44       ` [PATCH v3 1/6] hex-ll: split out functionality from hex Calvin Wan
2023-09-08 17:44       ` [PATCH v3 2/6] wrapper: remove dependency to Git-specific internal file Calvin Wan
2023-09-15 17:54         ` Jonathan Tan
2023-09-08 17:44       ` [PATCH v3 3/6] config: correct bad boolean env value error message Calvin Wan
2023-09-08 17:44       ` [PATCH v3 4/6] parse: create new library for parsing strings and env values Calvin Wan
2023-09-08 17:44       ` [PATCH v3 5/6] git-std-lib: introduce git standard library Calvin Wan
2023-09-11 13:22         ` Phillip Wood
2023-09-27 14:14           ` Phillip Wood
2023-09-15 18:39         ` Jonathan Tan
2023-09-26 14:23         ` phillip.wood123
2023-09-08 17:44       ` [PATCH v3 6/6] git-std-lib: add test file to call git-std-lib.a functions Calvin Wan
2023-09-09  5:26         ` Junio C Hamano
2023-09-15 18:43         ` Jonathan Tan
2023-09-15 20:22           ` Junio C Hamano
2023-09-08 20:36       ` [PATCH v3 0/6] Introduce Git Standard Library Junio C Hamano
2023-09-08 21:30         ` Junio C Hamano
2023-09-29 21:20 ` [PATCH v4 0/4] Preliminary patches before git-std-lib Jonathan Tan
2023-09-29 21:20   ` [PATCH v4 1/4] hex-ll: separate out non-hash-algo functions Jonathan Tan
2023-10-21  4:14     ` Linus Arver
2023-09-29 21:20   ` [PATCH v4 2/4] wrapper: reduce scope of remove_or_warn() Jonathan Tan
2023-10-10  9:59     ` phillip.wood123
2023-10-10 16:13       ` Junio C Hamano
2023-10-10 17:38         ` Jonathan Tan
2023-09-29 21:20   ` [PATCH v4 3/4] config: correct bad boolean env value error message Jonathan Tan
2023-09-29 23:03     ` Junio C Hamano
2023-09-29 21:20   ` [PATCH v4 4/4] parse: separate out parsing functions from config.h Jonathan Tan
2023-10-10 10:00     ` phillip.wood123
2023-10-10 17:43       ` Jonathan Tan
2023-10-10 17:58         ` Phillip Wood
2023-10-10 20:57           ` Junio C Hamano
2023-10-10 10:05   ` [PATCH v4 0/4] Preliminary patches before git-std-lib phillip.wood123
2023-10-10 16:21     ` Jonathan Tan
2024-02-22 17:50   ` [PATCH v5 0/3] Introduce Git Standard Library Calvin Wan
2024-02-22 17:50   ` [PATCH v5 1/3] pager: include stdint.h because uintmax_t is used Calvin Wan
2024-02-22 21:43     ` Junio C Hamano
2024-02-26 18:59       ` Kyle Lippincott
2024-02-27  0:20         ` Junio C Hamano
2024-02-27  0:56           ` Kyle Lippincott
2024-02-27  2:45             ` Junio C Hamano
2024-02-27 22:29               ` Kyle Lippincott
2024-02-27 23:25                 ` Junio C Hamano
2024-02-27  8:45             ` Jeff King
2024-02-27  9:05               ` Jeff King
2024-02-27 20:10               ` Kyle Lippincott
2024-02-24  1:33     ` Kyle Lippincott
2024-02-24  7:58       ` Junio C Hamano
2024-02-22 17:50   ` [PATCH v5 2/3] git-std-lib: introduce Git Standard Library Calvin Wan
2024-02-29 11:16     ` Phillip Wood
2024-02-29 17:23       ` Junio C Hamano
2024-02-29 18:27         ` Linus Arver
2024-02-29 18:54           ` Junio C Hamano
2024-02-29 20:03             ` Linus Arver
2024-02-22 17:50   ` [PATCH v5 3/3] test-stdlib: show that git-std-lib is independent Calvin Wan
2024-02-22 22:24     ` Junio C Hamano
2024-03-07 21:13     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230627195251.1973421-1-calvinwan@google.com \
    --to=calvinwan@google.com \
    --cc=chooglen@google.com \
    --cc=git@vger.kernel.org \
    --cc=johnathantanmy@google.com \
    --cc=nasamuffin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).