git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Reducing git size by building libgit.so
@ 2019-06-11 19:52 Elmar Pruesse
  2019-06-11 23:48 ` brian m. carlson
  2019-06-12  9:41 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 12+ messages in thread
From: Elmar Pruesse @ 2019-06-11 19:52 UTC (permalink / raw)
  To: git@vger.kernel.org

Hi!

The total compiled size of libexec/git-core is currently somewhere 
around 30 MB. This is largely due to a number of binaries linking 
statically against libgit.a. For some folks, every byte counts. I 
meddled with the Makefile briefly to make it build and use a libgit.so 
instead, which dropped package size down to 5MB.

Are there, beyond the ~20 ms in extra startup time and the slightly 
bigger hassle with DSO locations, reasons for the choice to link statically?

best,

Elmar


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reducing git size by building libgit.so
  2019-06-11 19:52 Reducing git size by building libgit.so Elmar Pruesse
@ 2019-06-11 23:48 ` brian m. carlson
  2019-06-12  9:29   ` Duy Nguyen
  2019-06-12 13:57   ` Paul Smith
  2019-06-12  9:41 ` Ævar Arnfjörð Bjarmason
  1 sibling, 2 replies; 12+ messages in thread
From: brian m. carlson @ 2019-06-11 23:48 UTC (permalink / raw)
  To: Elmar Pruesse; +Cc: git@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

On 2019-06-11 at 19:52:18, Elmar Pruesse wrote:
> Hi!
> 
> The total compiled size of libexec/git-core is currently somewhere
> around 30 MB. This is largely due to a number of binaries linking
> statically against libgit.a. For some folks, every byte counts. I
> meddled with the Makefile briefly to make it build and use a libgit.so
> instead, which dropped package size down to 5MB.
> 
> Are there, beyond the ~20 ms in extra startup time and the slightly
> bigger hassle with DSO locations, reasons for the choice to link statically?

I think the reason is that libgit is not API stable and we definitely
don't want people linking against it. Before libgit2 existed, projects
like cgit built their own libgit and it required pinning to a specific
version of Git.

Also, some people install Git into their home directories, and a shared
library means that they'll have to use LD_LIBRARY_PATH (or equivalent)
to run Git.

Finally, we have support for a runtime relocatable Git which can be run
out of any path and still automatically find its dependent binaries.
That won't work with a shared library.

So if we did allow for building a shared library, it would have to be an
option that defaulted to off, I think.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reducing git size by building libgit.so
  2019-06-11 23:48 ` brian m. carlson
@ 2019-06-12  9:29   ` Duy Nguyen
  2019-06-12 13:57   ` Paul Smith
  1 sibling, 0 replies; 12+ messages in thread
From: Duy Nguyen @ 2019-06-12  9:29 UTC (permalink / raw)
  To: brian m. carlson, Elmar Pruesse, git@vger.kernel.org

On Wed, Jun 12, 2019 at 2:11 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2019-06-11 at 19:52:18, Elmar Pruesse wrote:
> > Hi!
> >
> > The total compiled size of libexec/git-core is currently somewhere
> > around 30 MB. This is largely due to a number of binaries linking
> > statically against libgit.a. For some folks, every byte counts. I
> > meddled with the Makefile briefly to make it build and use a libgit.so
> > instead, which dropped package size down to 5MB.
> >
> > Are there, beyond the ~20 ms in extra startup time and the slightly
> > bigger hassle with DSO locations, reasons for the choice to link statically?
>
> I think the reason is that libgit is not API stable and we definitely
> don't want people linking against it.

Having .so files does not mean it's stable API though. If we don't
ever install header files, there's no way for outside people to use it
(people who dlopen() it anyway deserve whatever they get). I do agree
with some hassles from .so files though.

If installation size is a problem I think we can still shrink it a bit
down. Some non-builtin commands (fast-import, sh-i18n--subst...) could
be merged back in "git" binary. Some other for remote side (or
background daemons) could also be bundled together unless there's
security concerns.

We could also have a look at function distribution in libgit.a. I'm
surprised git-credential-store is 5.6 MB on my machine. We probably
pull more stuff than needed somewhere due to dependency between .o
files.

> Before libgit2 existed, projects
> like cgit built their own libgit and it required pinning to a specific
> version of Git.
>
> Also, some people install Git into their home directories, and a shared
> library means that they'll have to use LD_LIBRARY_PATH (or equivalent)
> to run Git.
>
> Finally, we have support for a runtime relocatable Git which can be run
> out of any path and still automatically find its dependent binaries.
> That won't work with a shared library.
>
> So if we did allow for building a shared library, it would have to be an
> option that defaulted to off, I think.
> --
> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204



-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reducing git size by building libgit.so
  2019-06-11 19:52 Reducing git size by building libgit.so Elmar Pruesse
  2019-06-11 23:48 ` brian m. carlson
@ 2019-06-12  9:41 ` Ævar Arnfjörð Bjarmason
  2019-06-12  9:46   ` Duy Nguyen
  2019-06-12 10:25   ` SZEDER Gábor
  1 sibling, 2 replies; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-06-12  9:41 UTC (permalink / raw)
  To: Elmar Pruesse; +Cc: git@vger.kernel.org


On Tue, Jun 11 2019, Elmar Pruesse wrote:

> Hi!
>
> The total compiled size of libexec/git-core is currently somewhere
> around 30 MB. This is largely due to a number of binaries linking
> statically against libgit.a. For some folks, every byte counts. I
> meddled with the Makefile briefly to make it build and use a libgit.so
> instead, which dropped package size down to 5MB.
>
> Are there, beyond the ~20 ms in extra startup time and the slightly
> bigger hassle with DSO locations, reasons for the choice to link statically?

brian mentioned API stability. I'd be fine with having a *.so shipped
with git. We'd document the API non-stability, and of course it's GPL so
you can only link other GPL programs to it, but if people would be fine
with still using it and very closely following git development as we
break their API/ABI why not.

Have you looked at INSTALL_SYMLINKS & friends? I.e. maybe you're
measuring size without accounting for most of the binaries being
hardlinks to the same thing.

We still have some stand-alone binaries, but IIRC there's under 5 of
those with INSTALL_SYMLINKS. We could probably also just make those
built-ins to get the rest of the size benefits.

I.e. we'd just have one git binary, everything else symlinking to that,
and we'd route to the right program by inspecting argv, which we mostly
do already.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reducing git size by building libgit.so
  2019-06-12  9:41 ` Ævar Arnfjörð Bjarmason
@ 2019-06-12  9:46   ` Duy Nguyen
  2019-06-12 10:25   ` SZEDER Gábor
  1 sibling, 0 replies; 12+ messages in thread
From: Duy Nguyen @ 2019-06-12  9:46 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Elmar Pruesse, git@vger.kernel.org

On Wed, Jun 12, 2019 at 4:42 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> I.e. we'd just have one git binary, everything else symlinking to that,
> and we'd route to the right program by inspecting argv, which we mostly
> do already.

If I remember correctly libcurl.so startup time was the reason it's
split out of "git" binary, so we can't just merge everything into one
(*). But yeah merging some back is not a bad idea.

(*) but maybe "git" binary has gotten much slower overall, or
libcurl.so much faster that it does not matter anymore. That problem
was like 10 years ago.
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reducing git size by building libgit.so
  2019-06-12  9:41 ` Ævar Arnfjörð Bjarmason
  2019-06-12  9:46   ` Duy Nguyen
@ 2019-06-12 10:25   ` SZEDER Gábor
  1 sibling, 0 replies; 12+ messages in thread
From: SZEDER Gábor @ 2019-06-12 10:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Elmar Pruesse, git@vger.kernel.org

On Wed, Jun 12, 2019 at 11:41:10AM +0200, Ævar Arnfjörð Bjarmason wrote:
> On Tue, Jun 11 2019, Elmar Pruesse wrote:
> > The total compiled size of libexec/git-core is currently somewhere
> > around 30 MB. This is largely due to a number of binaries linking
> > statically against libgit.a. For some folks, every byte counts.

I wonder whether those folks actually need such non-builtin git
binaries like 'git-shell' or 'git-daemon' in the first place.


> We still have some stand-alone binaries, but IIRC there's under 5 of
> those with INSTALL_SYMLINKS. We could probably also just make those
> built-ins to get the rest of the size benefits.
> 
> I.e. we'd just have one git binary, everything else symlinking to that,
> and we'd route to the right program by inspecting argv, which we mostly
> do already.

Let's not forget that commands like 'git-daemon' and 'git-shell' are
better left as stand-alone programs.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reducing git size by building libgit.so
  2019-06-11 23:48 ` brian m. carlson
  2019-06-12  9:29   ` Duy Nguyen
@ 2019-06-12 13:57   ` Paul Smith
  2019-06-12 23:31     ` brian m. carlson
  2019-06-13  7:51     ` Johannes Schindelin
  1 sibling, 2 replies; 12+ messages in thread
From: Paul Smith @ 2019-06-12 13:57 UTC (permalink / raw)
  To: brian m. carlson, Elmar Pruesse; +Cc: git@vger.kernel.org

On Tue, 2019-06-11 at 23:48 +0000, brian m. carlson wrote:
> Also, some people install Git into their home directories, and a
> shared library means that they'll have to use LD_LIBRARY_PATH (or
> equivalent) to run Git.

I don't have strong feeling about .so's although obviously less disk
space used is always a good thing, everything else being equal.

However, the above concern isn't actually an issue.  You can install
the .so in a known location relative to the binaries, then link the
binaries with an RPATH setting using $ORIGIN (or the equivalent on
MacOS which does exist but I forget the name).  On Windows, DLLs are
installed in the same directory as the binary, typically.

Allowing relocatable binaries with .so dependencies without requiring
LD_LIBRARY_PATH settings is a solved problem, to the best of my
understanding.


One thing to think about is that runtime loading a .so can take some
time if it has lots of public symbols.  If someone really wanted to do
this, the ideal thing would be to make all symbols hidden except those
needed by the binary front-ends and have those be very small shells
that just had a very limited number of entry points into the .so.

Maybe for git this doesn't matter but for some projects I've worked on
the time to dlopen() a library was a blocking issue that the above
procedure solved nicely.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reducing git size by building libgit.so
  2019-06-12 13:57   ` Paul Smith
@ 2019-06-12 23:31     ` brian m. carlson
  2019-06-13 19:19       ` Johannes Sixt
  2019-06-13  7:51     ` Johannes Schindelin
  1 sibling, 1 reply; 12+ messages in thread
From: brian m. carlson @ 2019-06-12 23:31 UTC (permalink / raw)
  To: Paul Smith; +Cc: Elmar Pruesse, git@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2095 bytes --]

On 2019-06-12 at 13:57:43, Paul Smith wrote:
> On Tue, 2019-06-11 at 23:48 +0000, brian m. carlson wrote:
> > Also, some people install Git into their home directories, and a
> > shared library means that they'll have to use LD_LIBRARY_PATH (or
> > equivalent) to run Git.
> 
> I don't have strong feeling about .so's although obviously less disk
> space used is always a good thing, everything else being equal.
> 
> However, the above concern isn't actually an issue.  You can install
> the .so in a known location relative to the binaries, then link the
> binaries with an RPATH setting using $ORIGIN (or the equivalent on
> MacOS which does exist but I forget the name).  On Windows, DLLs are
> installed in the same directory as the binary, typically.
> 
> Allowing relocatable binaries with .so dependencies without requiring
> LD_LIBRARY_PATH settings is a solved problem, to the best of my
> understanding.

This is possible to do, but it's not especially portable.  People use
various C toolchains to compile our code, which may or may not have easy
access to linker flags.  The proper syntax also varies depending on
whether you're using ELF, Mach-O, PE[0], or another object format.  And
Debian tries hard to avoid RPATH settings[1], so we'd need to be sure to
have an option not to set it.

None of these are intractable problems, but there's not simply an easy
solution that we can magically set that will work everywhere.  If we
were using autoconf and friends exclusively, this would be easier, but
we're not.  So someone is welcome to attack these problems with a set of
patches, but I expect it to be fairly involved to get all the corner
cases right if we want to make it the default.

[0] AFAIUI, Windows doesn't have RPATH-like functionality, and from what
I've read, the same-directory behavior may be going away due to security
concerns.  I don't use Windows, so any solution there is fine as long as
Dscho is happy.
[1] https://wiki.debian.org/RpathIssue
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reducing git size by building libgit.so
  2019-06-12 13:57   ` Paul Smith
  2019-06-12 23:31     ` brian m. carlson
@ 2019-06-13  7:51     ` Johannes Schindelin
  2019-06-13 17:28       ` Paul Smith
  1 sibling, 1 reply; 12+ messages in thread
From: Johannes Schindelin @ 2019-06-13  7:51 UTC (permalink / raw)
  To: Paul Smith; +Cc: brian m. carlson, Elmar Pruesse, git@vger.kernel.org

Hi Paul,

On Wed, 12 Jun 2019, Paul Smith wrote:

> On Tue, 2019-06-11 at 23:48 +0000, brian m. carlson wrote:
> > Also, some people install Git into their home directories, and a
> > shared library means that they'll have to use LD_LIBRARY_PATH (or
> > equivalent) to run Git.
>
> I don't have strong feeling about .so's although obviously less disk
> space used is always a good thing, everything else being equal.
>
> However, the above concern isn't actually an issue.  You can install
> the .so in a known location relative to the binaries, then link the
> binaries with an RPATH setting using $ORIGIN (or the equivalent on
> MacOS which does exist but I forget the name).

Hassles aside, you mentioned Linux and macOS. What about literally *all*
the other platforms we support? Like AIX, NonStop, HP/UX, etc?

Sure, you can hunt down all of them, and maybe even come up with a
workaround for platforms that do not have a $ORIGIN equivalent. You can
pile workaround on workaround all you want.

In the end, it seems to be a clear indicator that this is a complicator's
glove, and the only reasonably simple way forward would be to either leave
things as-are, or have an *opt-in* to build a shared libgit.

But.

And this is a really big but.

While you can try to document _all you want_ how libgit.so is not supposed
to be used as a library, how its API is not an API or at least not a
stable one, if you have _some_ experience with software development you
will know that it won't matter one bit. It _will_ be used, people _will_
complain, and it will turn out to simply not have been a good idea in the
first place.

> On Windows, DLLs are installed in the same directory as the binary,
> typically.
>
> Allowing relocatable binaries with .so dependencies without requiring
> LD_LIBRARY_PATH settings is a solved problem, to the best of my
> understanding.

You're probably right, as long as you restrict your view to mainstream
Operating Systems.

To put things into perspective, you might be interested in reading up on
https://github.com/git/git/commit/0f50c8e32c87 (Makefile: remove the
NO_R_TO_GCC_LINKER flag, 2019-05-17) and related commit history.

Sure, you could still argue that it is a "solved" problem. Where "solved"
is a different term than "desirable".

> One thing to think about is that runtime loading a .so can take some
> time if it has lots of public symbols.  If someone really wanted to do
> this, the ideal thing would be to make all symbols hidden except those
> needed by the binary front-ends and have those be very small shells
> that just had a very limited number of entry points into the .so.

That would fall squarely into the "pile on workaround on workaround"
category I mentioned above.

> Maybe for git this doesn't matter but for some projects I've worked on
> the time to dlopen() a library was a blocking issue that the above
> procedure solved nicely.

Sure, sometimes you cannot control whether it is an ill-designed `.so` you
need to consume.

As far as Git is concerned, this is not the case. At least when you look
at libgit.

When you look at libcurl, it is a different matter. But then, we do not
need to play RPATH games there: we expect it to be in the system's
preferred location.

BTW Duy hinted at problems with libcurl that made us split apart
`git-remote-https` from the main `git` executable. The full story is here:

1. The Linus complained about some "crazy" shared library loading behavior
   five months before Christmas 2009:

   https://public-inbox.org/git/alpine.LFD.2.01.0907241349390.3960@localhost.localdomain/

2. Daniel Barkalow was working on some "foreign VCS" support and thought
   that HTTPS/HTTP support could be handled via the same route, to avoid
   having to load libcurl for every Git operation no matter what:

   https://public-inbox.org/git/alpine.LNX.2.00.0907242242310.2147@iabervon.org/

3. Daniel then sent a patch series about two weeks later:

   https://public-inbox.org/git/alpine.LNX.2.00.0908050052390.2147@iabervon.org/

4. Those patches were accepted via cd03eebbfdae (Merge branch
   'db/vcs-helper', 2009-09-13)

So yes, I think that a patch or patch series to turn libgit.a into
libgit.so would need to be crafted *very* carefully, and _in the least_
offer a sound performance analysis in the commit messages.

It would obviously need to be proven beyond doubt that the startup time
does not deteriorate noticeably, otherwise the patch (series) would likely
be rejected.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reducing git size by building libgit.so
  2019-06-13  7:51     ` Johannes Schindelin
@ 2019-06-13 17:28       ` Paul Smith
  2019-06-13 18:23         ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Paul Smith @ 2019-06-13 17:28 UTC (permalink / raw)
  To: git@vger.kernel.org

On Thu, 2019-06-13 at 09:51 +0200, Johannes Schindelin wrote:
> Hassles aside, you mentioned Linux and macOS. What about literally
> *all* the other platforms we support? Like AIX, NonStop, HP/UX, etc?

I assumed that we were discussing providing an _option_ of building
with shared libraries, rather than removing support for static
libraries and only supporting shared libraries.  The former is the
typical model in portable projects.

So, the answer to most of the (important) issues you and Brian raise
is, "if it doesn't work, can't be made to work, is too slow, or is
annoying for ANY other reason, then don't do it".

Regarding things like publish-ability of the API, I don't know what
else to say.  It's FOSS, after all: anyone can do whatever they want
(with respect to building and using the code) regardless of the desires
of the development team.  All you can do is make clear that the intent
is that the API is not stable, and if they don't listen and their stuff
breaks, well, as the saying goes, they get to keep both halves.  Not
adding any header files to the installation rules and packages is also
helpful :).

There's a certain amount of cold, hard reality that every FOSS project,
regardless of how friendly and welcoming they aspire to be, simply
can't avoid while still making progress (and staying sane).


I certainly don't want to minimize the amount of work involved here,
nor do I want to in any way volunteer myself to undertake any of it: as
I said, I don't have strong feelings about it.

I'm just saying, there's no technical reason it can't be done while
maintaining the same features (such as relocatability) as the static
library installs, at least on the major platforms.

Cheers!


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reducing git size by building libgit.so
  2019-06-13 17:28       ` Paul Smith
@ 2019-06-13 18:23         ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2019-06-13 18:23 UTC (permalink / raw)
  To: Paul Smith; +Cc: git@vger.kernel.org

Paul Smith <paul@mad-scientist.net> writes:

> I assumed that we were discussing providing an _option_ of building
> with shared libraries, rather than removing support for static
> libraries and only supporting shared libraries.  The former is the
> typical model in portable projects.
> ...
> So, the answer to most of the (important) issues you and Brian raise
> is, "if it doesn't work, can't be made to work, is too slow, or is
> annoying for ANY other reason, then don't do it".
>
> Regarding things like publish-ability of the API, I don't know what
> else to say.  It's FOSS, after all: anyone can do whatever they want
> (with respect to building and using the code) regardless of the desires
> of the development team.  All you can do is make clear that the intent
> is that the API is not stable, and if they don't listen and their stuff
> breaks, well, as the saying goes, they get to keep both halves.  Not
> adding any header files to the installation rules and packages is also
> helpful :).
>
> There's a certain amount of cold, hard reality that every FOSS project,
> regardless of how friendly and welcoming they aspire to be, simply
> can't avoid while still making progress (and staying sane).
>
>
> I certainly don't want to minimize the amount of work involved here,
> nor do I want to in any way volunteer myself to undertake any of it: as
> I said, I don't have strong feelings about it.
>
> I'm just saying, there's no technical reason it can't be done while
> maintaining the same features (such as relocatability) as the static
> library installs, at least on the major platforms.
>
> Cheers!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Reducing git size by building libgit.so
  2019-06-12 23:31     ` brian m. carlson
@ 2019-06-13 19:19       ` Johannes Sixt
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Sixt @ 2019-06-13 19:19 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Paul Smith, Elmar Pruesse, git@vger.kernel.org

Am 13.06.19 um 01:31 schrieb brian m. carlson:
> [0] AFAIUI, Windows doesn't have RPATH-like functionality, and from what
> I've read, the same-directory behavior may be going away due to security
> concerns.  I don't use Windows, so any solution there is fine as long as
> Dscho is happy.

The solution is NOT to use DLLs on Windows. They are touchy and slow.

-- Hannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-06-13 19:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-11 19:52 Reducing git size by building libgit.so Elmar Pruesse
2019-06-11 23:48 ` brian m. carlson
2019-06-12  9:29   ` Duy Nguyen
2019-06-12 13:57   ` Paul Smith
2019-06-12 23:31     ` brian m. carlson
2019-06-13 19:19       ` Johannes Sixt
2019-06-13  7:51     ` Johannes Schindelin
2019-06-13 17:28       ` Paul Smith
2019-06-13 18:23         ` Junio C Hamano
2019-06-12  9:41 ` Ævar Arnfjörð Bjarmason
2019-06-12  9:46   ` Duy Nguyen
2019-06-12 10:25   ` SZEDER Gábor

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).