git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* QGit: Shrink used memory with custom git log format
@ 2007-11-24  8:14 Marco Costalba
  2007-11-27  1:52 ` Shawn O. Pearce
  0 siblings, 1 reply; 7+ messages in thread
From: Marco Costalba @ 2007-11-24  8:14 UTC (permalink / raw
  To: Git Mailing List

Hi all,

   I have pushed a patch series to

git://git.kernel.org/pub/scm/qgit/qgit4.git

that changes the format of git log used to read data from a git repository.

Now instead of --pretty=raw a custom made --pretty=format is given,
this shrinks loaded data of 30% (17MB less on Linux tree) and gives a
good speed up when you are low on memory (especially on big repos)

Next step _would_ be to load log message body on demand (another 50%
reduction) but this has two drawbacks:

(1) Text search/filter on log message would be broken

(2) Slower to browse through revisions because for each revision an
additional git-rev-list /git-log command should be executed to read
the body

The second point is worsted by the fact that it is not possible to
keep a command running and "open" like as example git-diff-tree
--stdin and feed with additional revision's sha when needed. Avoiding
the burden to startup a new process each time to read a new log
message given an sha would let the answer much more quick especially
on lesser OS's

Indeed there is a git-rev-list --stdin option but with different
behaviour from git-diff-tree --stdin and not suitable for this.

Marco

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: QGit: Shrink used memory with custom git log format
  2007-11-24  8:14 QGit: Shrink used memory with custom git log format Marco Costalba
@ 2007-11-27  1:52 ` Shawn O. Pearce
  2007-11-27 10:48   ` Johannes Schindelin
  0 siblings, 1 reply; 7+ messages in thread
From: Shawn O. Pearce @ 2007-11-27  1:52 UTC (permalink / raw
  To: Marco Costalba; +Cc: Git Mailing List

Marco Costalba <mcostalba@gmail.com> wrote:
> Now instead of --pretty=raw a custom made --pretty=format is given,
> this shrinks loaded data of 30% (17MB less on Linux tree) and gives a
> good speed up when you are low on memory (especially on big repos)
> 
> Next step _would_ be to load log message body on demand (another 50%
> reduction) but this has two drawbacks:
> 
> (1) Text search/filter on log message would be broken
> 
> (2) Slower to browse through revisions because for each revision an
> additional git-rev-list /git-log command should be executed to read
> the body
> 
> The second point is worsted by the fact that it is not possible to
> keep a command running and "open" like as example git-diff-tree
> --stdin and feed with additional revision's sha when needed. Avoiding
> the burden to startup a new process each time to read a new log
> message given an sha would let the answer much more quick especially
> on lesser OS's
> 
> Indeed there is a git-rev-list --stdin option but with different
> behaviour from git-diff-tree --stdin and not suitable for this.

There was a proposed patch for git-cat-file that would let you run
it in a --stdin mode; the git-svn folks wanted this to speed up
fetching raw objects from the repository.  That may help as you
could get commit bodies (in raw format - not reencoded format!)
quite efficiently.

Otherwise I think what you really want here is a libgit that you can
link into your process and that can efficiently inflate an object
on demand for you.  Like the work Luiz was working on this past
summer for GSOC.  Lots of downsides to that current tree though...
like die() kills the GUI...

-- 
Shawn.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: QGit: Shrink used memory with custom git log format
  2007-11-27  1:52 ` Shawn O. Pearce
@ 2007-11-27 10:48   ` Johannes Schindelin
  2007-11-27 12:36     ` Marco Costalba
  2007-11-27 19:19     ` Jan Hudec
  0 siblings, 2 replies; 7+ messages in thread
From: Johannes Schindelin @ 2007-11-27 10:48 UTC (permalink / raw
  To: Shawn O. Pearce; +Cc: Marco Costalba, Git Mailing List

Hi,

On Mon, 26 Nov 2007, Shawn O. Pearce wrote:

> Marco Costalba <mcostalba@gmail.com> wrote:
> > Now instead of --pretty=raw a custom made --pretty=format is given,
> > this shrinks loaded data of 30% (17MB less on Linux tree) and gives a
> > good speed up when you are low on memory (especially on big repos)
> > 
> > Next step _would_ be to load log message body on demand (another 50%
> > reduction) but this has two drawbacks:
> > 
> > (1) Text search/filter on log message would be broken
> > 
> > (2) Slower to browse through revisions because for each revision an
> > additional git-rev-list /git-log command should be executed to read
> > the body
> > 
> > The second point is worsted by the fact that it is not possible to
> > keep a command running and "open" like as example git-diff-tree
> > --stdin and feed with additional revision's sha when needed. Avoiding
> > the burden to startup a new process each time to read a new log
> > message given an sha would let the answer much more quick especially
> > on lesser OS's
> > 
> > Indeed there is a git-rev-list --stdin option but with different
> > behaviour from git-diff-tree --stdin and not suitable for this.
> 
> There was a proposed patch for git-cat-file that would let you run
> it in a --stdin mode; the git-svn folks wanted this to speed up
> fetching raw objects from the repository.  That may help as you
> could get commit bodies (in raw format - not reencoded format!)
> quite efficiently.
> 
> Otherwise I think what you really want here is a libgit that you can
> link into your process and that can efficiently inflate an object
> on demand for you.  Like the work Luiz was working on this past
> summer for GSOC.  Lots of downsides to that current tree though...
> like die() kills the GUI...

But then, die() calls die_routine, which you can override.  And C++ has 
this funny exception mechanism which just begs to be used here.  The only 
thing you need to add is a way to flush all singletons like the object 
array.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: QGit: Shrink used memory with custom git log format
  2007-11-27 10:48   ` Johannes Schindelin
@ 2007-11-27 12:36     ` Marco Costalba
  2007-11-27 19:19     ` Jan Hudec
  1 sibling, 0 replies; 7+ messages in thread
From: Marco Costalba @ 2007-11-27 12:36 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Shawn O. Pearce, Git Mailing List

On Nov 27, 2007 11:48 AM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> > > Indeed there is a git-rev-list --stdin option but with different
> > > behaviour from git-diff-tree --stdin and not suitable for this.
> >
> > There was a proposed patch for git-cat-file that would let you run
> > it in a --stdin mode; the git-svn folks wanted this to speed up
> > fetching raw objects from the repository.  That may help as you
> > could get commit bodies (in raw format - not reencoded format!)
> > quite efficiently.
>

That would be nice.

> > Otherwise I think what you really want here is a libgit that you can
> > link into your process and that can efficiently inflate an object
> > on demand for you.

I would think libgit is overkilling for this.

You probably would not use libgit to just add a single feature but to
change completely the interface with git because the required work is
heavy both on git side and qgit side (you probably would want to run
the libgit linked part in a separated thread to avoid GUI soft locks
during slow  processing, now, because the executed git command is a
different process from qgit, the OS scheduler takes care of this 'for
free').

Marco

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: QGit: Shrink used memory with custom git log format
  2007-11-27 10:48   ` Johannes Schindelin
  2007-11-27 12:36     ` Marco Costalba
@ 2007-11-27 19:19     ` Jan Hudec
  2007-11-28 12:01       ` Johannes Schindelin
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Hudec @ 2007-11-27 19:19 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Shawn O. Pearce, Marco Costalba, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1213 bytes --]

On Tue, Nov 27, 2007 at 10:48:00 +0000, Johannes Schindelin wrote:
> On Mon, 26 Nov 2007, Shawn O. Pearce wrote:
> > [...]
> > Otherwise I think what you really want here is a libgit that you can
> > link into your process and that can efficiently inflate an object
> > on demand for you.  Like the work Luiz was working on this past
> > summer for GSOC.  Lots of downsides to that current tree though...
> > like die() kills the GUI...
> 
> But then, die() calls die_routine, which you can override.  And C++ has 
> this funny exception mechanism which just begs to be used here.  The only 
> thing you need to add is a way to flush all singletons like the object 
> array.

Unfortunately, exceptions won't really work. Why? Because to use exceptions,
you need to have an exception-safe code. That is the code needs to free any
allocated resources when it's aborted by exception. And git code is not
exceptions safe. Given the lack of destructors in C, it means registering all
resource allocation in some kind of pool, so they can be freed en masse in
case of failure. Than you can also use longjmp for die (for C they really
behave the same).

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: QGit: Shrink used memory with custom git log format
  2007-11-27 19:19     ` Jan Hudec
@ 2007-11-28 12:01       ` Johannes Schindelin
  2007-11-28 15:53         ` jhud7196
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Schindelin @ 2007-11-28 12:01 UTC (permalink / raw
  To: Jan Hudec; +Cc: Shawn O. Pearce, Marco Costalba, Git Mailing List

Hi,

On Tue, 27 Nov 2007, Jan Hudec wrote:

> On Tue, Nov 27, 2007 at 10:48:00 +0000, Johannes Schindelin wrote:
> > On Mon, 26 Nov 2007, Shawn O. Pearce wrote:
> > > [...]
> > > Otherwise I think what you really want here is a libgit that you can
> > > link into your process and that can efficiently inflate an object
> > > on demand for you.  Like the work Luiz was working on this past
> > > summer for GSOC.  Lots of downsides to that current tree though...
> > > like die() kills the GUI...
> > 
> > But then, die() calls die_routine, which you can override.  And C++ has 
> > this funny exception mechanism which just begs to be used here.  The only 
> > thing you need to add is a way to flush all singletons like the object 
> > array.
> 
> Unfortunately, exceptions won't really work. Why? Because to use 
> exceptions, you need to have an exception-safe code. That is the code 
> needs to free any allocated resources when it's aborted by exception. 
> And git code is not exceptions safe. Given the lack of destructors in C, 
> it means registering all resource allocation in some kind of pool, so 
> they can be freed en masse in case of failure. Than you can also use 
> longjmp for die (for C they really behave the same).

Sorry, I just assumed that you can read my mind (or alternatively remember 
what I suggested a few months ago, namely to "override" xmalloc(), 
xcalloc(), xrealloc() and xfree() (probably you need to create the 
latter)).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: QGit: Shrink used memory with custom git log format
  2007-11-28 12:01       ` Johannes Schindelin
@ 2007-11-28 15:53         ` jhud7196
  0 siblings, 0 replies; 7+ messages in thread
From: jhud7196 @ 2007-11-28 15:53 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Shawn O. Pearce, Marco Costalba, Git Mailing List

> Hi,
>
> On Tue, 27 Nov 2007, Jan Hudec wrote:
>
>> On Tue, Nov 27, 2007 at 10:48:00 +0000, Johannes Schindelin wrote:
>> > On Mon, 26 Nov 2007, Shawn O. Pearce wrote:
>> > > [...]
>> > > Otherwise I think what you really want here is a libgit that you can
>> > > link into your process and that can efficiently inflate an object
>> > > on demand for you.  Like the work Luiz was working on this past
>> > > summer for GSOC.  Lots of downsides to that current tree though...
>> > > like die() kills the GUI...
>> >
>> > But then, die() calls die_routine, which you can override.  And C++
>> has
>> > this funny exception mechanism which just begs to be used here.  The
>> only
>> > thing you need to add is a way to flush all singletons like the object
>> > array.
>>
>> Unfortunately, exceptions won't really work. Why? Because to use
>> exceptions, you need to have an exception-safe code. That is the code
>> needs to free any allocated resources when it's aborted by exception.
>> And git code is not exceptions safe. Given the lack of destructors in C,
>> it means registering all resource allocation in some kind of pool, so
>> they can be freed en masse in case of failure. Than you can also use
>> longjmp for die (for C they really behave the same).
>
> Sorry, I just assumed that you can read my mind (or alternatively remember
> what I suggested a few months ago, namely to "override" xmalloc(),
> xcalloc(), xrealloc() and xfree() (probably you need to create the
> latter)).

That sounds like the easiest (but not necessarily easy) direction towards
the goal. Thread-local or global (I don't think git is currently reentrant
anyway) would do. Also filehanles would have to be taken care of and
everything checked for using malloc, calloc, strdup and other libc
functions directly.

Than die could longjmp out to a specified buffer and could be safely
overriden to throw exception for C++ apps.

--
                                         - Jan Hudec <bulb@ucw.cz>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-11-28 15:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-24  8:14 QGit: Shrink used memory with custom git log format Marco Costalba
2007-11-27  1:52 ` Shawn O. Pearce
2007-11-27 10:48   ` Johannes Schindelin
2007-11-27 12:36     ` Marco Costalba
2007-11-27 19:19     ` Jan Hudec
2007-11-28 12:01       ` Johannes Schindelin
2007-11-28 15:53         ` jhud7196

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).