git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Josef Weidendorfer <Josef.Weidendorfer@gmx.de>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Martin Waitz <tali@admingilde.org>,
	sf <sf-gmane@stephan-feder.de>,
	git@vger.kernel.org
Subject: Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT]
Date: Sun, 3 Dec 2006 03:07:26 +0100	[thread overview]
Message-ID: <200612030307.26429.Josef.Weidendorfer@gmx.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0612021252380.3476@woody.osdl.org>

On Saturday 02 December 2006 22:22, Linus Torvalds wrote:
> So operations like "git-rev-list --objects" (or, these days, more commonly 
> anything that just does the equivalent of that internally using the 
> library interfaces - ie "git pack-objects" and friends) VERY FUNDAMENTALLY 
> have to hold on to the object flags for the whole lifetime of the whole 
> operation.
>
> [...]
> 
> So this really isn't a memory management issue. You could somewhat work 
> around it by adding a "caching layer" on top of git, and allow that 
> caching layer to modify their cache of old objects (so that they can 
> contain back-pointers), but for 99% of all users that would actually make 
> performance MUCH WORSE, and it would also be a serious problem for 
> coherency issues (one of the things that immutable objects cause is that 
> there are basically never any race conditions, while a "caching layer" 
> like this would have some serious issues about serialization).

Thinking about this...
You have to make very sure to always update the caching layer containing
the backlinks on every addition of a further object. You can do this
because you always reached this new object by some other object, which
exactly is the backpointer.

Now let us suppose we are able to do this.
What does this give us?

Take a look at object traversal:
We have to store the flag "already visited" for objects we could reach
again in the traversal. But with the backlinks, we can see that most
of the objects can only be reached via one path, and therefore, there
is no need to store the flag, as it never will be queried in the
further traversal.
(Similar for objects with two paths: When you have visited the object
two times, you can throw away the flag, as it is not queried any more).

Regarding the caching layer and object traversal, it would have been
enough to only store "is this object reachable via more than 1 path?".
For this, the "cache" could be the set of objects reachable with
more than one path.
And such a set stored in a file should be quite managable, and be
quite small, relative to the size of the object database.

In fact, this "cache" can be created with a usual object traversal
(which has the original memory requirement), but as long as we do
not add objects to the database, further traversals would only need
a fraction of memory.

When only adding a small number of objects, it should be easy to
update the cache; while with big actions like fetching/pulling,
we simply should remove the file with the backlink information.

> problem. In fact, O(n) is pretty damn good, especially since the constant 
> is pretty small (basically 28 bytes per object - and 20 of those bytes 
> are the SHA1 that you simply cannot avoid).

Again only some thoughts...
Pack files are fully self-contained object stores, yes?
So in the scope of a single pack file, the offset of this object is enough
as object identification.
If we could make sure that in any given algorithm touching objects, like
commit traversal, we always have the offset available when we need to do
an object lookup, then, it should be enough to store object flags only
indexed by the offset of this object in the pack.
The translation SHA1 -> offset can be done with the pack index.
As you usually have multiple packs, a (pack number / offset) tuple should
be enough as object ID.

Thinking even one step further:
Would it make sense to define an encoding format for the content of
commit and tree objects inside of packs, where the SHA1 is replaced by the
offset of the object in this pack?
As exactly the SHA1 is the least compressable thing, this could promise
quite a benefit.
AFAIK, we currently only use these offsets for referencing objects in
delta chains.

More about the original topic of this thread (and off-topic to the
new subject):

> But it does mean that supermodules really should NOT be so seamless that 
> doing a "git clone" on a supermodule does one _large_ clone. Because it's 
> simply going to be better to:
> 
>  - when you clone the supermodule, track the commits you need on all 
>    submodules (this _may_ be a reason in itself for the "link" object, 
>    just so that you can traverse the supermodule object dependencies and 
>    know what subobject you are looking at even _without_ having to look at 
>    the path you got there from)
> 
>  - clone submodules one-by-one, using the list of objects you gathered.

Without submodule identities, we would have to clone path-by-path, as
we can not distinguish different submodules apart from there location.


  reply	other threads:[~2006-12-03  2:07 UTC|newest]

Thread overview: 252+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-20 21:51 [RFC] Submodules in GIT Martin Waitz
2006-11-20 22:16 ` Jakub Narebski
2006-11-20 22:28   ` Martin Waitz
2006-11-20 22:43   ` Junio C Hamano
2006-11-20 23:02     ` Jakub Narebski
2006-11-20 23:52       ` Martin Waitz
2006-11-21  1:31       ` Sam Vilain
2006-11-20 23:05     ` Linus Torvalds
2006-11-20 23:25       ` J. Bruce Fields
2006-11-20 23:33         ` Martin Waitz
2006-11-21 18:01           ` J. Bruce Fields
2006-11-21 19:32             ` Martin Waitz
2006-11-20 23:29       ` Martin Waitz
2006-11-21  0:10       ` Junio C Hamano
2006-11-21  0:42         ` Jakub Narebski
2006-11-21  6:21           ` Martin Waitz
2006-11-21 10:04             ` Jakub Narebski
2006-11-21 11:49               ` Martin Waitz
2006-11-21  6:27         ` Martin Waitz
2006-11-21  7:36           ` Junio C Hamano
2006-11-21  7:55             ` Martin Waitz
2006-11-21 22:31       ` Yann Dirson
2006-11-21 22:51         ` Linus Torvalds
2006-11-21 22:59           ` Linus Torvalds
2006-11-21 23:54           ` Yann Dirson
2006-11-22  3:40             ` Shawn Pearce
2006-11-23 23:23               ` Yann Dirson
2006-11-25  6:53                 ` Shawn Pearce
2006-11-25 11:12                   ` Yann Dirson
2006-11-25 18:57                     ` Linus Torvalds
2006-11-25 19:19                       ` Steven Grimm
2006-11-25 19:30                         ` Linus Torvalds
2006-11-25 23:49                           ` Yann Dirson
2006-11-26  1:14                             ` Sven Verdoolaege
2006-11-26  1:32                               ` Yann Dirson
2006-11-26  3:39                             ` Linus Torvalds
2006-11-26  8:05                               ` Daniel Barkalow
2006-11-28  9:36                                 ` Andreas Ericsson
2006-11-28 10:29                                   ` Andy Parkins
2006-11-28 10:50                                     ` Jakub Narebski
2006-11-28 13:35                                       ` Andy Parkins
2006-11-28 15:44                                         ` Shawn Pearce
2006-11-28 16:29                                           ` Andy Parkins
2006-11-28 16:36                                             ` Shawn Pearce
2006-11-28 17:38                                             ` Jon Loeliger
2006-11-29 16:15                                             ` Martin Waitz
2006-11-30 11:57                                             ` sf
     [not found]                                               ` <200611301255.41733.andyparkins@gmail.com>
2006-11-30 14:00                                                 ` Stephan Feder
2006-11-30 14:49                                                   ` Andy Parkins
2006-11-30 15:20                                                     ` Sven Verdoolaege
2006-11-30 15:30                                                       ` Andy Parkins
2006-11-30 15:50                                                         ` Andreas Ericsson
2006-11-30 16:08                                                           ` Andy Parkins
2006-11-30 16:33                                                         ` Sven Verdoolaege
2006-12-01  0:01                                                           ` Andy Parkins
2006-12-01  0:11                                                             ` Jakub Narebski
2006-12-01  9:32                                                             ` Sven Verdoolaege
2006-12-01 10:19                                                               ` Andy Parkins
2006-11-30 17:19                                                         ` Martin Waitz
2006-11-30 16:05                                                     ` sf
2006-11-30 16:12                                                       ` sf
2006-12-01  9:19                                                       ` Andy Parkins
2006-12-01  9:57                                                         ` Martin Waitz
2006-12-01 10:29                                                           ` Andy Parkins
2006-12-01 10:42                                                             ` Sven Verdoolaege
2006-12-01 11:02                                                               ` Andy Parkins
2006-12-01 11:10                                                                 ` Sven Verdoolaege
2006-12-01 11:45                                                                   ` sf
2006-12-01 12:12                                                                   ` Andy Parkins
2006-12-01 12:28                                                                     ` Martin Waitz
2006-12-01 14:11                                                                       ` Andy Parkins
2006-12-01 15:12                                                                         ` Martin Waitz
2006-12-01 11:46                                                                 ` Martin Waitz
2006-12-01 12:16                                                                   ` Andy Parkins
2006-12-01 12:34                                                                     ` Martin Waitz
2006-12-01 13:59                                                                       ` Andy Parkins
2006-12-01 14:07                                                                         ` Martin Waitz
2006-12-01 11:31                                                             ` Martin Waitz
2006-12-01 12:20                                                               ` Andy Parkins
2006-12-01 12:37                                                                 ` Martin Waitz
2006-12-02 15:16                                                                   ` Jakub Narebski
2006-11-28 19:58                                         ` Steven Grimm
2006-11-28 21:02                                           ` Shawn Pearce
2006-11-29 16:03                                         ` Martin Waitz
2006-11-29 20:00                                           ` Andy Parkins
2006-11-30 12:16                                             ` Andreas Ericsson
2006-11-30 12:40                                               ` Andy Parkins
2006-11-30 17:06                                             ` Martin Waitz
2006-11-30 18:57                                               ` Andreas Ericsson
2006-12-01  8:49                                                 ` Andy Parkins
2006-12-01  9:33                                                   ` Andreas Ericsson
2006-12-01 10:38                                                     ` Andy Parkins
2006-12-01 12:03                                                 ` sf
2006-12-01 12:11                                                   ` Martin Waitz
2006-12-01 13:21                                                     ` sf
2006-12-01 13:43                                                       ` Martin Waitz
2006-12-01 14:23                                                         ` Stephan Feder
2006-12-01 15:07                                                           ` Martin Waitz
2006-12-01 16:04                                                             ` Stephan Feder
2006-12-01 16:15                                                               ` Martin Waitz
2006-12-05  9:01                                                 ` Uwe Kleine-Koenig
2006-12-05 10:33                                                   ` Andreas Ericsson
2006-12-05 11:11                                                     ` Jakub Narebski
2006-12-05 15:02                                                     ` Uwe Kleine-Koenig
2006-12-05 15:30                                                       ` Andreas Ericsson
2006-12-05 16:00                                                   ` Sven Verdoolaege
2006-12-01  9:02                                               ` Andy Parkins
2006-12-01 11:00                                                 ` Martin Waitz
2006-12-01 12:09                                                   ` sf
2006-12-01 12:12                                                     ` Martin Waitz
2006-12-01 13:05                                                       ` sf
2006-12-01 13:35                                                         ` Martin Waitz
2006-12-01 13:43                                                           ` Andreas Ericsson
2006-12-01 13:46                                                             ` Martin Waitz
2006-12-01 14:52                                                               ` Andreas Ericsson
2006-12-01 15:00                                                                 ` Martin Waitz
2006-12-01 16:38                                                                   ` Andreas Ericsson
2006-12-01 16:49                                                                     ` Linus Torvalds
2006-12-01 17:08                                                                       ` sf
2006-12-01 18:06                                                                         ` Andreas Ericsson
2006-12-01 20:13                                                                         ` Linus Torvalds
2006-12-01 20:30                                                                           ` Martin Waitz
2006-12-01 23:23                                                                             ` Alan Chandler
2006-12-01 22:06                                                                           ` Josef Weidendorfer
2006-12-01 22:12                                                                             ` Martin Waitz
2006-12-01 22:26                                                                               ` Josef Weidendorfer
2006-12-01 22:40                                                                                 ` Martin Waitz
2006-12-01 23:17                                                                               ` Josef Weidendorfer
2006-12-02 20:24                                                                                 ` Martin Waitz
2006-12-03  0:55                                                                                   ` Josef Weidendorfer
2006-12-03  6:29                                                                                     ` Martin Waitz
2006-12-01 22:26                                                                             ` Linus Torvalds
2006-12-01 22:41                                                                               ` sf
2006-12-01 23:03                                                                                 ` Josef Weidendorfer
2006-12-01 23:09                                                                                 ` Linus Torvalds
2006-12-01 23:36                                                                                   ` Josef Weidendorfer
2006-12-02  0:12                                                                                     ` Linus Torvalds
2006-12-02  9:22                                                                                       ` Andy Parkins
     [not found]                                                                                         ` <200612021255.59972.Josef.Weidendorfer@gmx.de>
2006-12-03  9:42                                                                                           ` Andy Parkins
2006-12-02 11:32                                                                                       ` Josef Weidendorfer
2006-12-02 19:52                                                                                         ` Linus Torvalds
2006-12-02 20:21                                                                                           ` Martin Waitz
2006-12-02 20:46                                                                                             ` Linus Torvalds
2006-12-02 20:58                                                                                               ` Martin Waitz
2006-12-03  1:11                                                                                                 ` Josef Weidendorfer
2006-12-02 20:18                                                                                       ` Martin Waitz
2006-12-02 20:44                                                                                         ` Linus Torvalds
2006-12-02 21:06                                                                                           ` Martin Waitz
2006-12-02 21:29                                                                                             ` Linus Torvalds
2006-12-02 21:22                                                                                           ` Linus Torvalds
2006-12-03  2:07                                                                                             ` Josef Weidendorfer [this message]
2006-12-03  2:25                                                                                               ` Thoughts about memory requirements in traversals [Was: Re: [RFC] Submodules in GIT] Linus Torvalds
2006-12-03  2:46                                                                                               ` Shawn Pearce
2006-12-03  3:21                                                                                                 ` Josef Weidendorfer
2006-12-03 11:10                                                                                                   ` Jakub Narebski
2006-12-03 11:47                                                                                                     ` Josef Weidendorfer
2006-12-03 20:46                                                                                           ` [RFC] Submodules in GIT Martin Waitz
2006-12-03 22:16                                                                                       ` Sven Verdoolaege
2006-12-03 22:32                                                                                         ` Linus Torvalds
2006-12-03 22:49                                                                                           ` Jakub Narebski
2006-12-04 11:12                                                                                           ` Josef Weidendorfer
2006-12-01 23:49                                                                                   ` sf
2006-12-02 18:57                                                                                     ` Torgil Svensson
2006-12-02 19:41                                                                                       ` Linus Torvalds
2006-12-03  9:19                                                                                         ` Torgil Svensson
2006-12-03 17:54                                                                                           ` Linus Torvalds
2006-12-04 20:26                                                                                             ` Torgil Svensson
2006-12-04 20:41                                                                                               ` Linus Torvalds
2006-12-04 21:36                                                                                                 ` Torgil Svensson
2006-12-05 10:42                                                                                                   ` Andreas Ericsson
2006-12-05 11:09                                                                                                     ` Jakub Narebski
2006-12-05 10:38                                                                                                 ` Andreas Ericsson
2006-12-05 11:01                                                                                                   ` Jakub Narebski
2006-12-03 19:33                                                                                         ` Andy Parkins
2006-12-05  2:33                                                                                         ` Daniel Barkalow
2006-12-05 22:07                                                                                           ` sf
2006-12-09 21:34                                                                                         ` R. Steve McKown
2006-12-10 11:47                                                                                           ` Torgil Svensson
2006-12-14 21:27                                                                                             ` Torgil Svensson
2006-12-14 23:07                                                                                               ` Josef Weidendorfer
2006-12-15 17:43                                                                                                 ` Torgil Svensson
2006-12-15 21:42                                                                                                   ` Josef Weidendorfer
2006-12-15 23:43                                                                                                     ` Torgil Svensson
2006-12-16  1:13                                                                                                       ` Torgil Svensson
2006-12-16  1:20                                                                                                         ` Torgil Svensson
2006-12-16  1:34                                                                                                           ` Jakub Narebski
2006-12-16  8:40                                                                                                             ` Torgil Svensson
2006-12-16  9:57                                                                                                               ` Jakub Narebski
2006-12-16 10:25                                                                                                                 ` Junio C Hamano
2006-12-16 15:05                                                                                                                 ` Torgil Svensson
2006-12-16 15:38                                                                                                                   ` Torgil Svensson
2006-12-16 16:32                                                                                                                   ` Jakub Narebski
2006-12-17  0:21                                                                                                                     ` Torgil Svensson
2006-12-16  1:49                                                                                                         ` Linus Torvalds
2006-12-16  2:12                                                                                                           ` Linus Torvalds
2006-12-16  8:50                                                                                                             ` Torgil Svensson
2006-12-02 20:12                                                                                   ` Martin Waitz
2006-12-01 22:55                                                                               ` Josef Weidendorfer
2006-12-01 23:07                                                                                 ` Martin Waitz
2006-12-01 23:30                                                                                 ` Linus Torvalds
2006-12-02  0:14                                                                                   ` Josef Weidendorfer
2006-12-02  0:33                                                                                     ` Linus Torvalds
2006-12-02  9:27                                                                                       ` Andy Parkins
2006-12-04 18:56                                                                                       ` Michael K. Edwards
2006-12-05  1:31                                                                                         ` Sam Vilain
2006-12-01 22:35                                                                           ` sf
2006-12-08 18:29                                                                           ` Jon Loeliger
2006-12-08 18:45                                                                             ` Sven Verdoolaege
2006-12-12  8:32                                                                             ` Andreas Ericsson
2006-12-01 17:14                                                                       ` Martin Waitz
2006-12-01 16:57                                                                     ` Martin Waitz
2006-12-01 18:08                                                                       ` Andreas Ericsson
2006-12-01 18:51                                                                         ` Martin Waitz
2006-12-01 13:51                                                           ` Stephan Feder
2006-12-01 14:58                                                             ` Martin Waitz
2006-12-01 15:47                                                               ` Stephan Feder
2006-12-01 16:54                                                                 ` Martin Waitz
2006-12-01 17:33                                                                   ` Stephan Feder
2006-12-01 18:48                                                                     ` Martin Waitz
2006-12-01 23:34                                                                       ` sf
2006-12-02 19:46                                                                         ` Martin Waitz
2006-12-01 19:17                                                                     ` Andy Parkins
2006-12-01 19:38                                                                       ` Martin Waitz
2006-12-01 21:04                                                                         ` Andy Parkins
2006-12-01 21:37                                                                           ` Martin Waitz
2006-12-01 21:54                                                                             ` Andy Parkins
2006-12-01 22:08                                                                               ` Martin Waitz
2006-12-02 10:04                                                                                 ` Andy Parkins
2006-12-02 13:50                                                                                   ` Josef Weidendorfer
2006-12-02 20:43                                                                                     ` Martin Waitz
2006-12-03  1:02                                                                                       ` Josef Weidendorfer
2006-12-02 20:40                                                                                   ` Martin Waitz
2006-12-02 13:14                                                                           ` Jakub Narebski
2006-12-02 13:08                                                                     ` Jakub Narebski
2006-12-02 12:48                                                   ` Jakub Narebski
2006-11-28 17:28                                   ` Daniel Barkalow
2006-11-28 18:08                                     ` Sven Verdoolaege
2006-11-28 18:37                                       ` Daniel Barkalow
2006-11-28 19:06                                         ` Sven Verdoolaege
2006-11-28 20:41                                           ` Daniel Barkalow
2006-11-28 21:10                                             ` Shawn Pearce
2006-11-28 21:32                                               ` Daniel Barkalow
2006-11-28 21:53                                                 ` Linus Torvalds
2006-11-20 22:49 ` Jakub Narebski
2006-11-21  7:21 ` Shawn Pearce
2006-11-22  5:29 ` Petr Baudis
2006-12-02 20:16 ` Jakub Narebski
2006-12-03  1:24   ` Robin Rosenberg
2006-12-03  1:31     ` Jakub Narebski
2006-12-03 12:22       ` Robin Rosenberg
2006-12-03 12:31         ` Jakub Narebski
2006-12-03 11:00     ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200612030307.26429.Josef.Weidendorfer@gmx.de \
    --to=josef.weidendorfer@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=sf-gmane@stephan-feder.de \
    --cc=tali@admingilde.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).