* [PATCH 0/17] Sliding window mmap for packfiles.
@ 2006-12-23 7:33 Shawn O. Pearce
2006-12-23 9:37 ` Junio C Hamano
2006-12-24 8:56 ` Francis Moreau
0 siblings, 2 replies; 9+ messages in thread
From: Shawn O. Pearce @ 2006-12-23 7:33 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
This 17 patch series implements my much discussed, but never produced
(until now), 'mmap sliding window' for packfile data access.
The key idea behind this topic is to mmap large non-contiguous
segments of a packfile rather than the entire file. If available
virtual memory is getting low we unmap the least recently used
packfile segment to free up address space for the currently needed
segment.
This series also permits accessing packfiles up to 4 GiB in size,
even on systems which permit only 2 GiB of virtual memory within
a single process (e.g. Windows and some older UNIXes). Of course
4 GiB is still the upper limit on packfile size due to the current
format of the index file.
This series is 'pu' ready, but it may be too early to bring it
directly into 'next'.
Patch summary
-------------
1 - Replace unpack_entry_gently with unpack_entry.
2 - Introduce new config option for mmap limit.
3 - Refactor packed_git to prepare for sliding mmap windows.
4 - Use off_t for index and pack file lengths.
5 - Create read_or_die utility routine.
6 - Refactor how we open pack files to prepare for multiple windows.
Most of the above changes are incremental refactorings to help
get the code in a state where we can start to implement and make
use of the struct pack_window concept this series introduces.
7 - Replace use_packed_git with window cursors.
8 - Loop over pack_windows when inflating/accessing data.
9 - Document why header parsing won't exceed a window.
10 - Unmap individual windows rather than entire files.
11 - Fully activate the sliding window pack access.
These commits actually implement the core of the mmap sliding
window implementation and the necessary garbage collection to
support unmapping the least recently used window.
12 - Load core configuration in git-verify-pack.
13 - Ensure core.packedGitWindowSize cannot be less than 2 pages.
14 - Improve error message when packfile mmap fails.
15 - Support unmapping windows on 'temporary' packfiles.
The above sequence of commits are bug fixes on top of the initial
commits. I did not fold these back into the earlier commits
as I felt the bug fix commit messages provided useful details.
16 - Create pack_report() as a debugging aid.
17 - Test suite for sliding window mmap implementation.
These provide debugging and testing tools.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/17] Sliding window mmap for packfiles.
2006-12-23 7:33 [PATCH 0/17] Sliding window mmap for packfiles Shawn O. Pearce
@ 2006-12-23 9:37 ` Junio C Hamano
2006-12-23 9:42 ` Shawn Pearce
2006-12-24 8:56 ` Francis Moreau
1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2006-12-23 9:37 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: git
I have to say i am very much impressed (I've taken a look at
only the first half up to #11, though). How much has this been
used in real projects?
A couple of comments:
[3/17]
I think losing "p->next = NULL" does not matter with the callers
we have right now, but somehow this part makes me feel uneasy.
[5/17]
I think it makes sense to exit(0) for the existing write_or_die
upon EPIPE because that indicates we are the upstream of the
pipe and the reading process has exit (i.e. user said 'q' to
less while we still have more to say).
I suspect the symmetry would not hold for read_or_die; when we
are reading, EPIPE is not any different from any other errors
(except for EAGAIN or EINTR which we already take care of in
xread()) and the net effect is that we could not read what we
wanted to.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/17] Sliding window mmap for packfiles.
2006-12-23 9:37 ` Junio C Hamano
@ 2006-12-23 9:42 ` Shawn Pearce
0 siblings, 0 replies; 9+ messages in thread
From: Shawn Pearce @ 2006-12-23 9:42 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Junio C Hamano <junkio@cox.net> wrote:
> I have to say i am very much impressed (I've taken a look at
> only the first half up to #11, though). How much has this been
> used in real projects?
None yet. I just finished remerging it onto current git.git code.
> A couple of comments:
>
> [3/17]
>
> I think losing "p->next = NULL" does not matter with the callers
> we have right now, but somehow this part makes me feel uneasy.
That's a bug in that patch. I removed it by mistake. Good catch.
> [5/17]
>
> I think it makes sense to exit(0) for the existing write_or_die
> upon EPIPE because that indicates we are the upstream of the
> pipe and the reading process has exit (i.e. user said 'q' to
> less while we still have more to say).
>
> I suspect the symmetry would not hold for read_or_die; when we
> are reading, EPIPE is not any different from any other errors
> (except for EAGAIN or EINTR which we already take care of in
> xread()) and the net effect is that we could not read what we
> wanted to.
Oh, good point.
--
Shawn.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/17] Sliding window mmap for packfiles.
2006-12-23 7:33 [PATCH 0/17] Sliding window mmap for packfiles Shawn O. Pearce
2006-12-23 9:37 ` Junio C Hamano
@ 2006-12-24 8:56 ` Francis Moreau
2006-12-24 9:05 ` Shawn Pearce
2006-12-24 9:29 ` Linus Torvalds
1 sibling, 2 replies; 9+ messages in thread
From: Francis Moreau @ 2006-12-24 8:56 UTC (permalink / raw)
To: Shawn O. Pearce; +Cc: Junio C Hamano, git
Hi,
On 12/23/06, Shawn O. Pearce <spearce@spearce.org> wrote:
> This 17 patch series implements my much discussed, but never produced
[snip]
>
> This series also permits accessing packfiles up to 4 GiB in size,
> even on systems which permit only 2 GiB of virtual memory within
> a single process (e.g. Windows and some older UNIXes). Of course
Just out of curiosity, do you mean that there are some OS running on
32 bits machines which allow 4GiB size of virtual memory within a
single process ? If so, could you give an example of such OS ?
thanks
--
Francis
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/17] Sliding window mmap for packfiles.
2006-12-24 8:56 ` Francis Moreau
@ 2006-12-24 9:05 ` Shawn Pearce
2006-12-24 9:36 ` Francis Moreau
2006-12-24 9:29 ` Linus Torvalds
1 sibling, 1 reply; 9+ messages in thread
From: Shawn Pearce @ 2006-12-24 9:05 UTC (permalink / raw)
To: Francis Moreau; +Cc: Junio C Hamano, git
Francis Moreau <francis.moro@gmail.com> wrote:
> On 12/23/06, Shawn O. Pearce <spearce@spearce.org> wrote:
> >This 17 patch series implements my much discussed, but never produced
> [snip]
> >
> >This series also permits accessing packfiles up to 4 GiB in size,
> >even on systems which permit only 2 GiB of virtual memory within
> >a single process (e.g. Windows and some older UNIXes). Of course
>
> Just out of curiosity, do you mean that there are some OS running on
> 32 bits machines which allow 4GiB size of virtual memory within a
> single process ? If so, could you give an example of such OS ?
No. What I meant was the Git packfile/index format currently
supports up to 4 GiB of data in a single packfile. But *no*
OS using 32 bit virtual address space would permit us to access
that packfile prior to this series as we would have *no* memory
left for a stack, let alone for parsing commits, etc., as *all*
of the address space would have been dedicated to the packfile.
However with this series even a 32 bit OS which only permits
processes to have at most 2 GiB of address space (2 GiB split
between kernel space and userspace) can access packfiles up
to 4 GiB in size. That seems to be the split most OSes wind
up using, if they didn't push it out to 3.2 GiB like Linux
and Solaris have done.
This series is a good change because Git can now really make
full use of the space allowed by a single packfile. :-)
--
Shawn.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/17] Sliding window mmap for packfiles.
2006-12-24 9:05 ` Shawn Pearce
@ 2006-12-24 9:36 ` Francis Moreau
2006-12-24 9:49 ` Shawn Pearce
0 siblings, 1 reply; 9+ messages in thread
From: Francis Moreau @ 2006-12-24 9:36 UTC (permalink / raw)
To: Shawn Pearce; +Cc: Junio C Hamano, git
On 12/24/06, Shawn Pearce <spearce@spearce.org> wrote:
> Francis Moreau <francis.moro@gmail.com> wrote:
> > On 12/23/06, Shawn O. Pearce <spearce@spearce.org> wrote:
> > >This 17 patch series implements my much discussed, but never produced
> > [snip]
> > >
> > >This series also permits accessing packfiles up to 4 GiB in size,
> > >even on systems which permit only 2 GiB of virtual memory within
> > >a single process (e.g. Windows and some older UNIXes). Of course
> >
> > Just out of curiosity, do you mean that there are some OS running on
> > 32 bits machines which allow 4GiB size of virtual memory within a
> > single process ? If so, could you give an example of such OS ?
>
> No. What I meant was the Git packfile/index format currently
> supports up to 4 GiB of data in a single packfile. But *no*
> OS using 32 bit virtual address space would permit us to access
> that packfile prior to this series as we would have *no* memory
> left for a stack, let alone for parsing commits, etc., as *all*
> of the address space would have been dedicated to the packfile.
>
ok.
> However with this series even a 32 bit OS which only permits
> processes to have at most 2 GiB of address space (2 GiB split
> between kernel space and userspace) can access packfiles up
> to 4 GiB in size. That seems to be the split most OSes wind
> up using, if they didn't push it out to 3.2 GiB like Linux
> and Solaris have done.
>
Does it still needed for 64 bit OS ?
if not, can the overhead (if there is a significant one) implied by
your rework be avoid for such cases ?
> This series is a good change because Git can now really make
> full use of the space allowed by a single packfile. :-)
>
Yes I agree with you.
--
Francis
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/17] Sliding window mmap for packfiles.
2006-12-24 9:36 ` Francis Moreau
@ 2006-12-24 9:49 ` Shawn Pearce
2007-01-02 15:28 ` Andy Whitcroft
0 siblings, 1 reply; 9+ messages in thread
From: Shawn Pearce @ 2006-12-24 9:49 UTC (permalink / raw)
To: Francis Moreau; +Cc: Junio C Hamano, git
Francis Moreau <francis.moro@gmail.com> wrote:
> On 12/24/06, Shawn Pearce <spearce@spearce.org> wrote:
> >However with this series even a 32 bit OS which only permits
> >processes to have at most 2 GiB of address space (2 GiB split
> >between kernel space and userspace) can access packfiles up
> >to 4 GiB in size. That seems to be the split most OSes wind
> >up using, if they didn't push it out to 3.2 GiB like Linux
> >and Solaris have done.
> >
>
> Does it still needed for 64 bit OS ?
Not really. Almost any reasonable 64 bit OS which is also running
a Git compiled for 64 bit userspace would be able to mmap multiple
4 GiB packfiles without this series.
> if not, can the overhead (if there is a significant one) implied by
> your rework be avoid for such cases ?
The overhead is rather low. I did try hard to make it only a handful
of machine instructions worth of additional work, and even then I
tried to ammortize those over relatively large blocks of data to
reduce the impact. But yes, there is an overhead over the current
shipping version of Git.
However at least some of the overhead can be avoided by setting
core.packedGitWindowSize and core.packedGitLimit to higher values.
This will allow the implementation to mmap() larger windows of the
packfiles and retain a greater number of windows in memory at once.
If core.packedGitWindowSize is larger than your largest packfile
then most of the code will just "shutoff" and won't get in the way.
Its default is 32 MiB (see Documentation/config.txt).
I think the additional overhead added by this series is neglible
and worth the more graceful degredation it allows when virtual
address space becomes limited.
--
Shawn.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/17] Sliding window mmap for packfiles.
2006-12-24 9:49 ` Shawn Pearce
@ 2007-01-02 15:28 ` Andy Whitcroft
0 siblings, 0 replies; 9+ messages in thread
From: Andy Whitcroft @ 2007-01-02 15:28 UTC (permalink / raw)
To: Shawn Pearce; +Cc: Francis Moreau, Junio C Hamano, git
Shawn Pearce wrote:
> Francis Moreau <francis.moro@gmail.com> wrote:
>> On 12/24/06, Shawn Pearce <spearce@spearce.org> wrote:
>>> However with this series even a 32 bit OS which only permits
>>> processes to have at most 2 GiB of address space (2 GiB split
>>> between kernel space and userspace) can access packfiles up
>>> to 4 GiB in size. That seems to be the split most OSes wind
>>> up using, if they didn't push it out to 3.2 GiB like Linux
>>> and Solaris have done.
>>>
>> Does it still needed for 64 bit OS ?
>
> Not really. Almost any reasonable 64 bit OS which is also running
> a Git compiled for 64 bit userspace would be able to mmap multiple
> 4 GiB packfiles without this series.
>
>> if not, can the overhead (if there is a significant one) implied by
>> your rework be avoid for such cases ?
>
> The overhead is rather low. I did try hard to make it only a handful
> of machine instructions worth of additional work, and even then I
> tried to ammortize those over relatively large blocks of data to
> reduce the impact. But yes, there is an overhead over the current
> shipping version of Git.
>
> However at least some of the overhead can be avoided by setting
> core.packedGitWindowSize and core.packedGitLimit to higher values.
> This will allow the implementation to mmap() larger windows of the
> packfiles and retain a greater number of windows in memory at once.
>
> If core.packedGitWindowSize is larger than your largest packfile
> then most of the code will just "shutoff" and won't get in the way.
> Its default is 32 MiB (see Documentation/config.txt).
>
> I think the additional overhead added by this series is neglible
> and worth the more graceful degredation it allows when virtual
> address space becomes limited.
You now change the default size based on NO_MMAP, could you not just
bump the window size to 4GiB on 64 bit?
-apw
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/17] Sliding window mmap for packfiles.
2006-12-24 8:56 ` Francis Moreau
2006-12-24 9:05 ` Shawn Pearce
@ 2006-12-24 9:29 ` Linus Torvalds
1 sibling, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2006-12-24 9:29 UTC (permalink / raw)
To: Francis Moreau; +Cc: Shawn O. Pearce, Junio C Hamano, git
On Sun, 24 Dec 2006, Francis Moreau wrote:
>
> Just out of curiosity, do you mean that there are some OS running on
> 32 bits machines which allow 4GiB size of virtual memory within a
> single process ? If so, could you give an example of such OS ?
Actually, Linux will do it on certain architectures (some architectures
have separate "address spaces" for kernel and user). And even on x86, if
you apply the (insane) 4GB patches, user space will actually have almost
all of the 4GB, because there's only a _tiny_ trampoline thing that
switches the whole page table around that is kernel-mapped and takes away
from the 4GB thing.
In practice, though, most 32-bit architectures will have between 1-3GB of
user virtual memory. And obviously stack space, binaries, heap etc take up
space, so you often end up with with just ~0.5 GB of actual dependable
contiguous virtual memory.
Linus
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-01-02 15:28 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-23 7:33 [PATCH 0/17] Sliding window mmap for packfiles Shawn O. Pearce
2006-12-23 9:37 ` Junio C Hamano
2006-12-23 9:42 ` Shawn Pearce
2006-12-24 8:56 ` Francis Moreau
2006-12-24 9:05 ` Shawn Pearce
2006-12-24 9:36 ` Francis Moreau
2006-12-24 9:49 ` Shawn Pearce
2007-01-02 15:28 ` Andy Whitcroft
2006-12-24 9:29 ` Linus Torvalds
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).