git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC][RFC] Proposal: Make pack access code thread-safe
@ 2019-04-07 20:48 Matheus Tavares Bernardino
  2019-04-07 22:52 ` Christian Couder
  0 siblings, 1 reply; 13+ messages in thread
From: Matheus Tavares Bernardino @ 2019-04-07 20:48 UTC (permalink / raw)
  To: git
  Cc: Duy Nguyen, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane, Christian Couder

Hi, everyone

This is my proposal for GSoC with the subject "Make pack access code
thread-safe". I'm late in schedule but I would like to ask for your
comments on it. Any feedback will be highly appreciated.

The "rendered" version can be seen here:
https://docs.google.com/document/d/1QXT3iiI5zjwusplcZNf6IbYc04-9diziVKdOGkTHeIU/edit?usp=sharing

I kindly ask you to read the text at the google docs link, because in
the conversion to plain text I noticed it discards some information :(
But for those who prefer to comment by email, here it is:

Thanks,
Matheus Tavares

=======================================

Making pack access code thread-safe
April, 2019

#Contact Info

Name Matheus Tavares Bernardino
Timezone GMT-3
Email matheus.bernardino@usp.br
IRC Nick matheustavares on #git-devel
Telefone [...]
Postal address [...]
Github https://github.com/MatheusBernardino/
Gitlab https://gitlab.com/MatheusTavares

# About me

I’m a senior student at the University of São Paulo (USP), attending the
Bachelor’s degree in Computer Science course. Currently, I’m at the end
of a one year undergraduate research in High-Performance Computing. The
goal of this project was to accelerate astrophysical software for black
hole studies using GPUs. Also, I’m working as a teaching assistant on
IME-USP’s Concurrent and Parallel Programming course, giving lectures
and developing/grading programming assignments. Besides parallel and
high-performance computing I’m very passionate about software
development in general, but especially low-level coding, and FLOSS.

# About me and FLOSS

## Linux Kernel

Last year, I started contributing to the Linux Kernel in the IIO
subsystem, together with a group of colleagues. I worked with another
student, to move the ad2s90 module out of staging area to Kernel’s
mainline, which we accomplished by the end of the year. In total, I
authored 11 patches and co-authored 3 (all of which are already at
Torvald’s repo). If you want to know more about my contributions to
Linux Kernel, take a look at the Appendix section.

## FLUSP: FLOSS at USP

After the amazing experience contributing to the Linux Kernel, we
decided to found FLUSP: FLOSS at USP, a group opened to undergraduate
and graduate students that aims to contribute to FLOSS software. Since
then, the group has grown and evolved a lot: Currently, we have members
contributing to the Kernel, GCC, IGT GPU Tools, Git and some projects of
our own such as KernelWorkflow. And as a recognition of our endeavor
with free software, we received some donations from AnalogDevices and
DigitalOcean.

Besides administrative questions and contributions to FLOSS projects, at
FLUSP, I’ve been mentoring people who want to start contributing to the
Linux Kernel and now, to Git, as well.

# About me and Git

I joined Git community in February and, so far, I have sent the
following patches:

        clone: test for our behavior on odd objects/* content
        clone: better handle symlinked files at .git/objects/
        dir-iterator: add flags parameter to dir_iterator_begin
        clone: copy hidden paths at local clone
        clone: extract function from copy_or_link_directory
        clone: use dir-iterator to avoid explicit dir traversal
        clone: Replace strcmp by fspathcmp

And three more patches for git.github.io:

        rn-50: Add git-send-email links to light readings
        SoC-2019-Microprojects: Remove git-credential-cache
        SoC-2019-Microprojects: Remove all trailing spaces

Participating at FLUSP, I’ve also been part of some Git related activities:

* I actively helped to organize a Git workshop for newcomer students.
* I’ve written an article at our website to help people configure and
use git-send-email to send patches.
* I’ve been writing a ‘First steps at Git’ article (not finished yet),
in which I’m registering what I’ve learned in the Git community so far,
since downloading the source, subscribing to the mailings list and
joining the channel at IRC until how to use travis-ci and begin sending
patches.

# The Project

As direct as possible, the goal with this project is to make more of
Git’s codebase thread-safe, so that we can improve parallelism in
various commands. The motivation behind this are the complaints from
developers experiencing slow Git commands when working with large
repositories[1], such as chromium and Android. And since nowadays, most
personal computers have multi-core CPUs, it is a natural step trying to
improve parallel support so that we can better use the available resources.

With this in mind, pack access code is a good target for improvement,
since it’s used by many Git commands (e.g., checkout, grep, blame, diff,
log, etc.). This section of the codebase is still sequential and has
many global states, which should be protected before we can work to
improve parallelism.

## The Pack Access Code

To better describe what the pack access code is, we must talk about
Git’s object storing (in a simplified way): Besides what are called
loose objects, Git has a very optimized mechanism to compactly store
objects (blobs, trees, commits, etc.) in packfiles[2]. These files are
created by[3]:

1. listing objects;
2. sorting the list with some good heuristics;
3. traversing the list with a sliding window to find similar objects in
the window, in order to do delta decomposing;
4. compress the objects with zlib and write them to the packfile.

What we are calling pack access code in this document, is the set of
functions responsible for retrieving the objects stored at the
packfiles. This process consists, roughly speaking, in three parts:

1. Locate and read the blob from packfile, using the index file;
2. If the blob is a delta, locate and read the base object to apply the
delta on top of it;
3. Once the full content is read, decompress it (using zlib inflate).

Note: There is a delta cache for the second step so that if another
delta depends on the same base object, it is already in memory. This
cache is global; also, the sliding windows, are global per packfile.

If these steps were thread-safe, the ability to perform the delta
reconstruction (together with the delta cache lookup) and zlib inflation
in parallel could bring a good speedup. At git-blame, for example,
24%[4] of the time is spent in the call stack originated at
read_object_file_extended. Not only this but once we have this big
section of the codebase thread-safe, we can work to parallelize even
more work at higher levels of the call stack. Therefore, with this
project, we aim to make room for many future optimizations in many Git
commands.

# Plan

I will probably be working mainly with packfile.c, sha1-file.c,
object-store.h, object.c and pack.h, however, I may also need to tackle
other files. I will be focusing on the following three pack access call
chains, found in git-grep and/or git-blame:


read_object_file → repo_read_object_file → read_object_file_extended →
read_object → oid_object_info_extended → find_pack_entry →
fill_pack_entry → find_pack_entry_one → bsearch_pack and
nth_packed_object_offset

oid_object_info → oid_object_info_extended → <same as previous>

read_object_with_reference → read_object_file → <same as previous>


Ideally, at the end of the project, it will be possible to call
read_object_file, oid_object_info and read_object_with_reference with
thread-safety, so that these operations can be, latter, performed in
parallel.

Here are some threads on Git’s mailing list where I started discussing
my project:

* https://public-inbox.org/git/CAHd-oW7onvn4ugEjXzAX_OSVEfCboH3-FnGR00dU8iaoc+b8=Q@mail.gmail.com/
* https://public-inbox.org/git/20190402005245.4983-1-matheus.bernardino@usp.br/#t

And also, a previous attempt to make part of the pack access code
thread-safe which I may use as a base:

* https://public-inbox.org/git/20140212015727.1D63A403D3@wince.sfo.corp.google.com/#Z30builtin:gc.c

# Points to work on

* Investigate pack access call chains and look for non-thread-safe
operations on then.
* Protect packfile.c read-and-write global variables, such as
pack_open_windows, pack_open_fds and etc., using mutexes.
* Just like the previous item, protect sha1-file.c global states such as
the object cache used by read_object_file(). (The object cache may be
thread-local thought. This should still be studied.)
* Investigate the delta cache, sliding pack window, and maybe other
states that should be protected as well.
* Use GDB or GPROF to follow call chains inside pack access looking for
functions with static variables in their scopes. These variables are
thread-shared and should be protected or the functions to which they
belong be refactored.
* Make sure tests cover functions I’ll be working on and refactor/add
tests as needed
* [Bonus] Once pack access is thread-safe, refactor the critical
sections at git-grep to use more fine-grained mutexes. This will
hopefully increase git-grep performance, especially in large repositories.
* [Bonus] Check other mutex protected functions git-grep uses, not
related with pack access, to see if we can implement a more
fined-grained parallelism there. This functions are: fill_textconv,
is_submodule_active, repo_submodule_init, repo_read_gitmodules and
add_to_alternates_memory.
* [Bonus] Once pack access is thread-safe, ensure xdiff code used by
git-blame has thread-safety. I expect this to be easier.
* [Bonus] If the previous bonus get completed, start discussing a
possible parallel git-blame implementation with the community. We could
work a producer-consumer mechanism at blame.c’s assign_blame() function,
for a very good work sharing assignment (90% of git-blame’s time is
spent here[5]). Or try threading at lower functions on the call stack
that still uses a lot of execution time such as the libxdiff ones.

# Schedule

This is the planned schedule in which I should be working on. But I
would like to highlight that since there’s still a significant
investigation period from now until early May, this can have some
changes or additions during the process.

Timeline:

Investigation Time (Now - May, 5)
     * Gather information of global states.
     * Trace pack access call chain used by git commands like blame and
  checkout.
     * Try to classify which global variables are updated during pack
access call stack and, therefore, should be protected.
     * Adjust the schedule as needed.


Community Bounding and work on sha1-file.c global states (May, 6 - 26[6])
     * Talk with the community about my then refined plan and ask for
comments.
     * Protect object cache at sha1-file.c (or make it thread-local).
     * Work on other sha1-file.c global states and non-thread-safe
functions.


Work on packfile.c global states (not including delta cache) (May, 27 -
June, 23)
     * Conclude any unfinished work on sha1-file.c from the previous period.
     * Protect packfile.c variables (pack_open_windows, pack_open_fds
and etc.) and work on its non-protected functions.


Work on delta cache and other global states (June, 24 - July, 21)
     * Protect delta base cache operations (here we should study whether
to add mutexes to the cache itself or to the underlying hashmap).
     * Protect oid_* functions.
     * Work on sliding window (this is the section I have less knowledge
yet, so should be studied)


Work on bonus and leftovers (July, 22 - August 19)
     * This time will be reserved to finish any leftovers from the other
periods and, if we still have some spare time, work on the bonus items.
     * Note: I also plan to attend DebConf from july 21th to 28th


# Availability

My university vacations start on June 29, but since this is my last year
and I’m attending just two courses plus the teaching assistance, I think
it won’t be a problem. Also, the classes start back in early August, but
I won’t have any more courses to attend in this next semester. I don’t
have any schedule trips besides DebConf, from July 21th to 28th (let me
know if any other Git community members plan to attend too, please). All
changes in availability will be communicated to the mentors in advance.

# Project Relevance and after GSoC plans

As already pointed out, this project will make it feasible to improve
(or add) parallelism in many Git commands. And that’s what I plan to do
after (or even during) GSoC, mainly with git-blame and git-grep. I’m
also trying to form a local community at FLUSP to keep contributing to Git.

Appendix

Patches at Linux Kernel

        staging: iio: ad2s1210: fix 'assignment operator' style checks
        staging:iio:ad2s90: Make read_raw return spi_read's error code
        staging:iio:ad2s90: Make probe handle spi_setup failure
        staging:iio:ad2s90: Remove always overwritten assignment
        staging:iio:ad2s90: Move device registration to the end of probe
        staging:iio:ad2s90: Add IIO_CHAN_INFO_SCALE to channel spec and read_raw
        staging:iio:ad2s90: Check channel type at read_raw
        staging:iio:ad2s90: Add device tree support
        staging:iio:ad2s90: Remove spi setup that should be done via dt
        staging:iio:ad2s90: Add max frequency check at probe
        dt-bindings:iio:resolver: Add docs for ad2s90
        staging:iio:ad2s90: Replace license text w/ SPDX identifier
        staging:iio:ad2s90: Add comment to device state mutex
        staging:iio:ad2s90: Move out of staging


________________
[1] Some of them can be seen here:
https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/oYe69KzyG_U
https://bugs.chromium.org/p/git/issues/detail?id=18
https://bugs.chromium.org/p/git/issues/detail?id=16
https://code.fb.com/core-data/scaling-mercurial-at-facebook/
https://public-inbox.org/git/CA+TurHgyUK5sfCKrK+3xY8AeOg0t66vEvFxX=JiA9wXww7eZXQ@mail.gmail.com/
https://public-inbox.org/git/20140213014229.GE4582@vauxhall.crustytoothpaste.net/
https://public-inbox.org/git/CACBZZX6A+35wGBYAYj7E9d4XwLby21TLbTh-zRX+fkSt_e2zeg@mail.gmail.com/
[2] https://git-scm.com/book/en/v2/Git-Internals-Packfiles
[3]
https://github.com/git/git/blob/master/Documentation/technical/pack-heuristics.txt
[4]  https://i.imgur.com/XmyJMuE.png
[5] https://i.imgur.com/XmyJMuE.png
[6] GSoC’s official start date

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
  2019-04-07 20:48 [GSoC][RFC] Proposal: Make pack access code thread-safe Matheus Tavares Bernardino
@ 2019-04-07 22:52 ` Christian Couder
  2019-04-08  1:23   ` Duy Nguyen
  2019-04-08 16:42   ` Matheus Tavares Bernardino
  0 siblings, 2 replies; 13+ messages in thread
From: Christian Couder @ 2019-04-07 22:52 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: git, Duy Nguyen, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane

Hi Matheus

On Sun, Apr 7, 2019 at 10:48 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> This is my proposal for GSoC with the subject "Make pack access code
> thread-safe".

Thanks!

> I'm late in schedule but I would like to ask for your
> comments on it. Any feedback will be highly appreciated.
>
> The "rendered" version can be seen here:
> https://docs.google.com/document/d/1QXT3iiI5zjwusplcZNf6IbYc04-9diziVKdOGkTHeIU/edit?usp=sharing

Thanks for the link!

> Besides administrative questions and contributions to FLOSS projects, at
> FLUSP, I’ve been mentoring people who want to start contributing to the
> Linux Kernel and now, to Git, as well.

Nice! Do you have links about that?

> # The Project
>
> As direct as possible, the goal with this project is to make more of
> Git’s codebase thread-safe, so that we can improve parallelism in
> various commands. The motivation behind this are the complaints from
> developers experiencing slow Git commands when working with large
> repositories[1], such as chromium and Android. And since nowadays, most
> personal computers have multi-core CPUs, it is a natural step trying to
> improve parallel support so that we can better use the available resources.
>
> With this in mind, pack access code is a good target for improvement,
> since it’s used by many Git commands (e.g., checkout, grep, blame, diff,
> log, etc.). This section of the codebase is still sequential and has
> many global states, which should be protected before we can work to
> improve parallelism.

I think it's better if global state can be made local or perhaps
removed, rather than protected (though of course that's not always
possible).

> ## The Pack Access Code
>
> To better describe what the pack access code is, we must talk about
> Git’s object storing (in a simplified way):

Maybe s/storing/storage/

> Besides what are called loose objects,

s/loose object/loose object files/

> Git has a very optimized mechanism to compactly store
> objects (blobs, trees, commits, etc.) in packfiles[2]. These files are
> created by[3]:
>
> 1. listing objects;
> 2. sorting the list with some good heuristics;
> 3. traversing the list with a sliding window to find similar objects in
> the window, in order to do delta decomposing;
> 4. compress the objects with zlib and write them to the packfile.
>
> What we are calling pack access code in this document, is the set of
> functions responsible for retrieving the objects stored at the
> packfiles. This process consists, roughly speaking, in three parts:
>
> 1. Locate and read the blob from packfile, using the index file;
> 2. If the blob is a delta, locate and read the base object to apply the
> delta on top of it;
> 3. Once the full content is read, decompress it (using zlib inflate).
>
> Note: There is a delta cache for the second step so that if another
> delta depends on the same base object, it is already in memory. This
> cache is global; also, the sliding windows, are global per packfile.

Yeah, but the sliding windows are used only when creating pack files,
not when reading them, right?

> If these steps were thread-safe, the ability to perform the delta
> reconstruction (together with the delta cache lookup) and zlib inflation
> in parallel could bring a good speedup. At git-blame, for example,
> 24%[4] of the time is spent in the call stack originated at
> read_object_file_extended. Not only this but once we have this big
> section of the codebase thread-safe, we can work to parallelize even
> more work at higher levels of the call stack. Therefore, with this
> project, we aim to make room for many future optimizations in many Git
> commands.

Nice.

> # Plan
>
> I will probably be working mainly with packfile.c, sha1-file.c,
> object-store.h, object.c and pack.h, however, I may also need to tackle
> other files. I will be focusing on the following three pack access call
> chains, found in git-grep and/or git-blame:
>
> read_object_file → repo_read_object_file → read_object_file_extended →
> read_object → oid_object_info_extended → find_pack_entry →
> fill_pack_entry → find_pack_entry_one → bsearch_pack and
> nth_packed_object_offset
>
> oid_object_info → oid_object_info_extended → <same as previous>
>
> read_object_with_reference → read_object_file → <same as previous>
>
> Ideally, at the end of the project, it will be possible to call
> read_object_file, oid_object_info and read_object_with_reference with
> thread-safety, so that these operations can be, latter, performed in
> parallel.
>
> Here are some threads on Git’s mailing list where I started discussing
> my project:
>
> * https://public-inbox.org/git/CAHd-oW7onvn4ugEjXzAX_OSVEfCboH3-FnGR00dU8iaoc+b8=Q@mail.gmail.com/
> * https://public-inbox.org/git/20190402005245.4983-1-matheus.bernardino@usp.br/#t
>
> And also, a previous attempt to make part of the pack access code
> thread-safe which I may use as a base:
>
> * https://public-inbox.org/git/20140212015727.1D63A403D3@wince.sfo.corp.google.com/#Z30builtin:gc.c

Nice.

> # Points to work on
>
> * Investigate pack access call chains and look for non-thread-safe
> operations on then.
> * Protect packfile.c read-and-write global variables, such as
> pack_open_windows, pack_open_fds and etc., using mutexes.

Do you want to work on making both packfile reading and packfile
writing thread safe? Or just packfile reading?

If some variables are used for both reading and writing packfiles, do
you plan to protect them only when they are used for reading?

The rest of your proposal looks very good to me. Please make sure you
upload this or an updated version soon to the GSoC web site.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
  2019-04-07 22:52 ` Christian Couder
@ 2019-04-08  1:23   ` Duy Nguyen
  2019-04-08  3:32     ` Duy Nguyen
  2019-04-08  9:26     ` Philip Oakley
  2019-04-08 16:42   ` Matheus Tavares Bernardino
  1 sibling, 2 replies; 13+ messages in thread
From: Duy Nguyen @ 2019-04-08  1:23 UTC (permalink / raw)
  To: Christian Couder
  Cc: Matheus Tavares Bernardino, git, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane

On Mon, Apr 8, 2019 at 5:52 AM Christian Couder
<christian.couder@gmail.com> wrote:
> > Git has a very optimized mechanism to compactly store
> > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are
> > created by[3]:
> >
> > 1. listing objects;
> > 2. sorting the list with some good heuristics;
> > 3. traversing the list with a sliding window to find similar objects in
> > the window, in order to do delta decomposing;
> > 4. compress the objects with zlib and write them to the packfile.
> >
> > What we are calling pack access code in this document, is the set of
> > functions responsible for retrieving the objects stored at the
> > packfiles. This process consists, roughly speaking, in three parts:
> >
> > 1. Locate and read the blob from packfile, using the index file;
> > 2. If the blob is a delta, locate and read the base object to apply the
> > delta on top of it;
> > 3. Once the full content is read, decompress it (using zlib inflate).
> >
> > Note: There is a delta cache for the second step so that if another
> > delta depends on the same base object, it is already in memory. This
> > cache is global; also, the sliding windows, are global per packfile.
>
> Yeah, but the sliding windows are used only when creating pack files,
> not when reading them, right?

These windows are actually for reading. We used to just mmap the whole
pack file in the early days but that was impossible for 4+ GB packs on
32-bit platforms, which was one of the reasons, I think, that sliding
windows were added, to map just the parts we want to read.

> > # Points to work on
> >
> > * Investigate pack access call chains and look for non-thread-safe
> > operations on then.
> > * Protect packfile.c read-and-write global variables, such as
> > pack_open_windows, pack_open_fds and etc., using mutexes.
>
> Do you want to work on making both packfile reading and packfile
> writing thread safe? Or just packfile reading?

Packfile writing is probably already or pretty close to thread-safe
(at least the main writing code path in git-pack-objects; the
streaming blobs to a pack, i'm not so sure).
-- 
Duy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
  2019-04-08  1:23   ` Duy Nguyen
@ 2019-04-08  3:32     ` Duy Nguyen
  2019-04-08  6:58       ` Christian Couder
  2019-04-08 15:58       ` Matheus Tavares Bernardino
  2019-04-08  9:26     ` Philip Oakley
  1 sibling, 2 replies; 13+ messages in thread
From: Duy Nguyen @ 2019-04-08  3:32 UTC (permalink / raw)
  To: Christian Couder
  Cc: Matheus Tavares Bernardino, git, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane

On Mon, Apr 8, 2019 at 8:23 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Mon, Apr 8, 2019 at 5:52 AM Christian Couder
> <christian.couder@gmail.com> wrote:
> > > Git has a very optimized mechanism to compactly store
> > > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are
> > > created by[3]:
> > >
> > > 1. listing objects;
> > > 2. sorting the list with some good heuristics;
> > > 3. traversing the list with a sliding window to find similar objects in
> > > the window, in order to do delta decomposing;
> > > 4. compress the objects with zlib and write them to the packfile.
> > >
> > > What we are calling pack access code in this document, is the set of
> > > functions responsible for retrieving the objects stored at the
> > > packfiles. This process consists, roughly speaking, in three parts:
> > >
> > > 1. Locate and read the blob from packfile, using the index file;
> > > 2. If the blob is a delta, locate and read the base object to apply the
> > > delta on top of it;
> > > 3. Once the full content is read, decompress it (using zlib inflate).
> > >
> > > Note: There is a delta cache for the second step so that if another
> > > delta depends on the same base object, it is already in memory. This
> > > cache is global; also, the sliding windows, are global per packfile.
> >
> > Yeah, but the sliding windows are used only when creating pack files,
> > not when reading them, right?
>
> These windows are actually for reading. We used to just mmap the whole
> pack file in the early days but that was impossible for 4+ GB packs on
> 32-bit platforms, which was one of the reasons, I think, that sliding
> windows were added, to map just the parts we want to read.

To clarify (I think I see why you mentioned pack creation now), there
are actually two window concepts. core.packedGitWindowSize is about
reading pack files. pack.window is for generating pack files. The
second window should already be thread-safe since we do all the
heuristics to find best base object candidates in threads.
-- 
Duy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
  2019-04-08  3:32     ` Duy Nguyen
@ 2019-04-08  6:58       ` Christian Couder
  2019-04-08 16:03         ` Matheus Tavares Bernardino
  2019-04-08 15:58       ` Matheus Tavares Bernardino
  1 sibling, 1 reply; 13+ messages in thread
From: Christian Couder @ 2019-04-08  6:58 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Matheus Tavares Bernardino, git, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane

On Mon, Apr 8, 2019 at 5:32 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Mon, Apr 8, 2019 at 8:23 AM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Mon, Apr 8, 2019 at 5:52 AM Christian Couder
> > <christian.couder@gmail.com> wrote:
> > > > Git has a very optimized mechanism to compactly store
> > > > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are
> > > > created by[3]:
> > > >
> > > > 1. listing objects;
> > > > 2. sorting the list with some good heuristics;
> > > > 3. traversing the list with a sliding window to find similar objects in
> > > > the window, in order to do delta decomposing;
> > > > 4. compress the objects with zlib and write them to the packfile.
> > > >
> > > > What we are calling pack access code in this document, is the set of
> > > > functions responsible for retrieving the objects stored at the
> > > > packfiles. This process consists, roughly speaking, in three parts:
> > > >
> > > > 1. Locate and read the blob from packfile, using the index file;
> > > > 2. If the blob is a delta, locate and read the base object to apply the
> > > > delta on top of it;
> > > > 3. Once the full content is read, decompress it (using zlib inflate).
> > > >
> > > > Note: There is a delta cache for the second step so that if another
> > > > delta depends on the same base object, it is already in memory. This
> > > > cache is global; also, the sliding windows, are global per packfile.
> > >
> > > Yeah, but the sliding windows are used only when creating pack files,
> > > not when reading them, right?
> >
> > These windows are actually for reading. We used to just mmap the whole
> > pack file in the early days but that was impossible for 4+ GB packs on
> > 32-bit platforms, which was one of the reasons, I think, that sliding
> > windows were added, to map just the parts we want to read.
>
> To clarify (I think I see why you mentioned pack creation now), there
> are actually two window concepts. core.packedGitWindowSize is about
> reading pack files. pack.window is for generating pack files. The
> second window should already be thread-safe since we do all the
> heuristics to find best base object candidates in threads.

Yeah, it is not very clear in the proposal which windows it is talking
about as I think a window is first mentioned when describing the steps
to create a packfile in:

"3. traversing the list with a sliding window to find similar objects
in the window, in order to do delta decomposing;"

Also the proposal plans to "Protect packfile.c read-and-write global
variables ..." which made me wonder if it was also about improving
thread safety when generating pack files.

Thanks for clarifying!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
  2019-04-08  1:23   ` Duy Nguyen
  2019-04-08  3:32     ` Duy Nguyen
@ 2019-04-08  9:26     ` Philip Oakley
  2019-04-08 17:04       ` Matheus Tavares Bernardino
  1 sibling, 1 reply; 13+ messages in thread
From: Philip Oakley @ 2019-04-08  9:26 UTC (permalink / raw)
  To: Duy Nguyen, Christian Couder
  Cc: Matheus Tavares Bernardino, git, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane

On 08/04/2019 02:23, Duy Nguyen wrote:
> On Mon, Apr 8, 2019 at 5:52 AM Christian Couder
> <christian.couder@gmail.com> wrote:
>>> Git has a very optimized mechanism to compactly store
>>> objects (blobs, trees, commits, etc.) in packfiles[2]. These files are
>>> created by[3]:
>>>
>>> 1. listing objects;
>>> 2. sorting the list with some good heuristics;
>>> 3. traversing the list with a sliding window to find similar objects in
>>> the window, in order to do delta decomposing;
>>> 4. compress the objects with zlib and write them to the packfile.
>>>
>>> What we are calling pack access code in this document, is the set of
>>> functions responsible for retrieving the objects stored at the
>>> packfiles. This process consists, roughly speaking, in three parts:
>>>
>>> 1. Locate and read the blob from packfile, using the index file;
>>> 2. If the blob is a delta, locate and read the base object to apply the
>>> delta on top of it;
>>> 3. Once the full content is read, decompress it (using zlib inflate).
>>>
>>> Note: There is a delta cache for the second step so that if another
>>> delta depends on the same base object, it is already in memory. This
>>> cache is global; also, the sliding windows, are global per packfile.
>> Yeah, but the sliding windows are used only when creating pack files,
>> not when reading them, right?
> These windows are actually for reading. We used to just mmap the whole
> pack file in the early days but that was impossible for 4+ GB packs on
> 32-bit platforms, which was one of the reasons, I think, that sliding
> windows were added, to map just the parts we want to read.

Another "32-bit problem" should also be expressly considered during the 
GSoC work because of the MS Windows definition of uInt / long to be only 
32 bits, leading to much of the Git code failing on the Git for Windows 
port and on the Git LFS (for Windows) for packs and files greater than 
4Gb. https://github.com/git-for-windows/git/issues/1063

Mainly it is just substitution of size_t for long, but there can be 
unexpected coercions when mixed data types get coerced down to a local 
32-bit long. This is made worse by it being implementation defined, so 
one needs to be explicit about some casts up to pointer/memsized types.
>>> # Points to work on
>>>
>>> * Investigate pack access call chains and look for non-thread-safe
>>> operations on then.
>>> * Protect packfile.c read-and-write global variables, such as
>>> pack_open_windows, pack_open_fds and etc., using mutexes.
>> Do you want to work on making both packfile reading and packfile
>> writing thread safe? Or just packfile reading?
> Packfile writing is probably already or pretty close to thread-safe
> (at least the main writing code path in git-pack-objects; the
> streaming blobs to a pack, i'm not so sure).
--
Philip

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
  2019-04-08  3:32     ` Duy Nguyen
  2019-04-08  6:58       ` Christian Couder
@ 2019-04-08 15:58       ` Matheus Tavares Bernardino
  1 sibling, 0 replies; 13+ messages in thread
From: Matheus Tavares Bernardino @ 2019-04-08 15:58 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Christian Couder, git, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane

On Mon, Apr 8, 2019 at 12:32 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Mon, Apr 8, 2019 at 8:23 AM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Mon, Apr 8, 2019 at 5:52 AM Christian Couder
> > <christian.couder@gmail.com> wrote:
> > > > Git has a very optimized mechanism to compactly store
> > > > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are
> > > > created by[3]:
> > > >
> > > > 1. listing objects;
> > > > 2. sorting the list with some good heuristics;
> > > > 3. traversing the list with a sliding window to find similar objects in
> > > > the window, in order to do delta decomposing;
> > > > 4. compress the objects with zlib and write them to the packfile.
> > > >
> > > > What we are calling pack access code in this document, is the set of
> > > > functions responsible for retrieving the objects stored at the
> > > > packfiles. This process consists, roughly speaking, in three parts:
> > > >
> > > > 1. Locate and read the blob from packfile, using the index file;
> > > > 2. If the blob is a delta, locate and read the base object to apply the
> > > > delta on top of it;
> > > > 3. Once the full content is read, decompress it (using zlib inflate).
> > > >
> > > > Note: There is a delta cache for the second step so that if another
> > > > delta depends on the same base object, it is already in memory. This
> > > > cache is global; also, the sliding windows, are global per packfile.
> > >
> > > Yeah, but the sliding windows are used only when creating pack files,
> > > not when reading them, right?
> >
> > These windows are actually for reading. We used to just mmap the whole
> > pack file in the early days but that was impossible for 4+ GB packs on
> > 32-bit platforms, which was one of the reasons, I think, that sliding
> > windows were added, to map just the parts we want to read.
>
> To clarify (I think I see why you mentioned pack creation now), there
> are actually two window concepts. core.packedGitWindowSize is about
> reading pack files. pack.window is for generating pack files. The
> second window should already be thread-safe since we do all the
> heuristics to find best base object candidates in threads.

I was indeed confusing this two concepts, thanks for clarifying it! I
took a quick look around the usage of core.packedGitWindowSize arround
the code (at packfile.c) and it seems to be already thread-safe (I may
be wrong thought).

> --
> Duy

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
  2019-04-08  6:58       ` Christian Couder
@ 2019-04-08 16:03         ` Matheus Tavares Bernardino
  0 siblings, 0 replies; 13+ messages in thread
From: Matheus Tavares Bernardino @ 2019-04-08 16:03 UTC (permalink / raw)
  To: Christian Couder
  Cc: Duy Nguyen, git, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane

On Mon, Apr 8, 2019 at 3:58 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Mon, Apr 8, 2019 at 5:32 AM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Mon, Apr 8, 2019 at 8:23 AM Duy Nguyen <pclouds@gmail.com> wrote:
> > >
> > > On Mon, Apr 8, 2019 at 5:52 AM Christian Couder
> > > <christian.couder@gmail.com> wrote:
> > > > > Git has a very optimized mechanism to compactly store
> > > > > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are
> > > > > created by[3]:
> > > > >
> > > > > 1. listing objects;
> > > > > 2. sorting the list with some good heuristics;
> > > > > 3. traversing the list with a sliding window to find similar objects in
> > > > > the window, in order to do delta decomposing;
> > > > > 4. compress the objects with zlib and write them to the packfile.
> > > > >
> > > > > What we are calling pack access code in this document, is the set of
> > > > > functions responsible for retrieving the objects stored at the
> > > > > packfiles. This process consists, roughly speaking, in three parts:
> > > > >
> > > > > 1. Locate and read the blob from packfile, using the index file;
> > > > > 2. If the blob is a delta, locate and read the base object to apply the
> > > > > delta on top of it;
> > > > > 3. Once the full content is read, decompress it (using zlib inflate).
> > > > >
> > > > > Note: There is a delta cache for the second step so that if another
> > > > > delta depends on the same base object, it is already in memory. This
> > > > > cache is global; also, the sliding windows, are global per packfile.
> > > >
> > > > Yeah, but the sliding windows are used only when creating pack files,
> > > > not when reading them, right?
> > >
> > > These windows are actually for reading. We used to just mmap the whole
> > > pack file in the early days but that was impossible for 4+ GB packs on
> > > 32-bit platforms, which was one of the reasons, I think, that sliding
> > > windows were added, to map just the parts we want to read.
> >
> > To clarify (I think I see why you mentioned pack creation now), there
> > are actually two window concepts. core.packedGitWindowSize is about
> > reading pack files. pack.window is for generating pack files. The
> > second window should already be thread-safe since we do all the
> > heuristics to find best base object candidates in threads.
>
> Yeah, it is not very clear in the proposal which windows it is talking
> about as I think a window is first mentioned when describing the steps
> to create a packfile in:
>
> "3. traversing the list with a sliding window to find similar objects
> in the window, in order to do delta decomposing;"
>
> Also the proposal plans to "Protect packfile.c read-and-write global
> variables ..." which made me wonder if it was also about improving
> thread safety when generating pack files.

Sorry, it is indeed unclear. The idea here was to say that variables
which are both read and updated in code that must be thread-safe
should be protected. I will refactor this, thanks.

Oh, also I'm targeting just packfile reading. The explanation on how
packfiles are created was written just as a contextualization. But
perhaps it leaded to some confusion on the proposal's objective.
Thanks for this feedback too.

> Thanks for clarifying!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
  2019-04-07 22:52 ` Christian Couder
  2019-04-08  1:23   ` Duy Nguyen
@ 2019-04-08 16:42   ` Matheus Tavares Bernardino
  1 sibling, 0 replies; 13+ messages in thread
From: Matheus Tavares Bernardino @ 2019-04-08 16:42 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Duy Nguyen, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane

On Sun, Apr 7, 2019 at 7:52 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> Hi Matheus
>
> On Sun, Apr 7, 2019 at 10:48 PM Matheus Tavares Bernardino
> <matheus.bernardino@usp.br> wrote:
> >
> > This is my proposal for GSoC with the subject "Make pack access code
> > thread-safe".
>
> Thanks!
>
> > I'm late in schedule but I would like to ask for your
> > comments on it. Any feedback will be highly appreciated.
> >
> > The "rendered" version can be seen here:
> > https://docs.google.com/document/d/1QXT3iiI5zjwusplcZNf6IbYc04-9diziVKdOGkTHeIU/edit?usp=sharing
>
> Thanks for the link!
>
> > Besides administrative questions and contributions to FLOSS projects, at
> > FLUSP, I’ve been mentoring people who want to start contributing to the
> > Linux Kernel and now, to Git, as well.
>
> Nice! Do you have links about that?

Unfortunately not :( Maybe just the mentoring slides (e.g.
https://flusp.ime.usp.br/materials/Kernel_Primeiros_Passos.pdf). But
they are all in Portuguese, so I don't know wether it would be
valuable to add them here...

> > # The Project
> >
> > As direct as possible, the goal with this project is to make more of
> > Git’s codebase thread-safe, so that we can improve parallelism in
> > various commands. The motivation behind this are the complaints from
> > developers experiencing slow Git commands when working with large
> > repositories[1], such as chromium and Android. And since nowadays, most
> > personal computers have multi-core CPUs, it is a natural step trying to
> > improve parallel support so that we can better use the available resources.
> >
> > With this in mind, pack access code is a good target for improvement,
> > since it’s used by many Git commands (e.g., checkout, grep, blame, diff,
> > log, etc.). This section of the codebase is still sequential and has
> > many global states, which should be protected before we can work to
> > improve parallelism.
>
> I think it's better if global state can be made local or perhaps
> removed, rather than protected (though of course that's not always
> possible).

Indeed! I just added this to the docs version. Thanks

> > ## The Pack Access Code
> >
> > To better describe what the pack access code is, we must talk about
> > Git’s object storing (in a simplified way):
>
> Maybe s/storing/storage/

Thanks. Already changed.

> > Besides what are called loose objects,
>
> s/loose object/loose object files/

Done, thanks!

> > Git has a very optimized mechanism to compactly store
> > objects (blobs, trees, commits, etc.) in packfiles[2]. These files are
> > created by[3]:
> >
> > 1. listing objects;
> > 2. sorting the list with some good heuristics;
> > 3. traversing the list with a sliding window to find similar objects in
> > the window, in order to do delta decomposing;
> > 4. compress the objects with zlib and write them to the packfile.
> >
> > What we are calling pack access code in this document, is the set of
> > functions responsible for retrieving the objects stored at the
> > packfiles. This process consists, roughly speaking, in three parts:
> >
> > 1. Locate and read the blob from packfile, using the index file;
> > 2. If the blob is a delta, locate and read the base object to apply the
> > delta on top of it;
> > 3. Once the full content is read, decompress it (using zlib inflate).
> >
> > Note: There is a delta cache for the second step so that if another
> > delta depends on the same base object, it is already in memory. This
> > cache is global; also, the sliding windows, are global per packfile.
>
> Yeah, but the sliding windows are used only when creating pack files,
> not when reading them, right?
>
> > If these steps were thread-safe, the ability to perform the delta
> > reconstruction (together with the delta cache lookup) and zlib inflation
> > in parallel could bring a good speedup. At git-blame, for example,
> > 24%[4] of the time is spent in the call stack originated at
> > read_object_file_extended. Not only this but once we have this big
> > section of the codebase thread-safe, we can work to parallelize even
> > more work at higher levels of the call stack. Therefore, with this
> > project, we aim to make room for many future optimizations in many Git
> > commands.
>
> Nice.
>
> > # Plan
> >
> > I will probably be working mainly with packfile.c, sha1-file.c,
> > object-store.h, object.c and pack.h, however, I may also need to tackle
> > other files. I will be focusing on the following three pack access call
> > chains, found in git-grep and/or git-blame:
> >
> > read_object_file → repo_read_object_file → read_object_file_extended →
> > read_object → oid_object_info_extended → find_pack_entry →
> > fill_pack_entry → find_pack_entry_one → bsearch_pack and
> > nth_packed_object_offset
> >
> > oid_object_info → oid_object_info_extended → <same as previous>
> >
> > read_object_with_reference → read_object_file → <same as previous>
> >
> > Ideally, at the end of the project, it will be possible to call
> > read_object_file, oid_object_info and read_object_with_reference with
> > thread-safety, so that these operations can be, latter, performed in
> > parallel.
> >
> > Here are some threads on Git’s mailing list where I started discussing
> > my project:
> >
> > * https://public-inbox.org/git/CAHd-oW7onvn4ugEjXzAX_OSVEfCboH3-FnGR00dU8iaoc+b8=Q@mail.gmail.com/
> > * https://public-inbox.org/git/20190402005245.4983-1-matheus.bernardino@usp.br/#t
> >
> > And also, a previous attempt to make part of the pack access code
> > thread-safe which I may use as a base:
> >
> > * https://public-inbox.org/git/20140212015727.1D63A403D3@wince.sfo.corp.google.com/#Z30builtin:gc.c
>
> Nice.
>
> > # Points to work on
> >
> > * Investigate pack access call chains and look for non-thread-safe
> > operations on then.
> > * Protect packfile.c read-and-write global variables, such as
> > pack_open_windows, pack_open_fds and etc., using mutexes.
>
> Do you want to work on making both packfile reading and packfile
> writing thread safe? Or just packfile reading?

I plan to work on packfile reading, only.

> If some variables are used for both reading and writing packfiles, do
> you plan to protect them only when they are used for reading?

Hm, I haven't thought of that before. But indeed, if they are used for
both, I think I should protect them in both cases.

> The rest of your proposal looks very good to me. Please make sure you
> upload this or an updated version soon to the GSoC web site.

Thanks, Christian. I will work on the final points today and submit it.

> Thanks,
> Christian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
  2019-04-08  9:26     ` Philip Oakley
@ 2019-04-08 17:04       ` Matheus Tavares Bernardino
  2019-04-08 19:19         ` Philip Oakley
  0 siblings, 1 reply; 13+ messages in thread
From: Matheus Tavares Bernardino @ 2019-04-08 17:04 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Duy Nguyen, Christian Couder, git, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane

On Mon, Apr 8, 2019 at 6:26 AM Philip Oakley <philipoakley@iee.org> wrote:
>
> On 08/04/2019 02:23, Duy Nguyen wrote:
> > On Mon, Apr 8, 2019 at 5:52 AM Christian Couder
> > <christian.couder@gmail.com> wrote:
> >>> Git has a very optimized mechanism to compactly store
> >>> objects (blobs, trees, commits, etc.) in packfiles[2]. These files are
> >>> created by[3]:
> >>>
> >>> 1. listing objects;
> >>> 2. sorting the list with some good heuristics;
> >>> 3. traversing the list with a sliding window to find similar objects in
> >>> the window, in order to do delta decomposing;
> >>> 4. compress the objects with zlib and write them to the packfile.
> >>>
> >>> What we are calling pack access code in this document, is the set of
> >>> functions responsible for retrieving the objects stored at the
> >>> packfiles. This process consists, roughly speaking, in three parts:
> >>>
> >>> 1. Locate and read the blob from packfile, using the index file;
> >>> 2. If the blob is a delta, locate and read the base object to apply the
> >>> delta on top of it;
> >>> 3. Once the full content is read, decompress it (using zlib inflate).
> >>>
> >>> Note: There is a delta cache for the second step so that if another
> >>> delta depends on the same base object, it is already in memory. This
> >>> cache is global; also, the sliding windows, are global per packfile.
> >> Yeah, but the sliding windows are used only when creating pack files,
> >> not when reading them, right?
> > These windows are actually for reading. We used to just mmap the whole
> > pack file in the early days but that was impossible for 4+ GB packs on
> > 32-bit platforms, which was one of the reasons, I think, that sliding
> > windows were added, to map just the parts we want to read.
>
> Another "32-bit problem" should also be expressly considered during the
> GSoC work because of the MS Windows definition of uInt / long to be only
> 32 bits, leading to much of the Git code failing on the Git for Windows
> port and on the Git LFS (for Windows) for packs and files greater than
> 4Gb. https://github.com/git-for-windows/git/issues/1063

Thanks for pointing it out. I didn't get it, thought, if your
suggestion was to also propose tackling this issue in this GSoC
project. Was it that? I read the link but it seems to be a kind of
unrelated problem from what I'm planing to do with the pack access
code (which is tread-safety). I may have understood this wrongly,
though. Please, let me know if that's the case :)

> Mainly it is just substitution of size_t for long, but there can be
> unexpected coercions when mixed data types get coerced down to a local
> 32-bit long. This is made worse by it being implementation defined, so
> one needs to be explicit about some casts up to pointer/memsized types.
> >>> # Points to work on
> >>>
> >>> * Investigate pack access call chains and look for non-thread-safe
> >>> operations on then.
> >>> * Protect packfile.c read-and-write global variables, such as
> >>> pack_open_windows, pack_open_fds and etc., using mutexes.
> >> Do you want to work on making both packfile reading and packfile
> >> writing thread safe? Or just packfile reading?
> > Packfile writing is probably already or pretty close to thread-safe
> > (at least the main writing code path in git-pack-objects; the
> > streaming blobs to a pack, i'm not so sure).
> --
> Philip

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
  2019-04-08 17:04       ` Matheus Tavares Bernardino
@ 2019-04-08 19:19         ` Philip Oakley
  2019-04-08 19:36           ` Matheus Tavares Bernardino
  0 siblings, 1 reply; 13+ messages in thread
From: Philip Oakley @ 2019-04-08 19:19 UTC (permalink / raw)
  To: Matheus Tavares Bernardino
  Cc: Duy Nguyen, Christian Couder, git, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane, Torsten Bögershausen

Hi Matheus

On 08/04/2019 18:04, Matheus Tavares Bernardino wrote:
>> Another "32-bit problem" should also be expressly considered during the
>> GSoC work because of the MS Windows definition of uInt / long to be only
>> 32 bits, leading to much of the Git code failing on the Git for Windows
>> port and on the Git LFS (for Windows) for packs and files greater than
>> 4Gb.https://github.com/git-for-windows/git/issues/1063

> Thanks for pointing it out. I didn't get it, thought, if your
> suggestion was to also propose tackling this issue in this GSoC
> project. Was it that? I read the link but it seems to be a kind of
> unrelated problem from what I'm planing to do with the pack access
> code (which is tread-safety). I may have understood this wrongly,
> though. Please, let me know if that's the case :)
> 
The main point was to avoid accidental regressions by re-introducing 
simple 'longs' where memsized types were more appropriate.

Torsten has already done a lot of work at 
https://github.com/tboegi/git/tree/tb.190402_1552_convert_size_t_only_git_master_181124_mk_size_t

HTH
Philip
(I'm off line for a few days)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
  2019-04-08 19:19         ` Philip Oakley
@ 2019-04-08 19:36           ` Matheus Tavares Bernardino
  2019-04-09  5:54             ` Torsten Bögershausen
  0 siblings, 1 reply; 13+ messages in thread
From: Matheus Tavares Bernardino @ 2019-04-08 19:36 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Duy Nguyen, Christian Couder, git, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane, Torsten Bögershausen

On Mon, Apr 8, 2019 at 4:19 PM Philip Oakley <philipoakley@iee.org> wrote:
>
> Hi Matheus
>
> On 08/04/2019 18:04, Matheus Tavares Bernardino wrote:
> >> Another "32-bit problem" should also be expressly considered during the
> >> GSoC work because of the MS Windows definition of uInt / long to be only
> >> 32 bits, leading to much of the Git code failing on the Git for Windows
> >> port and on the Git LFS (for Windows) for packs and files greater than
> >> 4Gb.https://github.com/git-for-windows/git/issues/1063
>
> > Thanks for pointing it out. I didn't get it, thought, if your
> > suggestion was to also propose tackling this issue in this GSoC
> > project. Was it that? I read the link but it seems to be a kind of
> > unrelated problem from what I'm planing to do with the pack access
> > code (which is tread-safety). I may have understood this wrongly,
> > though. Please, let me know if that's the case :)
> >
> The main point was to avoid accidental regressions by re-introducing
> simple 'longs' where memsized types were more appropriate.
>
> Torsten has already done a lot of work at
> https://github.com/tboegi/git/tree/tb.190402_1552_convert_size_t_only_git_master_181124_mk_size_t

Got it. Thanks, Philip!

> HTH
> Philip
> (I'm off line for a few days)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GSoC][RFC] Proposal: Make pack access code thread-safe
  2019-04-08 19:36           ` Matheus Tavares Bernardino
@ 2019-04-09  5:54             ` Torsten Bögershausen
  0 siblings, 0 replies; 13+ messages in thread
From: Torsten Bögershausen @ 2019-04-09  5:54 UTC (permalink / raw)
  To: Matheus Tavares Bernardino, Philip Oakley
  Cc: Duy Nguyen, Christian Couder, git, Thomas Gummerer,
	Оля Тележная,
	Elijah Newren, Tanushree Tumane

On 2019-04-08 21:36, Matheus Tavares Bernardino wrote:
> On Mon, Apr 8, 2019 at 4:19 PM Philip Oakley <philipoakley@iee.org> wrote:
>>
>> Hi Matheus
>>
>> On 08/04/2019 18:04, Matheus Tavares Bernardino wrote:
>>>> Another "32-bit problem" should also be expressly considered during the
>>>> GSoC work because of the MS Windows definition of uInt / long to be only
>>>> 32 bits, leading to much of the Git code failing on the Git for Windows
>>>> port and on the Git LFS (for Windows) for packs and files greater than
>>>> 4Gb.https://github.com/git-for-windows/git/issues/1063
>>
>>> Thanks for pointing it out. I didn't get it, thought, if your
>>> suggestion was to also propose tackling this issue in this GSoC
>>> project. Was it that? I read the link but it seems to be a kind of
>>> unrelated problem from what I'm planing to do with the pack access
>>> code (which is tread-safety). I may have understood this wrongly,
>>> though. Please, let me know if that's the case :)
>>>
>> The main point was to avoid accidental regressions by re-introducing
>> simple 'longs' where memsized types were more appropriate.
>>
>> Torsten has already done a lot of work at
>> https://github.com/tboegi/git/tree/tb.190402_1552_convert_size_t_only_git_master_181124_mk_size_t
>
> Got it. Thanks, Philip!
>
>> HTH
>> Philip
>> (I'm off line for a few days)

Thanks for the reminder -
I will probably send something out the next days/weeks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-04-09  5:54 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-07 20:48 [GSoC][RFC] Proposal: Make pack access code thread-safe Matheus Tavares Bernardino
2019-04-07 22:52 ` Christian Couder
2019-04-08  1:23   ` Duy Nguyen
2019-04-08  3:32     ` Duy Nguyen
2019-04-08  6:58       ` Christian Couder
2019-04-08 16:03         ` Matheus Tavares Bernardino
2019-04-08 15:58       ` Matheus Tavares Bernardino
2019-04-08  9:26     ` Philip Oakley
2019-04-08 17:04       ` Matheus Tavares Bernardino
2019-04-08 19:19         ` Philip Oakley
2019-04-08 19:36           ` Matheus Tavares Bernardino
2019-04-09  5:54             ` Torsten Bögershausen
2019-04-08 16:42   ` Matheus Tavares Bernardino

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).