git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Zachary Turner <zturner@chromium.org>
To: Karsten Blees <karsten.blees@gmail.com>
Cc: Stefan Zager <szager@google.com>, Git Mailing List <git@vger.kernel.org>
Subject: Re: Make the git codebase thread-safe
Date: Fri, 14 Feb 2014 11:16:50 -0800	[thread overview]
Message-ID: <CAAErz9j=_FpWLSyUk43pp8A6e7Ej0crT8ghW5-yxBEbGkd6O+A@mail.gmail.com> (raw)
In-Reply-To: <CAAErz9g7ND1htfk=yxRJJLbSEgBi4EV_AHC9uDRptugGWFWcXw@mail.gmail.com>

(Gah, sorry if you're receiving multiple emails to your personal
addresses, I need to get used to manually setting Plain-text mode
every time I send a message).

For the mixed read, we wouldn't be looking for another caller of
pread() (since it doesn't care what the file pointer is), but instead
a caller of read() or lseek() (since those do depend on the current
file pointer).  In index-pack.c, I see two possible culprits:

1) A call to xread() from inside fill()
2) A call to lseek in parse_pack_objects()

Do you think these could be related?  If so, maybe that opens up some
other solutions?

BTW, the version you posted isn't thread safe.  Suppose thread A and
thread B execute this function at the same time.  A executes through
the ReadFile(), but does not yet reset the second lseek64.  B then
executes the first lseek64(), storing off the modified file pointer.
Then A finishes, then B finishes.  At the end, the file pointer is
still modified.

On Fri, Feb 14, 2014 at 11:15 AM, Zachary Turner <zturner@chromium.org> wrote:
> For the mixed read, we wouldn't be looking for another caller of pread()
> (since it doesn't care what the file pointer is), but instead a caller of
> read() or lseek().  In index-pack.c, I see two possible culprits:
>
> 1) A call to xread() from inside fill()
> 2) A call to lseek in parse_pack_objects()
>
> Do you think these could be related?  If so, maybe that opens up some other
> solutions?
>
> BTW, the version you posted isn't thread safe.  Suppose thread A and thread
> B execute this function at the same time.  A executes through the
> ReadFile(), but does not yet reset the second lseek64.  B then executes the
> first lseek64(), storing off the modified file pointer.  Then A finishes,
> then B finishes.  At the end, the file pointer is still modified.
>
>
>
> On Fri, Feb 14, 2014 at 11:04 AM, Karsten Blees <karsten.blees@gmail.com>
> wrote:
>>
>> Am 14.02.2014 00:09, schrieb Zachary Turner:
>> > To elaborate a little bit more, you can verify with a sample program
>> > that ReadFile with OVERLAPPED does in fact modify the HANDLE's file
>> > position.  The documentation doesn't actually state one way or
>> > another.   My original attempt at a patch didn't have the ReOpenFile,
>> > and we experienced regular read corruption.  We scratched our heads
>> > over it for a bit, and then hypothesized that someone must be mixing
>> > read styles, which led to this ReOpenFile workaround, which
>> > incidentally also solved the corruption problems.  We wrote a similar
>> > sample program to verify that when using ReOpenHandle, and changing
>> > the file pointer of the duplicated handle, that the file pointer of
>> > the original handle is not modified.
>> >
>> > We did not actually try to identify the source of the mixed read
>> > styles, but it seems like the only possible explanation.
>> >
>> > On Thu, Feb 13, 2014 at 2:53 PM, Stefan Zager <szager@google.com> wrote:
>> >> On Thu, Feb 13, 2014 at 2:51 PM, Karsten Blees
>> >> <karsten.blees@gmail.com> wrote:
>> >>> Am 13.02.2014 19:38, schrieb Zachary Turner:
>> >>>
>> >>>> The only reason ReOpenFile is necessary at
>> >>>> all is because some code somewhere is mixing read-styles against the
>> >>>> same
>> >>>> fd.
>> >>>>
>> >>>
>> >>> I don't understand...ReadFile with OVERLAPPED parameter doesn't modify
>> >>> the HANDLE's file position, so you should be able to mix read()/pread()
>> >>> however you like (as long as read() is only called from one thread).
>> >>
>> >> That is, apparently, a bald-faced lie in the ReadFile API doc.  First
>> >> implementation didn't use ReOpenFile, and it crashed all over the
>> >> place.  ReOpenFile fixed it.
>> >>
>> >> Stefan
>>
>> Damn...you're right, multi-threaded git-index-pack works fine, but some
>> tests fail badly. Mixed reads would have to be from git_mmap, which is the
>> only other caller of pread().
>>
>> A simple alternative to ReOpenHandle is to reset the file pointer to its
>> original position, as in compat/pread.c::git_pread. Thus single-theaded code
>> can mix read()/pread() at will, but multi-threaded code has to use pread()
>> exclusively (which is usually the case anyway). A main thread using read()
>> and background threads using pread() (which is technically allowed by POSIX)
>> will fail with this solution.
>>
>> This version passes the test suite on msysgit:
>>
>> ----8<----
>> ssize_t mingw_pread(int fd, void *buf, size_t count, off64_t offset)
>> {
>>         DWORD bytes_read;
>>         OVERLAPPED overlapped;
>>         off64_t current;
>>         memset(&overlapped, 0, sizeof(overlapped));
>>         overlapped.Offset = (DWORD) offset;
>>         overlapped.OffsetHigh = (DWORD) (offset >> 32);
>>
>>         current = lseek64(fd, 0, SEEK_CUR);
>>
>>         if (!ReadFile((HANDLE)_get_osfhandle(fd), buf, count, &bytes_read,
>> &overlapped)) {
>>                 errno = err_win_to_posix(GetLastError());
>>                 return -1;
>>         }
>>
>>         lseek64(fd, current, SEEK_SET);
>>
>>         return (ssize_t) bytes_read;
>> }
>>
>

  parent reply	other threads:[~2014-02-14 19:17 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-12  1:54 Make the git codebase thread-safe Stefan Zager
2014-02-12  2:02 ` Robin H. Johnson
2014-02-12  3:43   ` Duy Nguyen
2014-02-12 11:00     ` Karsten Blees
2014-02-12 23:03       ` Mike Hommey
2014-02-13  0:06         ` Karsten Blees
2014-02-12 18:15     ` Stefan Zager
2014-02-12  2:11 ` Duy Nguyen
2014-02-12 18:12   ` Stefan Zager
2014-02-12 18:33     ` Matthieu Moy
2014-02-12 18:39       ` Stefan Zager
2014-02-12 18:50     ` David Kastrup
2014-02-12 19:02       ` Stefan Zager
2014-02-12 19:15         ` David Kastrup
2014-02-12 23:09           ` Mike Hommey
2014-02-13  6:04             ` David Kastrup
2014-02-13  9:34               ` Mike Hommey
2014-02-13  9:48                 ` Mike Hommey
2014-02-13  8:30           ` David Kastrup
2014-02-12 20:06     ` Junio C Hamano
2014-02-12 20:27       ` Stefan Zager
2014-02-12 23:05         ` Junio C Hamano
2014-02-12 11:59 ` Erik Faye-Lund
2014-02-12 18:20   ` Stefan Zager
2014-02-12 18:27     ` Erik Faye-Lund
2014-02-12 18:34       ` Stefan Zager
2014-02-12 18:37         ` Erik Faye-Lund
2014-02-12 19:22           ` Karsten Blees
2014-02-12 19:30             ` Stefan Zager
2014-02-13  8:27               ` Johannes Sixt
2014-02-13  8:38                 ` David Kastrup
2014-02-13 18:40                 ` Stefan Zager
2014-02-13 18:38             ` Zachary Turner
2014-02-13 22:51               ` Karsten Blees
2014-02-13 22:53                 ` Stefan Zager
2014-02-13 23:09                   ` Zachary Turner
2014-02-14 19:04                     ` Karsten Blees
     [not found]                       ` <CAAErz9g7ND1htfk=yxRJJLbSEgBi4EV_AHC9uDRptugGWFWcXw@mail.gmail.com>
2014-02-14 19:16                         ` Zachary Turner [this message]
2014-02-14 23:10                           ` Karsten Blees
2014-02-15  0:45                           ` Duy Nguyen
2014-02-15  0:50                             ` Stefan Zager
2014-02-15  0:56                               ` Duy Nguyen
2014-02-15  1:15                                 ` Zachary Turner
2014-02-15  1:39                                   ` Duy Nguyen
2014-02-18 17:55                                     ` Junio C Hamano
2014-02-18 18:14                                       ` Zachary Turner
2014-02-14 19:52                         ` Stefan Zager
2014-02-14 21:49                       ` Stefan Zager
2014-02-13  1:42 ` brian m. carlson
2019-04-02  0:52 ` Matheus Tavares
2019-04-02  1:07   ` Duy Nguyen
2019-04-02 10:30     ` David Kastrup
2019-04-02 11:35       ` Duy Nguyen
2019-04-02 11:52         ` David Kastrup
2019-04-02 19:06     ` Matheus Tavares Bernardino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAErz9j=_FpWLSyUk43pp8A6e7Ej0crT8ghW5-yxBEbGkd6O+A@mail.gmail.com' \
    --to=zturner@chromium.org \
    --cc=git@vger.kernel.org \
    --cc=karsten.blees@gmail.com \
    --cc=szager@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).