git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: git@vger.kernel.org
Cc: David Barr <david.barr@cordelta.com>,
	Thomas Rast <trast@student.ethz.ch>,
	Ramkumar Ramachandra <artagnon@gmail.com>
Subject: [PATCH 0/4] teach vcs-svn/line_buffer to handle multiple input files
Date: Fri, 24 Dec 2010 02:05:05 -0600	[thread overview]
Message-ID: <20101224080505.GA29681@burratino> (raw)

Hi David et al,

This collection of patches comes from the svndiff0 series[1].  They
are not urgent --- the motivation is for svn-fe to be able to keep
separate track of input from stdin (svnrdump) and the report-fd (blobs
from fast-import) for the coming Text-delta support --- but ideally I
would like to see them applied at the start of the next merge window,
since they change API that other patches use.  See [*] below for the
open question.

The idea: instead of keeping the input file handle and input buffer as
global variables, pack them in a struct and let the calling program
keep track of them.

Patch 1 makes a previously global buffer local to the two functions
that use it.  Performance impact should be negligibile.  Ideally the
buffer would not be needed at all --- there is enough buffering at
lower layers already --- but stdio does not provide the calls that
would be needed to eliminate it (in particular a wrapper for
sendfile(2)).

Patch 2 replaces a use of the obj_pool library with a strbuf.  The
main immediate effect is to improve error handling behavior (more
importantly, this is needed for patches 3 and 4 since obj_pool is
defined to be global).

Patches 3 and 4 are the main patches, collecting input state in a
struct and moving resposibility for that struct to the calling
program, respectively.

The patches have already received a lot of testing.

[*]
I am not sure whether this is the right approach for reading from the
report-fd.  To avoid deadlock, we cannot issue a blocking read(2)
after the trailing newline has been read from an expected line or the
nth byte has been read in fixed-length input.  This would rule out
fread/fgets if implemented as follows with too large an internal
buffer:

 1. fill internal buffer completely (or stop when an error or
    end of file is encountered)
 2. fill caller's buffer from internal buffer

glibc fread/fgets do not work that way (and in fact will never
deadlock for us).  What about other platforms?  The standards (C,
POSIX) do not make it obvious.

 - maybe setvbuf(f, NULL, _IOLBF, 0) would make fgets safe to use
 - maybe setvbuf(f, NULL, _IONBF, 0) would make fread safe to use.
   On the other hand, it is not clear to me what unbuffered input
   means in this context.  That flag does not do anything meaningful
   on glibc, for example, for input streams.

If all else fails, setting the O_NONBLOCK flag with fcntl (this
could presumably be implemented as SetNamedPipeHandleState(...,
PIPE_NOWAIT) on Windows) would avoid trouble.  After any failing
operation we would have to check that the error is EAGAIN and
clear the error indicator.  Simple.  But it would be even better
to learn that that is not needed.

So far I have been playing it safe with the read(fd, buf, 1) trick
but that does not have great performance, as David noticed[2].

Thoughts?

Jonathan Nieder (4):
  vcs-svn: Eliminate global byte_buffer[] array
  vcs-svn: Replace buffer_read_string memory pool with a strbuf
  vcs-svn: Collect line_buffer data in a struct
  vcs-svn: Teach line_buffer to handle multiple input files

 test-line-buffer.c      |   17 ++++++-----
 vcs-svn/fast_export.c   |    6 ++--
 vcs-svn/fast_export.h   |    5 +++-
 vcs-svn/line_buffer.c   |   66 ++++++++++++++++++++--------------------------
 vcs-svn/line_buffer.h   |   25 +++++++++++++-----
 vcs-svn/line_buffer.txt |    5 ++-
 vcs-svn/svndump.c       |   29 +++++++++++---------
 7 files changed, 82 insertions(+), 71 deletions(-)

[1] http://thread.gmane.org/gmane.comp.version-control.git/158731
[2] http://colabti.org/irclogger/irclogger_log/git-devel?date=2010-12-18

             reply	other threads:[~2010-12-24  8:08 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-24  8:05 Jonathan Nieder [this message]
2010-12-24  8:08 ` [PATCH 1/4] vcs-svn: eliminate global byte_buffer Jonathan Nieder
2010-12-24  8:17 ` [PATCH 2/4] vcs-svn: replace buffer_read_string memory pool with a strbuf Jonathan Nieder
2010-12-24  8:18 ` [PATCH 3/4] vcs-svn: collect line_buffer data in a struct Jonathan Nieder
2010-12-24  8:28 ` [PATCH 4/4] vcs-svn: teach line_buffer to handle multiple input files Jonathan Nieder
2011-01-03  0:49 ` [PATCH 0/4] teach vcs-svn/line_buffer " Jonathan Nieder
2011-01-03  0:50   ` [PATCH 5/8] vcs-svn: make test-line-buffer input format more flexible Jonathan Nieder
2011-01-03  0:51   ` [PATCH 6/8] tests: give vcs-svn/line_buffer its own test script Jonathan Nieder
2011-01-03  0:52   ` [PATCH 7/8] vcs-svn: tweak test-line-buffer to not assume line-oriented input Jonathan Nieder
2011-01-03  1:07   ` [PATCH 8/8] t0081 (line-buffer): add buffering tests Jonathan Nieder
2011-01-03  1:34     ` Jonathan Nieder
2011-01-03  3:03   ` [PATCHES 9-12/12] line_buffer: more wrappers around stdio functions Jonathan Nieder
2011-01-03  3:05     ` [PATCH 09/12] vcs-svn: add binary-safe read function Jonathan Nieder
2011-01-03  3:06     ` [PATCH 10/12] vcs-svn: allow character-oriented input Jonathan Nieder
2011-01-03  3:09     ` [PATCH 11/12] vcs-svn: allow input from file descriptor Jonathan Nieder
2011-01-03  3:10     ` [PATCH 12/12] vcs-svn: teach line_buffer about temporary files Jonathan Nieder
2011-01-22  6:42       ` [FYI/PATCH] vcs-svn: give control over temporary file names Jonathan Nieder
2011-02-26 11:44 ` [PULL svn-fe] fast-import 'ls', line-buffer changes Jonathan Nieder
2011-02-26 12:03   ` David Michael Barr
2011-02-28  6:15   ` Junio C Hamano
2011-02-28 21:32     ` [PATCH svn-fe] fast-import: make code "-Wpointer-arith" clean Jonathan Nieder
2011-02-28 21:36       ` Sverre Rabbelier
2011-02-28 22:05         ` Junio C Hamano
2011-02-28 23:15       ` Jonathan Nieder
2011-03-01  0:41         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101224080505.GA29681@burratino \
    --to=jrnieder@gmail.com \
    --cc=artagnon@gmail.com \
    --cc=david.barr@cordelta.com \
    --cc=git@vger.kernel.org \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).