git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	David Michael Barr <david.barr@cordelta.com>,
	Sverre Rabbelier <srabbelier@gmail.com>
Subject: [PATCH/WIP 00/16] svn delta applier
Date: Sun, 10 Oct 2010 21:34:36 -0500	[thread overview]
Message-ID: <20101011023435.GA706@burratino> (raw)
In-Reply-To: <20100810125317.GB3921@kytes>

Hi,

The svndiff format has proved more difficult to parse than expected.
This series documents the current state of things, and though it is
not complete, it should be ready for nitpicking by the masses.

Patches 1-4 modify the line_buffer API by introducing a struct
line_buffer to collect state that was previously held in global
variables.  Callers can use multiple line_buffers to manage input from
multiple files at a time.

Patches 5-10 add various utility functions to the line_buffer API
(wrapping strbuf_fread(), fgetc(), etc).  Putting the helpers there
instead of having callers work with the FILE* directly means one
could easily

 - tweak the input stream (to insert "link: " at the beginning
   for symlinks?);
 - trace reads, for debugging; or
 - use read() directly in place of stdio and limit the number of bytes
   buffered

if one wants to.

Patch 11 adds a data structure and function to manage a "sliding
window" without using mmap() or fseek().  See the svndiff0 spec[1] for
how this would be used.

Patches 12 and 13 are some basic components for reading an svndiff0
file: reading variable-length integers and the opening magic bytes.

Patch 15 makes the svn-fe test usable on systems (like Ram's) without
libsvn-perl installed.  It also should make the test easier to read
for people unfamiliar with lib-git-svn.sh.

Patch 16 is the delta parser/applier.  This patch does _not_ add it to
contrib/svn-fe, even though that would be useful, since the
command-line interface is not set in stone yet.  If you want to try it
out, use the test-svn-fe command:

	test-svn-fe -d <preimage> <delta> <delta length>

The preimage or delta arg can be /dev/stdin for use in a pipeline.
Both are only read sequentially; they do not need to be regular files.

One of the test cases is enormous.  The svn delta lib doesn't use
multiple windows except when dealing with relatively big files, but
probably the test case should be replaced with a smaller, artificial
example.

One of the test cases does not pass.  I also don't know how to apply
the delta by hand --- it seems to have some extra bytes at the end. :(
Unfortunately the svndiff0 spec is not as clear about when to stop
reading as one might like

The code separately maintains nominal and actual lengths for a few
buffers, since truncated input is permitted (and even required) in the
deltas svn produces, though the svndiff0 spec does not document the
semantics of that.

For svn-fe changes to take advantage of this code to handle the
dumpfilev3 format, see <git://github.com/barrbrain/git.git>[2].  So
now the full svnrdump | svn-fe | fast-import pipeline can be
experienced.  It still chokes on some deltas in the wild.

Thoughts, cleanups, test cases, bug reports, improvements welcome. :)

Enjoy,
Jonathan Nieder (15):
  vcs-svn: Eliminate global byte_buffer[] array
  vcs-svn: Replace buffer_read_string()'s memory pool with a strbuf
  vcs-svn: Collect line_buffer data in a struct
  vcs-svn: Teach line_buffer to handle multiple input files
  vcs-svn: Make buffer_skip_bytes() report partial reads
  vcs-svn: Better support for reading large files
  vcs-svn: Add binary-safe read() function
  vcs-svn: Let callers peek ahead to find stream end
  vcs-svn: Allow input errors to be detected early
  vcs-svn: Allow character-oriented input
  vcs-svn: Add code to maintain a sliding view of a file
  vcs-svn: Learn to parse variable-length integers
  vcs-svn: Learn to check for SVN\0 magic
  compat: helper for detecting unsigned overflow
  vcs-svn: Add svn delta parser

Ramkumar Ramachandra (1):
  t9010 (svn-fe): Eliminate dependency on svn perl bindings

 Makefile                 |    5 +-
 vcs-svn/line_buffer.txt  |    8 +-
 vcs-svn/fast_export.c    |    6 +-
 vcs-svn/fast_export.h    |    5 +-
 vcs-svn/line_buffer.c    |   99 +-
 vcs-svn/line_buffer.h    |   29 +-
 vcs-svn/sliding_window.c |   65 +
 vcs-svn/sliding_window.h |   14 +
 vcs-svn/svndiff.c        |  344 +
 vcs-svn/svndiff.h        |    9 +
 vcs-svn/svndump.c        |   29 +-
 vcs-svn/LICENSE          |    2 +
 git-compat-util.h        |    6 +
 test-line-buffer.c       |   17 +-
 test-svn-fe.c            |   37 +-
 t/t9010-svn-fe.sh        |   29 +-
 t/t9010/Xerces.cpp.diff0 |  Bin 0 -> 12185 bytes
 t/t9010/Xerces.cpp.done  |54963 +++++++++++++++++++++++++++++++++++++++++++++
 t/t9010/Xerces.cpp.src   |55052 ++++++++++++++++++++++++++++++++++++++++++++++
 t/t9010/newdata.diff0    |  Bin 0 -> 19392 bytes
 t/t9010/newdata.done     |  522 +
 t/t9010/src.diff0        |  Bin 0 -> 74 bytes
 t/t9010/src.done         |  522 +
 23 files changed, 111677 insertions(+), 86 deletions(-)
 create mode 100644 vcs-svn/sliding_window.c
 create mode 100644 vcs-svn/sliding_window.h
 create mode 100644 vcs-svn/svndiff.c
 create mode 100644 vcs-svn/svndiff.h
 create mode 100644 t/t9010/Xerces.cpp.diff0
 create mode 100644 t/t9010/Xerces.cpp.done
 create mode 100644 t/t9010/Xerces.cpp.src
 create mode 100644 t/t9010/blank.done
 create mode 100644 t/t9010/newdata.diff0
 create mode 100644 t/t9010/newdata.done
 create mode 100644 t/t9010/src.diff0
 create mode 100644 t/t9010/src.done

[1] http://svn.apache.org/repos/asf/subversion/trunk/notes/svndiff
[2] And some design notes:
http://thread.gmane.org/gmane.comp.version-control.git/150005/focus=157119

  parent reply	other threads:[~2010-10-11  2:38 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-15 16:22 [PATCH 0/8] Resurrect rr/svn-export Ramkumar Ramachandra
2010-07-15 16:22 ` [PATCH 1/8] Export parse_date_basic() to convert a date string to timestamp Ramkumar Ramachandra
2010-07-15 17:25   ` Jonathan Nieder
2010-07-15 22:54     ` Junio C Hamano
2010-07-15 16:22 ` [PATCH 2/8] Introduce vcs-svn lib Ramkumar Ramachandra
2010-07-15 17:46   ` Jonathan Nieder
2010-07-15 19:15     ` Ramkumar Ramachandra
2010-07-15 16:22 ` [PATCH 3/8] Add memory pool library Ramkumar Ramachandra
2010-07-15 18:57   ` Jonathan Nieder
2010-07-15 19:12     ` Ramkumar Ramachandra
2010-07-15 16:23 ` [PATCH 4/8] Add treap implementation Ramkumar Ramachandra
2010-07-15 19:09   ` Jonathan Nieder
2010-07-15 19:18     ` Ramkumar Ramachandra
2010-07-15 16:23 ` [PATCH 5/8] Add string-specific memory pool Ramkumar Ramachandra
2010-07-15 16:23 ` [PATCH 6/8] Add stream helper library Ramkumar Ramachandra
2010-07-15 19:19   ` Jonathan Nieder
2010-07-15 16:23 ` [PATCH 7/8] Add infrastructure to write revisions in fast-export format Ramkumar Ramachandra
2010-07-15 19:28   ` Jonathan Nieder
2010-07-15 16:23 ` [PATCH 8/8] Add SVN dump parser Ramkumar Ramachandra
2010-07-15 19:52   ` Jonathan Nieder
2010-07-15 20:04     ` Jonathan Nieder
2010-07-16 10:13 ` [PATCH 0/8] Resurrect rr/svn-export Jonathan Nieder
2010-07-16 10:16   ` [PATCH 3/9] Add memory pool library Jonathan Nieder
2010-07-16 10:23   ` [PATCH 4/9] Add treap implementation Jonathan Nieder
2010-07-16 18:26     ` Jonathan Nieder
2010-08-09 21:57   ` [PATCH 0/10] rr/svn-export reroll Jonathan Nieder
2010-08-09 22:01     ` [PATCH 01/10] Export parse_date_basic() to convert a date string to timestamp Jonathan Nieder
2010-08-09 22:04     ` [PATCH 02/10] Introduce vcs-svn lib Jonathan Nieder
2010-08-09 22:11     ` [PATCH 03/10] Add memory pool library Jonathan Nieder
2010-08-09 22:17     ` [PATCH 04/10] Add treap implementation Jonathan Nieder
2010-08-12 17:22       ` Junio C Hamano
2010-08-12 22:02         ` Jonathan Nieder
2010-08-12 22:11         ` Jonathan Nieder
2010-08-12 22:44           ` Junio C Hamano
2010-08-09 22:34     ` [PATCH 05/10] Add string-specific memory pool Jonathan Nieder
2010-08-12 17:22       ` Junio C Hamano
2010-08-12 21:30         ` Jonathan Nieder
2010-08-09 22:39     ` [PATCH 06/10] Add stream helper library Jonathan Nieder
2010-08-09 22:48     ` [PATCH 07/10] Infrastructure to write revisions in fast-export format Jonathan Nieder
2010-08-09 22:55     ` [PATCH 08/10] SVN dump parser Jonathan Nieder
2010-08-12 17:22       ` Junio C Hamano
2010-08-09 22:55     ` PATCH 09/10] Update svn-fe manual Jonathan Nieder
2010-08-09 22:58     ` [PATCH 10/10] svn-fe manual: Clarify warning about deltas in dump files Jonathan Nieder
2010-08-10 12:53     ` [PATCH 0/10] rr/svn-export reroll Ramkumar Ramachandra
2010-08-11  1:53       ` Jonathan Nieder
2010-10-11  2:34       ` Jonathan Nieder [this message]
2010-10-11  2:37         ` [PATCH 01/16] vcs-svn: Eliminate global byte_buffer[] array Jonathan Nieder
2010-10-11  2:39         ` [PATCH 03/16] vcs-svn: Collect line_buffer data in a struct Jonathan Nieder
2010-10-11  2:41         ` [PATCH 04/16] vcs-svn: Teach line_buffer to handle multiple input files Jonathan Nieder
2010-10-11  2:44         ` [PATCH 05/16] vcs-svn: Make buffer_skip_bytes() report partial reads Jonathan Nieder
2010-10-11  2:46         ` [PATCH 06/16] vcs-svn: Improve support for reading large files Jonathan Nieder
2010-10-11  2:47         ` [PATCH 07/16] vcs-svn: Add binary-safe read() function Jonathan Nieder
2010-10-11  2:47         ` [PATCH 08/16] vcs-svn: Let callers peek ahead to find stream end Jonathan Nieder
2010-10-11  2:51         ` [PATCH 09/16] vcs-svn: Allow input errors to be detected early Jonathan Nieder
2010-10-11  2:52         ` [PATCH 10/16] vcs-svn: Allow character-oriented input Jonathan Nieder
2010-10-11  2:53         ` [PATCH 11/16] vcs-svn: Add code to maintain a sliding view of a file Jonathan Nieder
2010-10-11  2:55         ` [PATCH 12/16] vcs-svn: Learn to parse variable-length integers Jonathan Nieder
2010-10-11  2:58         ` [PATCH 13/16] vcs-svn: Learn to check for SVN\0 magic Jonathan Nieder
2010-10-11  2:59         ` [PATCH 14/16] compat: helper for detecting unsigned overflow Jonathan Nieder
2010-10-11  3:00         ` [PATCH 15/16] t9010 (svn-fe): Eliminate dependency on svn perl bindings Jonathan Nieder
2010-10-11  3:11         ` [PATCH 02/16] vcs-svn: Replace buffer_read_string() memory pool with a strbuf Jonathan Nieder
2010-10-11  4:01         ` [PATCH/RFC 16'/16] vcs-svn: Add svn delta parser Jonathan Nieder
2010-10-13  9:17           ` [PATCH/RFC 0/11] Building up the " Jonathan Nieder
2010-10-13  9:19             ` [PATCH 01/11] fixup! vcs-svn: Learn to parse variable-length integers Jonathan Nieder
2010-10-13  9:21             ` [PATCH 02/11] vcs-svn: Skeleton of an svn delta parser Jonathan Nieder
2010-10-13  9:30             ` [PATCH 03/11] vcs-svn: Read the preimage while applying deltas Jonathan Nieder
2010-10-14 21:45               ` Sam Vilain
2010-10-14 23:40                 ` Jonathan Nieder
2010-10-13  9:35             ` [PATCH 04/11] vcs-svn: Read inline data from deltas Jonathan Nieder
2010-10-13  9:38             ` [PATCH 05/11] vcs-svn: Read instructions " Jonathan Nieder
2010-10-13  9:39             ` [PATCH 06/11] vcs-svn: Implement copyfrom_data delta instruction Jonathan Nieder
2010-10-13  9:41             ` [PATCH 07/11] vcs-svn: Check declared number of output bytes Jonathan Nieder
2010-10-13  9:48             ` [PATCH 08/11] vcs-svn: Reject deltas that do not consume all inline data Jonathan Nieder
2010-10-13  9:50             ` [PATCH 09/11] vcs-svn: Let deltas use data from postimage Jonathan Nieder
2010-10-13  9:53             ` [PATCH 10/11] vcs-svn: Reject deltas that read past end of preimage Jonathan Nieder
2010-10-13  9:58             ` [PATCH 11/11] vcs-svn: Allow deltas to copy from preimage Jonathan Nieder
2010-10-13 10:00             ` Jonathan Nieder
2010-10-18 17:00             ` [PATCH/RFC 0/11] Building up the delta parser Ramkumar Ramachandra
2010-10-18 17:03               ` Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101011023435.GA706@burratino \
    --to=jrnieder@gmail.com \
    --cc=artagnon@gmail.com \
    --cc=david.barr@cordelta.com \
    --cc=git@vger.kernel.org \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).