git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Han Xin <chiyutianyi@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Git List" <git@vger.kernel.org>, "Jeff King" <peff@peff.net>,
	"Jiang Xin" <zhiyou.jx@alibaba-inc.com>,
	"Philip Oakley" <philipoakley@iee.email>,
	"Derrick Stolee" <stolee@gmail.com>,
	"René Scharfe" <l.s.r@web.de>,
	"Jiang Xin" <worldhello.net@gmail.com>,
	"Han Xin" <hanxin.hx@alibaba-inc.com>
Subject: Re: [PATCH v9 0/5] unpack large blobs in stream
Date: Tue, 01 Feb 2022 22:24:46 +0100	[thread overview]
Message-ID: <220201.86wnie8eg0.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <20220120112114.47618-1-chiyutianyi@gmail.com>


On Thu, Jan 20 2022, Han Xin wrote:

> From: Han Xin <hanxin.hx@alibaba-inc.com>
>
> Changes since v8:
> * Rename "assert_no_loose ()" into "test_no_loose ()" in
>   "t5329-unpack-large-objects.sh". Remove "assert_no_pack ()" and use
>   "test_dir_is_empty" instead.
>
> * Revert changes to "create_tmpfile()" and error handling is now in
>   "start_loose_object_common()".
>
> * Remove "finalize_object_file_with_mtime()" which seems to be an overkill
>   for "write_loose_object()" now. 
>
> * Remove the commit "object-file.c: remove the slash for directory_size()",
>   it can be in a separate patch if necessary.
>
> Han Xin (4):
>   unpack-objects: low memory footprint for get_data() in dry_run mode
>   object-file.c: refactor write_loose_object() to several steps
>   object-file.c: add "stream_loose_object()" to handle large object
>   unpack-objects: unpack_non_delta_entry() read data in a stream
>
> Ævar Arnfjörð Bjarmason (1):
>   object-file API: add a format_object_header() function

I sent
https://lore.kernel.org/git/cover-00.10-00000000000-20220201T144803Z-avarab@gmail.com/
today which suggests splitting out the 5/5 cleanup you'd integrated.

I then rebased these patches of yours on top of that, the result is
here:
https://github.com/avar/git/tree/han-xin-avar/unpack-loose-object-streaming-9

The range-diff to your version is below. There's a few unrelated
fixes/nits in it.

I think with/without basing this on top of my series above your patches
here look good with the nits pointed out in the diff below addressed
(and some don't need to be). I.e. the dependency on it is rather
trivial, and the two could be split up.

What do you think is a good way to proceed? I could just submit the
below as a proposed v10 if you'd like & agree...

1:  553a9377eb3 ! 1:  61fcfe7b840 unpack-objects: low memory footprint for get_data() in dry_run mode
    @@ Commit message
         unpack-objects: low memory footprint for get_data() in dry_run mode
     
         As the name implies, "get_data(size)" will allocate and return a given
    -    size of memory. Allocating memory for a large blob object may cause the
    +    amount of memory. Allocating memory for a large blob object may cause the
         system to run out of memory. Before preparing to replace calling of
         "get_data()" to unpack large blob objects in latter commits, refactor
         "get_data()" to reduce memory footprint for dry_run mode.
    @@ Commit message
     
         Suggested-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
         Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## builtin/unpack-objects.c ##
     @@ builtin/unpack-objects.c: static void use(int bytes)
    @@ t/t5328-unpack-large-objects.sh (new)
     +
     +test_no_loose () {
     +	glob=dest.git/objects/?? &&
    -+	echo "$glob" >expect &&
    -+	eval "echo $glob" >actual &&
    ++	echo $glob >expect &&
    ++	echo "$glob" >actual &&
     +	test_cmp expect actual
     +}
     +
-:  ----------- > 2:  c6b0437db03 object-file.c: do fsync() and close() before post-write die()
2:  88c91affd61 ! 3:  77bcfe3da6f object-file.c: refactor write_loose_object() to several steps
    @@ Commit message
         When writing a large blob using "write_loose_object()", we have to pass
         a buffer with the whole content of the blob, and this behavior will
         consume lots of memory and may cause OOM. We will introduce a stream
    -    version function ("stream_loose_object()") in latter commit to resolve
    +    version function ("stream_loose_object()") in later commit to resolve
         this issue.
     
    -    Before introducing a stream vesion function for writing loose object,
    -    do some refactoring on "write_loose_object()" to reuse code for both
    -    versions.
    +    Before introducing that streaming function, do some refactoring on
    +    "write_loose_object()" to reuse code for both versions.
     
         Rewrite "write_loose_object()" as follows:
     
    @@ Commit message
     
          3. Compress data.
     
    -     4. Move common steps for ending zlib stream into a new funciton
    +     4. Move common steps for ending zlib stream into a new function
             "end_loose_object_common()".
     
          5. Close fd and finalize the object file.
    @@ Commit message
         Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
         Helped-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
         Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## object-file.c ##
     @@ object-file.c: static int create_tmpfile(struct strbuf *tmp, const char *filename)
      	return fd;
      }
      
    ++/**
    ++ * Common steps for loose object writers to start writing loose
    ++ * objects:
    ++ *
    ++ * - Create tmpfile for the loose object.
    ++ * - Setup zlib stream for compression.
    ++ * - Start to feed header to zlib stream.
    ++ *
    ++ * Returns a "fd", which should later be provided to
    ++ * end_loose_object_common().
    ++ */
     +static int start_loose_object_common(struct strbuf *tmp_file,
     +				     const char *filename, unsigned flags,
     +				     git_zstream *stream,
    @@ object-file.c: static int create_tmpfile(struct strbuf *tmp, const char *filenam
     +	return fd;
     +}
     +
    -+static void end_loose_object_common(int ret, git_hash_ctx *c,
    ++/**
    ++ * Common steps for loose object writers to end writing loose objects:
    ++ *
    ++ * - End the compression of zlib stream.
    ++ * - Get the calculated oid to "parano_oid".
    ++ * - fsync() and close() the "fd"
    ++ */
    ++static void end_loose_object_common(int fd, int ret, git_hash_ctx *c,
     +				    git_zstream *stream,
     +				    struct object_id *parano_oid,
     +				    const struct object_id *expected_oid,
    @@ object-file.c: static int create_tmpfile(struct strbuf *tmp, const char *filenam
     +	if (ret != Z_OK)
     +		die(_(die_msg2_fmt), ret, expected_oid);
     +	the_hash_algo->final_oid_fn(parano_oid, c);
    ++
    ++	/*
    ++	 * We already did a write_buffer() to the "fd", let's fsync()
    ++	 * and close().
    ++	 *
    ++	 * We might still die() on a subsequent sanity check, but
    ++	 * let's not add to that confusion by not flushing any
    ++	 * outstanding writes to disk first.
    ++	 */
    ++	close_loose_object(fd);
     +}
     +
      static int write_loose_object(const struct object_id *oid, char *hdr,
    @@ object-file.c: static int write_loose_object(const struct object_id *oid, char *
     -	while (git_deflate(&stream, 0) == Z_OK)
     -		; /* nothing */
     -	the_hash_algo->update_fn(&c, hdr, hdrlen);
    -+	/* Common steps for write_loose_object and stream_loose_object to
    -+	 * start writing loose oject:
    -+	 *
    -+	 *  - Create tmpfile for the loose object.
    -+	 *  - Setup zlib stream for compression.
    -+	 *  - Start to feed header to zlib stream.
    -+	 */
     +	fd = start_loose_object_common(&tmp_file, filename.buf, flags,
     +				       &stream, compressed, sizeof(compressed),
     +				       &c, hdr, hdrlen);
    @@ object-file.c: static int write_loose_object(const struct object_id *oid, char *
     -		die(_("deflateEnd on object %s failed (%d)"), oid_to_hex(oid),
     -		    ret);
     -	the_hash_algo->final_oid_fn(&parano_oid, &c);
    -+	/* Common steps for write_loose_object and stream_loose_object to
    -+	 * end writing loose oject:
    -+	 *
    -+	 *  - End the compression of zlib stream.
    -+	 *  - Get the calculated oid to "parano_oid".
    -+	 */
    -+	end_loose_object_common(ret, &c, &stream, &parano_oid, oid,
    +-
    +-	/*
    +-	 * We already did a write_buffer() to the "fd", let's fsync()
    +-	 * and close().
    +-	 *
    +-	 * We might still die() on a subsequent sanity check, but
    +-	 * let's not add to that confusion by not flushing any
    +-	 * outstanding writes to disk first.
    +-	 */
    +-	close_loose_object(fd);
    ++	end_loose_object_common(fd, ret, &c, &stream, &parano_oid, oid,
     +				N_("unable to deflate new object %s (%d)"),
     +				N_("deflateEnd on object %s failed (%d)"));
    -+
    + 
      	if (!oideq(oid, &parano_oid))
      		die(_("confused by unstable object source data for %s"),
    - 		    oid_to_hex(oid));
3:  054a00ed21d ! 4:  71c10e734d1 object-file.c: add "stream_loose_object()" to handle large object
    @@ Commit message
     
         Add a new function "stream_loose_object()", which is a stream version of
         "write_loose_object()" but with a low memory footprint. We will use this
    -    function to unpack large blob object in latter commit.
    +    function to unpack large blob object in later commit.
     
         Another difference with "write_loose_object()" is that we have no chance
         to run "write_object_file_prepare()" to calculate the oid in advance.
         In "write_loose_object()", we know the oid and we can write the
         temporary file in the same directory as the final object, but for an
         object with an undetermined oid, we don't know the exact directory for
    -    the object, so we have to save the temporary file in ".git/objects/"
    -    directory instead.
    +    the object.
    +
    +    Still, we need to save the temporary file we're preparing
    +    somewhere. We'll do that in the top-level ".git/objects/"
    +    directory (or whatever "GIT_OBJECT_DIRECTORY" is set to). Once we've
    +    streamed it we'll know the OID, and will move it to its canonical
    +    path.
     
         "freshen_packed_object()" or "freshen_loose_object()" will be called
         inside "stream_loose_object()" after obtaining the "oid".
    @@ object-file.c: static int freshen_packed_object(const struct object_id *oid)
     +
     +	/* Since oid is not determined, save tmp file to odb path. */
     +	strbuf_addf(&filename, "%s/", get_object_directory());
    -+	hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %"PRIuMAX, type_name(OBJ_BLOB), len) + 1;
    ++	hdrlen = format_object_header(hdr, sizeof(hdr), OBJ_BLOB, len);
     +
     +	/* Common steps for write_loose_object and stream_loose_object to
     +	 * start writing loose oject:
    @@ object-file.c: static int freshen_packed_object(const struct object_id *oid)
     +	 *  - End the compression of zlib stream.
     +	 *  - Get the calculated oid.
     +	 */
    -+	end_loose_object_common(ret, &c, &stream, oid, NULL,
    ++	end_loose_object_common(fd, ret, &c, &stream, oid, NULL,
     +				N_("unable to stream deflate new object (%d)"),
     +				N_("deflateEnd on stream object failed (%d)"));
     +
    -+	close_loose_object(fd);
    -+
     +	if (freshen_packed_object(oid) || freshen_loose_object(oid)) {
     +		unlink_or_warn(tmp_file.buf);
     +		goto cleanup;
    @@ object-file.c: static int freshen_packed_object(const struct object_id *oid)
     +}
     +
      int write_object_file_flags(const void *buf, unsigned long len,
    - 			    const char *type, struct object_id *oid,
    + 			    enum object_type type, struct object_id *oid,
      			    unsigned flags)
     
      ## object-store.h ##
    @@ object-store.h: static inline int write_object_file(const void *buf, unsigned lo
      
     +int stream_loose_object(struct input_stream *in_stream, size_t len,
     +			struct object_id *oid);
    -+
    - int hash_object_file_literally(const void *buf, unsigned long len,
    - 			       const char *type, struct object_id *oid,
    - 			       unsigned flags);
    + int hash_write_object_file_literally(const void *buf, unsigned long len,
    + 				     const char *type, struct object_id *oid,
    + 				     unsigned flags);
-:  ----------- > 5:  3c1d788d69d core doc: modernize core.bigFileThreshold documentation
4:  6bcba6bce66 ! 6:  8b83f6d6b83 unpack-objects: unpack_non_delta_entry() read data in a stream
    @@ Metadata
     Author: Han Xin <hanxin.hx@alibaba-inc.com>
     
      ## Commit message ##
    -    unpack-objects: unpack_non_delta_entry() read data in a stream
    +    unpack-objects: use stream_loose_object() to unpack large objects
     
    -    We used to call "get_data()" in "unpack_non_delta_entry()" to read the
    -    entire contents of a blob object, no matter how big it is. This
    -    implementation may consume all the memory and cause OOM.
    +    Make use of the stream_loose_object() function introduced in the
    +    preceding commit to unpack large objects. Before this we'd need to
    +    malloc() the size of the blob before unpacking it, which could cause
    +    OOM with very large blobs.
     
    -    By implementing a zstream version of input_stream interface, we can use
    -    a small fixed buffer for "unpack_non_delta_entry()". However, unpack
    -    non-delta objects from a stream instead of from an entrie buffer will
    -    have 10% performance penalty.
    +    We could use this new interface to unpack all blobs, but doing so
    +    would result in a performance penalty of around 10%, as the below
    +    "hyperfine" benchmark will show. We therefore limit this to files
    +    larger than "core.bigFileThreshold":
     
             $ hyperfine \
               --setup \
    @@ Commit message
                         -c core.bigFileThreshold=16k unpack-objects
                         <small.pack' in 'HEAD~1'
     
    -    Therefore, only unpack objects larger than the "core.bigFileThreshold"
    -    in zstream. Until now, the config variable has been used in the
    -    following cases, and our new case belongs to the packfile category.
    +    An earlier version of this patch introduced a new
    +    "core.bigFileStreamingThreshold" instead of re-using the existing
    +    "core.bigFileThreshold" variable[1]. As noted in a detailed overview
    +    of its users in [2] using it has several different meanings.
     
    -     * Archive:
    +    Still, we consider it good enough to simply re-use it. While it's
    +    possible that someone might want to e.g. consider objects "small" for
    +    the purposes of diffing but "big" for the purposes of writing them
    +    such use-cases are probably too obscure to worry about. We can always
    +    split up "core.bigFileThreshold" in the future if there's a need for
    +    that.
     
    -       + archive.c: write_entry(): write large blob entries to archive in
    -         stream.
    -
    -     * Loose objects:
    -
    -       + object-file.c: index_fd(): when hashing large files in worktree,
    -         read files in a stream, and create one packfile per large blob if
    -         want to save files to git object store.
    -
    -       + object-file.c: read_loose_object(): when checking loose objects
    -         using "git-fsck", do not read full content of large loose objects.
    -
    -     * Packfile:
    -
    -       + fast-import.c: parse_and_store_blob(): streaming large blob from
    -         foreign source to packfile.
    -
    -       + index-pack.c: check_collison(): read and check large blob in stream.
    -
    -       + index-pack.c: unpack_entry_data(): do not return the entire
    -         contents of the big blob from packfile, but uses a fixed buf to
    -         perform some integrity checks on the object.
    -
    -       + pack-check.c: verify_packfile(): used by "git-fsck" and will call
    -         check_object_signature() to check large blob in pack with the
    -         streaming interface.
    -
    -       + pack-objects.c: get_object_details(): set "no_try_delta" for large
    -         blobs when counting objects.
    -
    -       + pack-objects.c: write_no_reuse_object(): streaming large blob to
    -         pack.
    -
    -       + unpack-objects.c: unpack_non_delta_entry(): unpack large blob in
    -         stream from packfile.
    -
    -     * Others:
    -
    -       + diff.c: diff_populate_filespec(): treat large blob file as binary.
    -
    -       + streaming.c: istream_source(): as a helper of "open_istream()" to
    -         select proper streaming interface to read large blob from packfile.
    +    1. https://lore.kernel.org/git/20211210103435.83656-1-chiyutianyi@gmail.com/
    +    2. https://lore.kernel.org/git/20220120112114.47618-5-chiyutianyi@gmail.com/
     
         Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
         Helped-by: Derrick Stolee <stolee@gmail.com>
         Helped-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
         Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com>
     
    + ## Documentation/config/core.txt ##
    +@@ Documentation/config/core.txt: usage, at the slight expense of increased disk usage.
    + * Will be generally be streamed when written, which avoids excessive
    + memory usage, at the cost of some fixed overhead. Commands that make
    + use of this include linkgit:git-archive[1],
    +-linkgit:git-fast-import[1], linkgit:git-index-pack[1] and
    +-linkgit:git-fsck[1].
    ++linkgit:git-fast-import[1], linkgit:git-index-pack[1],
    ++linkgit:git-unpack-objects[1] and linkgit:git-fsck[1].
    + 
    + core.excludesFile::
    + 	Specifies the pathname to the file that contains patterns to
    +
      ## builtin/unpack-objects.c ##
     @@ builtin/unpack-objects.c: static void added_object(unsigned nr, enum object_type type,
      	}
5:  1bfaf89ee0b < -:  ----------- object-file API: add a format_object_header() function

  reply	other threads:[~2022-02-01 21:28 UTC|newest]

Thread overview: 211+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-09  8:20 [PATCH] unpack-objects: unpack large object in stream Han Xin
2021-10-19  7:37 ` Han Xin
2021-10-20 14:42 ` Philip Oakley
2021-10-21  3:42   ` Han Xin
2021-10-21 22:47     ` Philip Oakley
2021-11-03  1:48 ` Han Xin
2021-11-03 10:07   ` Philip Oakley
2021-11-12  9:40 ` [PATCH v2 1/6] object-file: refactor write_loose_object() to support inputstream Han Xin
2021-11-18  4:59   ` Jiang Xin
2021-11-18  6:45     ` Junio C Hamano
2021-11-12  9:40 ` [PATCH v2 2/6] object-file.c: add dry_run mode for write_loose_object() Han Xin
2021-11-18  5:42   ` Jiang Xin
2021-11-12  9:40 ` [PATCH v2 3/6] object-file.c: handle nil oid in write_loose_object() Han Xin
2021-11-18  5:49   ` Jiang Xin
2021-11-12  9:40 ` [PATCH v2 4/6] object-file.c: read input stream repeatedly " Han Xin
2021-11-18  5:56   ` Jiang Xin
2021-11-12  9:40 ` [PATCH v2 5/6] object-store.h: add write_loose_object() Han Xin
2021-11-12  9:40 ` [PATCH v2 6/6] unpack-objects: unpack large object in stream Han Xin
2021-11-18  7:14   ` Jiang Xin
2021-11-22  3:32 ` [PATCH v3 0/5] unpack large objects " Han Xin
2021-11-29  7:01   ` Han Xin
2021-11-29 19:12     ` Jeff King
2021-11-30  2:57       ` Han Xin
2021-12-03  9:35   ` [PATCH v4 " Han Xin
2021-12-07 16:18     ` Derrick Stolee
2021-12-10 10:34     ` [PATCH v5 0/6] unpack large blobs " Han Xin
2021-12-17 11:26       ` Han Xin
2021-12-21 11:51         ` [PATCH v7 0/5] " Han Xin
2021-12-21 11:51         ` [PATCH v7 1/5] unpack-objects.c: add dry_run mode for get_data() Han Xin
2021-12-21 14:09           ` Ævar Arnfjörð Bjarmason
2021-12-21 14:43             ` René Scharfe
2021-12-21 15:04               ` Ævar Arnfjörð Bjarmason
2021-12-22 11:15               ` Jiang Xin
2021-12-22 11:29             ` Jiang Xin
2021-12-31  3:06           ` Jiang Xin
2021-12-21 11:51         ` [PATCH v7 2/5] object-file API: add a format_object_header() function Han Xin
2021-12-21 14:30           ` René Scharfe
2022-02-01 14:28             ` C99 %z (was: [PATCH v7 2/5] object-file API: add a format_object_header() function) Ævar Arnfjörð Bjarmason
2021-12-31  3:12           ` [PATCH v7 2/5] object-file API: add a format_object_header() function Jiang Xin
2021-12-21 11:51         ` [PATCH v7 3/5] object-file.c: refactor write_loose_object() to reuse in stream version Han Xin
2021-12-21 14:16           ` Ævar Arnfjörð Bjarmason
2021-12-22 12:02             ` Jiang Xin
2021-12-21 11:52         ` [PATCH v7 4/5] object-file.c: add "write_stream_object_file()" to support read in stream Han Xin
2021-12-21 14:20           ` Ævar Arnfjörð Bjarmason
2021-12-21 15:05             ` Ævar Arnfjörð Bjarmason
2021-12-21 11:52         ` [PATCH v7 5/5] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2021-12-21 15:06           ` Ævar Arnfjörð Bjarmason
2021-12-31  3:19           ` Jiang Xin
2022-01-08  8:54         ` [PATCH v8 0/6] unpack large blobs in stream Han Xin
2022-01-20 11:21           ` [PATCH v9 0/5] " Han Xin
2022-02-01 21:24             ` Ævar Arnfjörð Bjarmason [this message]
2022-02-02  8:32               ` Han Xin
2022-02-02 10:59                 ` Ævar Arnfjörð Bjarmason
2022-02-04 14:07             ` [PATCH v10 0/6] unpack-objects: support streaming large objects to disk Ævar Arnfjörð Bjarmason
2022-02-04 14:07               ` [PATCH v10 1/6] unpack-objects: low memory footprint for get_data() in dry_run mode Ævar Arnfjörð Bjarmason
2022-02-04 14:07               ` [PATCH v10 2/6] object-file.c: do fsync() and close() before post-write die() Ævar Arnfjörð Bjarmason
2022-02-04 14:07               ` [PATCH v10 3/6] object-file.c: refactor write_loose_object() to several steps Ævar Arnfjörð Bjarmason
2022-02-04 14:07               ` [PATCH v10 4/6] object-file.c: add "stream_loose_object()" to handle large object Ævar Arnfjörð Bjarmason
2022-02-04 14:07               ` [PATCH v10 5/6] core doc: modernize core.bigFileThreshold documentation Ævar Arnfjörð Bjarmason
2022-02-04 14:07               ` [PATCH v10 6/6] unpack-objects: use stream_loose_object() to unpack large objects Ævar Arnfjörð Bjarmason
2022-03-19  0:23               ` [PATCH v11 0/8] unpack-objects: support streaming blobs to disk Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 1/8] unpack-objects: low memory footprint for get_data() in dry_run mode Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 2/8] object-file.c: do fsync() and close() before post-write die() Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 3/8] object-file.c: refactor write_loose_object() to several steps Ævar Arnfjörð Bjarmason
2022-03-19 10:11                   ` René Scharfe
2022-03-19  0:23                 ` [PATCH v11 4/8] object-file.c: factor out deflate part of write_loose_object() Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 5/8] object-file.c: add "stream_loose_object()" to handle large object Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 6/8] core doc: modernize core.bigFileThreshold documentation Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 7/8] unpack-objects: refactor away unpack_non_delta_entry() Ævar Arnfjörð Bjarmason
2022-03-19  0:23                 ` [PATCH v11 8/8] unpack-objects: use stream_loose_object() to unpack large objects Ævar Arnfjörð Bjarmason
2022-03-29 13:56                 ` [PATCH v12 0/8] unpack-objects: support streaming blobs to disk Ævar Arnfjörð Bjarmason
2022-03-29 13:56                   ` [PATCH v12 1/8] unpack-objects: low memory footprint for get_data() in dry_run mode Ævar Arnfjörð Bjarmason
2022-03-29 13:56                   ` [PATCH v12 2/8] object-file.c: do fsync() and close() before post-write die() Ævar Arnfjörð Bjarmason
2022-03-29 13:56                   ` [PATCH v12 3/8] object-file.c: refactor write_loose_object() to several steps Ævar Arnfjörð Bjarmason
2022-03-30  7:13                     ` Han Xin
2022-03-30 17:34                       ` Ævar Arnfjörð Bjarmason
2022-03-29 13:56                   ` [PATCH v12 4/8] object-file.c: factor out deflate part of write_loose_object() Ævar Arnfjörð Bjarmason
2022-03-29 13:56                   ` [PATCH v12 5/8] object-file.c: add "stream_loose_object()" to handle large object Ævar Arnfjörð Bjarmason
2022-03-31 19:54                     ` Neeraj Singh
2022-03-29 13:56                   ` [PATCH v12 6/8] core doc: modernize core.bigFileThreshold documentation Ævar Arnfjörð Bjarmason
2022-03-29 13:56                   ` [PATCH v12 7/8] unpack-objects: refactor away unpack_non_delta_entry() Ævar Arnfjörð Bjarmason
2022-03-30 19:40                     ` René Scharfe
2022-03-31 12:42                       ` Ævar Arnfjörð Bjarmason
2022-03-31 16:38                         ` René Scharfe
2022-03-29 13:56                   ` [PATCH v12 8/8] unpack-objects: use stream_loose_object() to unpack large objects Ævar Arnfjörð Bjarmason
2022-06-04 10:10                   ` [PATCH v13 0/7] unpack-objects: support streaming blobs to disk Ævar Arnfjörð Bjarmason
2022-06-04 10:10                     ` [PATCH v13 1/7] unpack-objects: low memory footprint for get_data() in dry_run mode Ævar Arnfjörð Bjarmason
2022-06-06 18:35                       ` Junio C Hamano
2022-06-09  4:10                         ` Han Xin
2022-06-09 18:27                           ` Junio C Hamano
2022-06-10  1:50                             ` Han Xin
2022-06-10  2:05                               ` Ævar Arnfjörð Bjarmason
2022-06-10 12:04                                 ` Han Xin
2022-06-04 10:10                     ` [PATCH v13 2/7] object-file.c: do fsync() and close() before post-write die() Ævar Arnfjörð Bjarmason
2022-06-06 18:45                       ` Junio C Hamano
2022-06-04 10:10                     ` [PATCH v13 3/7] object-file.c: refactor write_loose_object() to several steps Ævar Arnfjörð Bjarmason
2022-06-04 10:10                     ` [PATCH v13 4/7] object-file.c: factor out deflate part of write_loose_object() Ævar Arnfjörð Bjarmason
2022-06-04 10:10                     ` [PATCH v13 5/7] object-file.c: add "stream_loose_object()" to handle large object Ævar Arnfjörð Bjarmason
2022-06-06 19:44                       ` Junio C Hamano
2022-06-06 20:02                         ` Junio C Hamano
2022-06-09  6:04                           ` Han Xin
2022-06-09  6:14                         ` Han Xin
2022-06-07 19:53                       ` Neeraj Singh
2022-06-08 15:34                         ` Junio C Hamano
2022-06-09  3:05                         ` [RFC PATCH] object-file.c: batched disk flushes for stream_loose_object() Han Xin
2022-06-09  7:35                           ` Neeraj Singh
2022-06-09  9:30                           ` Johannes Schindelin
2022-06-10 12:55                             ` Han Xin
2022-06-04 10:10                     ` [PATCH v13 6/7] core doc: modernize core.bigFileThreshold documentation Ævar Arnfjörð Bjarmason
2022-06-06 19:50                       ` Junio C Hamano
2022-06-04 10:10                     ` [PATCH v13 7/7] unpack-objects: use stream_loose_object() to unpack large objects Ævar Arnfjörð Bjarmason
2022-06-10 14:46                     ` [PATCH v14 0/7] unpack-objects: support streaming blobs to disk Han Xin
2022-06-10 14:46                       ` [PATCH v14 1/7] unpack-objects: low memory footprint for get_data() in dry_run mode Han Xin
2022-06-10 14:46                       ` [PATCH v14 2/7] object-file.c: do fsync() and close() before post-write die() Han Xin
2022-06-10 21:10                         ` René Scharfe
2022-06-10 21:33                           ` Junio C Hamano
2022-06-11  1:50                             ` Han Xin
2022-06-10 14:46                       ` [PATCH v14 3/7] object-file.c: refactor write_loose_object() to several steps Han Xin
2022-06-10 14:46                       ` [PATCH v14 4/7] object-file.c: factor out deflate part of write_loose_object() Han Xin
2022-06-10 14:46                       ` [PATCH v14 5/7] object-file.c: add "stream_loose_object()" to handle large object Han Xin
2022-06-10 14:46                       ` [PATCH v14 6/7] core doc: modernize core.bigFileThreshold documentation Han Xin
2022-06-10 21:01                         ` Junio C Hamano
2022-06-10 14:46                       ` [PATCH v14 7/7] unpack-objects: use stream_loose_object() to unpack large objects Han Xin
2022-06-11  2:44                       ` [PATCH v15 0/6] unpack-objects: support streaming blobs to disk Han Xin
2022-06-11  2:44                         ` [PATCH v15 1/6] unpack-objects: low memory footprint for get_data() in dry_run mode Han Xin
2022-06-11  2:44                         ` [PATCH v15 2/6] object-file.c: refactor write_loose_object() to several steps Han Xin
2022-06-11  2:44                         ` [PATCH v15 3/6] object-file.c: factor out deflate part of write_loose_object() Han Xin
2022-06-11  2:44                         ` [PATCH v15 4/6] object-file.c: add "stream_loose_object()" to handle large object Han Xin
2022-06-11  2:44                         ` [PATCH v15 5/6] core doc: modernize core.bigFileThreshold documentation Han Xin
2022-06-11  2:44                         ` [PATCH v15 6/6] unpack-objects: use stream_loose_object() to unpack large objects Han Xin
2022-07-01  2:01                           ` Junio C Hamano
2022-05-20  3:05                 ` [PATCH 0/1] unpack-objects: low memory footprint for get_data() in dry_run mode Han Xin
2022-05-20  3:05                   ` [PATCH 1/1] " Han Xin
2022-01-20 11:21           ` [PATCH v9 1/5] " Han Xin
2022-01-20 11:21           ` [PATCH v9 2/5] object-file.c: refactor write_loose_object() to several steps Han Xin
2022-01-20 11:21           ` [PATCH v9 3/5] object-file.c: add "stream_loose_object()" to handle large object Han Xin
2022-01-20 11:21           ` [PATCH v9 4/5] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2022-01-20 11:21           ` [PATCH v9 5/5] object-file API: add a format_object_header() function Han Xin
2022-01-08  8:54         ` [PATCH v8 1/6] unpack-objects: low memory footprint for get_data() in dry_run mode Han Xin
2022-01-08 12:28           ` René Scharfe
2022-01-11 10:41             ` Han Xin
2022-01-08  8:54         ` [PATCH v8 2/6] object-file.c: refactor write_loose_object() to several steps Han Xin
2022-01-08 12:28           ` René Scharfe
2022-01-11 10:33             ` Han Xin
2022-01-08  8:54         ` [PATCH v8 3/6] object-file.c: remove the slash for directory_size() Han Xin
2022-01-08 17:24           ` René Scharfe
2022-01-11 10:14             ` Han Xin
2022-01-08  8:54         ` [PATCH v8 4/6] object-file.c: add "stream_loose_object()" to handle large object Han Xin
2022-01-08  8:54         ` [PATCH v8 5/6] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2022-01-08  8:54         ` [PATCH v8 6/6] object-file API: add a format_object_header() function Han Xin
2021-12-17 11:26       ` [PATCH v6 1/6] object-file.c: release strbuf in write_loose_object() Han Xin
2021-12-17 19:28         ` René Scharfe
2021-12-18  0:09           ` Junio C Hamano
2021-12-17 11:26       ` [PATCH v6 2/6] object-file.c: refactor object header generation into a function Han Xin
2021-12-20 12:10         ` [RFC PATCH] object-file API: add a format_loose_header() function Ævar Arnfjörð Bjarmason
2021-12-20 12:48           ` Philip Oakley
2021-12-20 22:25           ` Junio C Hamano
2021-12-21  1:42             ` Ævar Arnfjörð Bjarmason
2021-12-21  2:11               ` Junio C Hamano
2021-12-21  2:27                 ` Ævar Arnfjörð Bjarmason
2021-12-21 11:43           ` Han Xin
2021-12-17 11:26       ` [PATCH v6 3/6] object-file.c: refactor write_loose_object() to reuse in stream version Han Xin
2021-12-17 11:26       ` [PATCH v6 4/6] object-file.c: make "write_object_file_flags()" to support read in stream Han Xin
2021-12-17 22:52         ` René Scharfe
2021-12-17 11:26       ` [PATCH v6 5/6] unpack-objects.c: add dry_run mode for get_data() Han Xin
2021-12-17 21:22         ` René Scharfe
2021-12-17 11:26       ` [PATCH v6 6/6] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2021-12-10 10:34     ` [PATCH v5 1/6] object-file: refactor write_loose_object() to support read from stream Han Xin
2021-12-10 10:34     ` [PATCH v5 2/6] object-file.c: handle undetermined oid in write_loose_object() Han Xin
2021-12-13  7:32       ` Ævar Arnfjörð Bjarmason
2021-12-10 10:34     ` [PATCH v5 3/6] object-file.c: read stream in a loop " Han Xin
2021-12-10 10:34     ` [PATCH v5 4/6] unpack-objects.c: add dry_run mode for get_data() Han Xin
2021-12-10 10:34     ` [PATCH v5 5/6] object-file.c: make "write_object_file_flags()" to support "HASH_STREAM" Han Xin
2021-12-10 10:34     ` [PATCH v5 6/6] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2021-12-13  8:05       ` Ævar Arnfjörð Bjarmason
2021-12-03  9:35   ` [PATCH v4 1/5] object-file: refactor write_loose_object() to read buffer from stream Han Xin
2021-12-03 13:28     ` Ævar Arnfjörð Bjarmason
2021-12-06  2:07       ` Han Xin
2021-12-03  9:35   ` [PATCH v4 2/5] object-file.c: handle undetermined oid in write_loose_object() Han Xin
2021-12-03 13:21     ` Ævar Arnfjörð Bjarmason
2021-12-06  2:51       ` Han Xin
2021-12-03 13:41     ` Ævar Arnfjörð Bjarmason
2021-12-06  3:12       ` Han Xin
2021-12-03  9:35   ` [PATCH v4 3/5] object-file.c: read stream in a loop " Han Xin
2021-12-03  9:35   ` [PATCH v4 4/5] unpack-objects.c: add dry_run mode for get_data() Han Xin
2021-12-03 13:59     ` Ævar Arnfjörð Bjarmason
2021-12-06  3:20       ` Han Xin
2021-12-03  9:35   ` [PATCH v4 5/5] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2021-12-03 13:07     ` Ævar Arnfjörð Bjarmason
2021-12-07  6:42       ` Han Xin
2021-12-03 13:54     ` Ævar Arnfjörð Bjarmason
2021-12-07  6:17       ` Han Xin
2021-12-03 14:05     ` Ævar Arnfjörð Bjarmason
2021-12-07  6:48       ` Han Xin
2021-11-22  3:32 ` [PATCH v3 1/5] object-file: refactor write_loose_object() to read buffer from stream Han Xin
2021-11-23 23:24   ` Junio C Hamano
2021-11-24  9:00     ` Han Xin
2021-11-22  3:32 ` [PATCH v3 2/5] object-file.c: handle undetermined oid in write_loose_object() Han Xin
2021-11-29 15:10   ` Derrick Stolee
2021-11-29 20:44     ` Junio C Hamano
2021-11-29 22:18       ` Derrick Stolee
2021-11-30  3:23         ` Han Xin
2021-11-22  3:32 ` [PATCH v3 3/5] object-file.c: read stream in a loop " Han Xin
2021-11-22  3:32 ` [PATCH v3 4/5] unpack-objects.c: add dry_run mode for get_data() Han Xin
2021-11-22  3:32 ` [PATCH v3 5/5] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2021-11-29 17:37   ` Derrick Stolee
2021-11-30 13:49     ` Han Xin
2021-11-30 18:38       ` Derrick Stolee
2021-12-01 20:37         ` "git hyperfine" (was: [PATCH v3 5/5] unpack-objects[...]) Ævar Arnfjörð Bjarmason
2021-12-02  7:33         ` [PATCH v3 5/5] unpack-objects: unpack_non_delta_entry() read data in a stream Han Xin
2021-12-02 13:53           ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=220201.86wnie8eg0.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=chiyutianyi@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hanxin.hx@alibaba-inc.com \
    --cc=l.s.r@web.de \
    --cc=peff@peff.net \
    --cc=philipoakley@iee.email \
    --cc=stolee@gmail.com \
    --cc=worldhello.net@gmail.com \
    --cc=zhiyou.jx@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).