[PATCH v2 0/4] Convert index writes to use hashfile API

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net, stolee@gmail.com,
	git@jeffhostetler.com, Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH v2 0/4] Convert index writes to use hashfile API
Date: Mon, 17 May 2021 12:24:48 +0000	[thread overview]
Message-ID: <pull.916.v2.git.1621254292.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.916.git.1616785928.gitgitgadget@gmail.com>

As I prepare some ideas on index v5, one thing that strikes me as an
interesting direction to try is to use the chunk-format API. This would make
our extension model extremely simple (they become optional chunks, easily
identified by the table of contents).

But there is a huge hurdle to even starting that investigation: the index
uses its own hashing methods, separate from the hashfile API in csum-file.c!

The internals of the algorithms are mostly identical. The only possible
change is that the buffer sizes are different: 8KB for hashfile and 128KB in
read-cache.c. This change is actually relatively recent! I was able to see a
performance difference before changing that size, so I do make them equal.
This improves some other commands, too, such as git multi-pack-index write.

There is a subtle point about how the EOIE extension works in that it needs
a hash of just the previous extension headers. This won't be needed by the
chunk-format API, so leave this mostly as-is, except where it collides
somewhat with the hashfile changes.


Updates in v2
=============

 * Sorry for the false start on v1. I'm correctly testing index writes this
   time, but I also verified other commands, such as "git add .", to be sure
   I was exercising everything. I also previously had incorrect assumptions
   about the EOIE extension, since it is not triggered with "small" repos
   like the Linux kernel repository ;).

 * This version is rebased onto a recent version of 'master' so the
   conflicts with ds/sparse-index-protections are resolved.

 * This version abandons the nested hashfile concept, since that was only
   needed for the incorrect interpretation of the EOIE extension's hash.

 * This version does unify the buffer size to 128 KB with justification.

 * There is also some more unification of writing logic to use
   write_in_full() in the hashfile API.

 * When using check_fd, the hashfile API is no longer thread safe due to a
   stack-allocated buffer moving to the global scope. I'm not sure if there
   is a better way to handle this case.

Thanks, -Stolee

Derrick Stolee (4):
  hashfile: use write_in_full()
  csum-file.h: increase hashfile buffer size
  read-cache: use hashfile instead of git_hash_ctx
  read-cache: delete unused hashing methods

 chunk-format.c |  12 ++--
 csum-file.c    |  45 ++++++------
 csum-file.h    |   4 +-
 read-cache.c   | 191 ++++++++++++++++---------------------------------
 4 files changed, 94 insertions(+), 158 deletions(-)


base-commit: 97eea85a0a1ec66d356567808a1e4ca2367e0ce7
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-916%2Fderrickstolee%2Findex-hashfile-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-916/derrickstolee/index-hashfile-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/916

Range-diff vs v1:

 -:  ------------ > 1:  b3e578ac0365 hashfile: use write_in_full()
 1:  0eca529766fc ! 2:  9dc602f6c422 csum-file: add nested_hashfile()
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    csum-file: add nested_hashfile()
     +    csum-file.h: increase hashfile buffer size
      
     -    The index writing code in do_write_index() uses a custom set of hashing
     -    code, in part because it was introduced before the hashfile API. But
     -    also, the End of Index Entries extension computes a hash of just the
     -    extension data, not the entire file preceding that extension.
     +    The hashfile API uses a hard-coded buffer size of 8KB and has ever since
     +    it was introduced in c38138c (git-pack-objects: write the pack files
     +    with a SHA1 csum, 2005-06-26). It performs a similar function to the
     +    hashing buffers in read-cache.c, but that code was updated from 8KB to
     +    128KB in f279894 (read-cache: make the index write buffer size 128K,
     +    2021-02-18). The justification there was that do_write_index() improves
     +    from 1.02s to 0.72s.
      
     -    Before converting the index writing code to use the hashfile API, create
     -    a concept of a "nested hashfile". By adding a 'base' member to 'struct
     -    hashfile', we indicate that any writes to this hashfile should be passed
     -    along to the base hashfile, too.
     +    There is a buffer, check_buffer, that is used to verify the check_fd
     +    file descriptor. When this buffer increases to 128K to fit the data
     +    being flushed, it causes the stack to overflow the limits placed in the
     +    test suite. By moving this to a static buffer, we stop using stack data
     +    for this purpose, but we lose some thread-safety. This change makes it
     +    unsafe to write to multiple hashfiles across different threads.
      
     -    In the next change, the index code will use this to create a new
     -    hashfile wose base is the hashfile for the index. The outer hashfile
     -    will compute the hash just for the extension details. Thus, it will
     -    finalize earlier than the base hashfile, hence there is no modification
     -    to finalize_hashfile() here.
     +    By adding a new trace2 region in the chunk-format API, we can see that
     +    the writing portion of 'git multi-pack-index write' lowers from ~1.49s
     +    to ~1.47s on a Linux machine. These effects may be more pronounced or
     +    diminished on other filesystems. The end-to-end timing is too noisy to
     +    have a definitive change either way.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     - ## csum-file.c ##
     -@@
     + ## chunk-format.c ##
     +@@ chunk-format.c: void add_chunk(struct chunkfile *cf,
       
     - static void flush(struct hashfile *f, const void *buf, unsigned int count)
     + int write_chunkfile(struct chunkfile *cf, void *data)
       {
     -+	if (f->base)
     -+		return;
     +-	int i;
     ++	int i, result = 0;
     + 	uint64_t cur_offset = hashfile_total(cf->f);
     + 
     ++	trace2_region_enter("chunkfile", "write", the_repository);
      +
     - 	if (0 <= f->check_fd && count)  {
     - 		unsigned char check_buffer[8192];
     - 		ssize_t ret = read_in_full(f->check_fd, check_buffer, count);
     -@@ csum-file.c: void hashwrite(struct hashfile *f, const void *buf, unsigned int count)
     - 		}
     - 		f->offset = offset;
     + 	/* Add the table of contents to the current offset */
     + 	cur_offset += (cf->chunks_nr + 1) * CHUNK_TOC_ENTRY_SIZE;
     + 
     +@@ chunk-format.c: int write_chunkfile(struct chunkfile *cf, void *data)
     + 
     + 	for (i = 0; i < cf->chunks_nr; i++) {
     + 		off_t start_offset = hashfile_total(cf->f);
     +-		int result = cf->chunks[i].write_fn(cf->f, data);
     ++		result = cf->chunks[i].write_fn(cf->f, data);
     + 
     + 		if (result)
     +-			return result;
     ++			goto cleanup;
     + 
     + 		if (hashfile_total(cf->f) - start_offset != cf->chunks[i].size)
     + 			BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead",
     +@@ chunk-format.c: int write_chunkfile(struct chunkfile *cf, void *data)
     + 			    hashfile_total(cf->f) - start_offset);
       	}
     -+
     -+	if (f->base)
     -+		hashwrite(f->base, buf, count);
     - }
       
     - struct hashfile *hashfd(int fd, const char *name)
     -@@ csum-file.c: struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp
     - 	f->name = name;
     - 	f->do_crc = 0;
     - 	the_hash_algo->init_fn(&f->ctx);
     -+	f->base = NULL;
     - 	return f;
     +-	return 0;
     ++cleanup:
     ++	trace2_region_leave("chunkfile", "write", the_repository);
     ++	return result;
       }
       
     -@@ csum-file.c: uint32_t crc32_end(struct hashfile *f)
     - 	f->do_crc = 0;
     - 	return f->crc32;
     - }
     -+
     -+struct hashfile *nested_hashfile(struct hashfile *f)
     + int read_table_of_contents(struct chunkfile *cf,
     +
     + ## csum-file.c ##
     +@@
     + #include "progress.h"
     + #include "csum-file.h"
     + 
     ++static void verify_buffer_or_die(struct hashfile *f,
     ++				 const void *buf,
     ++				 unsigned int count)
      +{
     -+	struct hashfile *n = xmalloc(sizeof(*f));
     -+	n->fd = -1;
     -+	n->check_fd = -1;
     -+	n->offset = 0;
     -+	n->total = 0;
     -+	n->tp = NULL;
     -+	n->name = NULL;
     -+	n->do_crc = 0;
     -+	the_hash_algo->init_fn(&n->ctx);
     -+	n->base = f;
     -+	return n;
     ++	static unsigned char check_buffer[WRITE_BUFFER_SIZE];
     ++	ssize_t ret = read_in_full(f->check_fd, check_buffer, count);
     ++
     ++	if (ret < 0)
     ++		die_errno("%s: sha1 file read error", f->name);
     ++	if (ret != count)
     ++		die("%s: sha1 file truncated", f->name);
     ++	if (memcmp(buf, check_buffer, count))
     ++		die("sha1 file '%s' validation error", f->name);
      +}
     ++
     + static void flush(struct hashfile *f, const void *buf, unsigned int count)
     + {
     +-	if (0 <= f->check_fd && count)  {
     +-		unsigned char check_buffer[8192];
     +-		ssize_t ret = read_in_full(f->check_fd, check_buffer, count);
     +-
     +-		if (ret < 0)
     +-			die_errno("%s: sha1 file read error", f->name);
     +-		if (ret != count)
     +-			die("%s: sha1 file truncated", f->name);
     +-		if (memcmp(buf, check_buffer, count))
     +-			die("sha1 file '%s' validation error", f->name);
     +-	}
     ++	if (0 <= f->check_fd && count)
     ++		verify_buffer_or_die(f, buf, count);
     + 
     + 	if (write_in_full(f->fd, buf, count) < 0) {
     + 		if (errno == ENOSPC)
      
       ## csum-file.h ##
     +@@
     + 
     + struct progress;
     + 
     ++#define WRITE_BUFFER_SIZE (128 * 1024)
     ++
     + /* A SHA1-protected file */
     + struct hashfile {
     + 	int fd;
      @@ csum-file.h: struct hashfile {
       	const char *name;
       	int do_crc;
       	uint32_t crc32;
     -+	struct hashfile *base;
     - 	unsigned char buffer[8192];
     +-	unsigned char buffer[8192];
     ++	unsigned char buffer[WRITE_BUFFER_SIZE];
       };
       
     -@@ csum-file.h: void hashflush(struct hashfile *f);
     - void crc32_begin(struct hashfile *);
     - uint32_t crc32_end(struct hashfile *);
     - 
     -+/*
     -+ * A nested hashfile uses the same interface as a hashfile, and computes
     -+ * a hash for the input bytes while passing them to the base hashfile
     -+ * instead of writing them to its own file. This is useful for computing
     -+ * a hash of a region within a file during the write.
     -+ */
     -+struct hashfile *nested_hashfile(struct hashfile *f);
     -+
     - /*
     -  * Returns the total number of bytes fed to the hashfile so far (including ones
     -  * that have not been written out to the descriptor yet).
     + /* Checkpoint */
 2:  e2611bbc007a ! 3:  b94172ccf5e9 read-cache: use hashfile instead of git_hash_ctx
     @@ Commit message
          that uses the chunk-format API. That API uses a hashfile as its base, so
          it is incompatible with the custom code in read-cache.c.
      
     -    This rewrite of the logic is rather straightforward, except for the
     -    special case of creating a nested hashfile to handle computing the hash
     -    of the extension data just for the End of Index Entries extension. The
     -    previous change introduced the concept for just this purpose.
     +    This rewrite is rather straightforward. It replaces all writes to the
     +    temporary file with writes to the hashfile struct. This takes care of
     +    many of the direct interactions with the_hash_algo.
      
     -    The internals of the algorithms are mostly identical. The only
     -    meaningful change is that the buffer sizes are different: 8KB for
     -    hashfile and 128KB in read-cache.c. I was unable to find a performance
     -    difference in these two implementations, despite testing on several repo
     -    sizes. I also tried adjusting the buffer size of the hashfile struct for
     -    a variety of sizes between 8KB and 128KB, and did not see a performance
     -    change for any of the commands that currently use hashfiles.
     +    There are still some remaining: the extension headers are hashed for use
     +    in the End of Index Entries (EOIE) extension. This use of the
     +    git_hash_ctx is left as-is. There are multiple reasons to not use a
     +    hashfile here, including the fact that the data is not actually writing
     +    to a file, just a hash computation. These hashes do not block our
     +    adoption of the chunk-format API in a future change to the index, so
     +    leave it as-is.
     +
     +    The internals of the algorithms are mostly identical. Previously, the
     +    hashfile API used a smaller 8KB buffer instead of the 128KB buffer from
     +    read-cache.c. The previous change already unified these sizes.
     +
     +    There is one subtle point: we do not pass the CSUM_FSYNC to the
     +    finalize_hashfile() method, which differs from most consumers of the
     +    hashfile API. The extra fsync() call indicated by this flag causes a
     +    significant peformance degradation that is noticeable for quick
     +    commands that write the index, such as "git add". Other consumers can
     +    absorb this cost with their more complicated data structure
     +    organization, and further writing structures such as pack-files and
     +    commit-graphs is rarely in the critical path for common user
     +    interactions.
      
          Some static methods become orphaned in this diff, so I marked them as
          MAYBE_UNUSED. The diff is much harder to read if they are deleted during
     @@ Commit message
          In addition to the test suite passing, I computed indexes using the
          previous binaries and the binaries compiled after this change, and found
          the index data to be exactly equal. Finally, I did extensive performance
     -    testing of "git update-index --really-refresh" on repos of various
     -    sizes, including one with over 2 million paths at HEAD. These tests
     +    testing of "git update-index --force-write" on repos of various sizes,
     +    including one with over 2 million paths at HEAD. These tests
          demonstrated less than 1% difference in behavior, so the performance
          should be considered identical.
      
     @@ Commit message
      
       ## read-cache.c ##
      @@
     - #include "fsmonitor.h"
       #include "thread-utils.h"
       #include "progress.h"
     + #include "sparse-index.h"
      +#include "csum-file.h"
       
       /* Mask for the name length in ce_flags in the on-disk index */
       
     -@@ read-cache.c: static struct index_entry_offset_table *read_ieot_extension(const char *mmap, si
     - static void write_ieot_extension(struct strbuf *sb, struct index_entry_offset_table *ieot);
     - 
     - static size_t read_eoie_extension(const char *mmap, size_t mmap_size);
     --static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, size_t offset);
     -+static void write_eoie_extension(struct strbuf *sb, const unsigned char *hash, size_t offset);
     - 
     - struct load_index_extensions
     - {
      @@ read-cache.c: int repo_index_has_changes(struct repository *repo,
       static unsigned char write_buffer[WRITE_BUFFER_SIZE];
       static unsigned long write_buffer_len;
     @@ read-cache.c: static int ce_write(git_hash_ctx *context, int fd, void *data, uns
       
      -static int write_index_ext_header(git_hash_ctx *context, git_hash_ctx *eoie_context,
      -				  int fd, unsigned int ext, unsigned int sz)
     -+static int write_index_ext_header(struct hashfile *f, unsigned int ext, unsigned int sz)
     ++static int write_index_ext_header(struct hashfile *f,
     ++				  git_hash_ctx *eoie_f,
     ++				  unsigned int ext,
     ++				  unsigned int sz)
       {
      -	ext = htonl(ext);
      -	sz = htonl(sz);
      -	if (eoie_context) {
      -		the_hash_algo->update_fn(eoie_context, &ext, 4);
      -		the_hash_algo->update_fn(eoie_context, &sz, 4);
     --	}
     --	return ((ce_write(context, fd, &ext, 4) < 0) ||
     --		(ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0;
      +	hashwrite_be32(f, ext);
      +	hashwrite_be32(f, sz);
     ++
     ++	if (eoie_f) {
     ++		ext = htonl(ext);
     ++		sz = htonl(sz);
     ++		the_hash_algo->update_fn(eoie_f, &ext, sizeof(ext));
     ++		the_hash_algo->update_fn(eoie_f, &sz, sizeof(sz));
     + 	}
     +-	return ((ce_write(context, fd, &ext, 4) < 0) ||
     +-		(ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0;
      +	return 0;
       }
       
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
       	uint64_t start = getnanotime();
      -	int newfd = tempfile->fd;
      -	git_hash_ctx c, eoie_c;
     -+	struct hashfile *f, *eoie_f;
     ++	struct hashfile *f;
     ++	git_hash_ctx *eoie_c = NULL;
       	struct cache_header hdr;
       	int i, err = 0, removed, extended, hdr_version;
       	struct cache_entry **cache = istate->cache;
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
       	}
       
      -	offset = lseek(newfd, 0, SEEK_CUR);
     -+	offset = lseek(f->fd, 0, SEEK_CUR);
     - 	if (offset < 0) {
     - 		free(ieot);
     - 		return -1;
     - 	}
     +-	if (offset < 0) {
     +-		free(ieot);
     +-		return -1;
     +-	}
      -	offset += write_buffer_len;
     ++	offset = hashfile_total(f);
      +
       	nr = 0;
       	previous_name = (hdr_version == 4) ? &previous_name_buf : NULL;
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
       				previous_name->buf[0] = 0;
       			nr = 0;
      -			offset = lseek(newfd, 0, SEEK_CUR);
     -+
     -+			offset = lseek(f->fd, 0, SEEK_CUR);
     - 			if (offset < 0) {
     - 				free(ieot);
     - 				return -1;
     - 			}
     +-			if (offset < 0) {
     +-				free(ieot);
     +-				return -1;
     +-			}
      -			offset += write_buffer_len;
     ++
     ++			offset = hashfile_total(f);
       		}
      -		if (ce_write_entry(&c, newfd, ce, previous_name, (struct ondisk_cache_entry *)&ondisk) < 0)
      +		if (ce_write_entry(f, ce, previous_name, (struct ondisk_cache_entry *)&ondisk) < 0)
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
       
      -	/* Write extension data here */
      -	offset = lseek(newfd, 0, SEEK_CUR);
     -+	offset = lseek(f->fd, 0, SEEK_CUR);
     - 	if (offset < 0) {
     - 		free(ieot);
     - 		return -1;
     - 	}
     --	offset += write_buffer_len;
     --	the_hash_algo->init_fn(&eoie_c);
     +-	if (offset < 0) {
     +-		free(ieot);
     +-		return -1;
     ++	offset = hashfile_total(f);
      +
      +	/*
     -+	 * The extensions must be hashed on their own for use in the EOIE
     -+	 * extension. Use a nested hashfile to compute the hash for this
     -+	 * region while passing the buffer to the original hashfile.
     ++	 * The extension headers must be hashed on their own for the
     ++	 * EOIE extension. Create a hashfile here to compute that hash.
      +	 */
     -+	if (offset && record_eoie())
     -+		eoie_f = nested_hashfile(f);
     -+	else
     -+		eoie_f = f;
     ++	if (offset && record_eoie()) {
     ++		CALLOC_ARRAY(eoie_c, 1);
     ++		the_hash_algo->init_fn(eoie_c);
     + 	}
     +-	offset += write_buffer_len;
     +-	the_hash_algo->init_fn(&eoie_c);
       
       	/*
       	 * Lets write out CACHE_EXT_INDEXENTRYOFFSETTABLE first so that we
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
       		write_ieot_extension(&sb, ieot);
      -		err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_INDEXENTRYOFFSETTABLE, sb.len) < 0
      -			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
     -+		err = write_index_ext_header(eoie_f, CACHE_EXT_INDEXENTRYOFFSETTABLE, sb.len) < 0;
     -+		hashwrite(eoie_f, sb.buf, sb.len);
     ++		err = write_index_ext_header(f, eoie_c, CACHE_EXT_INDEXENTRYOFFSETTABLE, sb.len) < 0;
     ++		hashwrite(f, sb.buf, sb.len);
       		strbuf_release(&sb);
       		free(ieot);
       		if (err)
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
      -			write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_LINK,
      -					       sb.len) < 0 ||
      -			ce_write(&c, newfd, sb.buf, sb.len) < 0;
     -+			write_index_ext_header(eoie_f, CACHE_EXT_LINK,
     ++			write_index_ext_header(f, eoie_c, CACHE_EXT_LINK,
      +					       sb.len) < 0;
     -+		hashwrite(eoie_f, sb.buf, sb.len);
     ++		hashwrite(f, sb.buf, sb.len);
       		strbuf_release(&sb);
       		if (err)
       			return -1;
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
       		cache_tree_write(&sb, istate->cache_tree);
      -		err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_TREE, sb.len) < 0
      -			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
     -+		err = write_index_ext_header(eoie_f, CACHE_EXT_TREE, sb.len) < 0;
     -+		hashwrite(eoie_f, sb.buf, sb.len);
     ++		err = write_index_ext_header(f, eoie_c, CACHE_EXT_TREE, sb.len) < 0;
     ++		hashwrite(f, sb.buf, sb.len);
       		strbuf_release(&sb);
       		if (err)
       			return -1;
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
      -		err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_RESOLVE_UNDO,
      -					     sb.len) < 0
      -			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
     -+		err = write_index_ext_header(eoie_f, CACHE_EXT_RESOLVE_UNDO,
     ++		err = write_index_ext_header(f, eoie_c, CACHE_EXT_RESOLVE_UNDO,
      +					     sb.len) < 0;
     -+		hashwrite(eoie_f, sb.buf, sb.len);
     ++		hashwrite(f, sb.buf, sb.len);
       		strbuf_release(&sb);
       		if (err)
       			return -1;
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
      -		err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_UNTRACKED,
      -					     sb.len) < 0 ||
      -			ce_write(&c, newfd, sb.buf, sb.len) < 0;
     -+		err = write_index_ext_header(eoie_f, CACHE_EXT_UNTRACKED,
     ++		err = write_index_ext_header(f, eoie_c, CACHE_EXT_UNTRACKED,
      +					     sb.len) < 0;
     -+		hashwrite(eoie_f, sb.buf, sb.len);
     ++		hashwrite(f, sb.buf, sb.len);
       		strbuf_release(&sb);
       		if (err)
       			return -1;
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
       		write_fsmonitor_extension(&sb, istate);
      -		err = write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_FSMONITOR, sb.len) < 0
      -			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
     -+		err = write_index_ext_header(eoie_f, CACHE_EXT_FSMONITOR, sb.len) < 0;
     -+		hashwrite(eoie_f, sb.buf, sb.len);
     ++		err = write_index_ext_header(f, eoie_c, CACHE_EXT_FSMONITOR, sb.len) < 0;
     ++		hashwrite(f, sb.buf, sb.len);
       		strbuf_release(&sb);
       		if (err)
       			return -1;
     + 	}
     + 	if (istate->sparse_index) {
     +-		if (write_index_ext_header(&c, &eoie_c, newfd, CACHE_EXT_SPARSE_DIRECTORIES, 0) < 0)
     ++		if (write_index_ext_header(f, eoie_c, CACHE_EXT_SPARSE_DIRECTORIES, 0) < 0)
     + 			return -1;
     + 	}
     + 
      @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
       	 * read.  Write it out regardless of the strip_extensions parameter as we need it
       	 * when loading the shared index.
       	 */
      -	if (offset && record_eoie()) {
     -+	if (f != eoie_f) {
     ++	if (eoie_c) {
       		struct strbuf sb = STRBUF_INIT;
     -+		unsigned char hash[GIT_MAX_RAWSZ];
       
      -		write_eoie_extension(&sb, &eoie_c, offset);
      -		err = write_index_ext_header(&c, NULL, newfd, CACHE_EXT_ENDOFINDEXENTRIES, sb.len) < 0
      -			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
     -+		finalize_hashfile(eoie_f, hash, 0);
     -+
     -+		write_eoie_extension(&sb, hash, offset);
     -+		err = write_index_ext_header(f, CACHE_EXT_ENDOFINDEXENTRIES, sb.len) < 0;
     ++		write_eoie_extension(&sb, eoie_c, offset);
     ++		err = write_index_ext_header(f, NULL, CACHE_EXT_ENDOFINDEXENTRIES, sb.len) < 0;
      +		hashwrite(f, sb.buf, sb.len);
       		strbuf_release(&sb);
       		if (err)
     @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
       
      -	if (ce_flush(&c, newfd, istate->oid.hash))
      -		return -1;
     -+	finalize_hashfile(f, istate->oid.hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
     ++	finalize_hashfile(f, istate->oid.hash, CSUM_HASH_IN_STREAM);
       	if (close_tempfile_gently(tempfile)) {
       		error(_("could not close '%s'"), get_tempfile_path(tempfile));
       		return -1;
     -@@ read-cache.c: static size_t read_eoie_extension(const char *mmap, size_t mmap_size)
     - 	return offset;
     - }
     - 
     --static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, size_t offset)
     -+static void write_eoie_extension(struct strbuf *sb, const unsigned char *hash, size_t offset)
     - {
     - 	uint32_t buffer;
     --	unsigned char hash[GIT_MAX_RAWSZ];
     - 
     - 	/* offset */
     - 	put_be32(&buffer, offset);
     - 	strbuf_add(sb, &buffer, sizeof(uint32_t));
     - 
     - 	/* hash */
     --	the_hash_algo->final_fn(hash, eoie_context);
     - 	strbuf_add(sb, hash, the_hash_algo->rawsz);
     - }
     - 
 3:  e2d5a8dc919b ! 4:  4b3814eb4c80 read-cache: delete unused hashing methods
     @@ read-cache.c: int repo_index_has_changes(struct repository *repo,
      -	return 0;
      -}
      -
     - static int write_index_ext_header(struct hashfile *f, unsigned int ext, unsigned int sz)
     - {
     - 	hashwrite_be32(f, ext);
     -@@ read-cache.c: static int write_index_ext_header(struct hashfile *f, unsigned int ext, unsigned
     + static int write_index_ext_header(struct hashfile *f,
     + 				  git_hash_ctx *eoie_f,
     + 				  unsigned int ext,
     +@@ read-cache.c: static int write_index_ext_header(struct hashfile *f,
       	return 0;
       }
       

-- 
gitgitgadget

next prev parent reply	other threads:[~2021-05-17 12:25 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26 19:12 [PATCH 0/3] Convert index writes to use hashfile API Derrick Stolee via GitGitGadget
2021-03-26 19:12 ` [PATCH 1/3] csum-file: add nested_hashfile() Derrick Stolee via GitGitGadget
2021-03-26 19:12 ` [PATCH 2/3] read-cache: use hashfile instead of git_hash_ctx Derrick Stolee via GitGitGadget
2021-03-29 15:04   ` Derrick Stolee
2021-03-29 19:10     ` Derrick Stolee
2021-03-26 19:12 ` [PATCH 3/3] read-cache: delete unused hashing methods Derrick Stolee via GitGitGadget
2021-03-26 20:16 ` [PATCH 0/3] Convert index writes to use hashfile API Derrick Stolee
2021-05-17 12:24 ` Derrick Stolee via GitGitGadget [this message]
2021-05-17 12:24   ` [PATCH v2 1/4] hashfile: use write_in_full() Derrick Stolee via GitGitGadget
2021-05-17 12:24   ` [PATCH v2 2/4] csum-file.h: increase hashfile buffer size Derrick Stolee via GitGitGadget
2021-05-17 21:54     ` Junio C Hamano
2021-05-18  7:33       ` Jeff King
2021-05-18 14:44         ` Derrick Stolee
2021-05-18  7:31     ` Jeff King
2021-05-18  7:42       ` Jeff King
2021-05-17 12:24   ` [PATCH v2 3/4] read-cache: use hashfile instead of git_hash_ctx Derrick Stolee via GitGitGadget
2021-05-17 22:13     ` Junio C Hamano
2021-05-18 14:16       ` Derrick Stolee
2021-05-17 12:24   ` [PATCH v2 4/4] read-cache: delete unused hashing methods Derrick Stolee via GitGitGadget
2021-05-18 18:32   ` [PATCH v3 0/4] Convert index writes to use hashfile API Derrick Stolee via GitGitGadget
2021-05-18 18:32     ` [PATCH v3 1/4] hashfile: use write_in_full() Derrick Stolee via GitGitGadget
2021-05-18 18:32     ` [PATCH v3 2/4] csum-file.h: increase hashfile buffer size Derrick Stolee via GitGitGadget
2021-11-25 12:14       ` t4216-log-bloom.sh fails with -v (but not --verbose-log) Ævar Arnfjörð Bjarmason
2021-11-26  4:08         ` Jeff King
2021-11-29 13:49           ` Derrick Stolee
2021-05-18 18:32     ` [PATCH v3 3/4] read-cache: use hashfile instead of git_hash_ctx Derrick Stolee via GitGitGadget
2021-05-18 18:32     ` [PATCH v3 4/4] read-cache: delete unused hashing methods Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.916.v2.git.1621254292.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).