git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* [PATCH 00/10] Szeder's commit-graph cleanups
@ 2020-06-05 13:00 Derrick Stolee via GitGitGadget
  2020-06-05 13:00 ` [PATCH 01/10] tree-walk.c: don't match submodule entries for 'submod/anything' SZEDER Gábor via GitGitGadget
                   ` (11 more replies)
  0 siblings, 12 replies; 16+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-06-05 13:00 UTC (permalink / raw)
  To: git; +Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee

This is based on ds/line-log-on-bloom.

Since Szeder so kindly shared his alternate Bloom filter implementation [1],
I thought it worth my time to start the process of updating the patches to
apply to more recent code in Git. Here is the effort to update the almost
obviously-good commit-graph cleanups that he presented in that series.

[1] https://lore.kernel.org/git/20200529085038.26008-1-szeder.dev@gmail.com/

The range-diff below was created by applying his entire series onto v2.25.0
and then doing cherry-picks as appropriate onto ds/line-log-on-bloom and
correcting conflicts. I have Szeder's original series available as
"szeder-bloom" on my fork [2].

[2] https://github.com/derrickstolee/git/tree/szeder-bloom

As expected, the write_commit_graph_file() cleanups were the most difficult,
in part because we added extra chunks with the changed-path Bloom filters.

I did not include this commit since we already handle it mostly with the
MAX_NUM_CHUNKS macro from 08fd81c9 (commit-graph: implement
write_commit_graph(), 2018-04-02):

11:  56e3c4f57b3 <  -:  ----------- commit-graph: allocate the 'struct chunk_info' array dinamically

There were a few cleanups that I did not apply because they are more
involved to handle the conflicts with the changed-path Bloom filters,
especially because we pass a "struct bloom_filter_settings" in the method
signatures for the write_graph_chunk_bloom_*() functions:

12:  28fb1b5bdfe <  -:  ----------- commit-graph: unify the signatures of all write_graph_chunk_*() functions
13:  1e1e59e2592 <  -:  ----------- commit-graph: simplify write_commit_graph_file() #3
14:  6f0d912e4b8 <  -:  ----------- commit-graph: check chunk sizes after writing

I'm not saying that we shouldn't do these changes. I'm just saying that they
are more involved and can wait for a second series. No need to rush things.

The rest will need to be completely re-implemented to keep the other things
in mind, like split commit-graphs and the existing changed-path Bloom
filters. However, the following commits would be particularly interesting to
have equivalents on top of our existing Bloom filter implementation. That
would allow a more fair comparison between the two options:

26:  3951fdedf6a <  -:  ----------- commit-graph: deduplicate modified path Bloom filters
27:  5aba19a2766 <  -:  ----------- commit-graph: load modified path Bloom filters for merge commits
29:  f87b37bf08e <  -:  ----------- commit-graph: extract init and free write_commit_graph_context
34:  8b40ec4cd30 <  -:  ----------- commit-graph: use modified path Bloom filters with wildcards, if possible

Full range diff:

 1:  7a8dbfba53a !  1:  af84c253b24 tree-walk.c: don't match submodule entries for 'submod/anything'
    @@ Commit message
         Fix this by rejecting submodules as partial pathnames when their
         trailing slash is followed by anything.

         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
    +    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## t/t4010-diff-pathspec.sh ##
     @@ t/t4010-diff-pathspec.sh: test_expect_success 'setup submodules' '
 2:  df25e984c58 !  2:  1e1671e7c69 commit-graph: fix parsing the Chunk Lookup table
    @@ Commit message
         but that is a more invasive change, less suitable for 'maint', so that
         will be done in later patches.

    +    This additional flexibility of scanning more chunks breaks a test for
    +    "git commit-graph verify" so alter that test to mutate the commit-graph
    +    to have an even lower chunk count.
    +
         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
    +    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## commit-graph.c ##
     @@ commit-graph.c: struct commit_graph *parse_commit_graph(void *graph_map, int fd,
    @@ commit-graph.c: struct commit_graph *parse_commit_graph(void *graph_map, int fd,
              uint32_t chunk_id;
              uint64_t chunk_offset;
              int chunk_repeated = 0;
    +
    + ## t/t5318-commit-graph.sh ##
    +@@ t/t5318-commit-graph.sh: test_expect_success 'detect bad hash version' '
    + '
    + 
    + test_expect_success 'detect low chunk count' '
    +-    corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
    ++    corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \
    +         "missing the .* chunk"
    + '
    + 
 3:  598f7f9a978 !  3:  a09d0bd5b7a commit-graph-format.txt: all multi-byte numbers are in network byte order
    @@ Commit message

         Clarify that all multi-byte integers are in network byte order.

         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
    +    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## Documentation/technical/commit-graph-format.txt ##
     @@ Documentation/technical/commit-graph-format.txt: the body into "chunks" and provide a binary lookup table at the beginning
 4:  b29e5d39ed6 !  4:  32a6f11cc47 commit-slab: add a function to deep free entries on the slab
    @@ Commit message
         Use it in get_shallow_commits() in 'shallow.c' to replace an
         open-coded iteration over a commit slab's entries.

         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
    +    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## commit-slab-decl.h ##
     @@ commit-slab-decl.h: struct slabname {                            \
    @@ shallow.c: int is_repository_shallow(struct repository *r)
       * supports a "valid" flag.
       */
      define_commit_slab(commit_depth, int *);
    -+void free_depth_in_slab(int **ptr)
    ++static void free_depth_in_slab(int **ptr)
     +{
     +    FREE_AND_NULL(*ptr);
     +}
 5:  18f4db7bfb9 !  5:  1a70ff05aea diff.h: drop diff_tree_oid() & friends' return value
    @@ Commit message
         [2] diff_tree_oid() traces back to diff-tree.c:main() in 9174026cfe as
             well.

         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
    +    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## diff.h ##
     @@ diff.h: struct combine_diff_path *diff_tree_paths(
    @@ revision.c: static int rev_compare_tree(struct rev_info *revs,
     -               &revs->pruning) < 0)
     -        return REV_TREE_DIFFERENT;
     +    diff_tree_oid(&t1->object.oid, &t2->object.oid, "", &revs->pruning);
    -     return tree_difference;
    - }
    + 
    +     if (!nth_parent)
    +         if (bloom_ret == 1 && tree_difference == REV_TREE_SAME)
    +@@ revision.c: static int rev_compare_tree(struct rev_info *revs,

      static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
      {
    @@ tree-diff.c: static struct combine_diff_path *ll_diff_tree_paths(
          const struct object_id **parents_oid, int nparent,
          struct strbuf *base, struct diff_options *opt);
     -static int ll_diff_tree_oid(const struct object_id *old_oid,
    +-                const struct object_id *new_oid,
    +-                struct strbuf *base, struct diff_options *opt);
     +static void ll_diff_tree_oid(const struct object_id *old_oid,
    -                 const struct object_id *new_oid,
    -                 struct strbuf *base, struct diff_options *opt);
    ++                 const struct object_id *new_oid,
    ++                 struct strbuf *base, struct diff_options *opt);

    + /*
    +  * Compare two tree entries, taking into account only path/S_ISDIR(mode),
     @@ tree-diff.c: static void try_to_follow_renames(const struct object_id *old_oid,
          q->nr = 1;
      }

     -static int ll_diff_tree_oid(const struct object_id *old_oid,
    +-                const struct object_id *new_oid,
    +-                struct strbuf *base, struct diff_options *opt)
     +static void ll_diff_tree_oid(const struct object_id *old_oid,
    -                 const struct object_id *new_oid,
    -                 struct strbuf *base, struct diff_options *opt)
    ++                 const struct object_id *new_oid,
    ++                 struct strbuf *base, struct diff_options *opt)
      {
    +     struct combine_diff_path phead, *p;
    +     pathchange_fn_t pathchange_old = opt->pathchange;
     @@ tree-diff.c: static int ll_diff_tree_oid(const struct object_id *old_oid,
          }

    @@ tree-diff.c: static int ll_diff_tree_oid(const struct object_id *old_oid,
      }

     -int diff_root_tree_oid(const struct object_id *new_oid, const char *base, struct diff_options *opt)
    -+void diff_root_tree_oid(const struct object_id *new_oid, const char *base,
    ++void diff_root_tree_oid(const struct object_id *new_oid,
    ++            const char *base,
     +            struct diff_options *opt)
      {
     -    return diff_tree_oid(NULL, new_oid, base, opt);
 6:  bf336f109e6 !  6:  636c2069659 commit-graph: clean up #includes
    @@ Commit message
         'commit-graph.c' includes 'dir.h', but doesn't actually use anything
         from there, so let's drop that #include as well.

         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
    +    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## commit-graph.c ##
     @@
    @@ commit-graph.h
     -#include "repository.h"
     -#include "string-list.h"
     -#include "cache.h"
    -+#include "hash.h"
    + #include "object-store.h"

      #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH"
    - #define GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD "GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD"
    +@@ commit-graph.h: void git_test_write_commit_graph_or_die(void);

      struct commit;
    + struct bloom_filter_settings;
     +struct repository;
     +struct raw_object_store;
     +struct string_list;

    - char *get_commit_graph_filename(const char *obj_dir);
    + char *get_commit_graph_filename(struct object_directory *odb);
      int open_commit_graph(const char *graph_file, int *fd, struct stat *st);
 7:  b7f0f831bcf !  7:  cd9e033d1b1 commit-graph: simplify parse_commit_graph() #1
    @@ Commit message
         and, consequently, have to update the 'detect incorrect chunk count'
         test in 't5318-commit-graph.sh' as well.

         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
    +    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## commit-graph.c ##
     @@ commit-graph.c: struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 8:  f2752000052 !  8:  83641b5e49e commit-graph: simplify parse_commit_graph() #2
    @@ Commit message
         iteration, so we can calculate the size of each chunk right away,
         right where we store its starting offset.

         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
    +    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## commit-graph.c ##
     @@ commit-graph.c: struct commit_graph *parse_commit_graph(void *graph_map, int fd,
    @@ commit-graph.c: struct commit_graph *parse_commit_graph(void *graph_map, int fd,
     +    next_chunk_offset = get_be64(chunk_lookup + 4);
     +    for (i = 0; i < graph->num_chunks; i++) {
              uint32_t chunk_id;
    -         uint64_t chunk_offset;
    +-        uint64_t chunk_offset;
    ++        uint64_t chunk_offset = next_chunk_offset;
              int chunk_repeated = 0;

              chunk_id = get_be32(chunk_lookup + 0);
     -        chunk_offset = get_be64(chunk_lookup + 4);
    -+        chunk_offset = next_chunk_offset;
    -+        next_chunk_offset = get_be64(chunk_lookup + 4 +
    -+                         GRAPH_CHUNKLOOKUP_WIDTH);

              chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
    ++        next_chunk_offset = get_be64(chunk_lookup + 4);

    +         if (chunk_offset > graph_size - the_hash_algo->rawsz) {
    +             error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
     @@ commit-graph.c: struct commit_graph *parse_commit_graph(void *graph_map, int fd,
              case GRAPH_CHUNKID_OIDLOOKUP:
                  if (graph->chunk_oid_lookup)
    @@ commit-graph.c: struct commit_graph *parse_commit_graph(void *graph_map, int fd,
     -        last_chunk_offset = chunk_offset;
          }

    -     hashcpy(graph->oid.hash, graph->data + graph->data_len - graph->hash_len);
    +     if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) {

I'm stepping out of the range-diff to point out that this change is mostly
stylistic.

 9:  4e184b8743c !  9:  9b818b9cb91 commit-graph: simplify write_commit_graph_file() #1
    @@ Commit message
         fill the arrays of chunk IDs and sizes in one go, eliminating one set
         of repeated conditions.

         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
    +    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## commit-graph.c ##
     @@ commit-graph.c: static int write_commit_graph_file(struct write_commit_graph_context *ctx)
          struct hashfile *f;
          struct lock_file lk = LOCK_INIT;
    -     uint32_t chunk_ids[6];
    --    uint64_t chunk_offsets[6];
    -+    uint64_t chunk_sizes[6];
    +     uint32_t chunk_ids[MAX_NUM_CHUNKS + 1];
    +-    uint64_t chunk_offsets[MAX_NUM_CHUNKS + 1];
    ++    uint64_t chunk_sizes[MAX_NUM_CHUNKS + 1];
          const unsigned hashsz = the_hash_algo->rawsz;
          struct strbuf progress_title = STRBUF_INIT;
          int num_chunks = 3;
     +    uint64_t chunk_offset;
          struct object_id file_hash;
    +     const struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS;

    -     if (ctx->split) {
     @@ commit-graph.c: static int write_commit_graph_file(struct write_commit_graph_context *ctx)
          }

    @@ commit-graph.c: static int write_commit_graph_file(struct write_commit_graph_con
     +    chunk_sizes[1] = hashsz * ctx->commits.nr;
          chunk_ids[2] = GRAPH_CHUNKID_DATA;
     +    chunk_sizes[2] = (hashsz + 16) * ctx->commits.nr;
    ++
          if (ctx->num_extra_edges) {
              chunk_ids[num_chunks] = GRAPH_CHUNKID_EXTRAEDGES;
    -+        chunk_sizes[3] = 4 * ctx->num_extra_edges;
    ++        chunk_sizes[num_chunks] = 4 * ctx->num_extra_edges;
    +         num_chunks++;
    +     }
    +     if (ctx->changed_paths) {
    +         chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMINDEXES;
    ++        chunk_sizes[num_chunks] = sizeof(uint32_t) * ctx->commits.nr;
    +         num_chunks++;
    +         chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMDATA;
    ++        chunk_sizes[num_chunks] = sizeof(uint32_t) * 3
    ++                      + ctx->total_bloom_filter_data_size;
              num_chunks++;
          }
          if (ctx->num_commit_graphs_after > 1) {
              chunk_ids[num_chunks] = GRAPH_CHUNKID_BASE;
    -+        chunk_sizes[4] = hashsz * (ctx->num_commit_graphs_after - 1);
    ++        chunk_sizes[num_chunks] = hashsz * (ctx->num_commit_graphs_after - 1);
              num_chunks++;
          }

    @@ commit-graph.c: static int write_commit_graph_file(struct write_commit_graph_con
     -                        4 * ctx->num_extra_edges;
     -        num_chunks++;
     -    }
    +-    if (ctx->changed_paths) {
    +-        chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] +
    +-                        sizeof(uint32_t) * ctx->commits.nr;
    +-        num_chunks++;
    +-
    +-        chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] +
    +-                        sizeof(uint32_t) * 3 + ctx->total_bloom_filter_data_size;
    +-        num_chunks++;
    +-    }
     -    if (ctx->num_commit_graphs_after > 1) {
     -        chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] +
     -                        hashsz * (ctx->num_commit_graphs_after - 1);
10:  344dd337da5 ! 10:  5984fb01ebc commit-graph: simplify write_commit_graph_file() #2
    @@ Commit message
         'struct chunk_info'.  This will allow more cleanups in the following
         patches.

         Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
    +    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## commit-graph.c ##
     @@ commit-graph.c: static int write_graph_chunk_base(struct hashfile *f,
    @@ commit-graph.c: static int write_graph_chunk_base(struct hashfile *f,
          int fd;
          struct hashfile *f;
          struct lock_file lk = LOCK_INIT;
    --    uint32_t chunk_ids[6];
    --    uint64_t chunk_sizes[6];
    -+    struct chunk_info chunks[6];
    +-    uint32_t chunk_ids[MAX_NUM_CHUNKS + 1];
    +-    uint64_t chunk_sizes[MAX_NUM_CHUNKS + 1];
    ++    struct chunk_info chunks[MAX_NUM_CHUNKS + 1];
          const unsigned hashsz = the_hash_algo->rawsz;
          struct strbuf progress_title = STRBUF_INIT;
          int num_chunks = 3;
    @@ commit-graph.c: static int write_commit_graph_file(struct write_commit_graph_con
     -    chunk_sizes[1] = hashsz * ctx->commits.nr;
     -    chunk_ids[2] = GRAPH_CHUNKID_DATA;
     -    chunk_sizes[2] = (hashsz + 16) * ctx->commits.nr;
    +-
     +    chunks[0].id = GRAPH_CHUNKID_OIDFANOUT;
     +    chunks[0].size = GRAPH_FANOUT_SIZE;
     +    chunks[1].id = GRAPH_CHUNKID_OIDLOOKUP;
    @@ commit-graph.c: static int write_commit_graph_file(struct write_commit_graph_con
     +    chunks[2].size = (hashsz + 16) * ctx->commits.nr;
          if (ctx->num_extra_edges) {
     -        chunk_ids[num_chunks] = GRAPH_CHUNKID_EXTRAEDGES;
    --        chunk_sizes[3] = 4 * ctx->num_extra_edges;
    +-        chunk_sizes[num_chunks] = 4 * ctx->num_extra_edges;
     +        chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES;
     +        chunks[num_chunks].size = 4 * ctx->num_extra_edges;
              num_chunks++;
          }
    +     if (ctx->changed_paths) {
    +-        chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMINDEXES;
    +-        chunk_sizes[num_chunks] = sizeof(uint32_t) * ctx->commits.nr;
    ++        chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMINDEXES;
    ++        chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr;
    +         num_chunks++;
    +-        chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMDATA;
    +-        chunk_sizes[num_chunks] = sizeof(uint32_t) * 3
    ++        chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMDATA;
    ++        chunks[num_chunks].size = sizeof(uint32_t) * 3
    +                       + ctx->total_bloom_filter_data_size;
    +         num_chunks++;
    +     }
          if (ctx->num_commit_graphs_after > 1) {
     -        chunk_ids[num_chunks] = GRAPH_CHUNKID_BASE;
    --        chunk_sizes[4] = hashsz * (ctx->num_commit_graphs_after - 1);
    +-        chunk_sizes[num_chunks] = hashsz * (ctx->num_commit_graphs_after - 1);
     +        chunks[num_chunks].id = GRAPH_CHUNKID_BASE;
     +        chunks[num_chunks].size = hashsz * (ctx->num_commit_graphs_after - 1);
              num_chunks++;
11:  56e3c4f57b3 <  -:  ----------- commit-graph: allocate the 'struct chunk_info' array dinamically
12:  28fb1b5bdfe <  -:  ----------- commit-graph: unify the signatures of all write_graph_chunk_*() functions
13:  1e1e59e2592 <  -:  ----------- commit-graph: simplify write_commit_graph_file() #3
14:  6f0d912e4b8 <  -:  ----------- commit-graph: check chunk sizes after writing
15:  0ab955aac32 <  -:  ----------- commit-graph-format.txt: document the modified path Bloom filter chunks
16:  4c128d51dfe <  -:  ----------- Add a generic and minimal Bloom filter implementation
17:  41f02bc38f7 <  -:  ----------- Import a streaming-capable Murmur3 hash function implementation
18:  e5fd1da48d4 <  -:  ----------- commit-graph: write "empty" Modified Path Bloom Filter Index chunk
19:  2dd882ec601 <  -:  ----------- commit-graph: add commit slab for modified path Bloom filters
20:  f30e495c2b0 <  -:  ----------- commit-graph: fill the Modified Path Bloom Filter Index chunk
21:  e904cb58301 <  -:  ----------- commit-graph: load and use the Modified Path Bloom Filter Index chunk
22:  c71647ca374 <  -:  ----------- commit-graph: write the Modified Path Bloom Filters chunk
23:  50898d42291 <  -:  ----------- commit-graph: load and use the Modified Path Bloom Filters chunk
24:  dc96f0d9822 <  -:  ----------- commit-graph: check all leading directories in modified path Bloom filters
25:  7cbf1bc6b66 <  -:  ----------- commit-graph: check embedded modified path Bloom filters with a mask
26:  3951fdedf6a <  -:  ----------- commit-graph: deduplicate modified path Bloom filters
27:  5aba19a2766 <  -:  ----------- commit-graph: load modified path Bloom filters for merge commits
28:  93fc6af1d2f <  -:  ----------- commit-graph: write Modified Path Bloom Filter Merge Index chunk
29:  f87b37bf08e <  -:  ----------- commit-graph: extract init and free write_commit_graph_context
30:  943b0d9554c <  -:  ----------- commit-graph: move write_commit_graph_reachable below write_commit_graph
31:  47b26ea61aa <  -:  ----------- t7007-show: make the first test compatible with the next patch
32:  9201b71071c <  -:  ----------- PoC commit-graph: use revision walk machinery for '--reachable'
33:  5c72d97e5e9 <  -:  ----------- commit-graph: write modified path Bloom filters in "history order"
34:  8b40ec4cd30 <  -:  ----------- commit-graph: use modified path Bloom filters with wildcards, if possible

Thanks, -Stolee

SZEDER Gábor (10):
  tree-walk.c: don't match submodule entries for 'submod/anything'
  commit-graph: fix parsing the Chunk Lookup table
  commit-graph-format.txt: all multi-byte numbers are in network byte
    order
  commit-slab: add a function to deep free entries on the slab
  diff.h: drop diff_tree_oid() & friends' return value
  commit-graph: clean up #includes
  commit-graph: simplify parse_commit_graph() #1
  commit-graph: simplify parse_commit_graph() #2
  commit-graph: simplify write_commit_graph_file() #1
  commit-graph: simplify write_commit_graph_file() #2

 .../technical/commit-graph-format.txt         |   2 +-
 commit-graph.c                                | 113 ++++++++----------
 commit-graph.h                                |   6 +-
 commit-slab-decl.h                            |   1 +
 commit-slab-impl.h                            |  13 ++
 commit-slab.h                                 |  10 ++
 diff.h                                        |  10 +-
 revision.c                                    |   9 +-
 shallow.c                                     |  14 +--
 t/t4010-diff-pathspec.sh                      |   4 +-
 t/t5318-commit-graph.sh                       |   5 +-
 tree-diff.c                                   |  30 +++--
 tree-walk.c                                   |   9 +-
 13 files changed, 117 insertions(+), 109 deletions(-)


base-commit: f32dde8c12d941065be848a9f66239df96bde216
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-650%2Fderrickstolee%2Fbloom-improvements-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-650/derrickstolee/bloom-improvements-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/650
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 01/10] tree-walk.c: don't match submodule entries for 'submod/anything'
  2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
@ 2020-06-05 13:00 ` SZEDER Gábor via GitGitGadget
  2020-06-05 13:00 ` [PATCH 02/10] commit-graph: fix parsing the Chunk Lookup table SZEDER Gábor via GitGitGadget
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-06-05 13:00 UTC (permalink / raw)
  To: git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee,
	SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

Submodules should be handled the same as regular directories with
respect to the presence of a trailing slash, i.e. commands like:

  git diff rev1 rev2 -- $path
  git rev-list HEAD -- $path

should produce the same output whether $path is 'submod' or 'submod/'.
This has been fixed in commit 74b4f7f277 (tree-walk.c: ignore trailing
slash on submodule in tree_entry_interesting(), 2014-01-23).

Unfortunately, that commit had the unintended side effect to handle
'submod/anything' the same as 'submod' and 'submod/' as well, e.g.:

  $ git log --oneline --name-only -- sha1collisiondetection/whatever
  4125f78222 sha1dc: update from upstream
  sha1collisiondetection
  07a20f569b Makefile: fix unaligned loads in sha1dc with UBSan
  sha1collisiondetection
  23e37f8e9d sha1dc: update from upstream
  sha1collisiondetection
  86cfd61e6b sha1dc: optionally use sha1collisiondetection as a submodule
  sha1collisiondetection

Fix this by rejecting submodules as partial pathnames when their
trailing slash is followed by anything.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t4010-diff-pathspec.sh | 4 +++-
 tree-walk.c              | 9 ++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/t/t4010-diff-pathspec.sh b/t/t4010-diff-pathspec.sh
index e5ca359edfa..65cc703c659 100755
--- a/t/t4010-diff-pathspec.sh
+++ b/t/t4010-diff-pathspec.sh
@@ -125,7 +125,9 @@ test_expect_success 'setup submodules' '
 test_expect_success 'diff-tree ignores trailing slash on submodule path' '
 	git diff --name-only HEAD^ HEAD submod >expect &&
 	git diff --name-only HEAD^ HEAD submod/ >actual &&
-	test_cmp expect actual
+	test_cmp expect actual &&
+	git diff --name-only HEAD^ HEAD -- submod/whatever >actual &&
+	test_must_be_empty actual
 '
 
 test_expect_success 'diff multiple wildcard pathspecs' '
diff --git a/tree-walk.c b/tree-walk.c
index bb0ad34c545..0160294712b 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -851,7 +851,14 @@ static int match_entry(const struct pathspec_item *item,
 	if (matchlen > pathlen) {
 		if (match[pathlen] != '/')
 			return 0;
-		if (!S_ISDIR(entry->mode) && !S_ISGITLINK(entry->mode))
+		/*
+		 * Reject non-directories as partial pathnames, except
+		 * when match is a submodule with a trailing slash and
+		 * nothing else (to handle 'submod/' and 'submod'
+		 * uniformly).
+		 */
+		if (!S_ISDIR(entry->mode) &&
+		    (!S_ISGITLINK(entry->mode) || matchlen > pathlen + 1))
 			return 0;
 	}
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 02/10] commit-graph: fix parsing the Chunk Lookup table
  2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
  2020-06-05 13:00 ` [PATCH 01/10] tree-walk.c: don't match submodule entries for 'submod/anything' SZEDER Gábor via GitGitGadget
@ 2020-06-05 13:00 ` SZEDER Gábor via GitGitGadget
  2020-06-05 13:00 ` [PATCH 03/10] commit-graph-format.txt: all multi-byte numbers are in network byte order SZEDER Gábor via GitGitGadget
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-06-05 13:00 UTC (permalink / raw)
  To: git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee,
	SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

The commit-graph file format specifies that the chunks may be in any
order.  However, if the OID Lookup chunk happens to be the last one in
the file, then any command attempting to access the commit-graph data
will fail with:

  fatal: invalid commit position. commit-graph is likely corrupt

In this case the error is wrong, the commit-graph file does conform to
the specification, but the parsing of the Chunk Lookup table is a bit
buggy, and leaves the field holding the number of commits in the
commit-graph zero-initialized.

The number of commits in the commit-graph is determined while parsing
the Chunk Lookup table, by dividing the size of the OID Lookup chunk
with the hash size.  However, the Chunk Lookup table doesn't actually
store the size of the chunks, but it stores their starting offset.
Consequently, the size of a chunk can only be calculated by
subtracting the starting offsets of that chunk from the offset of the
subsequent chunk, or in case of the last chunk from the offset
recorded in the terminating label.  This is currenly implemented in a
bit complicated way: as we iterate over the entries of the Chunk
Lookup table, we check the ID of each chunk and store its starting
offset, then we check the ID of the last seen chunk and calculate its
size using its previously saved offset if necessary (at the moment
it's only necessary for the OID Lookup chunk).  Alas, while parsing
the Chunk Lookup table we only interate through the "real" chunks, but
never look at the terminating label, thus don't even check whether
it's necessary to calulate the size of the last chunk.  Consequently,
if the OID Lookup chunk is the last one, then we don't calculate its
size and turn don't run the piece of code determining the number of
commits in the commit graph, leaving the field holding that number
unchanged (i.e. zero-initialized), eventually triggering the sanity
check in load_oid_from_graph().

Fix this by iterating through all entries in the Chunk Lookup table,
including the terminating label.

Note that this is the minimal fix, suitable for the maintenance track.
A better fix would be to simplify how the chunk sizes are calculated,
but that is a more invasive change, less suitable for 'maint', so that
will be done in later patches.

This additional flexibility of scanning more chunks breaks a test for
"git commit-graph verify" so alter that test to mutate the commit-graph
to have an even lower chunk count.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 2 +-
 t/t5318-commit-graph.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 196e817a84c..7807d945626 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -277,7 +277,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 	last_chunk_id = 0;
 	last_chunk_offset = 8;
 	chunk_lookup = data + 8;
-	for (i = 0; i < graph->num_chunks; i++) {
+	for (i = 0; i <= graph->num_chunks; i++) {
 		uint32_t chunk_id;
 		uint64_t chunk_offset;
 		int chunk_repeated = 0;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 18304a65e4d..79e7fbcd40e 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -488,7 +488,7 @@ test_expect_success 'detect bad hash version' '
 '
 
 test_expect_success 'detect low chunk count' '
-	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\02" \
+	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \
 		"missing the .* chunk"
 '
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 03/10] commit-graph-format.txt: all multi-byte numbers are in network byte order
  2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
  2020-06-05 13:00 ` [PATCH 01/10] tree-walk.c: don't match submodule entries for 'submod/anything' SZEDER Gábor via GitGitGadget
  2020-06-05 13:00 ` [PATCH 02/10] commit-graph: fix parsing the Chunk Lookup table SZEDER Gábor via GitGitGadget
@ 2020-06-05 13:00 ` SZEDER Gábor via GitGitGadget
  2020-06-05 13:00 ` [PATCH 04/10] commit-slab: add a function to deep free entries on the slab SZEDER Gábor via GitGitGadget
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-06-05 13:00 UTC (permalink / raw)
  To: git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee,
	SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

The commit-graph format specifies that "All 4-byte numbers are in
network order", but the commit-graph contains 8-byte integers as well
(file offsets in the Chunk Lookup table), and their byte order is
unspecified.

Clarify that all multi-byte integers are in network byte order.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph-format.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
index 1beef171822..440541045d4 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -32,7 +32,7 @@ the body into "chunks" and provide a binary lookup table at the beginning
 of the body. The header includes certain values, such as number of chunks
 and hash type.
 
-All 4-byte numbers are in network order.
+All multi-byte numbers are in network byte order.
 
 HEADER:
 
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 04/10] commit-slab: add a function to deep free entries on the slab
  2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-06-05 13:00 ` [PATCH 03/10] commit-graph-format.txt: all multi-byte numbers are in network byte order SZEDER Gábor via GitGitGadget
@ 2020-06-05 13:00 ` SZEDER Gábor via GitGitGadget
  2020-06-18 20:59   ` René Scharfe
  2020-06-27 15:53   ` SZEDER Gábor
  2020-06-05 13:00 ` [PATCH 05/10] diff.h: drop diff_tree_oid() & friends' return value SZEDER Gábor via GitGitGadget
                   ` (7 subsequent siblings)
  11 siblings, 2 replies; 16+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-06-05 13:00 UTC (permalink / raw)
  To: git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee,
	SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

clear_##slabname() frees only the memory allocated for a commit slab
itself, but entries in the commit slab might own additional memory
outside the slab that should be freed as well.  We already have (at
least) one such commit slab, and this patch series is about to add one
more.

To free all additional memory owned by entries on the commit slab the
user of such a slab could iterate over all commits it knows about,
peek whether there is a valid entry associated with each commit, and
free the additional memory, if any.  Or it could rely on intimate
knowledge about the internals of the commit slab implementation, and
could itself iterate directly through all entries in the slab, and
free the additional memory.  Or it could just leak the additional
memory...

Introduce deep_clear_##slabname() to allow releasing memory owned by
commit slab entries by invoking the 'void free_fn(elemtype *ptr)'
function specified as parameter for each entry in the slab.

Use it in get_shallow_commits() in 'shallow.c' to replace an
open-coded iteration over a commit slab's entries.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-slab-decl.h |  1 +
 commit-slab-impl.h | 13 +++++++++++++
 commit-slab.h      | 10 ++++++++++
 shallow.c          | 14 +++++---------
 4 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/commit-slab-decl.h b/commit-slab-decl.h
index adc7b46c83b..286164b7e27 100644
--- a/commit-slab-decl.h
+++ b/commit-slab-decl.h
@@ -32,6 +32,7 @@ struct slabname {							\
 void init_ ##slabname## _with_stride(struct slabname *s, unsigned stride); \
 void init_ ##slabname(struct slabname *s);				\
 void clear_ ##slabname(struct slabname *s);				\
+void deep_clear_ ##slabname(struct slabname *s, void (*free_fn)(elemtype *ptr)); \
 elemtype *slabname## _at_peek(struct slabname *s, const struct commit *c, int add_if_missing); \
 elemtype *slabname## _at(struct slabname *s, const struct commit *c);	\
 elemtype *slabname## _peek(struct slabname *s, const struct commit *c)
diff --git a/commit-slab-impl.h b/commit-slab-impl.h
index 5c0eb91a5d1..557738df271 100644
--- a/commit-slab-impl.h
+++ b/commit-slab-impl.h
@@ -38,6 +38,19 @@ scope void clear_ ##slabname(struct slabname *s)			\
 	FREE_AND_NULL(s->slab);						\
 }									\
 									\
+scope void deep_clear_ ##slabname(struct slabname *s, void (*free_fn)(elemtype *)) \
+{									\
+	unsigned int i;							\
+	for (i = 0; i < s->slab_count; i++) {				\
+		unsigned int j;						\
+		if (!s->slab[i])					\
+			continue;					\
+		for (j = 0; j < s->slab_size; j++)			\
+			free_fn(&s->slab[i][j * s->stride]);		\
+	}								\
+	clear_ ##slabname(s);						\
+}									\
+									\
 scope elemtype *slabname## _at_peek(struct slabname *s,			\
 						  const struct commit *c, \
 						  int add_if_missing)   \
diff --git a/commit-slab.h b/commit-slab.h
index 05b3f2804e7..8e72a305365 100644
--- a/commit-slab.h
+++ b/commit-slab.h
@@ -47,6 +47,16 @@
  *
  *   Call this function before the slab falls out of scope to avoid
  *   leaking memory.
+ *
+ * - void deep_clear_indegree(struct indegree *, void (*free_fn)(int*))
+ *
+ *   Empties the slab, similar to clear_indegree(), but in addition it
+ *   calls the given 'free_fn' for each slab entry to release any
+ *   additional memory that might be owned by the entry (but not the
+ *   entry itself!).
+ *   Note that 'free_fn' might be called even for entries for which no
+ *   indegree_at() call has been made; in this case 'free_fn' is invoked
+ *   with a pointer to a zero-initialized location.
  */
 
 #define define_commit_slab(slabname, elemtype) \
diff --git a/shallow.c b/shallow.c
index 7fd04afed19..c4ac8a73273 100644
--- a/shallow.c
+++ b/shallow.c
@@ -84,6 +84,10 @@ int is_repository_shallow(struct repository *r)
  * supports a "valid" flag.
  */
 define_commit_slab(commit_depth, int *);
+static void free_depth_in_slab(int **ptr)
+{
+	FREE_AND_NULL(*ptr);
+}
 struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
 		int shallow_flag, int not_shallow_flag)
 {
@@ -150,15 +154,7 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
 			}
 		}
 	}
-	for (i = 0; i < depths.slab_count; i++) {
-		int j;
-
-		if (!depths.slab[i])
-			continue;
-		for (j = 0; j < depths.slab_size; j++)
-			free(depths.slab[i][j]);
-	}
-	clear_commit_depth(&depths);
+	deep_clear_commit_depth(&depths, free_depth_in_slab);
 
 	return result;
 }
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 05/10] diff.h: drop diff_tree_oid() & friends' return value
  2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-06-05 13:00 ` [PATCH 04/10] commit-slab: add a function to deep free entries on the slab SZEDER Gábor via GitGitGadget
@ 2020-06-05 13:00 ` SZEDER Gábor via GitGitGadget
  2020-06-05 13:00 ` [PATCH 06/10] commit-graph: clean up #includes SZEDER Gábor via GitGitGadget
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-06-05 13:00 UTC (permalink / raw)
  To: git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee,
	SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

ll_diff_tree_oid() has only ever returned 0 [1], so it's return value
is basically useless.  It's only caller diff_tree_oid() has only ever
returned the return value of ll_diff_tree_oid() as-is [2], so its
return value is just as useless.  Most of diff_tree_oid()'s callers
simply ignore its return value, except:

  - diff_root_tree_oid() is a thin wrapper around diff_tree_oid() and
    returns with its return value, but all of diff_root_tree_oid()'s
    callers ignore its return value.

  - rev_compare_tree() and rev_same_tree_as_empty() do look at the
    return value in a condition, but, since the return value is always
    0, the former's < 0 condition is never fulfilled, while the
    latter's >= 0 condition is always fulfilled.

So let's drop the return value of ll_diff_tree_oid(), diff_tree_oid()
and diff_root_tree_oid(), and drop those conditions from
rev_compare_tree() and rev_same_tree_as_empty() as well.

[1] ll_diff_tree_oid() and its ancestors have been returning only 0
    ever since it was introduced as diff_tree() in 9174026cfe (Add
    "diff-tree" program to show which files have changed between two
    trees., 2005-04-09).
[2] diff_tree_oid() traces back to diff-tree.c:main() in 9174026cfe as
    well.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 diff.h      | 10 +++++-----
 revision.c  |  9 +++------
 tree-diff.c | 30 ++++++++++++++----------------
 3 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/diff.h b/diff.h
index 9443dc1b003..e0c0af6286b 100644
--- a/diff.h
+++ b/diff.h
@@ -431,11 +431,11 @@ struct combine_diff_path *diff_tree_paths(
 	struct combine_diff_path *p, const struct object_id *oid,
 	const struct object_id **parents_oid, int nparent,
 	struct strbuf *base, struct diff_options *opt);
-int diff_tree_oid(const struct object_id *old_oid,
-		  const struct object_id *new_oid,
-		  const char *base, struct diff_options *opt);
-int diff_root_tree_oid(const struct object_id *new_oid, const char *base,
-		       struct diff_options *opt);
+void diff_tree_oid(const struct object_id *old_oid,
+		   const struct object_id *new_oid,
+		   const char *base, struct diff_options *opt);
+void diff_root_tree_oid(const struct object_id *new_oid, const char *base,
+			struct diff_options *opt);
 
 struct combine_diff_path {
 	struct combine_diff_path *next;
diff --git a/revision.c b/revision.c
index cbf4b61aa67..c644c660917 100644
--- a/revision.c
+++ b/revision.c
@@ -791,9 +791,7 @@ static int rev_compare_tree(struct rev_info *revs,
 
 	tree_difference = REV_TREE_SAME;
 	revs->pruning.flags.has_changes = 0;
-	if (diff_tree_oid(&t1->object.oid, &t2->object.oid, "",
-			   &revs->pruning) < 0)
-		return REV_TREE_DIFFERENT;
+	diff_tree_oid(&t1->object.oid, &t2->object.oid, "", &revs->pruning);
 
 	if (!nth_parent)
 		if (bloom_ret == 1 && tree_difference == REV_TREE_SAME)
@@ -804,7 +802,6 @@ static int rev_compare_tree(struct rev_info *revs,
 
 static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
 {
-	int retval;
 	struct tree *t1 = get_commit_tree(commit);
 
 	if (!t1)
@@ -812,9 +809,9 @@ static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
 
 	tree_difference = REV_TREE_SAME;
 	revs->pruning.flags.has_changes = 0;
-	retval = diff_tree_oid(NULL, &t1->object.oid, "", &revs->pruning);
+	diff_tree_oid(NULL, &t1->object.oid, "", &revs->pruning);
 
-	return retval >= 0 && (tree_difference == REV_TREE_SAME);
+	return tree_difference == REV_TREE_SAME;
 }
 
 struct treesame_state {
diff --git a/tree-diff.c b/tree-diff.c
index f3d303c6e54..6ebad1a46f3 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -29,9 +29,9 @@ static struct combine_diff_path *ll_diff_tree_paths(
 	struct combine_diff_path *p, const struct object_id *oid,
 	const struct object_id **parents_oid, int nparent,
 	struct strbuf *base, struct diff_options *opt);
-static int ll_diff_tree_oid(const struct object_id *old_oid,
-			    const struct object_id *new_oid,
-			    struct strbuf *base, struct diff_options *opt);
+static void ll_diff_tree_oid(const struct object_id *old_oid,
+			     const struct object_id *new_oid,
+			     struct strbuf *base, struct diff_options *opt);
 
 /*
  * Compare two tree entries, taking into account only path/S_ISDIR(mode),
@@ -679,9 +679,9 @@ static void try_to_follow_renames(const struct object_id *old_oid,
 	q->nr = 1;
 }
 
-static int ll_diff_tree_oid(const struct object_id *old_oid,
-			    const struct object_id *new_oid,
-			    struct strbuf *base, struct diff_options *opt)
+static void ll_diff_tree_oid(const struct object_id *old_oid,
+			     const struct object_id *new_oid,
+			     struct strbuf *base, struct diff_options *opt)
 {
 	struct combine_diff_path phead, *p;
 	pathchange_fn_t pathchange_old = opt->pathchange;
@@ -697,29 +697,27 @@ static int ll_diff_tree_oid(const struct object_id *old_oid,
 	}
 
 	opt->pathchange = pathchange_old;
-	return 0;
 }
 
-int diff_tree_oid(const struct object_id *old_oid,
-		  const struct object_id *new_oid,
-		  const char *base_str, struct diff_options *opt)
+void diff_tree_oid(const struct object_id *old_oid,
+		   const struct object_id *new_oid,
+		   const char *base_str, struct diff_options *opt)
 {
 	struct strbuf base;
-	int retval;
 
 	strbuf_init(&base, PATH_MAX);
 	strbuf_addstr(&base, base_str);
 
-	retval = ll_diff_tree_oid(old_oid, new_oid, &base, opt);
+	ll_diff_tree_oid(old_oid, new_oid, &base, opt);
 	if (!*base_str && opt->flags.follow_renames && diff_might_be_rename())
 		try_to_follow_renames(old_oid, new_oid, &base, opt);
 
 	strbuf_release(&base);
-
-	return retval;
 }
 
-int diff_root_tree_oid(const struct object_id *new_oid, const char *base, struct diff_options *opt)
+void diff_root_tree_oid(const struct object_id *new_oid,
+			const char *base,
+			struct diff_options *opt)
 {
-	return diff_tree_oid(NULL, new_oid, base, opt);
+	diff_tree_oid(NULL, new_oid, base, opt);
 }
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 06/10] commit-graph: clean up #includes
  2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2020-06-05 13:00 ` [PATCH 05/10] diff.h: drop diff_tree_oid() & friends' return value SZEDER Gábor via GitGitGadget
@ 2020-06-05 13:00 ` SZEDER Gábor via GitGitGadget
  2020-06-05 13:00 ` [PATCH 07/10] commit-graph: simplify parse_commit_graph() #1 SZEDER Gábor via GitGitGadget
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-06-05 13:00 UTC (permalink / raw)
  To: git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee,
	SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

Our CodingGuidelines says that it's sufficient to include one of
'git-compat-util.h' and 'cache.h', but both 'commit-graph.c' and
'commit-graph.h' include both.  Let's include only 'git-compat-util.h'
to loose a bunch of unnecessary dependencies; but include 'hash.h',
because 'commit-graph.h' does require the definition of 'struct
object_id'.

'commit-graph.h' explicitly includes 'repository.h' and
'string-list.h', but only needs the declaration of a few structs from
them.  Drop these includes and forward-declare the necessary structs
instead.

'commit-graph.c' includes 'dir.h', but doesn't actually use anything
from there, so let's drop that #include as well.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 4 +---
 commit-graph.h | 6 +++---
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 7807d945626..6ed649388d6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1,7 +1,5 @@
-#include "cache.h"
-#include "config.h"
-#include "dir.h"
 #include "git-compat-util.h"
+#include "config.h"
 #include "lockfile.h"
 #include "pack.h"
 #include "packfile.h"
diff --git a/commit-graph.h b/commit-graph.h
index 39484482cc1..881c9b46e57 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -2,9 +2,6 @@
 #define COMMIT_GRAPH_H
 
 #include "git-compat-util.h"
-#include "repository.h"
-#include "string-list.h"
-#include "cache.h"
 #include "object-store.h"
 
 #define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH"
@@ -22,6 +19,9 @@ void git_test_write_commit_graph_or_die(void);
 
 struct commit;
 struct bloom_filter_settings;
+struct repository;
+struct raw_object_store;
+struct string_list;
 
 char *get_commit_graph_filename(struct object_directory *odb);
 int open_commit_graph(const char *graph_file, int *fd, struct stat *st);
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 07/10] commit-graph: simplify parse_commit_graph() #1
  2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2020-06-05 13:00 ` [PATCH 06/10] commit-graph: clean up #includes SZEDER Gábor via GitGitGadget
@ 2020-06-05 13:00 ` SZEDER Gábor via GitGitGadget
  2020-06-05 13:00 ` [PATCH 08/10] commit-graph: simplify parse_commit_graph() #2 SZEDER Gábor via GitGitGadget
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-06-05 13:00 UTC (permalink / raw)
  To: git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee,
	SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

While we iterate over all entries of the Chunk Lookup table we make
sure that we don't attempt to read past the end of the mmap-ed
commit-graph file, and check in each iteration that the chunk ID and
offset we are about to read is still within the mmap-ed memory region.
However, these checks in each iteration are not really necessary,
because the number of chunks in the commit-graph file is already known
before this loop from the just parsed commit-graph header.

So let's check that the commit-graph file is large enough for all
entries in the Chunk Lookup table before we start iterating over those
entries, and drop those per-iteration checks.  While at it, take into
account the size of everything that is necessary to have a valid
commit-graph file, i.e. the size of the header, the size of the
mandatory OID Fanout chunk, and the size of the signature in the
trailer as well.

Note that this necessitates the change of the error message as well,
and, consequently, have to update the 'detect incorrect chunk count'
test in 't5318-commit-graph.sh' as well.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c          | 16 +++++++++-------
 t/t5318-commit-graph.sh |  3 ++-
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6ed649388d6..9927762f18c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -272,6 +272,15 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 	graph->data = graph_map;
 	graph->data_len = graph_size;
 
+	if (graph_size < GRAPH_HEADER_SIZE +
+			 (graph->num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH +
+			 GRAPH_FANOUT_SIZE + the_hash_algo->rawsz) {
+		error(_("commit-graph file is too small to hold %u chunks"),
+		      graph->num_chunks);
+		free(graph);
+		return NULL;
+	}
+
 	last_chunk_id = 0;
 	last_chunk_offset = 8;
 	chunk_lookup = data + 8;
@@ -280,13 +289,6 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 		uint64_t chunk_offset;
 		int chunk_repeated = 0;
 
-		if (data + graph_size - chunk_lookup <
-		    GRAPH_CHUNKLOOKUP_WIDTH) {
-			error(_("commit-graph chunk lookup table entry missing; file may be incomplete"));
-			free(graph);
-			return NULL;
-		}
-
 		chunk_id = get_be32(chunk_lookup + 0);
 		chunk_offset = get_be64(chunk_lookup + 4);
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 79e7fbcd40e..1073f9e3cf2 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -574,7 +574,8 @@ test_expect_success 'detect invalid checksum hash' '
 
 test_expect_success 'detect incorrect chunk count' '
 	corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\377" \
-		"chunk lookup table entry missing" $GRAPH_CHUNK_LOOKUP_OFFSET
+		"commit-graph file is too small to hold [0-9]* chunks" \
+		$GRAPH_CHUNK_LOOKUP_OFFSET
 '
 
 test_expect_success 'git fsck (checks commit-graph)' '
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 08/10] commit-graph: simplify parse_commit_graph() #2
  2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2020-06-05 13:00 ` [PATCH 07/10] commit-graph: simplify parse_commit_graph() #1 SZEDER Gábor via GitGitGadget
@ 2020-06-05 13:00 ` SZEDER Gábor via GitGitGadget
  2020-06-05 13:00 ` [PATCH 09/10] commit-graph: simplify write_commit_graph_file() #1 SZEDER Gábor via GitGitGadget
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-06-05 13:00 UTC (permalink / raw)
  To: git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee,
	SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

The Chunk Lookup table stores the chunks' starting offset in the
commit-graph file, not their sizes.  Consequently, the size of a chunk
can only be calculated by subtracting its offset from the offset of
the subsequent chunk (or that of the terminating label).  This is
currenly implemented in a bit complicated way: as we iterate over the
entries of the Chunk Lookup table, we check the id of each chunk and
store its starting offset, then we check the id of the last seen chunk
and calculate its size using its previously saved offset.  At the
moment there is only one chunk for which we calculate its size, but
this patch series will add more, and the repeated chunk id checks are
not that pretty.

Instead let's read ahead the offset of the next chunk on each
iteration, so we can calculate the size of each chunk right away,
right where we store its starting offset.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 9927762f18c..84206f0f512 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -230,8 +230,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 	const unsigned char *data, *chunk_lookup;
 	uint32_t i;
 	struct commit_graph *graph;
-	uint64_t last_chunk_offset;
-	uint32_t last_chunk_id;
+	uint64_t next_chunk_offset;
 	uint32_t graph_signature;
 	unsigned char graph_version, hash_version;
 
@@ -281,18 +280,17 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 		return NULL;
 	}
 
-	last_chunk_id = 0;
-	last_chunk_offset = 8;
 	chunk_lookup = data + 8;
-	for (i = 0; i <= graph->num_chunks; i++) {
+	next_chunk_offset = get_be64(chunk_lookup + 4);
+	for (i = 0; i < graph->num_chunks; i++) {
 		uint32_t chunk_id;
-		uint64_t chunk_offset;
+		uint64_t chunk_offset = next_chunk_offset;
 		int chunk_repeated = 0;
 
 		chunk_id = get_be32(chunk_lookup + 0);
-		chunk_offset = get_be64(chunk_lookup + 4);
 
 		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
+		next_chunk_offset = get_be64(chunk_lookup + 4);
 
 		if (chunk_offset > graph_size - the_hash_algo->rawsz) {
 			error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
@@ -312,8 +310,11 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 		case GRAPH_CHUNKID_OIDLOOKUP:
 			if (graph->chunk_oid_lookup)
 				chunk_repeated = 1;
-			else
+			else {
 				graph->chunk_oid_lookup = data + chunk_offset;
+				graph->num_commits = (next_chunk_offset - chunk_offset)
+						     / graph->hash_len;
+			}
 			break;
 
 		case GRAPH_CHUNKID_DATA:
@@ -368,15 +369,6 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd,
 			free(graph);
 			return NULL;
 		}
-
-		if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
-		{
-			graph->num_commits = (chunk_offset - last_chunk_offset)
-					     / graph->hash_len;
-		}
-
-		last_chunk_id = chunk_id;
-		last_chunk_offset = chunk_offset;
 	}
 
 	if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 09/10] commit-graph: simplify write_commit_graph_file() #1
  2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2020-06-05 13:00 ` [PATCH 08/10] commit-graph: simplify parse_commit_graph() #2 SZEDER Gábor via GitGitGadget
@ 2020-06-05 13:00 ` SZEDER Gábor via GitGitGadget
  2020-06-05 13:00 ` [PATCH 10/10] commit-graph: simplify write_commit_graph_file() #2 SZEDER Gábor via GitGitGadget
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 16+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-06-05 13:00 UTC (permalink / raw)
  To: git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee,
	SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

In write_commit_graph_file() one block of code fills the array of
chunk IDs, another block of code fills the array of chunk offsets,
then the chunk IDs and offsets are written to the Chunk Lookup table,
and finally a third block of code writes the actual chunks.  In case
of optional chunks like Extra Edge List and Base Graphs List there is
also a condition checking whether that chunk is necessary/desired, and
that same condition is repeated in all those three blocks of code.
This patch series is about to add more optional chunks, so there would
be even more repeated conditions.

Those chunk offsets are relative to the beginning of the file, so they
inherently depend on the size of the Chunk Lookup table, which in turn
depends on the number of chunks that are to be written to the
commit-graph file.  IOW at the time we set the first chunk's ID we
can't yet know its offset, because we don't yet know how many chunks
there are.

Simplify this by initially filling an array of chunk sizes, not
offsets, and calculate the offsets based on the chunk sizes only
later, while we are writing the Chunk Lookup table.  This way we can
fill the arrays of chunk IDs and sizes in one go, eliminating one set
of repeated conditions.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 46 +++++++++++++++++-----------------------------
 1 file changed, 17 insertions(+), 29 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 84206f0f512..79cddabcd12 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1529,10 +1529,11 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	struct hashfile *f;
 	struct lock_file lk = LOCK_INIT;
 	uint32_t chunk_ids[MAX_NUM_CHUNKS + 1];
-	uint64_t chunk_offsets[MAX_NUM_CHUNKS + 1];
+	uint64_t chunk_sizes[MAX_NUM_CHUNKS + 1];
 	const unsigned hashsz = the_hash_algo->rawsz;
 	struct strbuf progress_title = STRBUF_INIT;
 	int num_chunks = 3;
+	uint64_t chunk_offset;
 	struct object_id file_hash;
 	const struct bloom_filter_settings bloom_settings = DEFAULT_BLOOM_FILTER_SETTINGS;
 
@@ -1573,50 +1574,34 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	}
 
 	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
+	chunk_sizes[0] = GRAPH_FANOUT_SIZE;
 	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
+	chunk_sizes[1] = hashsz * ctx->commits.nr;
 	chunk_ids[2] = GRAPH_CHUNKID_DATA;
+	chunk_sizes[2] = (hashsz + 16) * ctx->commits.nr;
+
 	if (ctx->num_extra_edges) {
 		chunk_ids[num_chunks] = GRAPH_CHUNKID_EXTRAEDGES;
+		chunk_sizes[num_chunks] = 4 * ctx->num_extra_edges;
 		num_chunks++;
 	}
 	if (ctx->changed_paths) {
 		chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMINDEXES;
+		chunk_sizes[num_chunks] = sizeof(uint32_t) * ctx->commits.nr;
 		num_chunks++;
 		chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMDATA;
+		chunk_sizes[num_chunks] = sizeof(uint32_t) * 3
+					  + ctx->total_bloom_filter_data_size;
 		num_chunks++;
 	}
 	if (ctx->num_commit_graphs_after > 1) {
 		chunk_ids[num_chunks] = GRAPH_CHUNKID_BASE;
+		chunk_sizes[num_chunks] = hashsz * (ctx->num_commit_graphs_after - 1);
 		num_chunks++;
 	}
 
 	chunk_ids[num_chunks] = 0;
-
-	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
-	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-	chunk_offsets[2] = chunk_offsets[1] + hashsz * ctx->commits.nr;
-	chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * ctx->commits.nr;
-
-	num_chunks = 3;
-	if (ctx->num_extra_edges) {
-		chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] +
-						4 * ctx->num_extra_edges;
-		num_chunks++;
-	}
-	if (ctx->changed_paths) {
-		chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] +
-						sizeof(uint32_t) * ctx->commits.nr;
-		num_chunks++;
-
-		chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] +
-						sizeof(uint32_t) * 3 + ctx->total_bloom_filter_data_size;
-		num_chunks++;
-	}
-	if (ctx->num_commit_graphs_after > 1) {
-		chunk_offsets[num_chunks + 1] = chunk_offsets[num_chunks] +
-						hashsz * (ctx->num_commit_graphs_after - 1);
-		num_chunks++;
-	}
+	chunk_sizes[num_chunks] = 0;
 
 	hashwrite_be32(f, GRAPH_SIGNATURE);
 
@@ -1625,13 +1610,16 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	hashwrite_u8(f, num_chunks);
 	hashwrite_u8(f, ctx->num_commit_graphs_after - 1);
 
+	chunk_offset = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
 	for (i = 0; i <= num_chunks; i++) {
 		uint32_t chunk_write[3];
 
 		chunk_write[0] = htonl(chunk_ids[i]);
-		chunk_write[1] = htonl(chunk_offsets[i] >> 32);
-		chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
+		chunk_write[1] = htonl(chunk_offset >> 32);
+		chunk_write[2] = htonl(chunk_offset & 0xffffffff);
 		hashwrite(f, chunk_write, 12);
+
+		chunk_offset += chunk_sizes[i];
 	}
 
 	if (ctx->report_progress) {
-- 
gitgitgadget


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 10/10] commit-graph: simplify write_commit_graph_file() #2
  2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2020-06-05 13:00 ` [PATCH 09/10] commit-graph: simplify write_commit_graph_file() #1 SZEDER Gábor via GitGitGadget
@ 2020-06-05 13:00 ` SZEDER Gábor via GitGitGadget
  2020-06-08 17:39 ` [PATCH 00/10] Szeder's commit-graph cleanups Junio C Hamano
  2020-06-18  1:48 ` Derrick Stolee
  11 siblings, 0 replies; 16+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-06-05 13:00 UTC (permalink / raw)
  To: git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee,
	SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

Unify the 'chunk_ids' and 'chunk_sizes' arrays into an array of
'struct chunk_info'.  This will allow more cleanups in the following
patches.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 45 ++++++++++++++++++++++++---------------------
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 79cddabcd12..887837e8826 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -1522,14 +1522,18 @@ static int write_graph_chunk_base(struct hashfile *f,
 	return 0;
 }
 
+struct chunk_info {
+	uint32_t id;
+	uint64_t size;
+};
+
 static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 {
 	uint32_t i;
 	int fd;
 	struct hashfile *f;
 	struct lock_file lk = LOCK_INIT;
-	uint32_t chunk_ids[MAX_NUM_CHUNKS + 1];
-	uint64_t chunk_sizes[MAX_NUM_CHUNKS + 1];
+	struct chunk_info chunks[MAX_NUM_CHUNKS + 1];
 	const unsigned hashsz = the_hash_algo->rawsz;
 	struct strbuf progress_title = STRBUF_INIT;
 	int num_chunks = 3;
@@ -1573,35 +1577,34 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 		f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
 	}
 
-	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
-	chunk_sizes[0] = GRAPH_FANOUT_SIZE;
-	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
-	chunk_sizes[1] = hashsz * ctx->commits.nr;
-	chunk_ids[2] = GRAPH_CHUNKID_DATA;
-	chunk_sizes[2] = (hashsz + 16) * ctx->commits.nr;
-
+	chunks[0].id = GRAPH_CHUNKID_OIDFANOUT;
+	chunks[0].size = GRAPH_FANOUT_SIZE;
+	chunks[1].id = GRAPH_CHUNKID_OIDLOOKUP;
+	chunks[1].size = hashsz * ctx->commits.nr;
+	chunks[2].id = GRAPH_CHUNKID_DATA;
+	chunks[2].size = (hashsz + 16) * ctx->commits.nr;
 	if (ctx->num_extra_edges) {
-		chunk_ids[num_chunks] = GRAPH_CHUNKID_EXTRAEDGES;
-		chunk_sizes[num_chunks] = 4 * ctx->num_extra_edges;
+		chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES;
+		chunks[num_chunks].size = 4 * ctx->num_extra_edges;
 		num_chunks++;
 	}
 	if (ctx->changed_paths) {
-		chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMINDEXES;
-		chunk_sizes[num_chunks] = sizeof(uint32_t) * ctx->commits.nr;
+		chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMINDEXES;
+		chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr;
 		num_chunks++;
-		chunk_ids[num_chunks] = GRAPH_CHUNKID_BLOOMDATA;
-		chunk_sizes[num_chunks] = sizeof(uint32_t) * 3
+		chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMDATA;
+		chunks[num_chunks].size = sizeof(uint32_t) * 3
 					  + ctx->total_bloom_filter_data_size;
 		num_chunks++;
 	}
 	if (ctx->num_commit_graphs_after > 1) {
-		chunk_ids[num_chunks] = GRAPH_CHUNKID_BASE;
-		chunk_sizes[num_chunks] = hashsz * (ctx->num_commit_graphs_after - 1);
+		chunks[num_chunks].id = GRAPH_CHUNKID_BASE;
+		chunks[num_chunks].size = hashsz * (ctx->num_commit_graphs_after - 1);
 		num_chunks++;
 	}
 
-	chunk_ids[num_chunks] = 0;
-	chunk_sizes[num_chunks] = 0;
+	chunks[num_chunks].id = 0;
+	chunks[num_chunks].size = 0;
 
 	hashwrite_be32(f, GRAPH_SIGNATURE);
 
@@ -1614,12 +1617,12 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
 	for (i = 0; i <= num_chunks; i++) {
 		uint32_t chunk_write[3];
 
-		chunk_write[0] = htonl(chunk_ids[i]);
+		chunk_write[0] = htonl(chunks[i].id);
 		chunk_write[1] = htonl(chunk_offset >> 32);
 		chunk_write[2] = htonl(chunk_offset & 0xffffffff);
 		hashwrite(f, chunk_write, 12);
 
-		chunk_offset += chunk_sizes[i];
+		chunk_offset += chunks[i].size;
 	}
 
 	if (ctx->report_progress) {
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/10] Szeder's commit-graph cleanups
  2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
                   ` (9 preceding siblings ...)
  2020-06-05 13:00 ` [PATCH 10/10] commit-graph: simplify write_commit_graph_file() #2 SZEDER Gábor via GitGitGadget
@ 2020-06-08 17:39 ` Junio C Hamano
  2020-06-18  1:48 ` Derrick Stolee
  11 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2020-06-08 17:39 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This is based on ds/line-log-on-bloom.
>
> Since Szeder so kindly shared his alternate Bloom filter implementation [1],
> I thought it worth my time to start the process of updating the patches to
> apply to more recent code in Git. Here is the effort to update the almost
> obviously-good commit-graph cleanups that he presented in that series.

Very pleased to see collaboration.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 00/10] Szeder's commit-graph cleanups
  2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
                   ` (10 preceding siblings ...)
  2020-06-08 17:39 ` [PATCH 00/10] Szeder's commit-graph cleanups Junio C Hamano
@ 2020-06-18  1:48 ` Derrick Stolee
  11 siblings, 0 replies; 16+ messages in thread
From: Derrick Stolee @ 2020-06-18  1:48 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget, git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee

On 6/5/2020 9:00 AM, Derrick Stolee via GitGitGadget wrote:
> This is based on ds/line-log-on-bloom.
> 
> Since Szeder so kindly shared his alternate Bloom filter implementation [1],
> I thought it worth my time to start the process of updating the patches to
> apply to more recent code in Git. Here is the effort to update the almost
> obviously-good commit-graph cleanups that he presented in that series.
> 
> [1] https://lore.kernel.org/git/20200529085038.26008-1-szeder.dev@gmail.com/

Hello everyone,

Is anyone available to look at this series? Hopefully all of the changes
are simple to read. Szeder did a great job crafting small patches, and
hopefully I didn't butcher them too much applying them to recent commits.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 04/10] commit-slab: add a function to deep free entries on the slab
  2020-06-05 13:00 ` [PATCH 04/10] commit-slab: add a function to deep free entries on the slab SZEDER Gábor via GitGitGadget
@ 2020-06-18 20:59   ` René Scharfe
  2020-06-19 12:52     ` Derrick Stolee
  2020-06-27 15:53   ` SZEDER Gábor
  1 sibling, 1 reply; 16+ messages in thread
From: René Scharfe @ 2020-06-18 20:59 UTC (permalink / raw)
  To: SZEDER Gábor via GitGitGadget, git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee

Am 05.06.20 um 15:00 schrieb SZEDER Gábor via GitGitGadget:
> From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>
>
> clear_##slabname() frees only the memory allocated for a commit slab
> itself, but entries in the commit slab might own additional memory
> outside the slab that should be freed as well.  We already have (at
> least) one such commit slab, and this patch series is about to add one
> more.
>
> To free all additional memory owned by entries on the commit slab the
> user of such a slab could iterate over all commits it knows about,
> peek whether there is a valid entry associated with each commit, and
> free the additional memory, if any.  Or it could rely on intimate
> knowledge about the internals of the commit slab implementation, and
> could itself iterate directly through all entries in the slab, and
> free the additional memory.  Or it could just leak the additional
> memory...
>
> Introduce deep_clear_##slabname() to allow releasing memory owned by
> commit slab entries by invoking the 'void free_fn(elemtype *ptr)'
> function specified as parameter for each entry in the slab.

Adding a new function instead of extending the existing ones makes
sense, as this is a rare requirement.

>
> Use it in get_shallow_commits() in 'shallow.c' to replace an
> open-coded iteration over a commit slab's entries.
>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit-slab-decl.h |  1 +
>  commit-slab-impl.h | 13 +++++++++++++
>  commit-slab.h      | 10 ++++++++++
>  shallow.c          | 14 +++++---------
>  4 files changed, 29 insertions(+), 9 deletions(-)
>
> diff --git a/commit-slab-decl.h b/commit-slab-decl.h
> index adc7b46c83b..286164b7e27 100644
> --- a/commit-slab-decl.h
> +++ b/commit-slab-decl.h
> @@ -32,6 +32,7 @@ struct slabname {							\
>  void init_ ##slabname## _with_stride(struct slabname *s, unsigned stride); \
>  void init_ ##slabname(struct slabname *s);				\
>  void clear_ ##slabname(struct slabname *s);				\
> +void deep_clear_ ##slabname(struct slabname *s, void (*free_fn)(elemtype *ptr)); \
>  elemtype *slabname## _at_peek(struct slabname *s, const struct commit *c, int add_if_missing); \
>  elemtype *slabname## _at(struct slabname *s, const struct commit *c);	\
>  elemtype *slabname## _peek(struct slabname *s, const struct commit *c)
> diff --git a/commit-slab-impl.h b/commit-slab-impl.h
> index 5c0eb91a5d1..557738df271 100644
> --- a/commit-slab-impl.h
> +++ b/commit-slab-impl.h
> @@ -38,6 +38,19 @@ scope void clear_ ##slabname(struct slabname *s)			\
>  	FREE_AND_NULL(s->slab);						\
>  }									\
>  									\
> +scope void deep_clear_ ##slabname(struct slabname *s, void (*free_fn)(elemtype *)) \
> +{									\
> +	unsigned int i;							\
> +	for (i = 0; i < s->slab_count; i++) {				\
> +		unsigned int j;						\
> +		if (!s->slab[i])					\
> +			continue;					\
> +		for (j = 0; j < s->slab_size; j++)			\
> +			free_fn(&s->slab[i][j * s->stride]);		\
> +	}								\
> +	clear_ ##slabname(s);						> +}									\


Why pass an elemtype pointer to the callback function instead of
a plain elemtype?  Because it matches the return type of _at() and
_peek().  Consistency, good.  Handing it a pointer allows the
callback to pass it on to free(), though, which would be bad,
since we do that in clear_() as well.  Hmm.

> +									\
>  scope elemtype *slabname## _at_peek(struct slabname *s,			\
>  						  const struct commit *c, \
>  						  int add_if_missing)   \
> diff --git a/commit-slab.h b/commit-slab.h
> index 05b3f2804e7..8e72a305365 100644
> --- a/commit-slab.h
> +++ b/commit-slab.h
> @@ -47,6 +47,16 @@
>   *
>   *   Call this function before the slab falls out of scope to avoid
>   *   leaking memory.
> + *
> + * - void deep_clear_indegree(struct indegree *, void (*free_fn)(int*))
> + *
> + *   Empties the slab, similar to clear_indegree(), but in addition it
> + *   calls the given 'free_fn' for each slab entry to release any
> + *   additional memory that might be owned by the entry (but not the
> + *   entry itself!).
> + *   Note that 'free_fn' might be called even for entries for which no
> + *   indegree_at() call has been made; in this case 'free_fn' is invoked
> + *   with a pointer to a zero-initialized location.
>   */
>
>  #define define_commit_slab(slabname, elemtype) \
> diff --git a/shallow.c b/shallow.c
> index 7fd04afed19..c4ac8a73273 100644
> --- a/shallow.c
> +++ b/shallow.c
> @@ -84,6 +84,10 @@ int is_repository_shallow(struct repository *r)
>   * supports a "valid" flag.
>   */
>  define_commit_slab(commit_depth, int *);
> +static void free_depth_in_slab(int **ptr)
> +{
> +	FREE_AND_NULL(*ptr);
> +}

Why FREE_AND_NULL?  The original loop below called free().  The slabs
are all released by deep_clear_() immediately after the callbacks are
done anyway, so what's the point in zeroing these pointers?

>  struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
>  		int shallow_flag, int not_shallow_flag)
>  {
> @@ -150,15 +154,7 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
>  			}
>  		}
>  	}
> -	for (i = 0; i < depths.slab_count; i++) {
> -		int j;
> -
> -		if (!depths.slab[i])
> -			continue;
> -		for (j = 0; j < depths.slab_size; j++)
> -			free(depths.slab[i][j]);
> -	}
> -	clear_commit_depth(&depths);
> +	deep_clear_commit_depth(&depths, free_depth_in_slab);

What a relief!

>
>  	return result;
>  }
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 04/10] commit-slab: add a function to deep free entries on the slab
  2020-06-18 20:59   ` René Scharfe
@ 2020-06-19 12:52     ` Derrick Stolee
  0 siblings, 0 replies; 16+ messages in thread
From: Derrick Stolee @ 2020-06-19 12:52 UTC (permalink / raw)
  To: René Scharfe, SZEDER Gábor via GitGitGadget, git
  Cc: me, szeder.dev, jnareb, peff, garimasigit, Derrick Stolee

On 6/18/2020 4:59 PM, René Scharfe wrote:
> Am 05.06.20 um 15:00 schrieb SZEDER Gábor via GitGitGadget:
>> From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>
>>
>> clear_##slabname() frees only the memory allocated for a commit slab
>> itself, but entries in the commit slab might own additional memory
>> outside the slab that should be freed as well.  We already have (at
>> least) one such commit slab, and this patch series is about to add one
>> more.
>>
>> To free all additional memory owned by entries on the commit slab the
>> user of such a slab could iterate over all commits it knows about,
>> peek whether there is a valid entry associated with each commit, and
>> free the additional memory, if any.  Or it could rely on intimate
>> knowledge about the internals of the commit slab implementation, and
>> could itself iterate directly through all entries in the slab, and
>> free the additional memory.  Or it could just leak the additional
>> memory...
>>
>> Introduce deep_clear_##slabname() to allow releasing memory owned by
>> commit slab entries by invoking the 'void free_fn(elemtype *ptr)'
>> function specified as parameter for each entry in the slab.
> 
> Adding a new function instead of extending the existing ones makes
> sense, as this is a rare requirement.
> 
>>
>> Use it in get_shallow_commits() in 'shallow.c' to replace an
>> open-coded iteration over a commit slab's entries.
>>
>> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  commit-slab-decl.h |  1 +
>>  commit-slab-impl.h | 13 +++++++++++++
>>  commit-slab.h      | 10 ++++++++++
>>  shallow.c          | 14 +++++---------
>>  4 files changed, 29 insertions(+), 9 deletions(-)
>>
>> diff --git a/commit-slab-decl.h b/commit-slab-decl.h
>> index adc7b46c83b..286164b7e27 100644
>> --- a/commit-slab-decl.h
>> +++ b/commit-slab-decl.h
>> @@ -32,6 +32,7 @@ struct slabname {							\
>>  void init_ ##slabname## _with_stride(struct slabname *s, unsigned stride); \
>>  void init_ ##slabname(struct slabname *s);				\
>>  void clear_ ##slabname(struct slabname *s);				\
>> +void deep_clear_ ##slabname(struct slabname *s, void (*free_fn)(elemtype *ptr)); \
>>  elemtype *slabname## _at_peek(struct slabname *s, const struct commit *c, int add_if_missing); \
>>  elemtype *slabname## _at(struct slabname *s, const struct commit *c);	\
>>  elemtype *slabname## _peek(struct slabname *s, const struct commit *c)
>> diff --git a/commit-slab-impl.h b/commit-slab-impl.h
>> index 5c0eb91a5d1..557738df271 100644
>> --- a/commit-slab-impl.h
>> +++ b/commit-slab-impl.h
>> @@ -38,6 +38,19 @@ scope void clear_ ##slabname(struct slabname *s)			\
>>  	FREE_AND_NULL(s->slab);						\
>>  }									\
>>  									\
>> +scope void deep_clear_ ##slabname(struct slabname *s, void (*free_fn)(elemtype *)) \
>> +{									\
>> +	unsigned int i;							\
>> +	for (i = 0; i < s->slab_count; i++) {				\
>> +		unsigned int j;						\
>> +		if (!s->slab[i])					\
>> +			continue;					\
>> +		for (j = 0; j < s->slab_size; j++)			\
>> +			free_fn(&s->slab[i][j * s->stride]);		\
>> +	}								\
>> +	clear_ ##slabname(s);						> +}									\
> 
> 
> Why pass an elemtype pointer to the callback function instead of
> a plain elemtype?  Because it matches the return type of _at() and
> _peek().  Consistency, good.  Handing it a pointer allows the
> callback to pass it on to free(), though, which would be bad,
> since we do that in clear_() as well.  Hmm.
> 
>> +									\
>>  scope elemtype *slabname## _at_peek(struct slabname *s,			\
>>  						  const struct commit *c, \
>>  						  int add_if_missing)   \
>> diff --git a/commit-slab.h b/commit-slab.h
>> index 05b3f2804e7..8e72a305365 100644
>> --- a/commit-slab.h
>> +++ b/commit-slab.h
>> @@ -47,6 +47,16 @@
>>   *
>>   *   Call this function before the slab falls out of scope to avoid
>>   *   leaking memory.
>> + *
>> + * - void deep_clear_indegree(struct indegree *, void (*free_fn)(int*))
>> + *
>> + *   Empties the slab, similar to clear_indegree(), but in addition it
>> + *   calls the given 'free_fn' for each slab entry to release any
>> + *   additional memory that might be owned by the entry (but not the
>> + *   entry itself!).
>> + *   Note that 'free_fn' might be called even for entries for which no
>> + *   indegree_at() call has been made; in this case 'free_fn' is invoked
>> + *   with a pointer to a zero-initialized location.
>>   */
>>
>>  #define define_commit_slab(slabname, elemtype) \
>> diff --git a/shallow.c b/shallow.c
>> index 7fd04afed19..c4ac8a73273 100644
>> --- a/shallow.c
>> +++ b/shallow.c
>> @@ -84,6 +84,10 @@ int is_repository_shallow(struct repository *r)
>>   * supports a "valid" flag.
>>   */
>>  define_commit_slab(commit_depth, int *);
>> +static void free_depth_in_slab(int **ptr)
>> +{
>> +	FREE_AND_NULL(*ptr);
>> +}
> 
> Why FREE_AND_NULL?  The original loop below called free().  The slabs
> are all released by deep_clear_() immediately after the callbacks are
> done anyway, so what's the point in zeroing these pointers?

I think the point was that a later change was going to free
elements in the slab on a one-by-one basis while computing
the filters, to save memory overall. To be future-proof
against such a change, we need to NULL the pointers here.

Perhaps that viewpoint also answers your other comment about
"why pass the pointer?"

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 04/10] commit-slab: add a function to deep free entries on the slab
  2020-06-05 13:00 ` [PATCH 04/10] commit-slab: add a function to deep free entries on the slab SZEDER Gábor via GitGitGadget
  2020-06-18 20:59   ` René Scharfe
@ 2020-06-27 15:53   ` SZEDER Gábor
  1 sibling, 0 replies; 16+ messages in thread
From: SZEDER Gábor @ 2020-06-27 15:53 UTC (permalink / raw)
  To: SZEDER Gábor via GitGitGadget
  Cc: git, me, jnareb, peff, garimasigit, Derrick Stolee

On Fri, Jun 05, 2020 at 01:00:26PM +0000, SZEDER Gábor via GitGitGadget wrote:
> From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>
> 
> clear_##slabname() frees only the memory allocated for a commit slab
> itself, but entries in the commit slab might own additional memory
> outside the slab that should be freed as well.  We already have (at
> least) one such commit slab, and this patch series is about to add one
> more.

This was only true in my original submission, but not anymore: this
patch series doesn't add another such slab, and, more importantly, now
we have at least two such commit slabs.  deep_clear_##slabnmae() could
be used to clear the bloom_filter_slab and all memory attached to it,
which at the moment is just leaked.

> To free all additional memory owned by entries on the commit slab the
> user of such a slab could iterate over all commits it knows about,
> peek whether there is a valid entry associated with each commit, and
> free the additional memory, if any.  Or it could rely on intimate
> knowledge about the internals of the commit slab implementation, and
> could itself iterate directly through all entries in the slab, and
> free the additional memory.  Or it could just leak the additional
> memory...
> 
> Introduce deep_clear_##slabname() to allow releasing memory owned by
> commit slab entries by invoking the 'void free_fn(elemtype *ptr)'
> function specified as parameter for each entry in the slab.
> 
> Use it in get_shallow_commits() in 'shallow.c' to replace an
> open-coded iteration over a commit slab's entries.
> 
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  commit-slab-decl.h |  1 +
>  commit-slab-impl.h | 13 +++++++++++++
>  commit-slab.h      | 10 ++++++++++
>  shallow.c          | 14 +++++---------
>  4 files changed, 29 insertions(+), 9 deletions(-)
> 
> diff --git a/commit-slab-decl.h b/commit-slab-decl.h
> index adc7b46c83b..286164b7e27 100644
> --- a/commit-slab-decl.h
> +++ b/commit-slab-decl.h
> @@ -32,6 +32,7 @@ struct slabname {							\
>  void init_ ##slabname## _with_stride(struct slabname *s, unsigned stride); \
>  void init_ ##slabname(struct slabname *s);				\
>  void clear_ ##slabname(struct slabname *s);				\
> +void deep_clear_ ##slabname(struct slabname *s, void (*free_fn)(elemtype *ptr)); \
>  elemtype *slabname## _at_peek(struct slabname *s, const struct commit *c, int add_if_missing); \
>  elemtype *slabname## _at(struct slabname *s, const struct commit *c);	\
>  elemtype *slabname## _peek(struct slabname *s, const struct commit *c)
> diff --git a/commit-slab-impl.h b/commit-slab-impl.h
> index 5c0eb91a5d1..557738df271 100644
> --- a/commit-slab-impl.h
> +++ b/commit-slab-impl.h
> @@ -38,6 +38,19 @@ scope void clear_ ##slabname(struct slabname *s)			\
>  	FREE_AND_NULL(s->slab);						\
>  }									\
>  									\
> +scope void deep_clear_ ##slabname(struct slabname *s, void (*free_fn)(elemtype *)) \
> +{									\
> +	unsigned int i;							\
> +	for (i = 0; i < s->slab_count; i++) {				\
> +		unsigned int j;						\
> +		if (!s->slab[i])					\
> +			continue;					\
> +		for (j = 0; j < s->slab_size; j++)			\
> +			free_fn(&s->slab[i][j * s->stride]);		\
> +	}								\
> +	clear_ ##slabname(s);						\
> +}									\
> +									\
>  scope elemtype *slabname## _at_peek(struct slabname *s,			\
>  						  const struct commit *c, \
>  						  int add_if_missing)   \
> diff --git a/commit-slab.h b/commit-slab.h
> index 05b3f2804e7..8e72a305365 100644
> --- a/commit-slab.h
> +++ b/commit-slab.h
> @@ -47,6 +47,16 @@
>   *
>   *   Call this function before the slab falls out of scope to avoid
>   *   leaking memory.
> + *
> + * - void deep_clear_indegree(struct indegree *, void (*free_fn)(int*))
> + *
> + *   Empties the slab, similar to clear_indegree(), but in addition it
> + *   calls the given 'free_fn' for each slab entry to release any
> + *   additional memory that might be owned by the entry (but not the
> + *   entry itself!).
> + *   Note that 'free_fn' might be called even for entries for which no
> + *   indegree_at() call has been made; in this case 'free_fn' is invoked
> + *   with a pointer to a zero-initialized location.
>   */
>  
>  #define define_commit_slab(slabname, elemtype) \
> diff --git a/shallow.c b/shallow.c
> index 7fd04afed19..c4ac8a73273 100644
> --- a/shallow.c
> +++ b/shallow.c
> @@ -84,6 +84,10 @@ int is_repository_shallow(struct repository *r)
>   * supports a "valid" flag.
>   */
>  define_commit_slab(commit_depth, int *);
> +static void free_depth_in_slab(int **ptr)
> +{
> +	FREE_AND_NULL(*ptr);
> +}
>  struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
>  		int shallow_flag, int not_shallow_flag)
>  {
> @@ -150,15 +154,7 @@ struct commit_list *get_shallow_commits(struct object_array *heads, int depth,
>  			}
>  		}
>  	}
> -	for (i = 0; i < depths.slab_count; i++) {
> -		int j;
> -
> -		if (!depths.slab[i])
> -			continue;
> -		for (j = 0; j < depths.slab_size; j++)
> -			free(depths.slab[i][j]);
> -	}
> -	clear_commit_depth(&depths);
> +	deep_clear_commit_depth(&depths, free_depth_in_slab);
>  
>  	return result;
>  }
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, back to index

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-05 13:00 [PATCH 00/10] Szeder's commit-graph cleanups Derrick Stolee via GitGitGadget
2020-06-05 13:00 ` [PATCH 01/10] tree-walk.c: don't match submodule entries for 'submod/anything' SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 02/10] commit-graph: fix parsing the Chunk Lookup table SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 03/10] commit-graph-format.txt: all multi-byte numbers are in network byte order SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 04/10] commit-slab: add a function to deep free entries on the slab SZEDER Gábor via GitGitGadget
2020-06-18 20:59   ` René Scharfe
2020-06-19 12:52     ` Derrick Stolee
2020-06-27 15:53   ` SZEDER Gábor
2020-06-05 13:00 ` [PATCH 05/10] diff.h: drop diff_tree_oid() & friends' return value SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 06/10] commit-graph: clean up #includes SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 07/10] commit-graph: simplify parse_commit_graph() #1 SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 08/10] commit-graph: simplify parse_commit_graph() #2 SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 09/10] commit-graph: simplify write_commit_graph_file() #1 SZEDER Gábor via GitGitGadget
2020-06-05 13:00 ` [PATCH 10/10] commit-graph: simplify write_commit_graph_file() #2 SZEDER Gábor via GitGitGadget
2020-06-08 17:39 ` [PATCH 00/10] Szeder's commit-graph cleanups Junio C Hamano
2020-06-18  1:48 ` Derrick Stolee

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git