git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v5 00/13] Serialized Git Commit Graph
@ 2018-02-27  2:32 Derrick Stolee
  2018-02-27  2:32 ` [PATCH v5 01/13] commit-graph: add format document Derrick Stolee
                   ` (14 more replies)
  0 siblings, 15 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:32 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

This patch series is another big difference from version 4, but I do 
think we are converging on a stable design.

This series depends on a few things in flight:

* jt/binsearch-with-fanout for bsearch_graph()

* 'master' includes the sha1file -> hashfile rename in (98a3beab).

* [PATCH] commit: drop uses of get_cached_commit_buffer(). [1] I
  couldn't find a ds/* branch for this one, but it is necessary or
  else the commit graph test script should fail.

Here are some of the inter-patch changes:

* The single commit graph file is stored in the fixed filename
  .git/objects/info/commit-graph

* Because of this change, I struggled with the right way to pair the
  lockfile API with the hashfile API. Perhaps they were not meant to
  interact like this. I include a new patch step that adds a flag for
  hashclose() to keep the file descriptor open so commit_lock_file()
  can succeed. Please let me know if this is the wrong approach.

* A side-benefit of this change is that the "--set-latest" and
  "--delete-expired" arguments are no longer useful.

* I re-ran the performance tests since I rebased onto master. I had
  moved my "master" branch on my copy of Linux from another perf test,
  which changed the data shape a bit.

* There was some confusion between v3 and v4 about whether commits in
  an existing commit-graph file are automatically added to the new
  file during a write. I think I cleared up all of the documentation
  that referenced this to the new behavior: we only include commits
  reachable from the starting commits (depending on --stdin-commits,
  --stdin-packs, or neither) unless the new "--additive" argument
  is specified.

Thanks,
-Stolee

[1] https://public-inbox.org/git/1519240631-221761-1-git-send-email-dstolee@microsoft.com/

-- >8 --

This patch contains a way to serialize the commit graph.

The current implementation defines a new file format to store the graph
structure (parent relationships) and basic commit metadata (commit date,
root tree OID) in order to prevent parsing raw commits while performing
basic graph walks. For example, we do not need to parse the full commit
when performing these walks:

* 'git log --topo-order -1000' walks all reachable commits to avoid
  incorrect topological orders, but only needs the commit message for
  the top 1000 commits.

* 'git merge-base <A> <B>' may walk many commits to find the correct
  boundary between the commits reachable from A and those reachable
  from B. No commit messages are needed.

* 'git branch -vv' checks ahead/behind status for all local branches
  compared to their upstream remote branches. This is essentially as
  hard as computing merge bases for each.

The current patch speeds up these calculations by injecting a check in
parse_commit_gently() to check if there is a graph file and using that
to provide the required metadata to the struct commit.

The file format has room to store generation numbers, which will be
provided as a patch after this framework is merged. Generation numbers
are referenced by the design document but not implemented in order to
make the current patch focus on the graph construction process. Once
that is stable, it will be easier to add generation numbers and make
graph walks aware of generation numbers one-by-one.

Here are some performance results for a copy of the Linux repository
where 'master' has 664,185 reachable commits and is behind 'origin/master'
by 60,191 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  6.56s |  0.66s | -89%  |
| branch -vv                       |  1.35s |  0.32s | -76%  |
| rev-list --all                   |  6.7s  |  0.83s | -87%  |
| rev-list --all --objects         | 33.0s  | 27.5s  | -16%  |

To test this yourself, run the following on your repo:

  git config core.commitGraph true
  git show-ref -s | git commit-graph write --stdin-commits

The second command writes a commit graph file containing every commit
reachable from your refs. Now, all git commands that walk commits will
check your graph first before consulting the ODB. You can run your own
performance comparisons by toggling the 'core.commitGraph' setting.

[1] https://github.com/derrickstolee/git/pull/2
    A GitHub pull request containing the latest version of this patch.

Derrick Stolee (13):
  commit-graph: add format document
  graph: add commit graph design document
  commit-graph: create git-commit-graph builtin
  csum-file: add CSUM_KEEP_OPEN flag
  commit-graph: implement write_commit_graph()
  commit-graph: implement 'git-commit-graph write'
  commit-graph: implement git commit-graph read
  commit-graph: add core.commitGraph setting
  commit-graph: close under reachability
  commit: integrate commit graph with commit parsing
  commit-graph: read only from specific pack-indexes
  commit-graph: build graph from starting commits
  commit-graph: implement "--additive" option

 .gitignore                                    |   1 +
 Documentation/config.txt                      |   3 +
 Documentation/git-commit-graph.txt            |  93 +++
 .../technical/commit-graph-format.txt         |  98 +++
 Documentation/technical/commit-graph.txt      | 164 ++++
 Makefile                                      |   2 +
 alloc.c                                       |   1 +
 builtin.h                                     |   1 +
 builtin/commit-graph.c                        | 172 +++++
 cache.h                                       |   1 +
 command-list.txt                              |   1 +
 commit-graph.c                                | 720 ++++++++++++++++++
 commit-graph.h                                |  47 ++
 commit.c                                      |   3 +
 commit.h                                      |   3 +
 config.c                                      |   5 +
 contrib/completion/git-completion.bash        |   2 +
 csum-file.c                                   |  10 +-
 csum-file.h                                   |   1 +
 environment.c                                 |   1 +
 git.c                                         |   1 +
 packfile.c                                    |   4 +-
 packfile.h                                    |   2 +
 t/t5318-commit-graph.sh                       | 225 ++++++
 24 files changed, 1556 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 Documentation/technical/commit-graph-format.txt
 create mode 100644 Documentation/technical/commit-graph.txt
 create mode 100644 builtin/commit-graph.c
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h
 create mode 100755 t/t5318-commit-graph.sh

-- 
2.16.2.282.g5029fe8.dirty


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v5 01/13] commit-graph: add format document
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
@ 2018-02-27  2:32 ` Derrick Stolee
  2018-02-27  2:32 ` [PATCH v5 02/13] graph: add commit graph design document Derrick Stolee
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:32 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Add document specifying the binary format for commit graphs. This
format allows for:

* New versions.
* New hash functions and hash lengths.
* Optional extensions.

Basic header information is followed by a binary table of contents
into "chunks" that include:

* An ordered list of commit object IDs.
* A 256-entry fanout into that list of OIDs.
* A list of metadata for the commits.
* A list of "large edges" to enable octopus merges.

The format automatically includes two parent positions for every
commit. This favors speed over space, since using only one position
per commit would cause an extra level of indirection for every merge
commit. (Octopus merges suffer from this indirection, but they are
very rare.)

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph-format.txt | 98 +++++++++++++++++++++++++
 1 file changed, 98 insertions(+)
 create mode 100644 Documentation/technical/commit-graph-format.txt

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
new file mode 100644
index 0000000..4402baa
--- /dev/null
+++ b/Documentation/technical/commit-graph-format.txt
@@ -0,0 +1,98 @@
+Git commit graph format
+=======================
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+  generation number 1; commits with parents have generation number
+  one more than the maximum generation number of its parents. We
+  reserve zero as special, and can be used to mark a generation
+  number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+  the graph file.
+
+These positional references are stored as 32-bit integers corresponding to
+the array position withing the list of commit OIDs. We use the most-significant
+bit for special purposes, so we can store at most (1 << 31) - 1 (around 2
+billion) commits.
+
+== Commit graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks
+and hash type.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+  4-byte signature:
+      The signature is: {'C', 'G', 'P', 'H'}
+
+  1-byte version number:
+      Currently, the only valid version is 1.
+
+  1-byte Hash Version (1 = SHA-1)
+      We infer the hash length (H) from this value.
+
+  1-byte number (C) of "chunks"
+
+  1-byte (reserved for later use)
+     Current clients should ignore this value.
+
+CHUNK LOOKUP:
+
+  (C + 1) * 12 bytes listing the table of contents for the chunks:
+      First 4 bytes describe the chunk id. Value 0 is a terminating label.
+      Other 8 bytes provide the byte-offset in current file for chunk to
+      start. (Chunks are ordered contiguously in the file, so you can infer
+      the length using the next chunk position if necessary.) Each chunk
+      type appears at most once.
+
+  The remaining data in the body is described one chunk at a time, and
+  these chunks may be given in any order. Chunks are required unless
+  otherwise specified.
+
+CHUNK DATA:
+
+  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+      The ith entry, F[i], stores the number of OIDs with first
+      byte at most i. Thus F[255] stores the total
+      number of commits (N).
+
+  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+      The OIDs for all commits in the graph, sorted in ascending order.
+
+  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+    * The first H bytes are for the OID of the root tree.
+    * The next 8 bytes are for the positions of the first two parents
+      of the ith commit. Stores value 0xffffffff if no parent in that
+      position. If there are more than two parents, the second value
+      has its most-significant bit on and the other bits store an array
+      position into the Large Edge List chunk.
+    * The next 8 bytes store the generation number of the commit and
+      the commit time in seconds since EPOCH. The generation number
+      uses the higher 30 bits of the first 4 bytes, while the commit
+      time uses the 32 bits of the second 4 bytes, along with the lowest
+      2 bits of the lowest byte, storing the 33rd and 34th bit of the
+      commit time.
+
+  Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+      This list of 4-byte values store the second through nth parents for
+      all octopus merges. The second parent value in the commit data stores
+      an array position within this list along with the most-significant bit
+      on. Starting at that array position, iterate through this list of commit
+      positions for the parents until reaching a value with the most-significant
+      bit on. The other bits correspond to the position of the last parent.
+
+TRAILER:
+
+	H-byte HASH-checksum of all of the above.
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v5 02/13] graph: add commit graph design document
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
  2018-02-27  2:32 ` [PATCH v5 01/13] commit-graph: add format document Derrick Stolee
@ 2018-02-27  2:32 ` Derrick Stolee
  2018-02-27  2:32 ` [PATCH v5 03/13] commit-graph: create git-commit-graph builtin Derrick Stolee
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:32 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Add Documentation/technical/commit-graph.txt with details of the planned
commit graph feature, including future plans.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph.txt | 164 +++++++++++++++++++++++++++++++
 1 file changed, 164 insertions(+)
 create mode 100644 Documentation/technical/commit-graph.txt

diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
new file mode 100644
index 0000000..d11753a
--- /dev/null
+++ b/Documentation/technical/commit-graph.txt
@@ -0,0 +1,164 @@
+Git Commit Graph Design Notes
+=============================
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows. The merge
+base calculation shows up in many user-facing commands, such as 'merge-base'
+or 'status' and can take minutes to compute depending on history shape.
+
+There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to satisfy topological order constraints.
+
+The commit graph file is a supplemental data structure that accelerates
+commit graph walks. If a user downgrades or disables the 'core.commitGraph'
+config setting, then the existing ODB is sufficient. The file is stored
+as "commit-graph" either in the .git/objects/info directory or in the info
+directory of an alternate.
+
+The commit graph file stores the commit graph structure along with some
+extra metadata to speed up graph walks. By listing commit OIDs in lexi-
+cographic order, we can identify an integer position for each commit and
+refer to the parents of a commit using those integer positions. We use
+binary search to find initial commits and then use the integer positions
+for fast lookups during the walk.
+
+A consumer may load the following info for a commit from the graph:
+
+1. The commit OID.
+2. The list of parents, along with their integer position.
+3. The commit date.
+4. The root tree OID.
+5. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+Define the "generation number" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has generation number one.
+
+ * A commit with at least one parent has generation number one more than
+   the largest generation number among its parents.
+
+Equivalently, the generation number of a commit A is one more than the
+length of a longest path from A to a root commit. The recursive definition
+is easier to use for computation and observing the following property:
+
+    If A and B are commits with generation numbers N and M, respectively,
+    and N <= M, then A cannot reach B. That is, we know without searching
+    that B is not an ancestor of A because it is further from a root commit
+    than A.
+
+    Conversely, when checking if A is an ancestor of B, then we only need
+    to walk commits until all commits on the walk boundary have generation
+    number at most N. If we walk commits using a priority queue seeded by
+    generation numbers, then we always expand the boundary commit with highest
+    generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+    If A and B are commits with commit time X and Y, respectively, and
+    X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation is allowed to
+violate topological relationships due to clock skew (such as "git log"
+with default order), but is not used when the topological order is
+required (such as merge base calculations, "git log --graph").
+
+In practice, we expect some commits to be created recently and not stored
+in the commit graph. We can treat these commits as having "infinite"
+generation number and walk until reaching commits with known generation
+number.
+
+Design Details
+--------------
+
+- The commit graph file is stored in a file named 'commit-graph' in the
+  .git/objects/info directory. This could be stored in the info directory
+  of an alternate.
+
+- The core.commitGraph config setting must be on to consume graph files.
+
+- The file format includes parameters for the object ID hash function,
+  so a future change of hash algorithm does not require a change in format.
+
+Future Work
+-----------
+
+- The commit graph feature currently does not honor commit grafts. This can
+  be remedied by duplicating or refactoring the current graft logic.
+
+- The 'commit-graph' subcommand does not have a "verify" mode that is
+  necessary for integration with fsck.
+
+- The file format includes room for precomputed generation numbers. These
+  are not currently computed, so all generation numbers will be marked as
+  0 (or "uncomputed"). A later patch will include this calculation.
+
+- After computing and storing generation numbers, we must make graph
+  walks aware of generation numbers to gain the performance benefits they
+  enable. This will mostly be accomplished by swapping a commit-date-ordered
+  priority queue with one ordered by generation number. The following
+  operations are important candidates:
+
+    - paint_down_to_common()
+    - 'log --topo-order'
+
+- Currently, parse_commit_gently() requires filling in the root tree
+  object for a commit. This passes through lookup_tree() and consequently
+  lookup_object(). Also, it calls lookup_commit() when loading the parents.
+  These method calls check the ODB for object existence, even if the
+  consumer does not need the content. For example, we do not need the
+  tree contents when computing merge bases. Now that commit parsing is
+  removed from the computation time, these lookup operations are the
+  slowest operations keeping graph walks from being fast. Consider
+  loading these objects without verifying their existence in the ODB and
+  only loading them fully when consumers need them. Consider a method
+  such as "ensure_tree_loaded(commit)" that fully loads a tree before
+  using commit->tree.
+
+- The current design uses the 'commit-graph' subcommand to generate the graph.
+  When this feature stabilizes enough to recommend to most users, we should
+  add automatic graph writes to common operations that create many commits.
+  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
+  commands.
+
+- A server could provide a commit graph file as part of the network protocol
+  to avoid extra calculations by clients. This feature is only of benefit if
+  the user is willing to trust the file, because verifying the file is correct
+  is as hard as computing it from scratch.
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=8
+    Chromium work item for: Serialized Commit Graph
+
+[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/
+    An abandoned patch that introduced generation numbers.
+
+[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
+    Discussion about generation numbers on commits and how they interact
+    with fsck.
+
+[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/
+    More discussion about generation numbers and not storing them inside
+    commit objects. A valuable quote:
+
+    "I think we should be moving more in the direction of keeping
+     repo-local caches for optimizations. Reachability bitmaps have been
+     a big performance win. I think we should be doing the same with our
+     properties of commits. Not just generation numbers, but making it
+     cheap to access the graph structure without zlib-inflating whole
+     commit objects (i.e., packv4 or something like the "metapacks" I
+     proposed a few years ago)."
+
+[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
+    A patch to remove the ahead-behind calculation from 'status'.
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v5 03/13] commit-graph: create git-commit-graph builtin
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
  2018-02-27  2:32 ` [PATCH v5 01/13] commit-graph: add format document Derrick Stolee
  2018-02-27  2:32 ` [PATCH v5 02/13] graph: add commit graph design document Derrick Stolee
@ 2018-02-27  2:32 ` Derrick Stolee
  2018-02-27  2:32 ` [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag Derrick Stolee
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:32 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Thanks for the help in getting all the details right in setting up a
builtin.

-- >8 --

Teach git the 'commit-graph' builtin that will be used for writing and
reading packed graph files. The current implementation is mostly
empty, except for an '--object-dir' option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                             |  1 +
 Documentation/git-commit-graph.txt     | 11 ++++++++++
 Makefile                               |  1 +
 builtin.h                              |  1 +
 builtin/commit-graph.c                 | 37 ++++++++++++++++++++++++++++++++++
 command-list.txt                       |  1 +
 contrib/completion/git-completion.bash |  2 ++
 git.c                                  |  1 +
 8 files changed, 55 insertions(+)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 builtin/commit-graph.c

diff --git a/.gitignore b/.gitignore
index 833ef3b..e82f901 100644
--- a/.gitignore
+++ b/.gitignore
@@ -34,6 +34,7 @@
 /git-clone
 /git-column
 /git-commit
+/git-commit-graph
 /git-commit-tree
 /git-config
 /git-count-objects
diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
new file mode 100644
index 0000000..5913340
--- /dev/null
+++ b/Documentation/git-commit-graph.txt
@@ -0,0 +1,11 @@
+git-commit-graph(1)
+===================
+
+NAME
+----
+git-commit-graph - Write and verify Git commit graph files
+
+GIT
+---
+Part of the linkgit:git[1] suite
+
diff --git a/Makefile b/Makefile
index c56fdc1..2e4956f 100644
--- a/Makefile
+++ b/Makefile
@@ -946,6 +946,7 @@ BUILTIN_OBJS += builtin/clone.o
 BUILTIN_OBJS += builtin/column.o
 BUILTIN_OBJS += builtin/commit-tree.o
 BUILTIN_OBJS += builtin/commit.o
+BUILTIN_OBJS += builtin/commit-graph.o
 BUILTIN_OBJS += builtin/config.o
 BUILTIN_OBJS += builtin/count-objects.o
 BUILTIN_OBJS += builtin/credential.o
diff --git a/builtin.h b/builtin.h
index 42378f3..079855b 100644
--- a/builtin.h
+++ b/builtin.h
@@ -149,6 +149,7 @@ extern int cmd_clone(int argc, const char **argv, const char *prefix);
 extern int cmd_clean(int argc, const char **argv, const char *prefix);
 extern int cmd_column(int argc, const char **argv, const char *prefix);
 extern int cmd_commit(int argc, const char **argv, const char *prefix);
+extern int cmd_commit_graph(int argc, const char **argv, const char *prefix);
 extern int cmd_commit_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_config(int argc, const char **argv, const char *prefix);
 extern int cmd_count_objects(int argc, const char **argv, const char *prefix);
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
new file mode 100644
index 0000000..8ff7336
--- /dev/null
+++ b/builtin/commit-graph.c
@@ -0,0 +1,37 @@
+#include "builtin.h"
+#include "config.h"
+#include "parse-options.h"
+
+static char const * const builtin_commit_graph_usage[] = {
+	N_("git commit-graph [--object-dir <objdir>]"),
+	NULL
+};
+
+static struct opts_commit_graph {
+	const char *obj_dir;
+} opts;
+
+
+int cmd_commit_graph(int argc, const char **argv, const char *prefix)
+{
+	static struct option builtin_commit_graph_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			N_("dir"),
+			N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(builtin_commit_graph_usage,
+				   builtin_commit_graph_options);
+
+	git_config(git_default_config, NULL);
+	argc = parse_options(argc, argv, prefix,
+			     builtin_commit_graph_options,
+			     builtin_commit_graph_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	usage_with_options(builtin_commit_graph_usage,
+			   builtin_commit_graph_options);
+}
+
diff --git a/command-list.txt b/command-list.txt
index a1fad28..835c589 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -34,6 +34,7 @@ git-clean                               mainporcelain
 git-clone                               mainporcelain           init
 git-column                              purehelpers
 git-commit                              mainporcelain           history
+git-commit-graph                        plumbingmanipulators
 git-commit-tree                         plumbingmanipulators
 git-config                              ancillarymanipulators
 git-count-objects                       ancillaryinterrogators
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 88813e9..b060b6f 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -841,6 +841,7 @@ __git_list_porcelain_commands ()
 		check-ref-format) : plumbing;;
 		checkout-index)   : plumbing;;
 		column)           : internal helper;;
+		commit-graph)     : plumbing;;
 		commit-tree)      : plumbing;;
 		count-objects)    : infrequent;;
 		credential)       : credentials;;
@@ -2419,6 +2420,7 @@ _git_config ()
 		core.bigFileThreshold
 		core.checkStat
 		core.commentChar
+		core.commitGraph
 		core.compression
 		core.createObject
 		core.deltaBaseCacheLimit
diff --git a/git.c b/git.c
index c870b97..c7b5ada 100644
--- a/git.c
+++ b/git.c
@@ -388,6 +388,7 @@ static struct cmd_struct commands[] = {
 	{ "clone", cmd_clone },
 	{ "column", cmd_column, RUN_SETUP_GENTLY },
 	{ "commit", cmd_commit, RUN_SETUP | NEED_WORK_TREE },
+	{ "commit-graph", cmd_commit_graph, RUN_SETUP },
 	{ "commit-tree", cmd_commit_tree, RUN_SETUP },
 	{ "config", cmd_config, RUN_SETUP_GENTLY },
 	{ "count-objects", cmd_count_objects, RUN_SETUP },
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
                   ` (2 preceding siblings ...)
  2018-02-27  2:32 ` [PATCH v5 03/13] commit-graph: create git-commit-graph builtin Derrick Stolee
@ 2018-02-27  2:32 ` Derrick Stolee
  2018-03-12 13:55   ` Derrick Stolee
  2018-02-27  2:32 ` [PATCH v5 05/13] commit-graph: implement write_commit_graph() Derrick Stolee
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:32 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

This patch is new to the series due to the interactions with the lockfile API
and the hashfile API. I need to ensure the hashfile writes the hash value at
the end of the file, but keep the file descriptor open so the lock is valid.

I welcome any susggestions to this patch or to the way I use it in the commit
that follows.

-- >8 --

If we want to use a hashfile on the temporary file for a lockfile, then
we need hashclose() to fully write the trailing hash but also keep the
file descriptor open.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 csum-file.c | 10 +++++++---
 csum-file.h |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/csum-file.c b/csum-file.c
index 5eda7fb..302e6ae 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -66,9 +66,13 @@ int hashclose(struct hashfile *f, unsigned char *result, unsigned int flags)
 		flush(f, f->buffer, the_hash_algo->rawsz);
 		if (flags & CSUM_FSYNC)
 			fsync_or_die(f->fd, f->name);
-		if (close(f->fd))
-			die_errno("%s: sha1 file error on close", f->name);
-		fd = 0;
+		if (flags & CSUM_KEEP_OPEN)
+			fd = f->fd;
+		else {
+			if (close(f->fd))
+				die_errno("%s: sha1 file error on close", f->name);
+			fd = 0;
+		}
 	} else
 		fd = f->fd;
 	if (0 <= f->check_fd) {
diff --git a/csum-file.h b/csum-file.h
index 992e5c0..b7c0e48 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -29,6 +29,7 @@ extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
 /* hashclose flags */
 #define CSUM_CLOSE	1
 #define CSUM_FSYNC	2
+#define CSUM_KEEP_OPEN	4
 
 extern struct hashfile *hashfd(int fd, const char *name);
 extern struct hashfile *hashfd_check(const char *name);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v5 05/13] commit-graph: implement write_commit_graph()
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
                   ` (3 preceding siblings ...)
  2018-02-27  2:32 ` [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag Derrick Stolee
@ 2018-02-27  2:32 ` Derrick Stolee
  2018-02-27  2:33 ` [PATCH v5 06/13] commit-graph: implement 'git-commit-graph write' Derrick Stolee
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:32 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Teach Git to write a commit graph file by checking all packed objects
to see if they are commits, then store the file in the given object
directory.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Makefile       |   1 +
 commit-graph.c | 360 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 commit-graph.h |   7 ++
 3 files changed, 368 insertions(+)
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h

diff --git a/Makefile b/Makefile
index 2e4956f..bf91b2d 100644
--- a/Makefile
+++ b/Makefile
@@ -771,6 +771,7 @@ LIB_OBJS += color.o
 LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
 LIB_OBJS += commit.o
+LIB_OBJS += commit-graph.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
diff --git a/commit-graph.c b/commit-graph.c
new file mode 100644
index 0000000..2251397
--- /dev/null
+++ b/commit-graph.c
@@ -0,0 +1,360 @@
+#include "cache.h"
+#include "config.h"
+#include "git-compat-util.h"
+#include "lockfile.h"
+#include "pack.h"
+#include "packfile.h"
+#include "commit.h"
+#include "object.h"
+#include "revision.h"
+#include "sha1-lookup.h"
+#include "commit-graph.h"
+
+#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
+
+#define GRAPH_DATA_WIDTH 36
+
+#define GRAPH_VERSION_1 0x1
+#define GRAPH_VERSION GRAPH_VERSION_1
+
+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
+#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
+#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
+
+#define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000
+#define GRAPH_PARENT_MISSING 0x7fffffff
+#define GRAPH_EDGE_LAST_MASK 0x7fffffff
+#define GRAPH_PARENT_NONE 0x70000000
+
+#define GRAPH_LAST_EDGE 0x80000000
+
+#define GRAPH_FANOUT_SIZE (4 * 256)
+#define GRAPH_CHUNKLOOKUP_WIDTH 12
+#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
+			GRAPH_OID_LEN + 8)
+
+
+static char *get_commit_graph_filename(const char *obj_dir)
+{
+	return xstrfmt("%s/info/commit-graph", obj_dir);
+}
+
+static void write_graph_chunk_fanout(struct hashfile *f,
+				     struct commit **commits,
+				     int nr_commits)
+{
+	int i, count = 0;
+	struct commit **list = commits;
+
+	/*
+	 * Write the first-level table (the list is sorted,
+	 * but we use a 256-entry lookup to be able to avoid
+	 * having to do eight extra binary search iterations).
+	 */
+	for (i = 0; i < 256; i++) {
+		while (count < nr_commits) {
+			if ((*list)->object.oid.hash[0] != i)
+				break;
+			count++;
+			list++;
+		}
+
+		hashwrite_be32(f, count);
+	}
+}
+
+static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
+				   struct commit **commits, int nr_commits)
+{
+	struct commit **list = commits;
+	int count;
+	for (count = 0; count < nr_commits; count++, list++)
+		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+}
+
+static const unsigned char *commit_to_sha1(size_t index, void *table)
+{
+	struct commit **commits = table;
+	return commits[index]->object.oid.hash;
+}
+
+static void write_graph_chunk_data(struct hashfile *f, int hash_len,
+				   struct commit **commits, int nr_commits)
+{
+	struct commit **list = commits;
+	struct commit **last = commits + nr_commits;
+	uint32_t num_extra_edges = 0;
+
+	while (list < last) {
+		struct commit_list *parent;
+		int edge_value;
+		uint32_t packedDate[2];
+
+		parse_commit(*list);
+		hashwrite(f, (*list)->tree->object.oid.hash, hash_len);
+
+		parent = (*list)->parents;
+
+		if (!parent)
+			edge_value = GRAPH_PARENT_NONE;
+		else {
+			edge_value = sha1_pos(parent->item->object.oid.hash,
+					      commits,
+					      nr_commits,
+					      commit_to_sha1);
+
+			if (edge_value < 0)
+				edge_value = GRAPH_PARENT_MISSING;
+		}
+
+		hashwrite_be32(f, edge_value);
+
+		if (parent)
+			parent = parent->next;
+
+		if (!parent)
+			edge_value = GRAPH_PARENT_NONE;
+		else if (parent->next)
+			edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_extra_edges;
+		else {
+			edge_value = sha1_pos(parent->item->object.oid.hash,
+					      commits,
+					      nr_commits,
+					      commit_to_sha1);
+			if (edge_value < 0)
+				edge_value = GRAPH_PARENT_MISSING;
+		}
+
+		hashwrite_be32(f, edge_value);
+
+		if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
+			do {
+				num_extra_edges++;
+				parent = parent->next;
+			} while (parent);
+		}
+
+		if (sizeof((*list)->date) > 4)
+			packedDate[0] = htonl(((*list)->date >> 32) & 0x3);
+		else
+			packedDate[0] = 0;
+
+		packedDate[1] = htonl((*list)->date);
+		hashwrite(f, packedDate, 8);
+
+		list++;
+	}
+}
+
+static void write_graph_chunk_large_edges(struct hashfile *f,
+					  struct commit **commits,
+					  int nr_commits)
+{
+	struct commit **list = commits;
+	struct commit **last = commits + nr_commits;
+	struct commit_list *parent;
+
+	while (list < last) {
+		int num_parents = 0;
+		for (parent = (*list)->parents; num_parents < 3 && parent;
+		     parent = parent->next)
+			num_parents++;
+
+		if (num_parents <= 2) {
+			list++;
+			continue;
+		}
+
+		/* Since num_parents > 2, this initializer is safe. */
+		for (parent = (*list)->parents->next; parent; parent = parent->next) {
+			int edge_value = sha1_pos(parent->item->object.oid.hash,
+						  commits,
+						  nr_commits,
+						  commit_to_sha1);
+
+			if (edge_value < 0)
+				edge_value = GRAPH_PARENT_MISSING;
+			else if (!parent->next)
+				edge_value |= GRAPH_LAST_EDGE;
+
+			hashwrite_be32(f, edge_value);
+		}
+
+		list++;
+	}
+}
+
+static int commit_compare(const void *_a, const void *_b)
+{
+	const struct object_id *a = (const struct object_id *)_a;
+	const struct object_id *b = (const struct object_id *)_b;
+	return oidcmp(a, b);
+}
+
+struct packed_commit_list {
+	struct commit **list;
+	int nr;
+	int alloc;
+};
+
+struct packed_oid_list {
+	struct object_id *list;
+	int nr;
+	int alloc;
+};
+
+static int add_packed_commits(const struct object_id *oid,
+			      struct packed_git *pack,
+			      uint32_t pos,
+			      void *data)
+{
+	struct packed_oid_list *list = (struct packed_oid_list*)data;
+	enum object_type type;
+	unsigned long size;
+	void *inner_data;
+	off_t offset = nth_packed_object_offset(pack, pos);
+	inner_data = unpack_entry(pack, offset, &type, &size);
+	FREE_AND_NULL(inner_data);
+
+	if (type != OBJ_COMMIT)
+		return 0;
+
+	ALLOC_GROW(list->list, list->nr + 1, list->alloc);
+	oidcpy(&(list->list[list->nr]), oid);
+	(list->nr)++;
+
+	return 0;
+}
+
+void write_commit_graph(const char *obj_dir)
+{
+	struct packed_oid_list oids;
+	struct packed_commit_list commits;
+	struct hashfile *f;
+	uint32_t i, count_distinct = 0;
+	unsigned char final_hash[GIT_MAX_RAWSZ];
+	char *graph_name;
+	int fd;
+	struct lock_file lk = LOCK_INIT;
+	uint32_t chunk_ids[5];
+	uint64_t chunk_offsets[5];
+	int num_chunks;
+	int num_extra_edges;
+	struct commit_list *parent;
+
+	oids.nr = 0;
+	oids.alloc = approximate_object_count() / 4;
+
+	if (oids.alloc < 1024)
+		oids.alloc = 1024;
+	ALLOC_ARRAY(oids.list, oids.alloc);
+
+	for_each_packed_object(add_packed_commits, &oids, 0);
+
+	QSORT(oids.list, oids.nr, commit_compare);
+
+	count_distinct = 1;
+	for (i = 1; i < oids.nr; i++) {
+		if (oidcmp(&oids.list[i-1], &oids.list[i]))
+			count_distinct++;
+	}
+
+	if (count_distinct >= GRAPH_PARENT_MISSING)
+		die(_("the commit graph format cannot write %d commits"), count_distinct);
+
+	commits.nr = 0;
+	commits.alloc = count_distinct;
+	ALLOC_ARRAY(commits.list, commits.alloc);
+
+	num_extra_edges = 0;
+	for (i = 0; i < oids.nr; i++) {
+		int num_parents = 0;
+		if (i > 0 && !oidcmp(&oids.list[i-1], &oids.list[i]))
+			continue;
+
+		commits.list[commits.nr] = lookup_commit(&oids.list[i]);
+		parse_commit(commits.list[commits.nr]);
+
+		for (parent = commits.list[commits.nr]->parents;
+		     parent; parent = parent->next)
+			num_parents++;
+
+		if (num_parents > 2)
+			num_extra_edges += num_parents - 1;
+
+		commits.nr++;
+	}
+	num_chunks = num_extra_edges ? 4 : 3;
+
+	if (commits.nr >= GRAPH_PARENT_MISSING)
+		die(_("too many commits to write graph"));
+
+	graph_name = get_commit_graph_filename(obj_dir);
+	fd = hold_lock_file_for_update(&lk, graph_name, 0);
+
+	if (fd < 0) {
+		struct strbuf folder = STRBUF_INIT;
+		strbuf_addstr(&folder, graph_name);
+		strbuf_setlen(&folder, strrchr(folder.buf, '/') - folder.buf);
+
+		if (mkdir(folder.buf, 0777) < 0)
+			die_errno(_("cannot mkdir %s"), folder.buf);
+		strbuf_release(&folder);
+
+		fd = hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
+
+		if (fd < 0)
+			die_errno("unable to create '%s'", graph_name);
+	}
+
+	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+
+	hashwrite_be32(f, GRAPH_SIGNATURE);
+
+	hashwrite_u8(f, GRAPH_VERSION);
+	hashwrite_u8(f, GRAPH_OID_VERSION);
+	hashwrite_u8(f, num_chunks);
+	hashwrite_u8(f, 0); /* unused padding byte */
+
+	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
+	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
+	chunk_ids[2] = GRAPH_CHUNKID_DATA;
+	if (num_extra_edges)
+		chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
+	else
+		chunk_ids[3] = 0;
+	chunk_ids[4] = 0;
+
+	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
+	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
+	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
+	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+
+	for (i = 0; i <= num_chunks; i++) {
+		uint32_t chunk_write[3];
+
+		chunk_write[0] = htonl(chunk_ids[i]);
+		chunk_write[1] = htonl(chunk_offsets[i] >> 32);
+		chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
+		hashwrite(f, chunk_write, 12);
+	}
+
+	write_graph_chunk_fanout(f, commits.list, commits.nr);
+	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	write_graph_chunk_large_edges(f, commits.list, commits.nr);
+
+	hashclose(f, final_hash, CSUM_CLOSE | CSUM_FSYNC | CSUM_KEEP_OPEN);
+	commit_lock_file(&lk);
+
+	free(oids.list);
+	oids.alloc = 0;
+	oids.nr = 0;
+}
+
diff --git a/commit-graph.h b/commit-graph.h
new file mode 100644
index 0000000..4cb3f12
--- /dev/null
+++ b/commit-graph.h
@@ -0,0 +1,7 @@
+#ifndef COMMIT_GRAPH_H
+#define COMMIT_GRAPH_H
+
+void write_commit_graph(const char *obj_dir);
+
+#endif
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v5 06/13] commit-graph: implement 'git-commit-graph write'
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
                   ` (4 preceding siblings ...)
  2018-02-27  2:32 ` [PATCH v5 05/13] commit-graph: implement write_commit_graph() Derrick Stolee
@ 2018-02-27  2:33 ` Derrick Stolee
  2018-02-27  2:33 ` [PATCH v5 07/13] commit-graph: implement git commit-graph read Derrick Stolee
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:33 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Teach git-commit-graph to write graph files. Create new test script to verify
this command succeeds without failure.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  39 ++++++++++++
 builtin/commit-graph.c             |  33 ++++++++++
 t/t5318-commit-graph.sh            | 125 +++++++++++++++++++++++++++++++++++++
 3 files changed, 197 insertions(+)
 create mode 100755 t/t5318-commit-graph.sh

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 5913340..e688843 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -5,6 +5,45 @@ NAME
 ----
 git-commit-graph - Write and verify Git commit graph files
 
+
+SYNOPSIS
+--------
+[verse]
+'git commit-graph write' <options> [--object-dir <dir>]
+
+
+DESCRIPTION
+-----------
+
+Manage the serialized commit graph file.
+
+
+OPTIONS
+-------
+--object-dir::
+	Use given directory for the location of packfiles and commit graph
+	file. The commit graph file is expected to be at <dir>/info/commit-graph
+	and the packfiles are expected to be in <dir>/pack.
+
+
+COMMANDS
+--------
+'write'::
+
+Write a commit graph file based on the commits found in packfiles.
+Includes all commits from the existing commit graph file.
+
+
+EXAMPLES
+--------
+
+* Write a commit graph file for the packed commits in your local .git folder.
++
+------------------------------------------------
+$ git commit-graph write
+------------------------------------------------
+
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 8ff7336..a9d61f6 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -1,9 +1,18 @@
 #include "builtin.h"
 #include "config.h"
+#include "dir.h"
+#include "lockfile.h"
 #include "parse-options.h"
+#include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
+	N_("git commit-graph write [--object-dir <objdir>]"),
+	NULL
+};
+
+static const char * const builtin_commit_graph_write_usage[] = {
+	N_("git commit-graph write [--object-dir <objdir>]"),
 	NULL
 };
 
@@ -11,6 +20,25 @@ static struct opts_commit_graph {
 	const char *obj_dir;
 } opts;
 
+static int graph_write(int argc, const char **argv)
+{
+	static struct option builtin_commit_graph_write_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			N_("dir"),
+			N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_commit_graph_write_options,
+			     builtin_commit_graph_write_usage, 0);
+
+	if (!opts.obj_dir)
+		opts.obj_dir = get_object_directory();
+
+	write_commit_graph(opts.obj_dir);
+	return 0;
+}
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 {
@@ -31,6 +59,11 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 			     builtin_commit_graph_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
+	if (argc > 0) {
+		if (!strcmp(argv[0], "write"))
+			return graph_write(argc, argv);
+	}
+
 	usage_with_options(builtin_commit_graph_usage,
 			   builtin_commit_graph_options);
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
new file mode 100755
index 0000000..7524d2a
--- /dev/null
+++ b/t/t5318-commit-graph.sh
@@ -0,0 +1,125 @@
+#!/bin/sh
+
+test_description='commit graph'
+. ./test-lib.sh
+
+test_expect_success 'setup full repo' '
+	mkdir full &&
+	cd "$TRASH_DIRECTORY/full" &&
+	git init &&
+	objdir=".git/objects"
+'
+
+test_expect_success 'write graph with no packs' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write --object-dir . &&
+	test_path_is_file info/commit-graph
+'
+
+test_expect_success 'create commits and repack' '
+	cd "$TRASH_DIRECTORY/full" &&
+	for i in $(test_seq 3)
+	do
+		test_commit $i &&
+		git branch commits/$i
+	done &&
+	git repack
+'
+
+test_expect_success 'write graph' '
+	cd "$TRASH_DIRECTORY/full" &&
+	graph1=$(git commit-graph write) &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'Add more commits' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git reset --hard commits/1 &&
+	for i in $(test_seq 4 5)
+	do
+		test_commit $i &&
+		git branch commits/$i
+	done &&
+	git reset --hard commits/2 &&
+	for i in $(test_seq 6 7)
+	do
+		test_commit $i &&
+		git branch commits/$i
+	done &&
+	git reset --hard commits/2 &&
+	git merge commits/4 &&
+	git branch merge/1 &&
+	git reset --hard commits/4 &&
+	git merge commits/6 &&
+	git branch merge/2 &&
+	git reset --hard commits/3 &&
+	git merge commits/5 commits/7 &&
+	git branch merge/3 &&
+	git repack
+'
+
+# Current graph structure:
+#
+#   __M3___
+#  /   |   \
+# 3 M1 5 M2 7
+# |/  \|/  \|
+# 2    4    6
+# |___/____/
+# 1
+
+
+test_expect_success 'write graph with merges' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'Add one more commit' '
+	cd "$TRASH_DIRECTORY/full" &&
+	test_commit 8 &&
+	git branch commits/8 &&
+	ls $objdir/pack | grep idx >existing-idx &&
+	git repack &&
+	ls $objdir/pack| grep idx | grep -v --file=existing-idx >new-idx
+'
+
+# Current graph structure:
+#
+#      8
+#      |
+#   __M3___
+#  /   |   \
+# 3 M1 5 M2 7
+# |/  \|/  \|
+# 2    4    6
+# |___/____/
+# 1
+
+test_expect_success 'write graph with new commit' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'write graph with nothing new' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'setup bare repo' '
+	cd "$TRASH_DIRECTORY" &&
+	git clone --bare --no-local full bare &&
+	cd bare &&
+	baredir="./objects"
+'
+
+test_expect_success 'write graph in bare repo' '
+	cd "$TRASH_DIRECTORY/bare" &&
+	git commit-graph write &&
+	test_path_is_file $baredir/info/commit-graph
+'
+
+test_done
+
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v5 07/13] commit-graph: implement git commit-graph read
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
                   ` (5 preceding siblings ...)
  2018-02-27  2:33 ` [PATCH v5 06/13] commit-graph: implement 'git-commit-graph write' Derrick Stolee
@ 2018-02-27  2:33 ` Derrick Stolee
  2018-02-27  2:33 ` [PATCH v5 08/13] commit-graph: add core.commitGraph setting Derrick Stolee
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:33 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Teach git-commit-graph to read commit graph files and summarize their contents.

Use the read subcommand to verify the contents of a commit graph file in the
tests.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  12 ++++
 builtin/commit-graph.c             |  56 +++++++++++++++
 commit-graph.c                     | 140 ++++++++++++++++++++++++++++++++++++-
 commit-graph.h                     |  23 ++++++
 t/t5318-commit-graph.sh            |  32 +++++++--
 5 files changed, 257 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index e688843..51cb038 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -9,6 +9,7 @@ git-commit-graph - Write and verify Git commit graph files
 SYNOPSIS
 --------
 [verse]
+'git commit-graph read' <options> [--object-dir <dir>]
 'git commit-graph write' <options> [--object-dir <dir>]
 
 
@@ -33,6 +34,11 @@ COMMANDS
 Write a commit graph file based on the commits found in packfiles.
 Includes all commits from the existing commit graph file.
 
+'read'::
+
+Read a graph file given by the commit-graph file and output basic
+details about the graph file. Used for debugging purposes.
+
 
 EXAMPLES
 --------
@@ -43,6 +49,12 @@ EXAMPLES
 $ git commit-graph write
 ------------------------------------------------
 
+* Read basic information from the commit-graph file.
++
+------------------------------------------------
+$ git commit-graph read
+------------------------------------------------
+
 
 GIT
 ---
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index a9d61f6..0e164be 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -7,10 +7,16 @@
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
+	N_("git commit-graph read [--object-dir <objdir>]"),
 	N_("git commit-graph write [--object-dir <objdir>]"),
 	NULL
 };
 
+static const char * const builtin_commit_graph_read_usage[] = {
+	N_("git commit-graph read [--object-dir <objdir>]"),
+	NULL
+};
+
 static const char * const builtin_commit_graph_write_usage[] = {
 	N_("git commit-graph write [--object-dir <objdir>]"),
 	NULL
@@ -20,6 +26,54 @@ static struct opts_commit_graph {
 	const char *obj_dir;
 } opts;
 
+static int graph_read(int argc, const char **argv)
+{
+	struct commit_graph *graph = 0;
+	char *graph_name;
+
+	static struct option builtin_commit_graph_read_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			N_("dir"),
+			N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_commit_graph_read_options,
+			     builtin_commit_graph_read_usage, 0);
+
+	if (!opts.obj_dir)
+		opts.obj_dir = get_object_directory();
+
+	graph_name = get_commit_graph_filename(opts.obj_dir);
+	graph = load_commit_graph_one(graph_name);
+
+	if (!graph)
+		die("graph file %s does not exist", graph_name);
+	FREE_AND_NULL(graph_name);
+
+	printf("header: %08x %d %d %d %d\n",
+		ntohl(*(uint32_t*)graph->data),
+		*(unsigned char*)(graph->data + 4),
+		*(unsigned char*)(graph->data + 5),
+		*(unsigned char*)(graph->data + 6),
+		*(unsigned char*)(graph->data + 7));
+	printf("num_commits: %u\n", graph->num_commits);
+	printf("chunks:");
+
+	if (graph->chunk_oid_fanout)
+		printf(" oid_fanout");
+	if (graph->chunk_oid_lookup)
+		printf(" oid_lookup");
+	if (graph->chunk_commit_data)
+		printf(" commit_metadata");
+	if (graph->chunk_large_edges)
+		printf(" large_edges");
+	printf("\n");
+
+	return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
 	static struct option builtin_commit_graph_write_options[] = {
@@ -60,6 +114,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	if (argc > 0) {
+		if (!strcmp(argv[0], "read"))
+			return graph_read(argc, argv);
 		if (!strcmp(argv[0], "write"))
 			return graph_write(argc, argv);
 	}
diff --git a/commit-graph.c b/commit-graph.c
index 2251397..7b0cfb4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -39,11 +39,149 @@
 			GRAPH_OID_LEN + 8)
 
 
-static char *get_commit_graph_filename(const char *obj_dir)
+char *get_commit_graph_filename(const char *obj_dir)
 {
 	return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static struct commit_graph *alloc_commit_graph(void)
+{
+	struct commit_graph *g = xmalloc(sizeof(*g));
+	memset(g, 0, sizeof(*g));
+	g->graph_fd = -1;
+
+	return g;
+}
+
+struct commit_graph *load_commit_graph_one(const char *graph_file)
+{
+	void *graph_map;
+	const unsigned char *data, *chunk_lookup;
+	size_t graph_size;
+	struct stat st;
+	uint32_t i;
+	struct commit_graph *graph;
+	int fd = git_open(graph_file);
+	uint64_t last_chunk_offset;
+	uint32_t last_chunk_id;
+	uint32_t graph_signature;
+	unsigned char graph_version, hash_version;
+
+	if (fd < 0)
+		return NULL;
+	if (fstat(fd, &st)) {
+		close(fd);
+		return NULL;
+	}
+	graph_size = xsize_t(st.st_size);
+
+	if (graph_size < GRAPH_MIN_SIZE) {
+		close(fd);
+		die("graph file %s is too small", graph_file);
+	}
+	graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	data = (const unsigned char *)graph_map;
+
+	graph_signature = ntohl(*(uint32_t*)data);
+	if (graph_signature != GRAPH_SIGNATURE) {
+		error("graph signature %X does not match signature %X",
+		      graph_signature, GRAPH_SIGNATURE);
+		goto cleanup_fail;
+	}
+
+	graph_version = *(unsigned char*)(data + 4);
+	if (graph_version != GRAPH_VERSION) {
+		error("graph version %X does not match version %X",
+		      graph_version, GRAPH_VERSION);
+		goto cleanup_fail;
+	}
+
+	hash_version = *(unsigned char*)(data + 5);
+	if (hash_version != GRAPH_OID_VERSION) {
+		error("hash version %X does not match version %X",
+		      hash_version, GRAPH_OID_VERSION);
+		goto cleanup_fail;
+	}
+
+	graph = alloc_commit_graph();
+
+	graph->hash_len = GRAPH_OID_LEN;
+	graph->num_chunks = *(unsigned char*)(data + 6);
+	graph->graph_fd = fd;
+	graph->data = graph_map;
+	graph->data_len = graph_size;
+
+	last_chunk_id = 0;
+	last_chunk_offset = 8;
+	chunk_lookup = data + 8;
+	for (i = 0; i < graph->num_chunks; i++) {
+		uint32_t chunk_id = get_be32(chunk_lookup + 0);
+		uint64_t chunk_offset1 = get_be32(chunk_lookup + 4);
+		uint32_t chunk_offset2 = get_be32(chunk_lookup + 8);
+		uint64_t chunk_offset = (chunk_offset1 << 32) | chunk_offset2;
+		int chunk_repeated = 0;
+
+		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
+
+		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+			error("improper chunk offset %08x%08x", (uint32_t)(chunk_offset >> 32),
+			      (uint32_t)chunk_offset);
+			goto cleanup_fail;
+		}
+
+		switch (chunk_id) {
+		case GRAPH_CHUNKID_OIDFANOUT:
+			if (graph->chunk_oid_fanout)
+				chunk_repeated = 1;
+			else
+				graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);
+			break;
+
+		case GRAPH_CHUNKID_OIDLOOKUP:
+			if (graph->chunk_oid_lookup)
+				chunk_repeated = 1;
+			else
+				graph->chunk_oid_lookup = data + chunk_offset;
+			break;
+
+		case GRAPH_CHUNKID_DATA:
+			if (graph->chunk_commit_data)
+				chunk_repeated = 1;
+			else
+				graph->chunk_commit_data = data + chunk_offset;
+			break;
+
+		case GRAPH_CHUNKID_LARGEEDGES:
+			if (graph->chunk_large_edges)
+				chunk_repeated = 1;
+			else
+				graph->chunk_large_edges = data + chunk_offset;
+			break;
+		}
+
+		if (chunk_repeated) {
+			error("chunk id %08x appears multiple times", chunk_id);
+			goto cleanup_fail;
+		}
+
+		if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
+		{
+			graph->num_commits = (chunk_offset - last_chunk_offset)
+					     / graph->hash_len;
+		}
+
+		last_chunk_id = chunk_id;
+		last_chunk_offset = chunk_offset;
+	}
+
+	return graph;
+
+cleanup_fail:
+	munmap(graph_map, graph_size);
+	close(fd);
+	exit(1);
+}
+
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
 				     int nr_commits)
diff --git a/commit-graph.h b/commit-graph.h
index 4cb3f12..8b4b0f9 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -1,6 +1,29 @@
 #ifndef COMMIT_GRAPH_H
 #define COMMIT_GRAPH_H
 
+#include "git-compat-util.h"
+
+char *get_commit_graph_filename(const char *obj_dir);
+
+struct commit_graph {
+	int graph_fd;
+
+	const unsigned char *data;
+	size_t data_len;
+
+	unsigned char hash_len;
+	unsigned char num_chunks;
+	uint32_t num_commits;
+	struct object_id oid;
+
+	const uint32_t *chunk_oid_fanout;
+	const unsigned char *chunk_oid_lookup;
+	const unsigned char *chunk_commit_data;
+	const unsigned char *chunk_large_edges;
+};
+
+struct commit_graph *load_commit_graph_one(const char *graph_file);
+
 void write_commit_graph(const char *obj_dir);
 
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 7524d2a..0085e23 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -26,10 +26,28 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
+graph_read_expect() {
+	OPTIONAL=""
+	NUM_CHUNKS=3
+	if test ! -z $2
+	then
+		OPTIONAL=" $2"
+		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
+	fi
+	cat >expect <<- EOF
+	header: 43475048 1 1 $NUM_CHUNKS 0
+	num_commits: $1
+	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
+	EOF
+	git commit-graph read >output &&
+	test_cmp expect output
+}
+
 test_expect_success 'write graph' '
 	cd "$TRASH_DIRECTORY/full" &&
 	graph1=$(git commit-graph write) &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "3"
 '
 
 test_expect_success 'Add more commits' '
@@ -72,7 +90,8 @@ test_expect_success 'Add more commits' '
 test_expect_success 'write graph with merges' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "10" "large_edges"
 '
 
 test_expect_success 'Add one more commit' '
@@ -99,13 +118,15 @@ test_expect_success 'Add one more commit' '
 test_expect_success 'write graph with new commit' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
 '
 
 test_expect_success 'write graph with nothing new' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
 '
 
 test_expect_success 'setup bare repo' '
@@ -118,7 +139,8 @@ test_expect_success 'setup bare repo' '
 test_expect_success 'write graph in bare repo' '
 	cd "$TRASH_DIRECTORY/bare" &&
 	git commit-graph write &&
-	test_path_is_file $baredir/info/commit-graph
+	test_path_is_file $baredir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
 '
 
 test_done
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v5 08/13] commit-graph: add core.commitGraph setting
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
                   ` (6 preceding siblings ...)
  2018-02-27  2:33 ` [PATCH v5 07/13] commit-graph: implement git commit-graph read Derrick Stolee
@ 2018-02-27  2:33 ` Derrick Stolee
  2018-02-27  2:33 ` [PATCH v5 09/13] commit-graph: close under reachability Derrick Stolee
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:33 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

The commit graph feature is controlled by the new core.commitGraph config
setting. This defaults to 0, so the feature is opt-in.

The intention of core.commitGraph is that a user can always stop checking
for or parsing commit graph files if core.commitGraph=0.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt | 3 +++
 cache.h                  | 1 +
 config.c                 | 5 +++++
 environment.c            | 1 +
 4 files changed, 10 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index f57e9cf..77fcd53 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -898,6 +898,9 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
+core.commitGraph::
+	Enable git commit graph feature. Allows reading from .graph files.
+
 core.sparseCheckout::
 	Enable "sparse checkout" feature. See section "Sparse checkout" in
 	linkgit:git-read-tree[1] for more information.
diff --git a/cache.h b/cache.h
index 21fbcc2..3a18a02 100644
--- a/cache.h
+++ b/cache.h
@@ -801,6 +801,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_commit_graph;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index b0c20e6..25ee4a6 100644
--- a/config.c
+++ b/config.c
@@ -1226,6 +1226,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.commitgraph")) {
+		core_commit_graph = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.sparsecheckout")) {
 		core_apply_sparse_checkout = git_config_bool(var, value);
 		return 0;
diff --git a/environment.c b/environment.c
index de8431e..0e96be3 100644
--- a/environment.c
+++ b/environment.c
@@ -62,6 +62,7 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED;
 enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
+int core_commit_graph;
 int core_apply_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v5 09/13] commit-graph: close under reachability
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
                   ` (7 preceding siblings ...)
  2018-02-27  2:33 ` [PATCH v5 08/13] commit-graph: add core.commitGraph setting Derrick Stolee
@ 2018-02-27  2:33 ` Derrick Stolee
  2018-02-27  2:33 ` [PATCH v5 10/13] commit: integrate commit graph with commit parsing Derrick Stolee
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:33 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Teach write_commit_graph() to walk all parents from the commits
discovered in packfiles. This prevents gaps given by loose objects or
previously-missed packfiles.

Also automatically add commits from the existing graph file, if it
exists.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 7b0cfb4..01aa23d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -369,6 +369,28 @@ static int add_packed_commits(const struct object_id *oid,
 	return 0;
 }
 
+static void close_reachable(struct packed_oid_list *oids)
+{
+	int i;
+	struct rev_info revs;
+	struct commit *commit;
+	init_revisions(&revs, NULL);
+	for (i = 0; i < oids->nr; i++) {
+		commit = lookup_commit(&oids->list[i]);
+		if (commit && !parse_commit(commit))
+			revs.commits = commit_list_insert(commit, &revs.commits);
+	}
+
+	if (prepare_revision_walk(&revs))
+		die(_("revision walk setup failed"));
+
+	while ((commit = get_revision(&revs)) != NULL) {
+		ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc);
+		oidcpy(&oids->list[oids->nr], &(commit->object.oid));
+		(oids->nr)++;
+	}
+}
+
 void write_commit_graph(const char *obj_dir)
 {
 	struct packed_oid_list oids;
@@ -393,6 +415,7 @@ void write_commit_graph(const char *obj_dir)
 	ALLOC_ARRAY(oids.list, oids.alloc);
 
 	for_each_packed_object(add_packed_commits, &oids, 0);
+	close_reachable(&oids);
 
 	QSORT(oids.list, oids.nr, commit_compare);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v5 10/13] commit: integrate commit graph with commit parsing
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
                   ` (8 preceding siblings ...)
  2018-02-27  2:33 ` [PATCH v5 09/13] commit-graph: close under reachability Derrick Stolee
@ 2018-02-27  2:33 ` Derrick Stolee
  2018-02-27  2:33 ` [PATCH v5 11/13] commit-graph: read only from specific pack-indexes Derrick Stolee
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:33 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Teach Git to inspect a commit graph file to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date.

If core.commitGraph is false, then do not check graph files.

In test script t5318-commit-graph.sh, add output-matching conditions on
read-only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 664,185
reachable commits and is behind 'origin/master' by 60,191 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  6.56s |  0.66s | -89%  |
| branch -vv                       |  1.35s |  0.32s | -76%  |
| rev-list --all                   |  6.7s  |  0.83s | -87%  |
| rev-list --all --objects         | 33.0s  | 27.5s  | -16%  |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 alloc.c                 |   1 +
 commit-graph.c          | 141 +++++++++++++++++++++++++++++++++++++++++++++++-
 commit-graph.h          |  12 +++++
 commit.c                |   3 ++
 commit.h                |   3 ++
 t/t5318-commit-graph.sh |  47 +++++++++++++++-
 6 files changed, 205 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadf..cf4f8b6 100644
--- a/alloc.c
+++ b/alloc.c
@@ -93,6 +93,7 @@ void *alloc_commit_node(void)
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
 	c->index = alloc_commit_index();
+	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
 	return c;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 01aa23d..184b8da 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -38,7 +38,6 @@
 #define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
 			GRAPH_OID_LEN + 8)
 
-
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	return xstrfmt("%s/info/commit-graph", obj_dir);
@@ -182,6 +181,145 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	exit(1);
 }
 
+/* global storage */
+struct commit_graph *commit_graph = NULL;
+
+static void prepare_commit_graph_one(const char *obj_dir)
+{
+	char *graph_name;
+
+	if (commit_graph)
+		return;
+
+	graph_name = get_commit_graph_filename(obj_dir);
+	commit_graph = load_commit_graph_one(graph_name);
+
+	FREE_AND_NULL(graph_name);
+}
+
+static int prepare_commit_graph_run_once = 0;
+static void prepare_commit_graph(void)
+{
+	struct alternate_object_database *alt;
+	char *obj_dir;
+
+	if (prepare_commit_graph_run_once)
+		return;
+	prepare_commit_graph_run_once = 1;
+
+	obj_dir = get_object_directory();
+	prepare_commit_graph_one(obj_dir);
+	prepare_alt_odb();
+	for (alt = alt_odb_list; !commit_graph && alt; alt = alt->next)
+		prepare_commit_graph_one(alt->path);
+}
+
+static void close_commit_graph(void)
+{
+	if (!commit_graph)
+		return;
+
+	if (commit_graph->graph_fd >= 0) {
+		munmap((void *)commit_graph->data, commit_graph->data_len);
+		commit_graph->data = NULL;
+		close(commit_graph->graph_fd);
+	}
+
+	FREE_AND_NULL(commit_graph);
+}
+
+static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos)
+{
+	return bsearch_hash(oid->hash, g->chunk_oid_fanout,
+			    g->chunk_oid_lookup, g->hash_len, pos);
+}
+
+static struct commit_list **insert_parent_or_die(struct commit_graph *g,
+						 uint64_t pos,
+						 struct commit_list **pptr)
+{
+	struct commit *c;
+	struct object_id oid;
+	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
+	c = lookup_commit(&oid);
+	if (!c)
+		die("could not find commit %s", oid_to_hex(&oid));
+	c->graph_pos = pos;
+	return &commit_list_insert(c, pptr)->next;
+}
+
+static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos)
+{
+	struct object_id oid;
+	uint32_t edge_value;
+	uint32_t *parent_data_ptr;
+	uint64_t date_low, date_high;
+	struct commit_list **pptr;
+	const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos;
+
+	item->object.parsed = 1;
+	item->graph_pos = pos;
+
+	hashcpy(oid.hash, commit_data);
+	item->tree = lookup_tree(&oid);
+
+	date_high = ntohl(*(uint32_t*)(commit_data + g->hash_len + 8)) & 0x3;
+	date_low = ntohl(*(uint32_t*)(commit_data + g->hash_len + 12));
+	item->date = (timestamp_t)((date_high << 32) | date_low);
+
+	pptr = &item->parents;
+
+	edge_value = ntohl(*(uint32_t*)(commit_data + g->hash_len));
+	if (edge_value == GRAPH_PARENT_NONE)
+		return 1;
+	pptr = insert_parent_or_die(g, edge_value, pptr);
+
+	edge_value = ntohl(*(uint32_t*)(commit_data + g->hash_len + 4));
+	if (edge_value == GRAPH_PARENT_NONE)
+		return 1;
+	if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) {
+		pptr = insert_parent_or_die(g, edge_value, pptr);
+		return 1;
+	}
+
+	parent_data_ptr = (uint32_t*)(g->chunk_large_edges +
+			  4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
+	do {
+		edge_value = ntohl(*parent_data_ptr);
+		pptr = insert_parent_or_die(g,
+					    edge_value & GRAPH_EDGE_LAST_MASK,
+					    pptr);
+		parent_data_ptr++;
+	} while (!(edge_value & GRAPH_LAST_EDGE));
+
+	return 1;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+	if (!core_commit_graph)
+		return 0;
+	if (item->object.parsed)
+		return 1;
+
+	prepare_commit_graph();
+	if (commit_graph) {
+		uint32_t pos;
+		int found;
+		if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
+			pos = item->graph_pos;
+			found = 1;
+		} else {
+			found = bsearch_graph(commit_graph, &(item->object.oid), &pos);
+		}
+
+		if (found)
+			return fill_commit_in_graph(item, commit_graph, pos);
+	}
+
+	return 0;
+}
+
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
 				     int nr_commits)
@@ -511,6 +649,7 @@ void write_commit_graph(const char *obj_dir)
 	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
 	write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
+	close_commit_graph();
 	hashclose(f, final_hash, CSUM_CLOSE | CSUM_FSYNC | CSUM_KEEP_OPEN);
 	commit_lock_file(&lk);
 
diff --git a/commit-graph.h b/commit-graph.h
index 8b4b0f9..b223b9b 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -5,6 +5,18 @@
 
 char *get_commit_graph_filename(const char *obj_dir);
 
+/*
+ * Given a commit struct, try to fill the commit struct info, including:
+ *  1. tree object
+ *  2. date
+ *  3. parents.
+ *
+ * Returns 1 if and only if the commit was found in the packed graph.
+ *
+ * See parse_commit_buffer() for the fallback after this call.
+ */
+int parse_commit_in_graph(struct commit *item);
+
 struct commit_graph {
 	int graph_fd;
 
diff --git a/commit.c b/commit.c
index e8a49b9..eb61729 100644
--- a/commit.c
+++ b/commit.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "tag.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "pkt-line.h"
 #include "utf8.h"
 #include "diff.h"
@@ -383,6 +384,8 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 		return -1;
 	if (item->object.parsed)
 		return 0;
+	if (parse_commit_in_graph(item))
+		return 0;
 	buffer = read_sha1_file(item->object.oid.hash, &type, &size);
 	if (!buffer)
 		return quiet_on_missing ? -1 :
diff --git a/commit.h b/commit.h
index 0fb8271..e57ae4b 100644
--- a/commit.h
+++ b/commit.h
@@ -9,6 +9,8 @@
 #include "string-list.h"
 #include "pretty.h"
 
+#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
+
 struct commit_list {
 	struct commit *item;
 	struct commit_list *next;
@@ -21,6 +23,7 @@ struct commit {
 	timestamp_t date;
 	struct commit_list *parents;
 	struct tree *tree;
+	uint32_t graph_pos;
 };
 
 extern int save_commit_buffer;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 0085e23..9a0cd71 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -7,6 +7,7 @@ test_expect_success 'setup full repo' '
 	mkdir full &&
 	cd "$TRASH_DIRECTORY/full" &&
 	git init &&
+	git config core.commitGraph true &&
 	objdir=".git/objects"
 '
 
@@ -26,6 +27,29 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
+graph_git_two_modes() {
+	git -c core.graph=true $1 >output
+	git -c core.graph=false $1 >expect
+	test_cmp output expect
+}
+
+graph_git_behavior() {
+	MSG=$1
+	DIR=$2
+	BRANCH=$3
+	COMPARE=$4
+	test_expect_success "check normal git operations: $MSG" '
+		cd "$TRASH_DIRECTORY/$DIR" &&
+		graph_git_two_modes "log --oneline $BRANCH" &&
+		graph_git_two_modes "log --topo-order $BRANCH" &&
+		graph_git_two_modes "log --graph $COMPARE..$BRANCH" &&
+		graph_git_two_modes "branch -vv" &&
+		graph_git_two_modes "merge-base -a $BRANCH $COMPARE"
+	'
+}
+
+graph_git_behavior 'no graph' full commits/3 commits/1
+
 graph_read_expect() {
 	OPTIONAL=""
 	NUM_CHUNKS=3
@@ -50,6 +74,8 @@ test_expect_success 'write graph' '
 	graph_read_expect "3"
 '
 
+graph_git_behavior 'graph exists' full commits/3 commits/1
+
 test_expect_success 'Add more commits' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git reset --hard commits/1 &&
@@ -86,7 +112,6 @@ test_expect_success 'Add more commits' '
 # |___/____/
 # 1
 
-
 test_expect_success 'write graph with merges' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
@@ -94,6 +119,10 @@ test_expect_success 'write graph with merges' '
 	graph_read_expect "10" "large_edges"
 '
 
+graph_git_behavior 'merge 1 vs 2' full merge/1 merge/2
+graph_git_behavior 'merge 1 vs 3' full merge/1 merge/3
+graph_git_behavior 'merge 2 vs 3' full merge/2 merge/3
+
 test_expect_success 'Add one more commit' '
 	cd "$TRASH_DIRECTORY/full" &&
 	test_commit 8 &&
@@ -115,6 +144,9 @@ test_expect_success 'Add one more commit' '
 # |___/____/
 # 1
 
+graph_git_behavior 'mixed mode, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'mixed mode, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'write graph with new commit' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
@@ -122,6 +154,9 @@ test_expect_success 'write graph with new commit' '
 	graph_read_expect "11" "large_edges"
 '
 
+graph_git_behavior 'full graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'full graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'write graph with nothing new' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
@@ -129,13 +164,20 @@ test_expect_success 'write graph with nothing new' '
 	graph_read_expect "11" "large_edges"
 '
 
+graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
 	cd bare &&
+	git config core.commitGraph true &&
 	baredir="./objects"
 '
 
+graph_git_behavior 'bare repo, commit 8 vs merge 1' bare commits/8 merge/1
+graph_git_behavior 'bare repo, commit 8 vs merge 2' bare commits/8 merge/2
+
 test_expect_success 'write graph in bare repo' '
 	cd "$TRASH_DIRECTORY/bare" &&
 	git commit-graph write &&
@@ -143,5 +185,8 @@ test_expect_success 'write graph in bare repo' '
 	graph_read_expect "11" "large_edges"
 '
 
+graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1
+graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare commits/8 merge/2
+
 test_done
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v5 11/13] commit-graph: read only from specific pack-indexes
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
                   ` (9 preceding siblings ...)
  2018-02-27  2:33 ` [PATCH v5 10/13] commit: integrate commit graph with commit parsing Derrick Stolee
@ 2018-02-27  2:33 ` Derrick Stolee
  2018-02-27 20:15   ` Stefan Beller
  2018-02-27  2:33 ` [PATCH v5 12/13] commit-graph: build graph from starting commits Derrick Stolee
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:33 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Teach git-commit-graph to inspect the objects only in a certain list
of pack-indexes within the given pack directory. This allows updating
the commit graph iteratively.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt | 11 ++++++++++-
 builtin/commit-graph.c             | 33 ++++++++++++++++++++++++++++++---
 commit-graph.c                     | 26 ++++++++++++++++++++++++--
 commit-graph.h                     |  4 +++-
 packfile.c                         |  4 ++--
 packfile.h                         |  2 ++
 t/t5318-commit-graph.sh            | 10 ++++++++++
 7 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 51cb038..b945510 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -32,7 +32,9 @@ COMMANDS
 'write'::
 
 Write a commit graph file based on the commits found in packfiles.
-Includes all commits from the existing commit graph file.
++
+With the `--stdin-packs` option, generate the new commit graph by
+walking objects only in the specified packfiles.
 
 'read'::
 
@@ -49,6 +51,13 @@ EXAMPLES
 $ git commit-graph write
 ------------------------------------------------
 
+* Write a graph file, extending the current graph file using commits
+* in <pack-index>.
++
+------------------------------------------------
+$ echo <pack-index> | git commit-graph write --stdin-packs
+------------------------------------------------
+
 * Read basic information from the commit-graph file.
 +
 ------------------------------------------------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0e164be..eebca57 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
 	NULL
 };
 
@@ -18,12 +18,13 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
 	NULL
 };
 
 static struct opts_commit_graph {
 	const char *obj_dir;
+	int stdin_packs;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -76,10 +77,18 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
+	const char **pack_indexes = NULL;
+	int packs_nr = 0;
+	const char **lines = NULL;
+	int lines_nr = 0;
+	int lines_alloc = 0;
+
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
 			N_("dir"),
 			N_("The object directory to store the graph")),
+		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
+			N_("scan packfiles listed by stdin for commits")),
 		OPT_END(),
 	};
 
@@ -90,7 +99,25 @@ static int graph_write(int argc, const char **argv)
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
-	write_commit_graph(opts.obj_dir);
+	if (opts.stdin_packs) {
+		struct strbuf buf = STRBUF_INIT;
+		lines_nr = 0;
+		lines_alloc = 128;
+		ALLOC_ARRAY(lines, lines_alloc);
+
+		while (strbuf_getline(&buf, stdin) != EOF) {
+			ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
+			lines[lines_nr++] = strbuf_detach(&buf, NULL);
+		}
+
+		pack_indexes = lines;
+		packs_nr = lines_nr;
+	}
+
+	write_commit_graph(opts.obj_dir,
+			   pack_indexes,
+			   packs_nr);
+
 	return 0;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 184b8da..4e9f1d5 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -529,7 +529,9 @@ static void close_reachable(struct packed_oid_list *oids)
 	}
 }
 
-void write_commit_graph(const char *obj_dir)
+void write_commit_graph(const char *obj_dir,
+			const char **pack_indexes,
+			int nr_packs)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -552,7 +554,27 @@ void write_commit_graph(const char *obj_dir)
 		oids.alloc = 1024;
 	ALLOC_ARRAY(oids.list, oids.alloc);
 
-	for_each_packed_object(add_packed_commits, &oids, 0);
+	if (pack_indexes) {
+		struct strbuf packname = STRBUF_INIT;
+		int dirlen;
+		strbuf_addf(&packname, "%s/pack/", obj_dir);
+		dirlen = packname.len;
+		for (i = 0; i < nr_packs; i++) {
+			struct packed_git *p;
+			strbuf_setlen(&packname, dirlen);
+			strbuf_addstr(&packname, pack_indexes[i]);
+			p = add_packed_git(packname.buf, packname.len, 1);
+			if (!p)
+				die("error adding pack %s", packname.buf);
+			if (open_pack_index(p))
+				die("error opening index for %s", packname.buf);
+			for_each_object_in_pack(p, add_packed_commits, &oids);
+			close_pack(p);
+		}
+		strbuf_release(&packname);
+	} else
+		for_each_packed_object(add_packed_commits, &oids, 0);
+
 	close_reachable(&oids);
 
 	QSORT(oids.list, oids.nr, commit_compare);
diff --git a/commit-graph.h b/commit-graph.h
index b223b9b..65fe770 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -36,7 +36,9 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
-void write_commit_graph(const char *obj_dir);
+void write_commit_graph(const char *obj_dir,
+			const char **pack_indexes,
+			int nr_packs);
 
 #endif
 
diff --git a/packfile.c b/packfile.c
index 5d07f33..f14179f 100644
--- a/packfile.c
+++ b/packfile.c
@@ -304,7 +304,7 @@ void close_pack_index(struct packed_git *p)
 	}
 }
 
-static void close_pack(struct packed_git *p)
+void close_pack(struct packed_git *p)
 {
 	close_pack_windows(p);
 	close_pack_fd(p);
@@ -1850,7 +1850,7 @@ int has_pack_index(const unsigned char *sha1)
 	return 1;
 }
 
-static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
+int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
 {
 	uint32_t i;
 	int r = 0;
diff --git a/packfile.h b/packfile.h
index a7fca59..b341f2b 100644
--- a/packfile.h
+++ b/packfile.h
@@ -63,6 +63,7 @@ extern void close_pack_index(struct packed_git *);
 
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
 extern void close_pack_windows(struct packed_git *);
+extern void close_pack(struct packed_git *);
 extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
@@ -140,6 +141,7 @@ typedef int each_packed_object_fn(const struct object_id *oid,
 				  struct packed_git *pack,
 				  uint32_t pos,
 				  void *data);
+extern int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn, void *data);
 extern int for_each_packed_object(each_packed_object_fn, void *, unsigned flags);
 
 /*
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 9a0cd71..6bc529c 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -167,6 +167,16 @@ test_expect_success 'write graph with nothing new' '
 graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph from latest pack with closure' '
+	cd "$TRASH_DIRECTORY/full" &&
+	cat new-idx | git commit-graph write --stdin-packs &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "9" "large_edges"
+'
+
+graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v5 12/13] commit-graph: build graph from starting commits
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
                   ` (10 preceding siblings ...)
  2018-02-27  2:33 ` [PATCH v5 11/13] commit-graph: read only from specific pack-indexes Derrick Stolee
@ 2018-02-27  2:33 ` Derrick Stolee
  2018-02-27  2:33 ` [PATCH v5 13/13] commit-graph: implement "--additive" option Derrick Stolee
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:33 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Teach git-commit-graph to read commits from stdin when the
--stdin-commits flag is specified. Commits reachable from these
commits are added to the graph. This is a much faster way to construct
the graph than inspecting all packed objects, but is restricted to
known tips.

For the Linux repository, 700,000+ commits were added to the graph
file starting from 'master' in 7-9 seconds, depending on the number
of packfiles in the repo (1, 24, or 120).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt | 14 +++++++++++++-
 builtin/commit-graph.c             | 27 +++++++++++++++++++++------
 commit-graph.c                     | 27 +++++++++++++++++++++++++--
 commit-graph.h                     |  4 +++-
 t/t5318-commit-graph.sh            | 13 +++++++++++++
 5 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index b945510..0710a68 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -34,7 +34,13 @@ COMMANDS
 Write a commit graph file based on the commits found in packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
-walking objects only in the specified packfiles.
+walking objects only in the specified packfiles. (Cannot be combined
+with --stdin-commits.)
++
+With the `--stdin-commits` option, generate the new commit graph by
+walking commits starting at the commits specified in stdin as a list
+of OIDs in hex, one OID per line. (Cannot be combined with
+--stdin-packs.)
 
 'read'::
 
@@ -58,6 +64,12 @@ $ git commit-graph write
 $ echo <pack-index> | git commit-graph write --stdin-packs
 ------------------------------------------------
 
+* Write a graph file containing all reachable commits.
++
+------------------------------------------------
+$ git show-ref -s | git commit-graph write --stdin-commits
+------------------------------------------------
+
 * Read basic information from the commit-graph file.
 +
 ------------------------------------------------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index eebca57..1c7b7e7 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -18,13 +18,14 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
 static struct opts_commit_graph {
 	const char *obj_dir;
 	int stdin_packs;
+	int stdin_commits;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -79,6 +80,8 @@ static int graph_write(int argc, const char **argv)
 {
 	const char **pack_indexes = NULL;
 	int packs_nr = 0;
+	const char **commit_hex = NULL;
+	int commits_nr = 0;
 	const char **lines = NULL;
 	int lines_nr = 0;
 	int lines_alloc = 0;
@@ -89,6 +92,8 @@ static int graph_write(int argc, const char **argv)
 			N_("The object directory to store the graph")),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
 			N_("scan packfiles listed by stdin for commits")),
+		OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
+			N_("start walk at commits listed by stdin")),
 		OPT_END(),
 	};
 
@@ -96,10 +101,12 @@ static int graph_write(int argc, const char **argv)
 			     builtin_commit_graph_write_options,
 			     builtin_commit_graph_write_usage, 0);
 
+	if (opts.stdin_packs && opts.stdin_commits)
+		die(_("cannot use both --stdin-commits and --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
-	if (opts.stdin_packs) {
+	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
 		lines_nr = 0;
 		lines_alloc = 128;
@@ -110,13 +117,21 @@ static int graph_write(int argc, const char **argv)
 			lines[lines_nr++] = strbuf_detach(&buf, NULL);
 		}
 
-		pack_indexes = lines;
-		packs_nr = lines_nr;
+		if (opts.stdin_packs) {
+			pack_indexes = lines;
+			packs_nr = lines_nr;
+		}
+		if (opts.stdin_commits) {
+			commit_hex = lines;
+			commits_nr = lines_nr;
+		}
 	}
 
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
-			   packs_nr);
+			   packs_nr,
+			   commit_hex,
+			   commits_nr);
 
 	return 0;
 }
diff --git a/commit-graph.c b/commit-graph.c
index 4e9f1d5..dbb9801 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -531,7 +531,9 @@ static void close_reachable(struct packed_oid_list *oids)
 
 void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
-			int nr_packs)
+			int nr_packs,
+			const char **commit_hex,
+			int nr_commits)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -572,7 +574,28 @@ void write_commit_graph(const char *obj_dir,
 			close_pack(p);
 		}
 		strbuf_release(&packname);
-	} else
+	}
+
+	if (commit_hex) {
+		for (i = 0; i < nr_commits; i++) {
+			const char *end;
+			struct object_id oid;
+			struct commit *result;
+
+			if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end))
+				continue;
+
+			result = lookup_commit_reference_gently(&oid, 1);
+
+			if (result) {
+				ALLOC_GROW(oids.list, oids.nr + 1, oids.alloc);
+				oidcpy(&oids.list[oids.nr], &(result->object.oid));
+				oids.nr++;
+			}
+		}
+	}
+
+	if (!pack_indexes && !commit_hex)
 		for_each_packed_object(add_packed_commits, &oids, 0);
 
 	close_reachable(&oids);
diff --git a/commit-graph.h b/commit-graph.h
index 65fe770..4c70281 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -38,7 +38,9 @@ struct commit_graph *load_commit_graph_one(const char *graph_file);
 
 void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
-			int nr_packs);
+			int nr_packs,
+			const char **commit_hex,
+			int nr_commits);
 
 #endif
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 6bc529c..9589238 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -177,6 +177,19 @@ test_expect_success 'build graph from latest pack with closure' '
 graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph from commits with closure' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git tag -a -m "merge" tag/merge merge/2 &&
+	git rev-parse tag/merge >commits-in &&
+	git rev-parse merge/1 >>commits-in &&
+	cat commits-in | git commit-graph write --stdin-commits &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "6"
+'
+
+graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v5 13/13] commit-graph: implement "--additive" option
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
                   ` (11 preceding siblings ...)
  2018-02-27  2:33 ` [PATCH v5 12/13] commit-graph: build graph from starting commits Derrick Stolee
@ 2018-02-27  2:33 ` Derrick Stolee
  2018-02-27 18:50 ` [PATCH v5 00/13] Serialized Git Commit Graph Stefan Beller
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-02-27  2:33 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Teach git-commit-graph to add all commits from the existing
commit-graph file to the file about to be written. This should be
used when adding new commits without performing garbage collection.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt | 10 ++++++++++
 builtin/commit-graph.c             | 10 +++++++---
 commit-graph.c                     | 17 ++++++++++++++++-
 commit-graph.h                     |  3 ++-
 t/t5318-commit-graph.sh            | 10 ++++++++++
 5 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 0710a68..ccf5e20 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -41,6 +41,9 @@ With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
 --stdin-packs.)
++
+With the `--additive` option, include all commits that are present
+in the existing commit-graph file.
 
 'read'::
 
@@ -70,6 +73,13 @@ $ echo <pack-index> | git commit-graph write --stdin-packs
 $ git show-ref -s | git commit-graph write --stdin-commits
 ------------------------------------------------
 
+* Write a graph file containing all commits in the current
+* commit-graph file along with those reachable from HEAD.
++
+------------------------------------------------
+$ git rev-parse HEAD | git commit-graph write --stdin-commits --additive
+------------------------------------------------
+
 * Read basic information from the commit-graph file.
 +
 ------------------------------------------------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 1c7b7e7..d26a6d6 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--additive] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -18,7 +18,7 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--additive] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -26,6 +26,7 @@ static struct opts_commit_graph {
 	const char *obj_dir;
 	int stdin_packs;
 	int stdin_commits;
+	int additive;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -94,6 +95,8 @@ static int graph_write(int argc, const char **argv)
 			N_("scan packfiles listed by stdin for commits")),
 		OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
 			N_("start walk at commits listed by stdin")),
+		OPT_BOOL(0, "additive", &opts.additive,
+			N_("include all commits already in the commit-graph file")),
 		OPT_END(),
 	};
 
@@ -131,7 +134,8 @@ static int graph_write(int argc, const char **argv)
 			   pack_indexes,
 			   packs_nr,
 			   commit_hex,
-			   commits_nr);
+			   commits_nr,
+			   opts.additive);
 
 	return 0;
 }
diff --git a/commit-graph.c b/commit-graph.c
index dbb9801..c111717 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -533,7 +533,8 @@ void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
 			int nr_packs,
 			const char **commit_hex,
-			int nr_commits)
+			int nr_commits,
+			int additive)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -552,10 +553,24 @@ void write_commit_graph(const char *obj_dir,
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
 
+	if (additive) {
+		prepare_commit_graph_one(obj_dir);
+		if (commit_graph)
+			oids.alloc += commit_graph->num_commits;
+	}
+
 	if (oids.alloc < 1024)
 		oids.alloc = 1024;
 	ALLOC_ARRAY(oids.list, oids.alloc);
 
+	if (additive && commit_graph) {
+		for (i = 0; i < commit_graph->num_commits; i++) {
+			const unsigned char *hash = commit_graph->chunk_oid_lookup +
+				commit_graph->hash_len * i;
+			hashcpy(oids.list[oids.nr++].hash, hash);
+		}
+	}
+
 	if (pack_indexes) {
 		struct strbuf packname = STRBUF_INIT;
 		int dirlen;
diff --git a/commit-graph.h b/commit-graph.h
index 4c70281..c10e436 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -40,7 +40,8 @@ void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
 			int nr_packs,
 			const char **commit_hex,
-			int nr_commits);
+			int nr_commits,
+			int additive);
 
 #endif
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 9589238..518eb92 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -190,6 +190,16 @@ test_expect_success 'build graph from commits with closure' '
 graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph from commits additively' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git rev-parse merge/3 | git commit-graph write --stdin-commits --additive &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "10" "large_edges"
+'
+
+graph_git_behavior 'additive graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'additive graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v5 00/13] Serialized Git Commit Graph
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
                   ` (12 preceding siblings ...)
  2018-02-27  2:33 ` [PATCH v5 13/13] commit-graph: implement "--additive" option Derrick Stolee
@ 2018-02-27 18:50 ` Stefan Beller
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
  14 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2018-02-27 18:50 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Jonathan Tan,
	SZEDER Gábor, Ramsay Jones, Derrick Stolee

On Mon, Feb 26, 2018 at 6:32 PM, Derrick Stolee <stolee@gmail.com> wrote:
> This patch series is another big difference from version 4, but I do
> think we are converging on a stable design.
>
> This series depends on a few things in flight:
>
> * jt/binsearch-with-fanout for bsearch_graph()
>
> * 'master' includes the sha1file -> hashfile rename in (98a3beab).
>
> * [PATCH] commit: drop uses of get_cached_commit_buffer(). [1] I
>   couldn't find a ds/* branch for this one, but it is necessary or
>   else the commit graph test script should fail.

'jk/cached-commit-buffer', 'jk' as the first commit in that series
is by Jeff King ?

I found this commit by searching for its verbatim title in
'git log --oneline origin/pu' and then using
https://github.com/mhagger/git-when-merged
to find 51ff16f5f3a (Merge branch 'jk/cached-commit-buffer'
into jch, 2018-02-23)

>
> Here are some of the inter-patch changes:
>
> * The single commit graph file is stored in the fixed filename
>   .git/objects/info/commit-graph
>
> * Because of this change, I struggled with the right way to pair the
>   lockfile API with the hashfile API. Perhaps they were not meant to
>   interact like this. I include a new patch step that adds a flag for
>   hashclose() to keep the file descriptor open so commit_lock_file()
>   can succeed. Please let me know if this is the wrong approach.

This sounds like an interesting thing to review.

>
> * A side-benefit of this change is that the "--set-latest" and
>   "--delete-expired" arguments are no longer useful.
>
> * I re-ran the performance tests since I rebased onto master. I had
>   moved my "master" branch on my copy of Linux from another perf test,
>   which changed the data shape a bit.
>
> * There was some confusion between v3 and v4 about whether commits in
>   an existing commit-graph file are automatically added to the new
>   file during a write. I think I cleared up all of the documentation
>   that referenced this to the new behavior: we only include commits
>   reachable from the starting commits (depending on --stdin-commits,
>   --stdin-packs, or neither) unless the new "--additive" argument
>   is specified.
>

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v5 11/13] commit-graph: read only from specific pack-indexes
  2018-02-27  2:33 ` [PATCH v5 11/13] commit-graph: read only from specific pack-indexes Derrick Stolee
@ 2018-02-27 20:15   ` Stefan Beller
  0 siblings, 0 replies; 110+ messages in thread
From: Stefan Beller @ 2018-02-27 20:15 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Jonathan Tan,
	SZEDER Gábor, Ramsay Jones, Derrick Stolee

> @@ -76,10 +77,18 @@ static int graph_read(int argc, const char **argv)
>
>  static int graph_write(int argc, const char **argv)
>  {
> +       const char **pack_indexes = NULL;
> +       int packs_nr = 0;
> +       const char **lines = NULL;
> +       int lines_nr = 0;
> +       int lines_alloc = 0;
> +
>         static struct option builtin_commit_graph_write_options[] = {
>                 OPT_STRING(0, "object-dir", &opts.obj_dir,
>                         N_("dir"),
>                         N_("The object directory to store the graph")),
> +               OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
> +                       N_("scan packfiles listed by stdin for commits")),
>                 OPT_END(),
>         };
>
> @@ -90,7 +99,25 @@ static int graph_write(int argc, const char **argv)
>         if (!opts.obj_dir)
>                 opts.obj_dir = get_object_directory();
>
> -       write_commit_graph(opts.obj_dir);
> +       if (opts.stdin_packs) {
> +               struct strbuf buf = STRBUF_INIT;
> +               lines_nr = 0;
> +               lines_alloc = 128;

both lines_nr as well as lines_alloc are already initialized?

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag
  2018-02-27  2:32 ` [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag Derrick Stolee
@ 2018-03-12 13:55   ` Derrick Stolee
  2018-03-13 21:42     ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-03-12 13:55 UTC (permalink / raw)
  To: git
  Cc: gitster, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

On 2/26/2018 9:32 PM, Derrick Stolee wrote:
> This patch is new to the series due to the interactions with the lockfile API
> and the hashfile API. I need to ensure the hashfile writes the hash value at
> the end of the file, but keep the file descriptor open so the lock is valid.
>
> I welcome any susggestions to this patch or to the way I use it in the commit
> that follows.
>
> -- >8 --

I haven't gotten any feedback on this step of the patch. Could someone 
take a look and let me know what you think?

Thanks,
-Stolee

> If we want to use a hashfile on the temporary file for a lockfile, then
> we need hashclose() to fully write the trailing hash but also keep the
> file descriptor open.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>   csum-file.c | 10 +++++++---
>   csum-file.h |  1 +
>   2 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/csum-file.c b/csum-file.c
> index 5eda7fb..302e6ae 100644
> --- a/csum-file.c
> +++ b/csum-file.c
> @@ -66,9 +66,13 @@ int hashclose(struct hashfile *f, unsigned char *result, unsigned int flags)
>   		flush(f, f->buffer, the_hash_algo->rawsz);
>   		if (flags & CSUM_FSYNC)
>   			fsync_or_die(f->fd, f->name);
> -		if (close(f->fd))
> -			die_errno("%s: sha1 file error on close", f->name);
> -		fd = 0;
> +		if (flags & CSUM_KEEP_OPEN)
> +			fd = f->fd;
> +		else {
> +			if (close(f->fd))
> +				die_errno("%s: sha1 file error on close", f->name);
> +			fd = 0;
> +		}
>   	} else
>   		fd = f->fd;
>   	if (0 <= f->check_fd) {
> diff --git a/csum-file.h b/csum-file.h
> index 992e5c0..b7c0e48 100644
> --- a/csum-file.h
> +++ b/csum-file.h
> @@ -29,6 +29,7 @@ extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
>   /* hashclose flags */
>   #define CSUM_CLOSE	1
>   #define CSUM_FSYNC	2
> +#define CSUM_KEEP_OPEN	4
>   
>   extern struct hashfile *hashfd(int fd, const char *name);
>   extern struct hashfile *hashfd_check(const char *name);


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag
  2018-03-12 13:55   ` Derrick Stolee
@ 2018-03-13 21:42     ` Junio C Hamano
  2018-03-14  2:26       ` Derrick Stolee
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2018-03-13 21:42 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> On 2/26/2018 9:32 PM, Derrick Stolee wrote:
>> This patch is new to the series due to the interactions with the lockfile API
>> and the hashfile API. I need to ensure the hashfile writes the hash value at
>> the end of the file, but keep the file descriptor open so the lock is valid.
>>
>> I welcome any susggestions to this patch or to the way I use it in the commit
>> that follows.
>>
>> -- >8 --
>
> I haven't gotten any feedback on this step of the patch. Could someone
> take a look and let me know what you think?

Let's follow the commit-graph writing codepath to see what happens:

	fd = hold_lock_file_for_update(&lk, graph_name, 0);
	...
	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);

The caller creates a lockfile, and then wraps its file descriptor in
a hashfile.

	hashwrite_be32(f, GRAPH_SIGNATURE);
	...

Then it goes on writing to the hashfile, growing the lockfile.

        ...
	write_graph_chunk_large_edges(f, commits.list, commits.nr);

	close_commit_graph();

And after writing all data out (oh by the way, why aren't we passing
commit_graph instance around and instead relying on a file-scope
static global?)...

	hashclose(f, final_hash, CSUM_CLOSE | CSUM_FSYNC | CSUM_KEEP_OPEN);

We ask for the final hash value to be written to the file (and also
returned to us---although you do not seem to use that at all).  See
a comment on this, though, at the end.

	commit_lock_file(&lk);

And then, we put the lockfile to its final place, while closing its
file descriptor.

The overall API sounds sensible, from the above.

However.

The function whose name is hashclose() that takes a flag word whose
possible bit value includes "Please close this thing" feels strange
enough (does it mean the hashclose() function does not close it if
CSUM_CLOSE is not given?), but adding another to the mix that lets
us say "Please close this (with or without FSYNC), oh by the way
please leave it open" feels a bit borderline to insanity.

I _think_ the word "close" in the name hashclose() is about closing
the (virtual) stream for the hashing that is overlayed on top of the
underlying file descriptor, and being able to choose between closing
and not closing the underlying file descriptor when "closing" the
hashing layer sort of makes sense.  So I won't complain too much
about hashclose() that takes optional CSUM_CLOSE flag.

But then what does it mean to give KEEP_OPEN and CLOSE together?

The new caller (which is the only one that wants the nominally
nonsensical CLOSE|KEEP_OPEN combination, which is shown above) wants
the final checksum of the data sent over the (virtual) stream
computed and written, and the file descriptor fsync'ed, but the file
descriptor kept open.  As we _DO_ want to keep the verbs in flags
CSUM_CLOSE and CSUM_FSYNC to be about the underlying file
descriptor, I think your new code for KEEP_OPEN that is inside the
if() block that is for CSUM_CLOSE is an ugly hack, and your asking
for improvements is very much appreciated.

Let's step back and see what different behaviours the existing code
wants to support before your patch:

    - hashclose() is always about finializing the hash computation
      over the data sent through the struct hashfile (i.e. the
      virtual stream opened by hashfd()).  The optional *result can
      be used to receive this hash value, even when the caller does
      not want to write that hash value to the output stream.

    - when CSUM_CLOSE is given, however, the hash value is written
      out as the trailing record to the output stream and the stream
      is closed.  CSUM_FSYNC can instead be used to ensure that the
      data hits the disk platter when the output stream is closed.

    - when CSUM_CLOSE nor CSUM_FSYNC is not given, hash value is not
      written to the output stream (the caller takes responsibility
      of using *result), and the output stream is left open.

I think the first mistake in the existing code is to associate
"close the underlying stream" and "write the hash out to the
underlying stream" more closely than it should.  It should be
possible to "close the underlying steam" without first writing the
hash out to the underlying stream", and vice versa.

IOW, I think

        hashclose() {
                hashflush();
                the_hash_algo->final_fn();
                if (result)             
                        hashcpy(result, f->buffer);
        +       if (flags & CSUM_HASH_IN_STREAM)
        +               flush(f, f->buffer, the_hash_algo->rawsz);
        +       if (flags & CSUM_FSYNC)
        +               fsync_or_die();
                if (flags & (CSUM_CLOSE | CSUM_FSYNC)) {
        -               flush();                
        -               if (flags & CSUM_FSYNC)
        -                       fsync_or_die();
                        if (close(f->fd))
                                die_errno();
                        fd = 0;
                } else
                        fd = f->fd;
                if (0 <= f->check_fd) {
                        ...
                }
                free(f);
                return fd;
        }

with would be a good first "preliminary preparation" step.

Existing callers that pass CSUM_FSYNC or CSUM_CLOSE now need to also
say "I want the resulting hash in the output stream", but that
allows your later caller to omit CSUM_CLOSE and then ask for
HASH_IN_STREAM alone.

Existing callers can expect that FSYNC alone means fsync and close,
but your caller wants hashclose() to compute the hash, write the hash
to the output stream, and fsync the output stream, and return
without closing the output stream.  For that, you'd make FSYNC not
to imply CLOSE, and you'd need to vet all the existing callers that
use FSYNC are OK with such a change.  And then the above would
become

        hashclose() {
                hashflush();
                the_hash_algo->final_fn();
                if (result)             
                        hashcpy(result, f->buffer);
                if (flags & CSUM_HASH_IN_STREAM)
                        flush(f, f->buffer, the_hash_algo->rawsz);
                if (flags & CSUM_FSYNC)
                        fsync_or_die();
                if (flags & CSUM_CLOSE) {
                        if (close(f->fd))
                                die_errno();
                        fd = 0;
                } else
                        fd = f->fd;
                if (0 <= f->check_fd) {
                        ...
                }
                free(f);
                return fd;
        }

Once we reach that state, the new caller in write_commit_graph()
does not have to pass nonsensical CLOSE|KEEP_OPEN combination.
Instead we can do

	hashclose(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);

or something like that, I would think, without having KEEP_OPEN.

I am actually wondering if it is worth making CSUM_FSYNC not imply
CSUM_CLOSE.  There aren't that many existing callers of hashclose()
that uses FSYNC, so vetting all of them and replacing their FSYNC
with (FSYNC|CLOSE) is not all that difficult, but if this new caller
is an oddball then another strategy may be to do the fsync_or_die()
on the caller side, something like:

                hashclose(f, NULL, CSUM_HASH_IN_STREAM);
        +       fsync_or_die(fd, get_lock_file_path(&lk));
                commit_lock_file(&lk);

And then we can keep the "FSYNC means fsync and then close" the
current set of callers rely on.  I dunno if that is a major issue,
but I do think "close this, or no, keep it open" is far worse than
"do we want the resulting hash in the stream?"

An alternative design of the above is without making
CSUM_HASH_IN_STREAM a new flag bit.  I highly suspect that the
calling codepath _knows_ whether the resulting final hash will be
written out at the end of the stream or not when it wraps an fd with
a hashfile structure, so "struct hashfile" could gain a bit to tell
hashclose() whether the resulting hash need to be written (or not).
That would be a bit larger change than what I outlined above, and I
do not know if it is worth doing, though.




^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag
  2018-03-13 21:42     ` Junio C Hamano
@ 2018-03-14  2:26       ` Derrick Stolee
  2018-03-14 17:00         ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14  2:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

On 3/13/2018 5:42 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> On 2/26/2018 9:32 PM, Derrick Stolee wrote:
>>> This patch is new to the series due to the interactions with the lockfile API
>>> and the hashfile API. I need to ensure the hashfile writes the hash value at
>>> the end of the file, but keep the file descriptor open so the lock is valid.
>>>
>>> I welcome any susggestions to this patch or to the way I use it in the commit
>>> that follows.
>>>
>>> -- >8 --
>> I haven't gotten any feedback on this step of the patch. Could someone
>> take a look and let me know what you think?

Let me just say that I appreciate the level of detail you provided in 
answering this question. The discussion below is very illuminating.

> Let's follow the commit-graph writing codepath to see what happens:
>
> 	fd = hold_lock_file_for_update(&lk, graph_name, 0);
> 	...
> 	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
>
> The caller creates a lockfile, and then wraps its file descriptor in
> a hashfile.
>
> 	hashwrite_be32(f, GRAPH_SIGNATURE);
> 	...
>
> Then it goes on writing to the hashfile, growing the lockfile.
>
>          ...
> 	write_graph_chunk_large_edges(f, commits.list, commits.nr);
>
> 	close_commit_graph();
>
> And after writing all data out (oh by the way, why aren't we passing
> commit_graph instance around and instead relying on a file-scope
> static global?)...

Yeah, we should remove the global dependence. Is this a blocker for the 
series?

After this lands, I plan to add a one-commit patch that moves all global 
"commit_graph" state to the_repository in the spirit of Stefan's 
patches. For now, this is equivalent to "close_all_packs()".

> 	hashclose(f, final_hash, CSUM_CLOSE | CSUM_FSYNC | CSUM_KEEP_OPEN);
>
> We ask for the final hash value to be written to the file (and also
> returned to us---although you do not seem to use that at all).  See
> a comment on this, though, at the end.

I'm not using the final_hash anymore, since we are using the 
"$GIT_DIR/objects/info/commit-graph" filename always. If we make the 
file incremental, then we can use "final_hash" to name the split files. 
For now, I'll remove final_hash in favor of NULL.

> 	commit_lock_file(&lk);
>
> And then, we put the lockfile to its final place, while closing its
> file descriptor.
>
> The overall API sounds sensible, from the above.
>
> However.
>
> The function whose name is hashclose() that takes a flag word whose
> possible bit value includes "Please close this thing" feels strange
> enough (does it mean the hashclose() function does not close it if
> CSUM_CLOSE is not given?), but adding another to the mix that lets
> us say "Please close this (with or without FSYNC), oh by the way
> please leave it open" feels a bit borderline to insanity.
>
> I _think_ the word "close" in the name hashclose() is about closing
> the (virtual) stream for the hashing that is overlayed on top of the
> underlying file descriptor, and being able to choose between closing
> and not closing the underlying file descriptor when "closing" the
> hashing layer sort of makes sense.  So I won't complain too much
> about hashclose() that takes optional CSUM_CLOSE flag.

I agree this "close" word is incorrect. We really want 
"finalize_hashfile()" which may include closing the file.

> But then what does it mean to give KEEP_OPEN and CLOSE together?

This should have been a red flag that something was wrong.

> The new caller (which is the only one that wants the nominally
> nonsensical CLOSE|KEEP_OPEN combination, which is shown above) wants
> the final checksum of the data sent over the (virtual) stream
> computed and written, and the file descriptor fsync'ed, but the file
> descriptor kept open.  As we _DO_ want to keep the verbs in flags
> CSUM_CLOSE and CSUM_FSYNC to be about the underlying file
> descriptor, I think your new code for KEEP_OPEN that is inside the
> if() block that is for CSUM_CLOSE is an ugly hack, and your asking
> for improvements is very much appreciated.
>
> Let's step back and see what different behaviours the existing code
> wants to support before your patch:
>
>      - hashclose() is always about finializing the hash computation
>        over the data sent through the struct hashfile (i.e. the
>        virtual stream opened by hashfd()).  The optional *result can
>        be used to receive this hash value, even when the caller does
>        not want to write that hash value to the output stream.
>
>      - when CSUM_CLOSE is given, however, the hash value is written
>        out as the trailing record to the output stream and the stream
>        is closed.  CSUM_FSYNC can instead be used to ensure that the
>        data hits the disk platter when the output stream is closed.
>
>      - when CSUM_CLOSE nor CSUM_FSYNC is not given, hash value is not
>        written to the output stream (the caller takes responsibility
>        of using *result), and the output stream is left open.
>
> I think the first mistake in the existing code is to associate
> "close the underlying stream" and "write the hash out to the
> underlying stream" more closely than it should.  It should be
> possible to "close the underlying steam" without first writing the
> hash out to the underlying stream", and vice versa.
>
> IOW, I think
>
>          hashclose() {
>                  hashflush();
>                  the_hash_algo->final_fn();
>                  if (result)
>                          hashcpy(result, f->buffer);
>          +       if (flags & CSUM_HASH_IN_STREAM)
>          +               flush(f, f->buffer, the_hash_algo->rawsz);
>          +       if (flags & CSUM_FSYNC)
>          +               fsync_or_die();
>                  if (flags & (CSUM_CLOSE | CSUM_FSYNC)) {
>          -               flush();
>          -               if (flags & CSUM_FSYNC)
>          -                       fsync_or_die();
>                          if (close(f->fd))
>                                  die_errno();
>                          fd = 0;
>                  } else
>                          fd = f->fd;
>                  if (0 <= f->check_fd) {
>                          ...
>                  }
>                  free(f);
>                  return fd;
>          }
>
> with would be a good first "preliminary preparation" step.
>
> Existing callers that pass CSUM_FSYNC or CSUM_CLOSE now need to also
> say "I want the resulting hash in the output stream", but that
> allows your later caller to omit CSUM_CLOSE and then ask for
> HASH_IN_STREAM alone.
>
> Existing callers can expect that FSYNC alone means fsync and close,
> but your caller wants hashclose() to compute the hash, write the hash
> to the output stream, and fsync the output stream, and return
> without closing the output stream.  For that, you'd make FSYNC not
> to imply CLOSE, and you'd need to vet all the existing callers that
> use FSYNC are OK with such a change.  And then the above would
> become
>
>          hashclose() {
>                  hashflush();
>                  the_hash_algo->final_fn();
>                  if (result)
>                          hashcpy(result, f->buffer);
>                  if (flags & CSUM_HASH_IN_STREAM)
>                          flush(f, f->buffer, the_hash_algo->rawsz);
>                  if (flags & CSUM_FSYNC)
>                          fsync_or_die();
>                  if (flags & CSUM_CLOSE) {
>                          if (close(f->fd))
>                                  die_errno();
>                          fd = 0;
>                  } else
>                          fd = f->fd;
>                  if (0 <= f->check_fd) {
>                          ...
>                  }
>                  free(f);
>                  return fd;
>          }
>
> Once we reach that state, the new caller in write_commit_graph()
> does not have to pass nonsensical CLOSE|KEEP_OPEN combination.
> Instead we can do
>
> 	hashclose(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
>
> or something like that, I would think, without having KEEP_OPEN.

My new solution works this way. The only caveat is that existing callers 
end up with this diff:

- hashclose(f, _, CSUM_FSYNC);
+ hashclose(f, _, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);

Perhaps this is acceptable, as it is very clear about what will happen. 
It is easy to recognize that in the existing implementation, CSUM_FSYNC 
implies CSUM_CLOSE which further implies CSUM_HASH_IN_STREAM.

> I am actually wondering if it is worth making CSUM_FSYNC not imply
> CSUM_CLOSE.  There aren't that many existing callers of hashclose()
> that uses FSYNC, so vetting all of them and replacing their FSYNC
> with (FSYNC|CLOSE) is not all that difficult, but if this new caller
> is an oddball then another strategy may be to do the fsync_or_die()
> on the caller side, something like:
>
>                  hashclose(f, NULL, CSUM_HASH_IN_STREAM);
>          +       fsync_or_die(fd, get_lock_file_path(&lk));
>                  commit_lock_file(&lk);
>
> And then we can keep the "FSYNC means fsync and then close" the
> current set of callers rely on.  I dunno if that is a major issue,
> but I do think "close this, or no, keep it open" is far worse than
> "do we want the resulting hash in the stream?"

I'm not happy with this solution of needing an extra call like this 
in-between, especially since hashclose() knows how to FSYNC.

> An alternative design of the above is without making
> CSUM_HASH_IN_STREAM a new flag bit.  I highly suspect that the
> calling codepath _knows_ whether the resulting final hash will be
> written out at the end of the stream or not when it wraps an fd with
> a hashfile structure, so "struct hashfile" could gain a bit to tell
> hashclose() whether the resulting hash need to be written (or not).
> That would be a bit larger change than what I outlined above, and I
> do not know if it is worth doing, though.

This certainly seems trickier to get right, but if we think it is the 
right solution I'll spend the time pairing struct creations with stream 
closings. At the moment, I'm not sure what to do with the following 
snippet from builtin/pack-objects.c that uses different flags (including 
0) depending on the situation:

         /*
          * Did we write the wrong # entries in the header?
          * If so, rewrite it like in fast-import
          */
         if (pack_to_stdout) {
             hashclose(f, oid.hash, CSUM_CLOSE);
         } else if (nr_written == nr_remaining) {
             hashclose(f, oid.hash, CSUM_FSYNC);
         } else {
             int fd = hashclose(f, oid.hash, 0);

It may be solvable if I dig a bit deeper than I have so far.

Below is my attempt at making the proposed change concrete, including 
adding flags to all existing callers to preserve behavior.

-- >8 --

 From 976ff3902f8a5a1b0132a4032a4000bb330737f7 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Mon, 26 Feb 2018 14:45:44 -0500
Subject: [PATCH] csum-file: refactor hashclose() method

If we want to use a hashfile on the temporary file for a lockfile, then
we need hashclose() to fully write the trailing hash but also keep the
file descriptor open.

Do this by adding a new CSUM_HASH_IN_STREAM flag along with a functional
change that checks this flag before writing the checksum to the stream.
This differs from previous behavior since it would be written if either
CSUM_CLOSE or CSUM_FSYNC is provided.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
  builtin/pack-objects.c | 4 ++--
  bulk-checkin.c         | 2 +-
  csum-file.c            | 8 ++++----
  csum-file.h            | 5 +++--
  pack-bitmap-write.c    | 2 +-
  pack-write.c           | 5 +++--
  6 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a197926eaa..530ddd0677 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -837,9 +837,9 @@ static void write_pack_file(void)
                  * If so, rewrite it like in fast-import
                  */
                 if (pack_to_stdout) {
-                       hashclose(f, oid.hash, CSUM_CLOSE);
+                       hashclose(f, oid.hash, CSUM_HASH_IN_STREAM | 
CSUM_CLOSE);
                 } else if (nr_written == nr_remaining) {
-                       hashclose(f, oid.hash, CSUM_FSYNC);
+                       hashclose(f, oid.hash, CSUM_HASH_IN_STREAM | 
CSUM_FSYNC | CSUM_CLOSE);
                 } else {
                         int fd = hashclose(f, oid.hash, 0);
                         fixup_pack_header_footer(fd, oid.hash, 
pack_tmp_name,
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 9d87eac07b..8108bacc79 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -35,7 +35,7 @@ static void finish_bulk_checkin(struct 
bulk_checkin_state *state)
                 unlink(state->pack_tmp_name);
                 goto clear_exit;
         } else if (state->nr_written == 1) {
-               hashclose(state->f, oid.hash, CSUM_FSYNC);
+               hashclose(state->f, oid.hash, CSUM_HASH_IN_STREAM | 
CSUM_FSYNC | CSUM_CLOSE);
         } else {
                 int fd = hashclose(state->f, oid.hash, 0);
                 fixup_pack_header_footer(fd, oid.hash, 
state->pack_tmp_name,
diff --git a/csum-file.c b/csum-file.c
index 5eda7fb6af..735cd2d3d5 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -61,11 +61,11 @@ int hashclose(struct hashfile *f, unsigned char 
*result, unsigned int flags)
         the_hash_algo->final_fn(f->buffer, &f->ctx);
         if (result)
                 hashcpy(result, f->buffer);
-       if (flags & (CSUM_CLOSE | CSUM_FSYNC)) {
-               /* write checksum and close fd */
+       if (flags & CSUM_HASH_IN_STREAM)
                 flush(f, f->buffer, the_hash_algo->rawsz);
-               if (flags & CSUM_FSYNC)
-                       fsync_or_die(f->fd, f->name);
+       if (flags & CSUM_FSYNC)
+               fsync_or_die(f->fd, f->name);
+       if (flags & CSUM_CLOSE) {
                 if (close(f->fd))
                         die_errno("%s: sha1 file error on close", f->name);
                 fd = 0;
diff --git a/csum-file.h b/csum-file.h
index 992e5c0141..a5790ca266 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -27,8 +27,9 @@ extern void hashfile_checkpoint(struct hashfile *, 
struct hashfile_checkpoint *)
  extern int hashfile_truncate(struct hashfile *, struct 
hashfile_checkpoint *);

  /* hashclose flags */
-#define CSUM_CLOSE     1
-#define CSUM_FSYNC     2
+#define CSUM_CLOSE             1
+#define CSUM_FSYNC             2
+#define CSUM_HASH_IN_STREAM    4

  extern struct hashfile *hashfd(int fd, const char *name);
  extern struct hashfile *hashfd_check(const char *name);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e01f992884..471481f461 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -535,7 +535,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
         if (options & BITMAP_OPT_HASH_CACHE)
                 write_hash_cache(f, index, index_nr);

-       hashclose(f, NULL, CSUM_FSYNC);
+       hashclose(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);

         if (adjust_shared_perm(tmp_file.buf))
                 die_errno("unable to make temporary bitmap file readable");
diff --git a/pack-write.c b/pack-write.c
index d775c7406d..f72df5a836 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -170,8 +170,9 @@ const char *write_idx_file(const char *index_name, 
struct pack_idx_entry **objec
         }

         hashwrite(f, sha1, the_hash_algo->rawsz);
-       hashclose(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
-                           ? CSUM_CLOSE : CSUM_FSYNC));
+       hashclose(f, NULL, CSUM_HASH_IN_STREAM | CSUM_CLOSE |
+                          ((opts->flags & WRITE_IDX_VERIFY)
+                          ? 0 : CSUM_FSYNC));
         return index_name;
  }

--
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag
  2018-03-14  2:26       ` Derrick Stolee
@ 2018-03-14 17:00         ` Junio C Hamano
  0 siblings, 0 replies; 110+ messages in thread
From: Junio C Hamano @ 2018-03-14 17:00 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, peff, git, jonathantanmy, sbeller, szeder.dev, ramsay,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

>> 	close_commit_graph();
>>
>> And after writing all data out (oh by the way, why aren't we passing
>> commit_graph instance around and instead relying on a file-scope
>> static global?)...
>
> Yeah, we should remove the global dependence. Is this a blocker for
> the series?

I do not think it is such a big deal.  It was just that I found it a
bit curious while reading it through, knowing that you are already
familiar with the work being done in that "the_repository" area.

>> I _think_ the word "close" in the name hashclose() is about closing
>> the (virtual) stream for the hashing that is overlayed on top of the
>> underlying file descriptor, and being able to choose between closing
>> and not closing the underlying file descriptor when "closing" the
>> hashing layer sort of makes sense.  So I won't complain too much
>> about hashclose() that takes optional CSUM_CLOSE flag.
>
> I agree this "close" word is incorrect. We really want
> "finalize_hashfile()" which may include closing the file.

Yeah, that is much better.  I do not think I'd mind seeing a prelim
step at the very beginning of the series to just rename the function
before the series starts to change anything else (there aren't that
many callers and I do not think there is any topic in flight that
changes these existing callsites).  Or we can leave it for clean-up
after the dust settles.  Either is fine as long as we know that we
eventually get there.

> My new solution works this way. The only caveat is that existing
> callers end up with this diff:
>
> - hashclose(f, _, CSUM_FSYNC);
> + hashclose(f, _, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);

I think I am fine with that.  It feels a bit nonsensical for a
caller to ask fsync when it is not asking fd to be closed, as I'd
imagine that the typical reason why the caller wants the fd left
open is because the caller still wants to do something to it
(e.g. write some more things into it) and a caller who would care
about fsync would want to do so _after_ finishing its own writing,
but that may be just me.

>> And then we can keep the "FSYNC means fsync and then close" the
>> current set of callers rely on.  I dunno if that is a major issue,
>> but I do think "close this, or no, keep it open" is far worse than
>> "do we want the resulting hash in the stream?"
>
> I'm not happy with this solution of needing an extra call like this
> in-between, especially since hashclose() knows how to FSYNC.

I guess we are repeating the same as above ;-)  As I said, I do not
care too deeply either way.

>> An alternative design of the above is without making
>> CSUM_HASH_IN_STREAM a new flag bit.  I highly suspect that the
>> calling codepath _knows_ whether the resulting final hash will be
>> written out at the end of the stream or not when it wraps an fd with
>> a hashfile structure, so "struct hashfile" could gain a bit to tell
>> hashclose() whether the resulting hash need to be written (or not).
>> That would be a bit larger change than what I outlined above, and I
>> do not know if it is worth doing, though.
>
> This certainly seems trickier to get right, but if we think it is the
> right solution I'll spend the time pairing struct creations with
> stream closings.

I still do not think of a compelling reason why such an alternative
approach would be worth taking, and do prefer the approach to let
the caller choose when finalize function is called via a flag bit.

Thanks.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v6 00/14] Serialized Git Commit Graph
  2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
                   ` (13 preceding siblings ...)
  2018-02-27 18:50 ` [PATCH v5 00/13] Serialized Git Commit Graph Stefan Beller
@ 2018-03-14 19:27 ` Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
                     ` (17 more replies)
  14 siblings, 18 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

This v6 includes feedback around csum-file.c and the rename of hashclose()
to finalize_hashfile(). These are the first two commits of the series, so
they could be pulled out independently.

The only other change since v5 is that I re-ran the performance numbers
in "commit: integrate commit graph with commit parsing".

Hopefully this version is ready to merge. I have several follow-up topics
in mind to submit soon after, including:

* Auto-generate the commit graph as the repo changes:
   i. teach git-commit-graph an "fsck" subcommand and integrate with git-fsck
  ii. teach git-repack to call git-commit-graph
* Generation numbers:
   i. teach git-commit-graph to compute generation numbers
  ii. consume generation numbers in paint_down_to_common()
* Move globals from commit-graph.c to the_repository

The three bullets (*) are relatively independent but have sub-items that
appear in priority order.

Derrick Stolee (14):
  csum-file: rename hashclose() to finalize_hashfile()
  csum-file: refactor finalize_hashfile() method
  commit-graph: add format document
  graph: add commit graph design document
  commit-graph: create git-commit-graph builtin
  commit-graph: implement write_commit_graph()
  commit-graph: implement 'git-commit-graph write'
  commit-graph: implement git commit-graph read
  commit-graph: add core.commitGraph setting
  commit-graph: close under reachability
  commit: integrate commit graph with commit parsing
  commit-graph: read only from specific pack-indexes
  commit-graph: build graph from starting commits
  commit-graph: implement "--additive" option

 .gitignore                                      |   1 +
 Documentation/config.txt                        |   3 +
 Documentation/git-commit-graph.txt              |  93 +++
 Documentation/technical/commit-graph-format.txt |  98 ++++
 Documentation/technical/commit-graph.txt        | 164 ++++++
 Makefile                                        |   2 +
 alloc.c                                         |   1 +
 builtin.h                                       |   1 +
 builtin/commit-graph.c                          | 172 ++++++
 builtin/index-pack.c                            |   2 +-
 builtin/pack-objects.c                          |   6 +-
 bulk-checkin.c                                  |   4 +-
 cache.h                                         |   1 +
 command-list.txt                                |   1 +
 commit-graph.c                                  | 719 ++++++++++++++++++++++++
 commit-graph.h                                  |  47 ++
 commit.c                                        |   3 +
 commit.h                                        |   3 +
 config.c                                        |   5 +
 contrib/completion/git-completion.bash          |   2 +
 csum-file.c                                     |  10 +-
 csum-file.h                                     |   9 +-
 environment.c                                   |   1 +
 fast-import.c                                   |   2 +-
 git.c                                           |   1 +
 pack-bitmap-write.c                             |   2 +-
 pack-write.c                                    |   5 +-
 packfile.c                                      |   4 +-
 packfile.h                                      |   2 +
 t/t5318-commit-graph.sh                         | 225 ++++++++
 30 files changed, 1568 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 Documentation/technical/commit-graph-format.txt
 create mode 100644 Documentation/technical/commit-graph.txt
 create mode 100644 builtin/commit-graph.c
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h
 create mode 100755 t/t5318-commit-graph.sh


base-commit: d0db9edba0050ada6f6eac68061599690d2a4333
-- 
2.14.1


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v6 01/14] csum-file: rename hashclose() to finalize_hashfile()
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The hashclose() method behaves very differently depending on the flags
parameter. In particular, the file descriptor is not always closed.

Perform a simple rename of "hashclose()" to "finalize_hashfile()" in
preparation for functional changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/index-pack.c   | 2 +-
 builtin/pack-objects.c | 6 +++---
 bulk-checkin.c         | 4 ++--
 csum-file.c            | 2 +-
 csum-file.h            | 4 ++--
 fast-import.c          | 2 +-
 pack-bitmap-write.c    | 2 +-
 pack-write.c           | 4 ++--
 8 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 59878e70b8..157bceb264 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1269,7 +1269,7 @@ static void conclude_pack(int fix_thin_pack, const char *curr_pack, unsigned cha
 			    nr_objects - nr_objects_initial);
 		stop_progress_msg(&progress, msg.buf);
 		strbuf_release(&msg);
-		hashclose(f, tail_hash, 0);
+		finalize_hashfile(f, tail_hash, 0);
 		hashcpy(read_hash, pack_hash);
 		fixup_pack_header_footer(output_fd, pack_hash,
 					 curr_pack, nr_objects,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a197926eaa..84e9f57b7f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -837,11 +837,11 @@ static void write_pack_file(void)
 		 * If so, rewrite it like in fast-import
 		 */
 		if (pack_to_stdout) {
-			hashclose(f, oid.hash, CSUM_CLOSE);
+			finalize_hashfile(f, oid.hash, CSUM_CLOSE);
 		} else if (nr_written == nr_remaining) {
-			hashclose(f, oid.hash, CSUM_FSYNC);
+			finalize_hashfile(f, oid.hash, CSUM_FSYNC);
 		} else {
-			int fd = hashclose(f, oid.hash, 0);
+			int fd = finalize_hashfile(f, oid.hash, 0);
 			fixup_pack_header_footer(fd, oid.hash, pack_tmp_name,
 						 nr_written, oid.hash, offset);
 			close(fd);
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 9d87eac07b..227cc9f3b1 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -35,9 +35,9 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
 		unlink(state->pack_tmp_name);
 		goto clear_exit;
 	} else if (state->nr_written == 1) {
-		hashclose(state->f, oid.hash, CSUM_FSYNC);
+		finalize_hashfile(state->f, oid.hash, CSUM_FSYNC);
 	} else {
-		int fd = hashclose(state->f, oid.hash, 0);
+		int fd = finalize_hashfile(state->f, oid.hash, 0);
 		fixup_pack_header_footer(fd, oid.hash, state->pack_tmp_name,
 					 state->nr_written, oid.hash,
 					 state->offset);
diff --git a/csum-file.c b/csum-file.c
index 5eda7fb6af..e6c95a6915 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -53,7 +53,7 @@ void hashflush(struct hashfile *f)
 	}
 }
 
-int hashclose(struct hashfile *f, unsigned char *result, unsigned int flags)
+int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int flags)
 {
 	int fd;
 
diff --git a/csum-file.h b/csum-file.h
index 992e5c0141..9ba87f0a6c 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -26,14 +26,14 @@ struct hashfile_checkpoint {
 extern void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *);
 extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
 
-/* hashclose flags */
+/* finalize_hashfile flags */
 #define CSUM_CLOSE	1
 #define CSUM_FSYNC	2
 
 extern struct hashfile *hashfd(int fd, const char *name);
 extern struct hashfile *hashfd_check(const char *name);
 extern struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp);
-extern int hashclose(struct hashfile *, unsigned char *, unsigned int);
+extern int finalize_hashfile(struct hashfile *, unsigned char *, unsigned int);
 extern void hashwrite(struct hashfile *, const void *, unsigned int);
 extern void hashflush(struct hashfile *f);
 extern void crc32_begin(struct hashfile *);
diff --git a/fast-import.c b/fast-import.c
index 58ef360da4..2e5d17318d 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1016,7 +1016,7 @@ static void end_packfile(void)
 		struct tag *t;
 
 		close_pack_windows(pack_data);
-		hashclose(pack_file, cur_pack_oid.hash, 0);
+		finalize_hashfile(pack_file, cur_pack_oid.hash, 0);
 		fixup_pack_header_footer(pack_data->pack_fd, pack_data->sha1,
 				    pack_data->pack_name, object_count,
 				    cur_pack_oid.hash, pack_size);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e01f992884..662b44f97d 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -535,7 +535,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
 
-	hashclose(f, NULL, CSUM_FSYNC);
+	finalize_hashfile(f, NULL, CSUM_FSYNC);
 
 	if (adjust_shared_perm(tmp_file.buf))
 		die_errno("unable to make temporary bitmap file readable");
diff --git a/pack-write.c b/pack-write.c
index d775c7406d..044f427392 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -170,8 +170,8 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	}
 
 	hashwrite(f, sha1, the_hash_algo->rawsz);
-	hashclose(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
-			    ? CSUM_CLOSE : CSUM_FSYNC));
+	finalize_hashfile(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
+				    ? CSUM_CLOSE : CSUM_FSYNC));
 	return index_name;
 }
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 02/14] csum-file: refactor finalize_hashfile() method
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 03/14] commit-graph: add format document Derrick Stolee
                     ` (15 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we want to use a hashfile on the temporary file for a lockfile, then
we need finalize_hashfile() to fully write the trailing hash but also keep
the file descriptor open.

Do this by adding a new CSUM_HASH_IN_STREAM flag along with a functional
change that checks this flag before writing the checksum to the stream.
This differs from previous behavior since it would be written if either
CSUM_CLOSE or CSUM_FSYNC is provided.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/pack-objects.c | 4 ++--
 bulk-checkin.c         | 2 +-
 csum-file.c            | 8 ++++----
 csum-file.h            | 5 +++--
 pack-bitmap-write.c    | 2 +-
 pack-write.c           | 5 +++--
 6 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 84e9f57b7f..2b15afd932 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -837,9 +837,9 @@ static void write_pack_file(void)
 		 * If so, rewrite it like in fast-import
 		 */
 		if (pack_to_stdout) {
-			finalize_hashfile(f, oid.hash, CSUM_CLOSE);
+			finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_CLOSE);
 		} else if (nr_written == nr_remaining) {
-			finalize_hashfile(f, oid.hash, CSUM_FSYNC);
+			finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
 		} else {
 			int fd = finalize_hashfile(f, oid.hash, 0);
 			fixup_pack_header_footer(fd, oid.hash, pack_tmp_name,
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 227cc9f3b1..70b14fdf41 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -35,7 +35,7 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
 		unlink(state->pack_tmp_name);
 		goto clear_exit;
 	} else if (state->nr_written == 1) {
-		finalize_hashfile(state->f, oid.hash, CSUM_FSYNC);
+		finalize_hashfile(state->f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
 	} else {
 		int fd = finalize_hashfile(state->f, oid.hash, 0);
 		fixup_pack_header_footer(fd, oid.hash, state->pack_tmp_name,
diff --git a/csum-file.c b/csum-file.c
index e6c95a6915..53ce37f7ca 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -61,11 +61,11 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int fl
 	the_hash_algo->final_fn(f->buffer, &f->ctx);
 	if (result)
 		hashcpy(result, f->buffer);
-	if (flags & (CSUM_CLOSE | CSUM_FSYNC)) {
-		/* write checksum and close fd */
+	if (flags & CSUM_HASH_IN_STREAM)
 		flush(f, f->buffer, the_hash_algo->rawsz);
-		if (flags & CSUM_FSYNC)
-			fsync_or_die(f->fd, f->name);
+	if (flags & CSUM_FSYNC)
+		fsync_or_die(f->fd, f->name);
+	if (flags & CSUM_CLOSE) {
 		if (close(f->fd))
 			die_errno("%s: sha1 file error on close", f->name);
 		fd = 0;
diff --git a/csum-file.h b/csum-file.h
index 9ba87f0a6c..c5a2e335e7 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -27,8 +27,9 @@ extern void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *)
 extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
 
 /* finalize_hashfile flags */
-#define CSUM_CLOSE	1
-#define CSUM_FSYNC	2
+#define CSUM_CLOSE		1
+#define CSUM_FSYNC		2
+#define CSUM_HASH_IN_STREAM	4
 
 extern struct hashfile *hashfd(int fd, const char *name);
 extern struct hashfile *hashfd_check(const char *name);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 662b44f97d..db4c832428 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -535,7 +535,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
 
-	finalize_hashfile(f, NULL, CSUM_FSYNC);
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
 
 	if (adjust_shared_perm(tmp_file.buf))
 		die_errno("unable to make temporary bitmap file readable");
diff --git a/pack-write.c b/pack-write.c
index 044f427392..a9d46bc03f 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -170,8 +170,9 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	}
 
 	hashwrite(f, sha1, the_hash_algo->rawsz);
-	finalize_hashfile(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
-				    ? CSUM_CLOSE : CSUM_FSYNC));
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_CLOSE |
+				    ((opts->flags & WRITE_IDX_VERIFY)
+				    ? 0 : CSUM_FSYNC));
 	return index_name;
 }
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 03/14] commit-graph: add format document
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 04/14] graph: add commit graph design document Derrick Stolee
                     ` (14 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Add document specifying the binary format for commit graphs. This
format allows for:

* New versions.
* New hash functions and hash lengths.
* Optional extensions.

Basic header information is followed by a binary table of contents
into "chunks" that include:

* An ordered list of commit object IDs.
* A 256-entry fanout into that list of OIDs.
* A list of metadata for the commits.
* A list of "large edges" to enable octopus merges.

The format automatically includes two parent positions for every
commit. This favors speed over space, since using only one position
per commit would cause an extra level of indirection for every merge
commit. (Octopus merges suffer from this indirection, but they are
very rare.)

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph-format.txt | 98 +++++++++++++++++++++++++
 1 file changed, 98 insertions(+)
 create mode 100644 Documentation/technical/commit-graph-format.txt

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
new file mode 100644
index 0000000000..4402baa131
--- /dev/null
+++ b/Documentation/technical/commit-graph-format.txt
@@ -0,0 +1,98 @@
+Git commit graph format
+=======================
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+  generation number 1; commits with parents have generation number
+  one more than the maximum generation number of its parents. We
+  reserve zero as special, and can be used to mark a generation
+  number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+  the graph file.
+
+These positional references are stored as 32-bit integers corresponding to
+the array position withing the list of commit OIDs. We use the most-significant
+bit for special purposes, so we can store at most (1 << 31) - 1 (around 2
+billion) commits.
+
+== Commit graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks
+and hash type.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+  4-byte signature:
+      The signature is: {'C', 'G', 'P', 'H'}
+
+  1-byte version number:
+      Currently, the only valid version is 1.
+
+  1-byte Hash Version (1 = SHA-1)
+      We infer the hash length (H) from this value.
+
+  1-byte number (C) of "chunks"
+
+  1-byte (reserved for later use)
+     Current clients should ignore this value.
+
+CHUNK LOOKUP:
+
+  (C + 1) * 12 bytes listing the table of contents for the chunks:
+      First 4 bytes describe the chunk id. Value 0 is a terminating label.
+      Other 8 bytes provide the byte-offset in current file for chunk to
+      start. (Chunks are ordered contiguously in the file, so you can infer
+      the length using the next chunk position if necessary.) Each chunk
+      type appears at most once.
+
+  The remaining data in the body is described one chunk at a time, and
+  these chunks may be given in any order. Chunks are required unless
+  otherwise specified.
+
+CHUNK DATA:
+
+  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+      The ith entry, F[i], stores the number of OIDs with first
+      byte at most i. Thus F[255] stores the total
+      number of commits (N).
+
+  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+      The OIDs for all commits in the graph, sorted in ascending order.
+
+  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+    * The first H bytes are for the OID of the root tree.
+    * The next 8 bytes are for the positions of the first two parents
+      of the ith commit. Stores value 0xffffffff if no parent in that
+      position. If there are more than two parents, the second value
+      has its most-significant bit on and the other bits store an array
+      position into the Large Edge List chunk.
+    * The next 8 bytes store the generation number of the commit and
+      the commit time in seconds since EPOCH. The generation number
+      uses the higher 30 bits of the first 4 bytes, while the commit
+      time uses the 32 bits of the second 4 bytes, along with the lowest
+      2 bits of the lowest byte, storing the 33rd and 34th bit of the
+      commit time.
+
+  Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+      This list of 4-byte values store the second through nth parents for
+      all octopus merges. The second parent value in the commit data stores
+      an array position within this list along with the most-significant bit
+      on. Starting at that array position, iterate through this list of commit
+      positions for the parents until reaching a value with the most-significant
+      bit on. The other bits correspond to the position of the last parent.
+
+TRAILER:
+
+	H-byte HASH-checksum of all of the above.
+
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 04/14] graph: add commit graph design document
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (2 preceding siblings ...)
  2018-03-14 19:27   ` [PATCH v6 03/14] commit-graph: add format document Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 05/14] commit-graph: create git-commit-graph builtin Derrick Stolee
                     ` (13 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Add Documentation/technical/commit-graph.txt with details of the planned
commit graph feature, including future plans.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph.txt | 164 +++++++++++++++++++++++++++++++
 1 file changed, 164 insertions(+)
 create mode 100644 Documentation/technical/commit-graph.txt

diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
new file mode 100644
index 0000000000..d11753ac6f
--- /dev/null
+++ b/Documentation/technical/commit-graph.txt
@@ -0,0 +1,164 @@
+Git Commit Graph Design Notes
+=============================
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows. The merge
+base calculation shows up in many user-facing commands, such as 'merge-base'
+or 'status' and can take minutes to compute depending on history shape.
+
+There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to satisfy topological order constraints.
+
+The commit graph file is a supplemental data structure that accelerates
+commit graph walks. If a user downgrades or disables the 'core.commitGraph'
+config setting, then the existing ODB is sufficient. The file is stored
+as "commit-graph" either in the .git/objects/info directory or in the info
+directory of an alternate.
+
+The commit graph file stores the commit graph structure along with some
+extra metadata to speed up graph walks. By listing commit OIDs in lexi-
+cographic order, we can identify an integer position for each commit and
+refer to the parents of a commit using those integer positions. We use
+binary search to find initial commits and then use the integer positions
+for fast lookups during the walk.
+
+A consumer may load the following info for a commit from the graph:
+
+1. The commit OID.
+2. The list of parents, along with their integer position.
+3. The commit date.
+4. The root tree OID.
+5. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+Define the "generation number" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has generation number one.
+
+ * A commit with at least one parent has generation number one more than
+   the largest generation number among its parents.
+
+Equivalently, the generation number of a commit A is one more than the
+length of a longest path from A to a root commit. The recursive definition
+is easier to use for computation and observing the following property:
+
+    If A and B are commits with generation numbers N and M, respectively,
+    and N <= M, then A cannot reach B. That is, we know without searching
+    that B is not an ancestor of A because it is further from a root commit
+    than A.
+
+    Conversely, when checking if A is an ancestor of B, then we only need
+    to walk commits until all commits on the walk boundary have generation
+    number at most N. If we walk commits using a priority queue seeded by
+    generation numbers, then we always expand the boundary commit with highest
+    generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+    If A and B are commits with commit time X and Y, respectively, and
+    X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation is allowed to
+violate topological relationships due to clock skew (such as "git log"
+with default order), but is not used when the topological order is
+required (such as merge base calculations, "git log --graph").
+
+In practice, we expect some commits to be created recently and not stored
+in the commit graph. We can treat these commits as having "infinite"
+generation number and walk until reaching commits with known generation
+number.
+
+Design Details
+--------------
+
+- The commit graph file is stored in a file named 'commit-graph' in the
+  .git/objects/info directory. This could be stored in the info directory
+  of an alternate.
+
+- The core.commitGraph config setting must be on to consume graph files.
+
+- The file format includes parameters for the object ID hash function,
+  so a future change of hash algorithm does not require a change in format.
+
+Future Work
+-----------
+
+- The commit graph feature currently does not honor commit grafts. This can
+  be remedied by duplicating or refactoring the current graft logic.
+
+- The 'commit-graph' subcommand does not have a "verify" mode that is
+  necessary for integration with fsck.
+
+- The file format includes room for precomputed generation numbers. These
+  are not currently computed, so all generation numbers will be marked as
+  0 (or "uncomputed"). A later patch will include this calculation.
+
+- After computing and storing generation numbers, we must make graph
+  walks aware of generation numbers to gain the performance benefits they
+  enable. This will mostly be accomplished by swapping a commit-date-ordered
+  priority queue with one ordered by generation number. The following
+  operations are important candidates:
+
+    - paint_down_to_common()
+    - 'log --topo-order'
+
+- Currently, parse_commit_gently() requires filling in the root tree
+  object for a commit. This passes through lookup_tree() and consequently
+  lookup_object(). Also, it calls lookup_commit() when loading the parents.
+  These method calls check the ODB for object existence, even if the
+  consumer does not need the content. For example, we do not need the
+  tree contents when computing merge bases. Now that commit parsing is
+  removed from the computation time, these lookup operations are the
+  slowest operations keeping graph walks from being fast. Consider
+  loading these objects without verifying their existence in the ODB and
+  only loading them fully when consumers need them. Consider a method
+  such as "ensure_tree_loaded(commit)" that fully loads a tree before
+  using commit->tree.
+
+- The current design uses the 'commit-graph' subcommand to generate the graph.
+  When this feature stabilizes enough to recommend to most users, we should
+  add automatic graph writes to common operations that create many commits.
+  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
+  commands.
+
+- A server could provide a commit graph file as part of the network protocol
+  to avoid extra calculations by clients. This feature is only of benefit if
+  the user is willing to trust the file, because verifying the file is correct
+  is as hard as computing it from scratch.
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=8
+    Chromium work item for: Serialized Commit Graph
+
+[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/
+    An abandoned patch that introduced generation numbers.
+
+[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
+    Discussion about generation numbers on commits and how they interact
+    with fsck.
+
+[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/
+    More discussion about generation numbers and not storing them inside
+    commit objects. A valuable quote:
+
+    "I think we should be moving more in the direction of keeping
+     repo-local caches for optimizations. Reachability bitmaps have been
+     a big performance win. I think we should be doing the same with our
+     properties of commits. Not just generation numbers, but making it
+     cheap to access the graph structure without zlib-inflating whole
+     commit objects (i.e., packv4 or something like the "metapacks" I
+     proposed a few years ago)."
+
+[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
+    A patch to remove the ahead-behind calculation from 'status'.
+
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 05/14] commit-graph: create git-commit-graph builtin
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (3 preceding siblings ...)
  2018-03-14 19:27   ` [PATCH v6 04/14] graph: add commit graph design document Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 06/14] commit-graph: implement write_commit_graph() Derrick Stolee
                     ` (12 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git the 'commit-graph' builtin that will be used for writing and
reading packed graph files. The current implementation is mostly
empty, except for an '--object-dir' option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                             |  1 +
 Documentation/git-commit-graph.txt     | 11 ++++++++++
 Makefile                               |  1 +
 builtin.h                              |  1 +
 builtin/commit-graph.c                 | 37 ++++++++++++++++++++++++++++++++++
 command-list.txt                       |  1 +
 contrib/completion/git-completion.bash |  2 ++
 git.c                                  |  1 +
 8 files changed, 55 insertions(+)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 builtin/commit-graph.c

diff --git a/.gitignore b/.gitignore
index 833ef3b0b7..e82f90184d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -34,6 +34,7 @@
 /git-clone
 /git-column
 /git-commit
+/git-commit-graph
 /git-commit-tree
 /git-config
 /git-count-objects
diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
new file mode 100644
index 0000000000..5913340fad
--- /dev/null
+++ b/Documentation/git-commit-graph.txt
@@ -0,0 +1,11 @@
+git-commit-graph(1)
+===================
+
+NAME
+----
+git-commit-graph - Write and verify Git commit graph files
+
+GIT
+---
+Part of the linkgit:git[1] suite
+
diff --git a/Makefile b/Makefile
index de4b8f0c02..a928d4de66 100644
--- a/Makefile
+++ b/Makefile
@@ -946,6 +946,7 @@ BUILTIN_OBJS += builtin/clone.o
 BUILTIN_OBJS += builtin/column.o
 BUILTIN_OBJS += builtin/commit-tree.o
 BUILTIN_OBJS += builtin/commit.o
+BUILTIN_OBJS += builtin/commit-graph.o
 BUILTIN_OBJS += builtin/config.o
 BUILTIN_OBJS += builtin/count-objects.o
 BUILTIN_OBJS += builtin/credential.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa4..079855b6d4 100644
--- a/builtin.h
+++ b/builtin.h
@@ -149,6 +149,7 @@ extern int cmd_clone(int argc, const char **argv, const char *prefix);
 extern int cmd_clean(int argc, const char **argv, const char *prefix);
 extern int cmd_column(int argc, const char **argv, const char *prefix);
 extern int cmd_commit(int argc, const char **argv, const char *prefix);
+extern int cmd_commit_graph(int argc, const char **argv, const char *prefix);
 extern int cmd_commit_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_config(int argc, const char **argv, const char *prefix);
 extern int cmd_count_objects(int argc, const char **argv, const char *prefix);
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
new file mode 100644
index 0000000000..8ff7336527
--- /dev/null
+++ b/builtin/commit-graph.c
@@ -0,0 +1,37 @@
+#include "builtin.h"
+#include "config.h"
+#include "parse-options.h"
+
+static char const * const builtin_commit_graph_usage[] = {
+	N_("git commit-graph [--object-dir <objdir>]"),
+	NULL
+};
+
+static struct opts_commit_graph {
+	const char *obj_dir;
+} opts;
+
+
+int cmd_commit_graph(int argc, const char **argv, const char *prefix)
+{
+	static struct option builtin_commit_graph_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			N_("dir"),
+			N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(builtin_commit_graph_usage,
+				   builtin_commit_graph_options);
+
+	git_config(git_default_config, NULL);
+	argc = parse_options(argc, argv, prefix,
+			     builtin_commit_graph_options,
+			     builtin_commit_graph_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	usage_with_options(builtin_commit_graph_usage,
+			   builtin_commit_graph_options);
+}
+
diff --git a/command-list.txt b/command-list.txt
index a1fad28fd8..835c5890be 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -34,6 +34,7 @@ git-clean                               mainporcelain
 git-clone                               mainporcelain           init
 git-column                              purehelpers
 git-commit                              mainporcelain           history
+git-commit-graph                        plumbingmanipulators
 git-commit-tree                         plumbingmanipulators
 git-config                              ancillarymanipulators
 git-count-objects                       ancillaryinterrogators
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 91536d831c..a24af902d8 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -841,6 +841,7 @@ __git_list_porcelain_commands ()
 		check-ref-format) : plumbing;;
 		checkout-index)   : plumbing;;
 		column)           : internal helper;;
+		commit-graph)     : plumbing;;
 		commit-tree)      : plumbing;;
 		count-objects)    : infrequent;;
 		credential)       : credentials;;
@@ -2419,6 +2420,7 @@ _git_config ()
 		core.bigFileThreshold
 		core.checkStat
 		core.commentChar
+		core.commitGraph
 		core.compression
 		core.createObject
 		core.deltaBaseCacheLimit
diff --git a/git.c b/git.c
index 96cd734f12..ea777f5d6a 100644
--- a/git.c
+++ b/git.c
@@ -388,6 +388,7 @@ static struct cmd_struct commands[] = {
 	{ "clone", cmd_clone },
 	{ "column", cmd_column, RUN_SETUP_GENTLY },
 	{ "commit", cmd_commit, RUN_SETUP | NEED_WORK_TREE },
+	{ "commit-graph", cmd_commit_graph, RUN_SETUP },
 	{ "commit-tree", cmd_commit_tree, RUN_SETUP },
 	{ "config", cmd_config, RUN_SETUP_GENTLY },
 	{ "count-objects", cmd_count_objects, RUN_SETUP },
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 06/14] commit-graph: implement write_commit_graph()
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (4 preceding siblings ...)
  2018-03-14 19:27   ` [PATCH v6 05/14] commit-graph: create git-commit-graph builtin Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 07/14] commit-graph: implement 'git-commit-graph write' Derrick Stolee
                     ` (11 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach Git to write a commit graph file by checking all packed objects
to see if they are commits, then store the file in the given object
directory.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Makefile       |   1 +
 commit-graph.c | 359 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 commit-graph.h |   7 ++
 3 files changed, 367 insertions(+)
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h

diff --git a/Makefile b/Makefile
index a928d4de66..49492c3e1c 100644
--- a/Makefile
+++ b/Makefile
@@ -771,6 +771,7 @@ LIB_OBJS += color.o
 LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
 LIB_OBJS += commit.o
+LIB_OBJS += commit-graph.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
diff --git a/commit-graph.c b/commit-graph.c
new file mode 100644
index 0000000000..9bef691d9b
--- /dev/null
+++ b/commit-graph.c
@@ -0,0 +1,359 @@
+#include "cache.h"
+#include "config.h"
+#include "git-compat-util.h"
+#include "lockfile.h"
+#include "pack.h"
+#include "packfile.h"
+#include "commit.h"
+#include "object.h"
+#include "revision.h"
+#include "sha1-lookup.h"
+#include "commit-graph.h"
+
+#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
+
+#define GRAPH_DATA_WIDTH 36
+
+#define GRAPH_VERSION_1 0x1
+#define GRAPH_VERSION GRAPH_VERSION_1
+
+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
+#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
+#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
+
+#define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000
+#define GRAPH_PARENT_MISSING 0x7fffffff
+#define GRAPH_EDGE_LAST_MASK 0x7fffffff
+#define GRAPH_PARENT_NONE 0x70000000
+
+#define GRAPH_LAST_EDGE 0x80000000
+
+#define GRAPH_FANOUT_SIZE (4 * 256)
+#define GRAPH_CHUNKLOOKUP_WIDTH 12
+#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
+			GRAPH_OID_LEN + 8)
+
+
+static char *get_commit_graph_filename(const char *obj_dir)
+{
+	return xstrfmt("%s/info/commit-graph", obj_dir);
+}
+
+static void write_graph_chunk_fanout(struct hashfile *f,
+				     struct commit **commits,
+				     int nr_commits)
+{
+	int i, count = 0;
+	struct commit **list = commits;
+
+	/*
+	 * Write the first-level table (the list is sorted,
+	 * but we use a 256-entry lookup to be able to avoid
+	 * having to do eight extra binary search iterations).
+	 */
+	for (i = 0; i < 256; i++) {
+		while (count < nr_commits) {
+			if ((*list)->object.oid.hash[0] != i)
+				break;
+			count++;
+			list++;
+		}
+
+		hashwrite_be32(f, count);
+	}
+}
+
+static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
+				   struct commit **commits, int nr_commits)
+{
+	struct commit **list = commits;
+	int count;
+	for (count = 0; count < nr_commits; count++, list++)
+		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+}
+
+static const unsigned char *commit_to_sha1(size_t index, void *table)
+{
+	struct commit **commits = table;
+	return commits[index]->object.oid.hash;
+}
+
+static void write_graph_chunk_data(struct hashfile *f, int hash_len,
+				   struct commit **commits, int nr_commits)
+{
+	struct commit **list = commits;
+	struct commit **last = commits + nr_commits;
+	uint32_t num_extra_edges = 0;
+
+	while (list < last) {
+		struct commit_list *parent;
+		int edge_value;
+		uint32_t packedDate[2];
+
+		parse_commit(*list);
+		hashwrite(f, (*list)->tree->object.oid.hash, hash_len);
+
+		parent = (*list)->parents;
+
+		if (!parent)
+			edge_value = GRAPH_PARENT_NONE;
+		else {
+			edge_value = sha1_pos(parent->item->object.oid.hash,
+					      commits,
+					      nr_commits,
+					      commit_to_sha1);
+
+			if (edge_value < 0)
+				edge_value = GRAPH_PARENT_MISSING;
+		}
+
+		hashwrite_be32(f, edge_value);
+
+		if (parent)
+			parent = parent->next;
+
+		if (!parent)
+			edge_value = GRAPH_PARENT_NONE;
+		else if (parent->next)
+			edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_extra_edges;
+		else {
+			edge_value = sha1_pos(parent->item->object.oid.hash,
+					      commits,
+					      nr_commits,
+					      commit_to_sha1);
+			if (edge_value < 0)
+				edge_value = GRAPH_PARENT_MISSING;
+		}
+
+		hashwrite_be32(f, edge_value);
+
+		if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
+			do {
+				num_extra_edges++;
+				parent = parent->next;
+			} while (parent);
+		}
+
+		if (sizeof((*list)->date) > 4)
+			packedDate[0] = htonl(((*list)->date >> 32) & 0x3);
+		else
+			packedDate[0] = 0;
+
+		packedDate[1] = htonl((*list)->date);
+		hashwrite(f, packedDate, 8);
+
+		list++;
+	}
+}
+
+static void write_graph_chunk_large_edges(struct hashfile *f,
+					  struct commit **commits,
+					  int nr_commits)
+{
+	struct commit **list = commits;
+	struct commit **last = commits + nr_commits;
+	struct commit_list *parent;
+
+	while (list < last) {
+		int num_parents = 0;
+		for (parent = (*list)->parents; num_parents < 3 && parent;
+		     parent = parent->next)
+			num_parents++;
+
+		if (num_parents <= 2) {
+			list++;
+			continue;
+		}
+
+		/* Since num_parents > 2, this initializer is safe. */
+		for (parent = (*list)->parents->next; parent; parent = parent->next) {
+			int edge_value = sha1_pos(parent->item->object.oid.hash,
+						  commits,
+						  nr_commits,
+						  commit_to_sha1);
+
+			if (edge_value < 0)
+				edge_value = GRAPH_PARENT_MISSING;
+			else if (!parent->next)
+				edge_value |= GRAPH_LAST_EDGE;
+
+			hashwrite_be32(f, edge_value);
+		}
+
+		list++;
+	}
+}
+
+static int commit_compare(const void *_a, const void *_b)
+{
+	const struct object_id *a = (const struct object_id *)_a;
+	const struct object_id *b = (const struct object_id *)_b;
+	return oidcmp(a, b);
+}
+
+struct packed_commit_list {
+	struct commit **list;
+	int nr;
+	int alloc;
+};
+
+struct packed_oid_list {
+	struct object_id *list;
+	int nr;
+	int alloc;
+};
+
+static int add_packed_commits(const struct object_id *oid,
+			      struct packed_git *pack,
+			      uint32_t pos,
+			      void *data)
+{
+	struct packed_oid_list *list = (struct packed_oid_list*)data;
+	enum object_type type;
+	unsigned long size;
+	void *inner_data;
+	off_t offset = nth_packed_object_offset(pack, pos);
+	inner_data = unpack_entry(pack, offset, &type, &size);
+	FREE_AND_NULL(inner_data);
+
+	if (type != OBJ_COMMIT)
+		return 0;
+
+	ALLOC_GROW(list->list, list->nr + 1, list->alloc);
+	oidcpy(&(list->list[list->nr]), oid);
+	(list->nr)++;
+
+	return 0;
+}
+
+void write_commit_graph(const char *obj_dir)
+{
+	struct packed_oid_list oids;
+	struct packed_commit_list commits;
+	struct hashfile *f;
+	uint32_t i, count_distinct = 0;
+	char *graph_name;
+	int fd;
+	struct lock_file lk = LOCK_INIT;
+	uint32_t chunk_ids[5];
+	uint64_t chunk_offsets[5];
+	int num_chunks;
+	int num_extra_edges;
+	struct commit_list *parent;
+
+	oids.nr = 0;
+	oids.alloc = approximate_object_count() / 4;
+
+	if (oids.alloc < 1024)
+		oids.alloc = 1024;
+	ALLOC_ARRAY(oids.list, oids.alloc);
+
+	for_each_packed_object(add_packed_commits, &oids, 0);
+
+	QSORT(oids.list, oids.nr, commit_compare);
+
+	count_distinct = 1;
+	for (i = 1; i < oids.nr; i++) {
+		if (oidcmp(&oids.list[i-1], &oids.list[i]))
+			count_distinct++;
+	}
+
+	if (count_distinct >= GRAPH_PARENT_MISSING)
+		die(_("the commit graph format cannot write %d commits"), count_distinct);
+
+	commits.nr = 0;
+	commits.alloc = count_distinct;
+	ALLOC_ARRAY(commits.list, commits.alloc);
+
+	num_extra_edges = 0;
+	for (i = 0; i < oids.nr; i++) {
+		int num_parents = 0;
+		if (i > 0 && !oidcmp(&oids.list[i-1], &oids.list[i]))
+			continue;
+
+		commits.list[commits.nr] = lookup_commit(&oids.list[i]);
+		parse_commit(commits.list[commits.nr]);
+
+		for (parent = commits.list[commits.nr]->parents;
+		     parent; parent = parent->next)
+			num_parents++;
+
+		if (num_parents > 2)
+			num_extra_edges += num_parents - 1;
+
+		commits.nr++;
+	}
+	num_chunks = num_extra_edges ? 4 : 3;
+
+	if (commits.nr >= GRAPH_PARENT_MISSING)
+		die(_("too many commits to write graph"));
+
+	graph_name = get_commit_graph_filename(obj_dir);
+ 	fd = hold_lock_file_for_update(&lk, graph_name, 0);
+
+	if (fd < 0) {
+		struct strbuf folder = STRBUF_INIT;
+		strbuf_addstr(&folder, graph_name);
+		strbuf_setlen(&folder, strrchr(folder.buf, '/') - folder.buf);
+
+		if (mkdir(folder.buf, 0777) < 0)
+			die_errno(_("cannot mkdir %s"), folder.buf);
+		strbuf_release(&folder);
+
+ 		fd = hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
+
+		if (fd < 0)
+			die_errno("unable to create '%s'", graph_name);
+	}
+
+	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+
+	hashwrite_be32(f, GRAPH_SIGNATURE);
+
+	hashwrite_u8(f, GRAPH_VERSION);
+	hashwrite_u8(f, GRAPH_OID_VERSION);
+	hashwrite_u8(f, num_chunks);
+	hashwrite_u8(f, 0); /* unused padding byte */
+
+	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
+	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
+	chunk_ids[2] = GRAPH_CHUNKID_DATA;
+	if (num_extra_edges)
+		chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
+	else
+		chunk_ids[3] = 0;
+	chunk_ids[4] = 0;
+
+	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
+	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
+	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
+	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+
+	for (i = 0; i <= num_chunks; i++) {
+		uint32_t chunk_write[3];
+
+		chunk_write[0] = htonl(chunk_ids[i]);
+		chunk_write[1] = htonl(chunk_offsets[i] >> 32);
+		chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
+		hashwrite(f, chunk_write, 12);
+	}
+
+	write_graph_chunk_fanout(f, commits.list, commits.nr);
+	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	write_graph_chunk_large_edges(f, commits.list, commits.nr);
+
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
+	commit_lock_file(&lk);
+
+	free(oids.list);
+	oids.alloc = 0;
+	oids.nr = 0;
+}
+
diff --git a/commit-graph.h b/commit-graph.h
new file mode 100644
index 0000000000..4cb3f12d33
--- /dev/null
+++ b/commit-graph.h
@@ -0,0 +1,7 @@
+#ifndef COMMIT_GRAPH_H
+#define COMMIT_GRAPH_H
+
+void write_commit_graph(const char *obj_dir);
+
+#endif
+
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 07/14] commit-graph: implement 'git-commit-graph write'
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (5 preceding siblings ...)
  2018-03-14 19:27   ` [PATCH v6 06/14] commit-graph: implement write_commit_graph() Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-18 13:25     ` Ævar Arnfjörð Bjarmason
  2018-03-14 19:27   ` [PATCH v6 08/14] commit-graph: implement git commit-graph read Derrick Stolee
                     ` (10 subsequent siblings)
  17 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to write graph files. Create new test script to verify
this command succeeds without failure.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  39 ++++++++++++
 builtin/commit-graph.c             |  33 ++++++++++
 t/t5318-commit-graph.sh            | 125 +++++++++++++++++++++++++++++++++++++
 3 files changed, 197 insertions(+)
 create mode 100755 t/t5318-commit-graph.sh

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 5913340fad..e688843808 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -5,6 +5,45 @@ NAME
 ----
 git-commit-graph - Write and verify Git commit graph files
 
+
+SYNOPSIS
+--------
+[verse]
+'git commit-graph write' <options> [--object-dir <dir>]
+
+
+DESCRIPTION
+-----------
+
+Manage the serialized commit graph file.
+
+
+OPTIONS
+-------
+--object-dir::
+	Use given directory for the location of packfiles and commit graph
+	file. The commit graph file is expected to be at <dir>/info/commit-graph
+	and the packfiles are expected to be in <dir>/pack.
+
+
+COMMANDS
+--------
+'write'::
+
+Write a commit graph file based on the commits found in packfiles.
+Includes all commits from the existing commit graph file.
+
+
+EXAMPLES
+--------
+
+* Write a commit graph file for the packed commits in your local .git folder.
++
+------------------------------------------------
+$ git commit-graph write
+------------------------------------------------
+
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 8ff7336527..a9d61f649a 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -1,9 +1,18 @@
 #include "builtin.h"
 #include "config.h"
+#include "dir.h"
+#include "lockfile.h"
 #include "parse-options.h"
+#include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
+	N_("git commit-graph write [--object-dir <objdir>]"),
+	NULL
+};
+
+static const char * const builtin_commit_graph_write_usage[] = {
+	N_("git commit-graph write [--object-dir <objdir>]"),
 	NULL
 };
 
@@ -11,6 +20,25 @@ static struct opts_commit_graph {
 	const char *obj_dir;
 } opts;
 
+static int graph_write(int argc, const char **argv)
+{
+	static struct option builtin_commit_graph_write_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			N_("dir"),
+			N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_commit_graph_write_options,
+			     builtin_commit_graph_write_usage, 0);
+
+	if (!opts.obj_dir)
+		opts.obj_dir = get_object_directory();
+
+	write_commit_graph(opts.obj_dir);
+	return 0;
+}
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 {
@@ -31,6 +59,11 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 			     builtin_commit_graph_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
+	if (argc > 0) {
+		if (!strcmp(argv[0], "write"))
+			return graph_write(argc, argv);
+	}
+
 	usage_with_options(builtin_commit_graph_usage,
 			   builtin_commit_graph_options);
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
new file mode 100755
index 0000000000..43707ce5bb
--- /dev/null
+++ b/t/t5318-commit-graph.sh
@@ -0,0 +1,125 @@
+#!/bin/sh
+
+test_description='commit graph'
+. ./test-lib.sh
+
+test_expect_success 'setup full repo' '
+	mkdir full &&
+	cd "$TRASH_DIRECTORY/full" &&
+	git init &&
+	objdir=".git/objects"
+'
+
+test_expect_success 'write graph with no packs' '
+        cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write --object-dir . &&
+	test_path_is_file info/commit-graph
+'
+
+test_expect_success 'create commits and repack' '
+        cd "$TRASH_DIRECTORY/full" &&
+	for i in $(test_seq 3)
+	do
+		test_commit $i &&
+		git branch commits/$i
+	done &&
+	git repack
+'
+
+test_expect_success 'write graph' '
+        cd "$TRASH_DIRECTORY/full" &&
+	graph1=$(git commit-graph write) &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'Add more commits' '
+        cd "$TRASH_DIRECTORY/full" &&
+	git reset --hard commits/1 &&
+	for i in $(test_seq 4 5)
+	do
+		test_commit $i &&
+		git branch commits/$i
+	done &&
+	git reset --hard commits/2 &&
+	for i in $(test_seq 6 7)
+	do
+		test_commit $i &&
+		git branch commits/$i
+	done &&
+	git reset --hard commits/2 &&
+	git merge commits/4 &&
+	git branch merge/1 &&
+	git reset --hard commits/4 &&
+	git merge commits/6 &&
+	git branch merge/2 &&
+	git reset --hard commits/3 &&
+	git merge commits/5 commits/7 &&
+	git branch merge/3 &&
+	git repack
+'
+
+# Current graph structure:
+#
+#   __M3___
+#  /   |   \
+# 3 M1 5 M2 7
+# |/  \|/  \|
+# 2    4    6
+# |___/____/
+# 1
+
+
+test_expect_success 'write graph with merges' '
+        cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'Add one more commit' '
+        cd "$TRASH_DIRECTORY/full" &&
+	test_commit 8 &&
+	git branch commits/8 &&
+	ls $objdir/pack | grep idx >existing-idx &&
+	git repack &&
+	ls $objdir/pack| grep idx | grep -v --file=existing-idx >new-idx
+'
+
+# Current graph structure:
+#
+#      8
+#      |
+#   __M3___
+#  /   |   \
+# 3 M1 5 M2 7
+# |/  \|/  \|
+# 2    4    6
+# |___/____/
+# 1
+
+test_expect_success 'write graph with new commit' '
+        cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'write graph with nothing new' '
+        cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'setup bare repo' '
+        cd "$TRASH_DIRECTORY" &&
+	git clone --bare --no-local full bare &&
+	cd bare &&
+	baredir="./objects"
+'
+
+test_expect_success 'write graph in bare repo' '
+        cd "$TRASH_DIRECTORY/bare" &&
+	git commit-graph write &&
+	test_path_is_file $baredir/info/commit-graph
+'
+
+test_done
+
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 08/14] commit-graph: implement git commit-graph read
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (6 preceding siblings ...)
  2018-03-14 19:27   ` [PATCH v6 07/14] commit-graph: implement 'git-commit-graph write' Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
                     ` (9 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to read commit graph files and summarize their contents.

Use the read subcommand to verify the contents of a commit graph file in the
tests.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  12 ++++
 builtin/commit-graph.c             |  56 +++++++++++++++
 commit-graph.c                     | 140 ++++++++++++++++++++++++++++++++++++-
 commit-graph.h                     |  23 ++++++
 t/t5318-commit-graph.sh            |  32 +++++++--
 5 files changed, 257 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index e688843808..51cb038f3d 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -9,6 +9,7 @@ git-commit-graph - Write and verify Git commit graph files
 SYNOPSIS
 --------
 [verse]
+'git commit-graph read' <options> [--object-dir <dir>]
 'git commit-graph write' <options> [--object-dir <dir>]
 
 
@@ -33,6 +34,11 @@ COMMANDS
 Write a commit graph file based on the commits found in packfiles.
 Includes all commits from the existing commit graph file.
 
+'read'::
+
+Read a graph file given by the commit-graph file and output basic
+details about the graph file. Used for debugging purposes.
+
 
 EXAMPLES
 --------
@@ -43,6 +49,12 @@ EXAMPLES
 $ git commit-graph write
 ------------------------------------------------
 
+* Read basic information from the commit-graph file.
++
+------------------------------------------------
+$ git commit-graph read
+------------------------------------------------
+
 
 GIT
 ---
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index a9d61f649a..0e164becff 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -7,10 +7,16 @@
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
+	N_("git commit-graph read [--object-dir <objdir>]"),
 	N_("git commit-graph write [--object-dir <objdir>]"),
 	NULL
 };
 
+static const char * const builtin_commit_graph_read_usage[] = {
+	N_("git commit-graph read [--object-dir <objdir>]"),
+	NULL
+};
+
 static const char * const builtin_commit_graph_write_usage[] = {
 	N_("git commit-graph write [--object-dir <objdir>]"),
 	NULL
@@ -20,6 +26,54 @@ static struct opts_commit_graph {
 	const char *obj_dir;
 } opts;
 
+static int graph_read(int argc, const char **argv)
+{
+	struct commit_graph *graph = 0;
+	char *graph_name;
+
+	static struct option builtin_commit_graph_read_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			N_("dir"),
+			N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_commit_graph_read_options,
+			     builtin_commit_graph_read_usage, 0);
+
+	if (!opts.obj_dir)
+		opts.obj_dir = get_object_directory();
+
+	graph_name = get_commit_graph_filename(opts.obj_dir);
+	graph = load_commit_graph_one(graph_name);
+
+	if (!graph)
+		die("graph file %s does not exist", graph_name);
+	FREE_AND_NULL(graph_name);
+
+	printf("header: %08x %d %d %d %d\n",
+		ntohl(*(uint32_t*)graph->data),
+		*(unsigned char*)(graph->data + 4),
+		*(unsigned char*)(graph->data + 5),
+		*(unsigned char*)(graph->data + 6),
+		*(unsigned char*)(graph->data + 7));
+	printf("num_commits: %u\n", graph->num_commits);
+	printf("chunks:");
+
+	if (graph->chunk_oid_fanout)
+		printf(" oid_fanout");
+	if (graph->chunk_oid_lookup)
+		printf(" oid_lookup");
+	if (graph->chunk_commit_data)
+		printf(" commit_metadata");
+	if (graph->chunk_large_edges)
+		printf(" large_edges");
+	printf("\n");
+
+	return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
 	static struct option builtin_commit_graph_write_options[] = {
@@ -60,6 +114,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	if (argc > 0) {
+		if (!strcmp(argv[0], "read"))
+			return graph_read(argc, argv);
 		if (!strcmp(argv[0], "write"))
 			return graph_write(argc, argv);
 	}
diff --git a/commit-graph.c b/commit-graph.c
index 9bef691d9b..2f2e2c7083 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -39,11 +39,149 @@
 			GRAPH_OID_LEN + 8)
 
 
-static char *get_commit_graph_filename(const char *obj_dir)
+char *get_commit_graph_filename(const char *obj_dir)
 {
 	return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static struct commit_graph *alloc_commit_graph(void)
+{
+	struct commit_graph *g = xmalloc(sizeof(*g));
+	memset(g, 0, sizeof(*g));
+	g->graph_fd = -1;
+
+	return g;
+}
+
+struct commit_graph *load_commit_graph_one(const char *graph_file)
+{
+	void *graph_map;
+	const unsigned char *data, *chunk_lookup;
+	size_t graph_size;
+	struct stat st;
+	uint32_t i;
+	struct commit_graph *graph;
+	int fd = git_open(graph_file);
+	uint64_t last_chunk_offset;
+	uint32_t last_chunk_id;
+	uint32_t graph_signature;
+	unsigned char graph_version, hash_version;
+
+	if (fd < 0)
+		return NULL;
+	if (fstat(fd, &st)) {
+		close(fd);
+		return NULL;
+	}
+	graph_size = xsize_t(st.st_size);
+
+	if (graph_size < GRAPH_MIN_SIZE) {
+		close(fd);
+		die("graph file %s is too small", graph_file);
+	}
+	graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	data = (const unsigned char *)graph_map;
+
+	graph_signature = ntohl(*(uint32_t*)data);
+	if (graph_signature != GRAPH_SIGNATURE) {
+		error("graph signature %X does not match signature %X",
+		      graph_signature, GRAPH_SIGNATURE);
+		goto cleanup_fail;
+	}
+
+	graph_version = *(unsigned char*)(data + 4);
+	if (graph_version != GRAPH_VERSION) {
+		error("graph version %X does not match version %X",
+		      graph_version, GRAPH_VERSION);
+		goto cleanup_fail;
+	}
+
+	hash_version = *(unsigned char*)(data + 5);
+	if (hash_version != GRAPH_OID_VERSION) {
+		error("hash version %X does not match version %X",
+		      hash_version, GRAPH_OID_VERSION);
+		goto cleanup_fail;
+	}
+
+	graph = alloc_commit_graph();
+
+	graph->hash_len = GRAPH_OID_LEN;
+	graph->num_chunks = *(unsigned char*)(data + 6);
+	graph->graph_fd = fd;
+	graph->data = graph_map;
+	graph->data_len = graph_size;
+
+	last_chunk_id = 0;
+	last_chunk_offset = 8;
+	chunk_lookup = data + 8;
+	for (i = 0; i < graph->num_chunks; i++) {
+		uint32_t chunk_id = get_be32(chunk_lookup + 0);
+		uint64_t chunk_offset1 = get_be32(chunk_lookup + 4);
+		uint32_t chunk_offset2 = get_be32(chunk_lookup + 8);
+		uint64_t chunk_offset = (chunk_offset1 << 32) | chunk_offset2;
+		int chunk_repeated = 0;
+
+		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
+
+		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+			error("improper chunk offset %08x%08x", (uint32_t)(chunk_offset >> 32),
+			      (uint32_t)chunk_offset);
+			goto cleanup_fail;
+		}
+
+		switch (chunk_id) {
+		case GRAPH_CHUNKID_OIDFANOUT:
+			if (graph->chunk_oid_fanout)
+				chunk_repeated = 1;
+			else
+				graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);
+			break;
+
+		case GRAPH_CHUNKID_OIDLOOKUP:
+			if (graph->chunk_oid_lookup)
+				chunk_repeated = 1;
+			else
+				graph->chunk_oid_lookup = data + chunk_offset;
+			break;
+
+		case GRAPH_CHUNKID_DATA:
+			if (graph->chunk_commit_data)
+				chunk_repeated = 1;
+			else
+				graph->chunk_commit_data = data + chunk_offset;
+			break;
+
+		case GRAPH_CHUNKID_LARGEEDGES:
+			if (graph->chunk_large_edges)
+				chunk_repeated = 1;
+			else
+				graph->chunk_large_edges = data + chunk_offset;
+			break;
+		}
+
+		if (chunk_repeated) {
+			error("chunk id %08x appears multiple times", chunk_id);
+			goto cleanup_fail;
+		}
+
+		if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
+		{
+			graph->num_commits = (chunk_offset - last_chunk_offset)
+					     / graph->hash_len;
+		}
+
+		last_chunk_id = chunk_id;
+		last_chunk_offset = chunk_offset;
+	}
+
+	return graph;
+
+cleanup_fail:
+	munmap(graph_map, graph_size);
+	close(fd);
+	exit(1);
+}
+
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
 				     int nr_commits)
diff --git a/commit-graph.h b/commit-graph.h
index 4cb3f12d33..8b4b0f9f04 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -1,6 +1,29 @@
 #ifndef COMMIT_GRAPH_H
 #define COMMIT_GRAPH_H
 
+#include "git-compat-util.h"
+
+char *get_commit_graph_filename(const char *obj_dir);
+
+struct commit_graph {
+	int graph_fd;
+
+	const unsigned char *data;
+	size_t data_len;
+
+	unsigned char hash_len;
+	unsigned char num_chunks;
+	uint32_t num_commits;
+	struct object_id oid;
+
+	const uint32_t *chunk_oid_fanout;
+	const unsigned char *chunk_oid_lookup;
+	const unsigned char *chunk_commit_data;
+	const unsigned char *chunk_large_edges;
+};
+
+struct commit_graph *load_commit_graph_one(const char *graph_file);
+
 void write_commit_graph(const char *obj_dir);
 
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 43707ce5bb..03b75882a0 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -26,10 +26,28 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
+graph_read_expect() {
+	OPTIONAL=""
+	NUM_CHUNKS=3
+	if test ! -z $2
+	then
+		OPTIONAL=" $2"
+		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
+	fi
+	cat >expect <<- EOF
+	header: 43475048 1 1 $NUM_CHUNKS 0
+	num_commits: $1
+	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
+	EOF
+	git commit-graph read >output &&
+	test_cmp expect output
+}
+
 test_expect_success 'write graph' '
         cd "$TRASH_DIRECTORY/full" &&
 	graph1=$(git commit-graph write) &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "3"
 '
 
 test_expect_success 'Add more commits' '
@@ -72,7 +90,8 @@ test_expect_success 'Add more commits' '
 test_expect_success 'write graph with merges' '
         cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "10" "large_edges"
 '
 
 test_expect_success 'Add one more commit' '
@@ -99,13 +118,15 @@ test_expect_success 'Add one more commit' '
 test_expect_success 'write graph with new commit' '
         cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
 '
 
 test_expect_success 'write graph with nothing new' '
         cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
 '
 
 test_expect_success 'setup bare repo' '
@@ -118,7 +139,8 @@ test_expect_success 'setup bare repo' '
 test_expect_success 'write graph in bare repo' '
         cd "$TRASH_DIRECTORY/bare" &&
 	git commit-graph write &&
-	test_path_is_file $baredir/info/commit-graph
+	test_path_is_file $baredir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
 '
 
 test_done
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 09/14] commit-graph: add core.commitGraph setting
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (7 preceding siblings ...)
  2018-03-14 19:27   ` [PATCH v6 08/14] commit-graph: implement git commit-graph read Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 10/14] commit-graph: close under reachability Derrick Stolee
                     ` (8 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The commit graph feature is controlled by the new core.commitGraph config
setting. This defaults to 0, so the feature is opt-in.

The intention of core.commitGraph is that a user can always stop checking
for or parsing commit graph files if core.commitGraph=0.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt | 3 +++
 cache.h                  | 1 +
 config.c                 | 5 +++++
 environment.c            | 1 +
 4 files changed, 10 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ce9102cea8..9e3da629b8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -898,6 +898,9 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
+core.commitGraph::
+	Enable git commit graph feature. Allows reading from .graph files.
+
 core.sparseCheckout::
 	Enable "sparse checkout" feature. See section "Sparse checkout" in
 	linkgit:git-read-tree[1] for more information.
diff --git a/cache.h b/cache.h
index d06932ed0b..e62569fbb1 100644
--- a/cache.h
+++ b/cache.h
@@ -801,6 +801,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_commit_graph;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index b0c20e6cb8..25ee4a676c 100644
--- a/config.c
+++ b/config.c
@@ -1226,6 +1226,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.commitgraph")) {
+		core_commit_graph = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.sparsecheckout")) {
 		core_apply_sparse_checkout = git_config_bool(var, value);
 		return 0;
diff --git a/environment.c b/environment.c
index d6dd64662c..8853e2f0dd 100644
--- a/environment.c
+++ b/environment.c
@@ -62,6 +62,7 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED;
 enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
+int core_commit_graph;
 int core_apply_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 10/14] commit-graph: close under reachability
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (8 preceding siblings ...)
  2018-03-14 19:27   ` [PATCH v6 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
                     ` (7 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach write_commit_graph() to walk all parents from the commits
discovered in packfiles. This prevents gaps given by loose objects or
previously-missed packfiles.

Also automatically add commits from the existing graph file, if it
exists.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 2f2e2c7083..fc7b4fa622 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -369,6 +369,28 @@ static int add_packed_commits(const struct object_id *oid,
 	return 0;
 }
 
+static void close_reachable(struct packed_oid_list *oids)
+{
+	int i;
+	struct rev_info revs;
+	struct commit *commit;
+	init_revisions(&revs, NULL);
+	for (i = 0; i < oids->nr; i++) {
+		commit = lookup_commit(&oids->list[i]);
+		if (commit && !parse_commit(commit))
+			revs.commits = commit_list_insert(commit, &revs.commits);
+	}
+
+	if (prepare_revision_walk(&revs))
+		die(_("revision walk setup failed"));
+
+	while ((commit = get_revision(&revs)) != NULL) {
+		ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc);
+		oidcpy(&oids->list[oids->nr], &(commit->object.oid));
+		(oids->nr)++;
+	}
+}
+
 void write_commit_graph(const char *obj_dir)
 {
 	struct packed_oid_list oids;
@@ -392,6 +414,7 @@ void write_commit_graph(const char *obj_dir)
 	ALLOC_ARRAY(oids.list, oids.alloc);
 
 	for_each_packed_object(add_packed_commits, &oids, 0);
+	close_reachable(&oids);
 
 	QSORT(oids.list, oids.nr, commit_compare);
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 11/14] commit: integrate commit graph with commit parsing
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (9 preceding siblings ...)
  2018-03-14 19:27   ` [PATCH v6 10/14] commit-graph: close under reachability Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
                     ` (6 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach Git to inspect a commit graph file to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date.

If core.commitGraph is false, then do not check graph files.

In test script t5318-commit-graph.sh, add output-matching conditions on
read-only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 678,653
reachable commits and is behind 'origin/master' by 59,929 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  8.31s |  0.94s | -88%  |
| branch -vv                       |  1.02s |  0.14s | -86%  |
| rev-list --all                   |  5.89s |  1.07s | -81%  |
| rev-list --all --objects         | 66.15s | 58.45s | -11%  |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 alloc.c                 |   1 +
 commit-graph.c          | 141 +++++++++++++++++++++++++++++++++++++++++++++++-
 commit-graph.h          |  12 +++++
 commit.c                |   3 ++
 commit.h                |   3 ++
 t/t5318-commit-graph.sh |  47 +++++++++++++++-
 6 files changed, 205 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacd..cf4f8b61e1 100644
--- a/alloc.c
+++ b/alloc.c
@@ -93,6 +93,7 @@ void *alloc_commit_node(void)
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
 	c->index = alloc_commit_index();
+	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
 	return c;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index fc7b4fa622..98e2b89b94 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -38,7 +38,6 @@
 #define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
 			GRAPH_OID_LEN + 8)
 
-
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	return xstrfmt("%s/info/commit-graph", obj_dir);
@@ -182,6 +181,145 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	exit(1);
 }
 
+/* global storage */
+struct commit_graph *commit_graph = NULL;
+
+static void prepare_commit_graph_one(const char *obj_dir)
+{
+	char *graph_name;
+
+	if (commit_graph)
+		return;
+
+	graph_name = get_commit_graph_filename(obj_dir);
+	commit_graph = load_commit_graph_one(graph_name);
+
+	FREE_AND_NULL(graph_name);
+}
+
+static int prepare_commit_graph_run_once = 0;
+static void prepare_commit_graph(void)
+{
+	struct alternate_object_database *alt;
+	char *obj_dir;
+
+	if (prepare_commit_graph_run_once)
+		return;
+	prepare_commit_graph_run_once = 1;
+
+	obj_dir = get_object_directory();
+	prepare_commit_graph_one(obj_dir);
+	prepare_alt_odb();
+	for (alt = alt_odb_list; !commit_graph && alt; alt = alt->next)
+		prepare_commit_graph_one(alt->path);
+}
+
+static void close_commit_graph(void)
+{
+	if (!commit_graph)
+		return;
+
+	if (commit_graph->graph_fd >= 0) {
+		munmap((void *)commit_graph->data, commit_graph->data_len);
+		commit_graph->data = NULL;
+		close(commit_graph->graph_fd);
+	}
+
+	FREE_AND_NULL(commit_graph);
+}
+
+static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos)
+{
+	return bsearch_hash(oid->hash, g->chunk_oid_fanout,
+			    g->chunk_oid_lookup, g->hash_len, pos);
+}
+
+static struct commit_list **insert_parent_or_die(struct commit_graph *g,
+						 uint64_t pos,
+						 struct commit_list **pptr)
+{
+	struct commit *c;
+	struct object_id oid;
+	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
+	c = lookup_commit(&oid);
+	if (!c)
+		die("could not find commit %s", oid_to_hex(&oid));
+	c->graph_pos = pos;
+	return &commit_list_insert(c, pptr)->next;
+}
+
+static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos)
+{
+	struct object_id oid;
+	uint32_t edge_value;
+	uint32_t *parent_data_ptr;
+	uint64_t date_low, date_high;
+	struct commit_list **pptr;
+	const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos;
+
+	item->object.parsed = 1;
+	item->graph_pos = pos;
+
+	hashcpy(oid.hash, commit_data);
+	item->tree = lookup_tree(&oid);
+
+	date_high = ntohl(*(uint32_t*)(commit_data + g->hash_len + 8)) & 0x3;
+	date_low = ntohl(*(uint32_t*)(commit_data + g->hash_len + 12));
+	item->date = (timestamp_t)((date_high << 32) | date_low);
+
+	pptr = &item->parents;
+
+	edge_value = ntohl(*(uint32_t*)(commit_data + g->hash_len));
+	if (edge_value == GRAPH_PARENT_NONE)
+		return 1;
+	pptr = insert_parent_or_die(g, edge_value, pptr);
+
+	edge_value = ntohl(*(uint32_t*)(commit_data + g->hash_len + 4));
+	if (edge_value == GRAPH_PARENT_NONE)
+		return 1;
+	if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) {
+		pptr = insert_parent_or_die(g, edge_value, pptr);
+		return 1;
+	}
+
+	parent_data_ptr = (uint32_t*)(g->chunk_large_edges +
+			  4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
+	do {
+		edge_value = ntohl(*parent_data_ptr);
+		pptr = insert_parent_or_die(g,
+					    edge_value & GRAPH_EDGE_LAST_MASK,
+					    pptr);
+		parent_data_ptr++;
+	} while (!(edge_value & GRAPH_LAST_EDGE));
+
+	return 1;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+	if (!core_commit_graph)
+		return 0;
+	if (item->object.parsed)
+		return 1;
+
+	prepare_commit_graph();
+	if (commit_graph) {
+		uint32_t pos;
+		int found;
+		if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
+			pos = item->graph_pos;
+			found = 1;
+		} else {
+			found = bsearch_graph(commit_graph, &(item->object.oid), &pos);
+		}
+
+		if (found)
+			return fill_commit_in_graph(item, commit_graph, pos);
+	}
+
+	return 0;
+}
+
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
 				     int nr_commits)
@@ -510,6 +648,7 @@ void write_commit_graph(const char *obj_dir)
 	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
 	write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
+	close_commit_graph();
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
 
diff --git a/commit-graph.h b/commit-graph.h
index 8b4b0f9f04..b223b9b078 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -5,6 +5,18 @@
 
 char *get_commit_graph_filename(const char *obj_dir);
 
+/*
+ * Given a commit struct, try to fill the commit struct info, including:
+ *  1. tree object
+ *  2. date
+ *  3. parents.
+ *
+ * Returns 1 if and only if the commit was found in the packed graph.
+ *
+ * See parse_commit_buffer() for the fallback after this call.
+ */
+int parse_commit_in_graph(struct commit *item);
+
 struct commit_graph {
 	int graph_fd;
 
diff --git a/commit.c b/commit.c
index 00c99c7272..3e39c86abf 100644
--- a/commit.c
+++ b/commit.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "tag.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "pkt-line.h"
 #include "utf8.h"
 #include "diff.h"
@@ -383,6 +384,8 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 		return -1;
 	if (item->object.parsed)
 		return 0;
+	if (parse_commit_in_graph(item))
+		return 0;
 	buffer = read_sha1_file(item->object.oid.hash, &type, &size);
 	if (!buffer)
 		return quiet_on_missing ? -1 :
diff --git a/commit.h b/commit.h
index 0fb8271665..e57ae4b583 100644
--- a/commit.h
+++ b/commit.h
@@ -9,6 +9,8 @@
 #include "string-list.h"
 #include "pretty.h"
 
+#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
+
 struct commit_list {
 	struct commit *item;
 	struct commit_list *next;
@@ -21,6 +23,7 @@ struct commit {
 	timestamp_t date;
 	struct commit_list *parents;
 	struct tree *tree;
+	uint32_t graph_pos;
 };
 
 extern int save_commit_buffer;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 03b75882a0..7bcc1b2874 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -7,6 +7,7 @@ test_expect_success 'setup full repo' '
 	mkdir full &&
 	cd "$TRASH_DIRECTORY/full" &&
 	git init &&
+	git config core.commitGraph true &&
 	objdir=".git/objects"
 '
 
@@ -26,6 +27,29 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
+graph_git_two_modes() {
+	git -c core.graph=true $1 >output
+	git -c core.graph=false $1 >expect
+	test_cmp output expect
+}
+
+graph_git_behavior() {
+	MSG=$1
+	DIR=$2
+	BRANCH=$3
+	COMPARE=$4
+	test_expect_success "check normal git operations: $MSG" '
+		cd "$TRASH_DIRECTORY/$DIR" &&
+		graph_git_two_modes "log --oneline $BRANCH" &&
+		graph_git_two_modes "log --topo-order $BRANCH" &&
+		graph_git_two_modes "log --graph $COMPARE..$BRANCH" &&
+		graph_git_two_modes "branch -vv" &&
+		graph_git_two_modes "merge-base -a $BRANCH $COMPARE"
+	'
+}
+
+graph_git_behavior 'no graph' full commits/3 commits/1
+
 graph_read_expect() {
 	OPTIONAL=""
 	NUM_CHUNKS=3
@@ -50,6 +74,8 @@ test_expect_success 'write graph' '
 	graph_read_expect "3"
 '
 
+graph_git_behavior 'graph exists' full commits/3 commits/1
+
 test_expect_success 'Add more commits' '
         cd "$TRASH_DIRECTORY/full" &&
 	git reset --hard commits/1 &&
@@ -86,7 +112,6 @@ test_expect_success 'Add more commits' '
 # |___/____/
 # 1
 
-
 test_expect_success 'write graph with merges' '
         cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
@@ -94,6 +119,10 @@ test_expect_success 'write graph with merges' '
 	graph_read_expect "10" "large_edges"
 '
 
+graph_git_behavior 'merge 1 vs 2' full merge/1 merge/2
+graph_git_behavior 'merge 1 vs 3' full merge/1 merge/3
+graph_git_behavior 'merge 2 vs 3' full merge/2 merge/3
+
 test_expect_success 'Add one more commit' '
         cd "$TRASH_DIRECTORY/full" &&
 	test_commit 8 &&
@@ -115,6 +144,9 @@ test_expect_success 'Add one more commit' '
 # |___/____/
 # 1
 
+graph_git_behavior 'mixed mode, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'mixed mode, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'write graph with new commit' '
         cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
@@ -122,6 +154,9 @@ test_expect_success 'write graph with new commit' '
 	graph_read_expect "11" "large_edges"
 '
 
+graph_git_behavior 'full graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'full graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'write graph with nothing new' '
         cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
@@ -129,13 +164,20 @@ test_expect_success 'write graph with nothing new' '
 	graph_read_expect "11" "large_edges"
 '
 
+graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
         cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
 	cd bare &&
+	git config core.commitGraph true &&
 	baredir="./objects"
 '
 
+graph_git_behavior 'bare repo, commit 8 vs merge 1' bare commits/8 merge/1
+graph_git_behavior 'bare repo, commit 8 vs merge 2' bare commits/8 merge/2
+
 test_expect_success 'write graph in bare repo' '
         cd "$TRASH_DIRECTORY/bare" &&
 	git commit-graph write &&
@@ -143,5 +185,8 @@ test_expect_success 'write graph in bare repo' '
 	graph_read_expect "11" "large_edges"
 '
 
+graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1
+graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare commits/8 merge/2
+
 test_done
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 12/14] commit-graph: read only from specific pack-indexes
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (10 preceding siblings ...)
  2018-03-14 19:27   ` [PATCH v6 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-15 22:50     ` SZEDER Gábor
  2018-03-14 19:27   ` [PATCH v6 13/14] commit-graph: build graph from starting commits Derrick Stolee
                     ` (5 subsequent siblings)
  17 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to inspect the objects only in a certain list
of pack-indexes within the given pack directory. This allows updating
the commit graph iteratively.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt | 11 ++++++++++-
 builtin/commit-graph.c             | 33 ++++++++++++++++++++++++++++++---
 commit-graph.c                     | 26 ++++++++++++++++++++++++--
 commit-graph.h                     |  4 +++-
 packfile.c                         |  4 ++--
 packfile.h                         |  2 ++
 t/t5318-commit-graph.sh            | 10 ++++++++++
 7 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 51cb038f3d..b945510f0f 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -32,7 +32,9 @@ COMMANDS
 'write'::
 
 Write a commit graph file based on the commits found in packfiles.
-Includes all commits from the existing commit graph file.
++
+With the `--stdin-packs` option, generate the new commit graph by
+walking objects only in the specified packfiles.
 
 'read'::
 
@@ -49,6 +51,13 @@ EXAMPLES
 $ git commit-graph write
 ------------------------------------------------
 
+* Write a graph file, extending the current graph file using commits
+* in <pack-index>.
++
+------------------------------------------------
+$ echo <pack-index> | git commit-graph write --stdin-packs
+------------------------------------------------
+
 * Read basic information from the commit-graph file.
 +
 ------------------------------------------------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 0e164becff..eebca57e6f 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
 	NULL
 };
 
@@ -18,12 +18,13 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
 	NULL
 };
 
 static struct opts_commit_graph {
 	const char *obj_dir;
+	int stdin_packs;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -76,10 +77,18 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
+	const char **pack_indexes = NULL;
+	int packs_nr = 0;
+	const char **lines = NULL;
+	int lines_nr = 0;
+	int lines_alloc = 0;
+
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
 			N_("dir"),
 			N_("The object directory to store the graph")),
+		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
+			N_("scan packfiles listed by stdin for commits")),
 		OPT_END(),
 	};
 
@@ -90,7 +99,25 @@ static int graph_write(int argc, const char **argv)
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
-	write_commit_graph(opts.obj_dir);
+	if (opts.stdin_packs) {
+		struct strbuf buf = STRBUF_INIT;
+		lines_nr = 0;
+		lines_alloc = 128;
+		ALLOC_ARRAY(lines, lines_alloc);
+
+		while (strbuf_getline(&buf, stdin) != EOF) {
+			ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
+			lines[lines_nr++] = strbuf_detach(&buf, NULL);
+		}
+
+		pack_indexes = lines;
+		packs_nr = lines_nr;
+	}
+
+	write_commit_graph(opts.obj_dir,
+			   pack_indexes,
+			   packs_nr);
+
 	return 0;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 98e2b89b94..f0d7585ddb 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -529,7 +529,9 @@ static void close_reachable(struct packed_oid_list *oids)
 	}
 }
 
-void write_commit_graph(const char *obj_dir)
+void write_commit_graph(const char *obj_dir,
+			const char **pack_indexes,
+			int nr_packs)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -551,7 +553,27 @@ void write_commit_graph(const char *obj_dir)
 		oids.alloc = 1024;
 	ALLOC_ARRAY(oids.list, oids.alloc);
 
-	for_each_packed_object(add_packed_commits, &oids, 0);
+	if (pack_indexes) {
+		struct strbuf packname = STRBUF_INIT;
+		int dirlen;
+		strbuf_addf(&packname, "%s/pack/", obj_dir);
+		dirlen = packname.len;
+		for (i = 0; i < nr_packs; i++) {
+			struct packed_git *p;
+			strbuf_setlen(&packname, dirlen);
+			strbuf_addstr(&packname, pack_indexes[i]);
+			p = add_packed_git(packname.buf, packname.len, 1);
+			if (!p)
+				die("error adding pack %s", packname.buf);
+			if (open_pack_index(p))
+				die("error opening index for %s", packname.buf);
+			for_each_object_in_pack(p, add_packed_commits, &oids);
+			close_pack(p);
+		}
+		strbuf_release(&packname);
+	} else
+		for_each_packed_object(add_packed_commits, &oids, 0);
+
 	close_reachable(&oids);
 
 	QSORT(oids.list, oids.nr, commit_compare);
diff --git a/commit-graph.h b/commit-graph.h
index b223b9b078..65fe77075c 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -36,7 +36,9 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
-void write_commit_graph(const char *obj_dir);
+void write_commit_graph(const char *obj_dir,
+			const char **pack_indexes,
+			int nr_packs);
 
 #endif
 
diff --git a/packfile.c b/packfile.c
index 7c1a2519fc..b1d33b646a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -304,7 +304,7 @@ void close_pack_index(struct packed_git *p)
 	}
 }
 
-static void close_pack(struct packed_git *p)
+void close_pack(struct packed_git *p)
 {
 	close_pack_windows(p);
 	close_pack_fd(p);
@@ -1850,7 +1850,7 @@ int has_pack_index(const unsigned char *sha1)
 	return 1;
 }
 
-static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
+int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
 {
 	uint32_t i;
 	int r = 0;
diff --git a/packfile.h b/packfile.h
index a7fca598d6..b341f2bf5e 100644
--- a/packfile.h
+++ b/packfile.h
@@ -63,6 +63,7 @@ extern void close_pack_index(struct packed_git *);
 
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
 extern void close_pack_windows(struct packed_git *);
+extern void close_pack(struct packed_git *);
 extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
@@ -140,6 +141,7 @@ typedef int each_packed_object_fn(const struct object_id *oid,
 				  struct packed_git *pack,
 				  uint32_t pos,
 				  void *data);
+extern int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn, void *data);
 extern int for_each_packed_object(each_packed_object_fn, void *, unsigned flags);
 
 /*
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 7bcc1b2874..5ab8b6975e 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -167,6 +167,16 @@ test_expect_success 'write graph with nothing new' '
 graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph from latest pack with closure' '
+        cd "$TRASH_DIRECTORY/full" &&
+	cat new-idx | git commit-graph write --stdin-packs &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "9" "large_edges" 
+'
+
+graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
         cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 13/14] commit-graph: build graph from starting commits
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (11 preceding siblings ...)
  2018-03-14 19:27   ` [PATCH v6 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-14 19:27   ` [PATCH v6 14/14] commit-graph: implement "--additive" option Derrick Stolee
                     ` (4 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to read commits from stdin when the
--stdin-commits flag is specified. Commits reachable from these
commits are added to the graph. This is a much faster way to construct
the graph than inspecting all packed objects, but is restricted to
known tips.

For the Linux repository, 700,000+ commits were added to the graph
file starting from 'master' in 7-9 seconds, depending on the number
of packfiles in the repo (1, 24, or 120).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt | 14 +++++++++++++-
 builtin/commit-graph.c             | 27 +++++++++++++++++++++------
 commit-graph.c                     | 27 +++++++++++++++++++++++++--
 commit-graph.h                     |  4 +++-
 t/t5318-commit-graph.sh            | 13 +++++++++++++
 5 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index b945510f0f..0710a68f2d 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -34,7 +34,13 @@ COMMANDS
 Write a commit graph file based on the commits found in packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
-walking objects only in the specified packfiles.
+walking objects only in the specified packfiles. (Cannot be combined
+with --stdin-commits.)
++
+With the `--stdin-commits` option, generate the new commit graph by
+walking commits starting at the commits specified in stdin as a list
+of OIDs in hex, one OID per line. (Cannot be combined with
+--stdin-packs.)
 
 'read'::
 
@@ -58,6 +64,12 @@ $ git commit-graph write
 $ echo <pack-index> | git commit-graph write --stdin-packs
 ------------------------------------------------
 
+* Write a graph file containing all reachable commits.
++
+------------------------------------------------
+$ git show-ref -s | git commit-graph write --stdin-commits
+------------------------------------------------
+
 * Read basic information from the commit-graph file.
 +
 ------------------------------------------------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index eebca57e6f..1c7b7e72b0 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -18,13 +18,14 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
 static struct opts_commit_graph {
 	const char *obj_dir;
 	int stdin_packs;
+	int stdin_commits;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -79,6 +80,8 @@ static int graph_write(int argc, const char **argv)
 {
 	const char **pack_indexes = NULL;
 	int packs_nr = 0;
+	const char **commit_hex = NULL;
+	int commits_nr = 0;
 	const char **lines = NULL;
 	int lines_nr = 0;
 	int lines_alloc = 0;
@@ -89,6 +92,8 @@ static int graph_write(int argc, const char **argv)
 			N_("The object directory to store the graph")),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
 			N_("scan packfiles listed by stdin for commits")),
+		OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
+			N_("start walk at commits listed by stdin")),
 		OPT_END(),
 	};
 
@@ -96,10 +101,12 @@ static int graph_write(int argc, const char **argv)
 			     builtin_commit_graph_write_options,
 			     builtin_commit_graph_write_usage, 0);
 
+	if (opts.stdin_packs && opts.stdin_commits)
+		die(_("cannot use both --stdin-commits and --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
-	if (opts.stdin_packs) {
+	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
 		lines_nr = 0;
 		lines_alloc = 128;
@@ -110,13 +117,21 @@ static int graph_write(int argc, const char **argv)
 			lines[lines_nr++] = strbuf_detach(&buf, NULL);
 		}
 
-		pack_indexes = lines;
-		packs_nr = lines_nr;
+		if (opts.stdin_packs) {
+			pack_indexes = lines;
+			packs_nr = lines_nr;
+		}
+		if (opts.stdin_commits) {
+			commit_hex = lines;
+			commits_nr = lines_nr;
+		}
 	}
 
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
-			   packs_nr);
+			   packs_nr,
+			   commit_hex,
+			   commits_nr);
 
 	return 0;
 }
diff --git a/commit-graph.c b/commit-graph.c
index f0d7585ddb..9f1ba9bff6 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -531,7 +531,9 @@ static void close_reachable(struct packed_oid_list *oids)
 
 void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
-			int nr_packs)
+			int nr_packs,
+			const char **commit_hex,
+			int nr_commits)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -571,7 +573,28 @@ void write_commit_graph(const char *obj_dir,
 			close_pack(p);
 		}
 		strbuf_release(&packname);
-	} else
+	}
+
+	if (commit_hex) {
+		for (i = 0; i < nr_commits; i++) {
+			const char *end;
+			struct object_id oid;
+			struct commit *result;
+
+			if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end))
+				continue;
+
+			result = lookup_commit_reference_gently(&oid, 1);
+
+			if (result) {
+				ALLOC_GROW(oids.list, oids.nr + 1, oids.alloc);
+				oidcpy(&oids.list[oids.nr], &(result->object.oid));
+				oids.nr++;
+			}
+		}
+	}
+
+	if (!pack_indexes && !commit_hex)
 		for_each_packed_object(add_packed_commits, &oids, 0);
 
 	close_reachable(&oids);
diff --git a/commit-graph.h b/commit-graph.h
index 65fe77075c..4c70281e70 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -38,7 +38,9 @@ struct commit_graph *load_commit_graph_one(const char *graph_file);
 
 void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
-			int nr_packs);
+			int nr_packs,
+			const char **commit_hex,
+			int nr_commits);
 
 #endif
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 5ab8b6975e..15b50b6282 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -177,6 +177,19 @@ test_expect_success 'build graph from latest pack with closure' '
 graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph from commits with closure' '
+        cd "$TRASH_DIRECTORY/full" &&
+	git tag -a -m "merge" tag/merge merge/2 &&
+	git rev-parse tag/merge >commits-in &&
+	git rev-parse merge/1 >>commits-in &&
+	cat commits-in | git commit-graph write --stdin-commits &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "6"
+'
+
+graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
         cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v6 14/14] commit-graph: implement "--additive" option
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (12 preceding siblings ...)
  2018-03-14 19:27   ` [PATCH v6 13/14] commit-graph: build graph from starting commits Derrick Stolee
@ 2018-03-14 19:27   ` Derrick Stolee
  2018-03-14 20:10   ` [PATCH v6 00/14] Serialized Git Commit Graph Ramsay Jones
                     ` (3 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-14 19:27 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to add all commits from the existing
commit-graph file to the file about to be written. This should be
used when adding new commits without performing garbage collection.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt | 10 ++++++++++
 builtin/commit-graph.c             | 10 +++++++---
 commit-graph.c                     | 17 ++++++++++++++++-
 commit-graph.h                     |  3 ++-
 t/t5318-commit-graph.sh            | 10 ++++++++++
 5 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 0710a68f2d..ccf5e203ce 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -41,6 +41,9 @@ With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
 --stdin-packs.)
++
+With the `--additive` option, include all commits that are present
+in the existing commit-graph file.
 
 'read'::
 
@@ -70,6 +73,13 @@ $ echo <pack-index> | git commit-graph write --stdin-packs
 $ git show-ref -s | git commit-graph write --stdin-commits
 ------------------------------------------------
 
+* Write a graph file containing all commits in the current
+* commit-graph file along with those reachable from HEAD.
++
+------------------------------------------------
+$ git rev-parse HEAD | git commit-graph write --stdin-commits --additive
+------------------------------------------------
+
 * Read basic information from the commit-graph file.
 +
 ------------------------------------------------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 1c7b7e72b0..d26a6d6de3 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--additive] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -18,7 +18,7 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--additive] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -26,6 +26,7 @@ static struct opts_commit_graph {
 	const char *obj_dir;
 	int stdin_packs;
 	int stdin_commits;
+	int additive;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -94,6 +95,8 @@ static int graph_write(int argc, const char **argv)
 			N_("scan packfiles listed by stdin for commits")),
 		OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
 			N_("start walk at commits listed by stdin")),
+		OPT_BOOL(0, "additive", &opts.additive,
+			N_("include all commits already in the commit-graph file")),
 		OPT_END(),
 	};
 
@@ -131,7 +134,8 @@ static int graph_write(int argc, const char **argv)
 			   pack_indexes,
 			   packs_nr,
 			   commit_hex,
-			   commits_nr);
+			   commits_nr,
+			   opts.additive);
 
 	return 0;
 }
diff --git a/commit-graph.c b/commit-graph.c
index 9f1ba9bff6..6348bab82b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -533,7 +533,8 @@ void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
 			int nr_packs,
 			const char **commit_hex,
-			int nr_commits)
+			int nr_commits,
+			int additive)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -551,10 +552,24 @@ void write_commit_graph(const char *obj_dir,
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
 
+	if (additive) {
+		prepare_commit_graph_one(obj_dir);
+		if (commit_graph)
+			oids.alloc += commit_graph->num_commits;
+	}
+
 	if (oids.alloc < 1024)
 		oids.alloc = 1024;
 	ALLOC_ARRAY(oids.list, oids.alloc);
 
+	if (additive && commit_graph) {
+		for (i = 0; i < commit_graph->num_commits; i++) {
+			const unsigned char *hash = commit_graph->chunk_oid_lookup +
+				commit_graph->hash_len * i;
+			hashcpy(oids.list[oids.nr++].hash, hash);
+		}
+	}
+
 	if (pack_indexes) {
 		struct strbuf packname = STRBUF_INIT;
 		int dirlen;
diff --git a/commit-graph.h b/commit-graph.h
index 4c70281e70..c10e436413 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -40,7 +40,8 @@ void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
 			int nr_packs,
 			const char **commit_hex,
-			int nr_commits);
+			int nr_commits,
+			int additive);
 
 #endif
 
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 15b50b6282..e3f1351e39 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -190,6 +190,16 @@ test_expect_success 'build graph from commits with closure' '
 graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph from commits additively' '
+        cd "$TRASH_DIRECTORY/full" &&
+	git rev-parse merge/3 | git commit-graph write --stdin-commits --additive &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "10" "large_edges"
+'
+
+graph_git_behavior 'additive graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'additive graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
         cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (13 preceding siblings ...)
  2018-03-14 19:27   ` [PATCH v6 14/14] commit-graph: implement "--additive" option Derrick Stolee
@ 2018-03-14 20:10   ` Ramsay Jones
  2018-03-14 20:43   ` Junio C Hamano
                     ` (2 subsequent siblings)
  17 siblings, 0 replies; 110+ messages in thread
From: Ramsay Jones @ 2018-03-14 20:10 UTC (permalink / raw)
  To: Derrick Stolee, git
  Cc: gitster, peff, sbeller, szeder.dev, git, Derrick Stolee



On 14/03/18 19:27, Derrick Stolee wrote:
> This v6 includes feedback around csum-file.c and the rename of hashclose()
> to finalize_hashfile(). These are the first two commits of the series, so
> they could be pulled out independently.
> 
> The only other change since v5 is that I re-ran the performance numbers
> in "commit: integrate commit graph with commit parsing".

I haven't looked at v6 (I will wait for it to hit pu), but v5 is
still causing sparse to complain.

The diff given below (on top of current pu @9e418c7c9), fixes it
for me. (Using a plain integer as a NULL pointer, in builtin/commit-
graph.c, and the 'commit_graph' symbol should be file-local, in
commit-graph.c).

Thanks!

ATB,
Ramsay Jones

-- >8 --
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 62ac26e44..855df66bd 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -31,7 +31,7 @@ static struct opts_commit_graph {
 
 static int graph_read(int argc, const char **argv)
 {
-	struct commit_graph *graph = 0;
+	struct commit_graph *graph = NULL;
 	char *graph_name;
 
 	static struct option builtin_commit_graph_read_options[] = {
diff --git a/commit-graph.c b/commit-graph.c
index 631edac4c..7b45fe85d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -182,7 +182,7 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 }
 
 /* global storage */
-struct commit_graph *commit_graph = NULL;
+static struct commit_graph *commit_graph = NULL;
 
 static void prepare_commit_graph_one(const char *obj_dir)
 {


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (14 preceding siblings ...)
  2018-03-14 20:10   ` [PATCH v6 00/14] Serialized Git Commit Graph Ramsay Jones
@ 2018-03-14 20:43   ` Junio C Hamano
  2018-03-15 17:23     ` Johannes Schindelin
  2018-03-16 16:28     ` Lars Schneider
  2018-03-16 15:06   ` Ævar Arnfjörð Bjarmason
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
  17 siblings, 2 replies; 110+ messages in thread
From: Junio C Hamano @ 2018-03-14 20:43 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, peff, sbeller, szeder.dev, ramsay, git, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> This v6 includes feedback around csum-file.c and the rename of hashclose()
> to finalize_hashfile(). These are the first two commits of the series, so
> they could be pulled out independently.
>
> The only other change since v5 is that I re-ran the performance numbers
> in "commit: integrate commit graph with commit parsing".

Thanks.

> Hopefully this version is ready to merge. I have several follow-up topics
> in mind to submit soon after, including:

A few patches add trailing blank lines and other whitespace
breakages, which will stop my "git merge" later to 'next' and down,
as I have a pre-commit hook to catch them.

Here is the output from my "git am -s" session.

Applying: csum-file: rename hashclose() to finalize_hashfile()
Applying: csum-file: refactor finalize_hashfile() method
.git/rebase-apply/patch:109: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
Applying: commit-graph: add format document
.git/rebase-apply/patch:175: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
Applying: graph: add commit graph design document
.git/rebase-apply/patch:42: new blank line at EOF.
+
.git/rebase-apply/patch:109: new blank line at EOF.
+
warning: 2 lines add whitespace errors.
Applying: commit-graph: create git-commit-graph builtin
.git/rebase-apply/patch:323: space before tab in indent.
 	fd = hold_lock_file_for_update(&lk, graph_name, 0);
.git/rebase-apply/patch:334: space before tab in indent.
 		fd = hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
.git/rebase-apply/patch:385: new blank line at EOF.
+
.git/rebase-apply/patch:398: new blank line at EOF.
+
warning: 2 lines applied after fixing whitespace errors.
Applying: commit-graph: implement write_commit_graph()
.git/rebase-apply/patch:138: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:144: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:154: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:160: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:197: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
warning: squelched 6 whitespace errors
warning: 10 lines applied after fixing whitespace errors.
Applying: commit-graph: implement 'git-commit-graph write'
Test number t5318 already taken
.git/rebase-apply/patch:346: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:356: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:366: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:374: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:384: indent with spaces.
        cd "$TRASH_DIRECTORY/bare" &&
warning: 5 lines add whitespace errors.
Applying: commit-graph: implement git commit-graph read
Applying: commit-graph: add core.commitGraph setting
Applying: commit-graph: close under reachability
.git/rebase-apply/patch:302: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:310: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:321: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:331: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:341: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
warning: squelched 2 whitespace errors
warning: 7 lines add whitespace errors.
Applying: commit: integrate commit graph with commit parsing
.git/rebase-apply/patch:224: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:227: trailing whitespace.
	graph_read_expect "9" "large_edges" 
.git/rebase-apply/patch:234: indent with spaces.
        cd "$TRASH_DIRECTORY" &&
warning: 2 lines applied after fixing whitespace errors.
Applying: commit-graph: read only from specific pack-indexes
.git/rebase-apply/patch:196: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:209: indent with spaces.
        cd "$TRASH_DIRECTORY" &&
warning: 1 line applied after fixing whitespace errors.
Applying: commit-graph: build graph from starting commits
.git/rebase-apply/patch:148: indent with spaces.
        cd "$TRASH_DIRECTORY/full" &&
.git/rebase-apply/patch:158: indent with spaces.
        cd "$TRASH_DIRECTORY" &&
warning: 1 line applied after fixing whitespace errors.
Applying: commit-graph: implement "--additive" option


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-14 20:43   ` Junio C Hamano
@ 2018-03-15 17:23     ` Johannes Schindelin
  2018-03-15 18:41       ` Junio C Hamano
  2018-03-16 16:28     ` Lars Schneider
  1 sibling, 1 reply; 110+ messages in thread
From: Johannes Schindelin @ 2018-03-15 17:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, git, peff, sbeller, szeder.dev, ramsay, git,
	Derrick Stolee

Hi Junio,

On Wed, 14 Mar 2018, Junio C Hamano wrote:

> A few patches add trailing blank lines and other whitespace
> breakages, which will stop my "git merge" later to 'next' and down,
> as I have a pre-commit hook to catch them.

I wonder how you cope with the intentional "whitespace breakage" caused by
a TAB after HS in my recreate-merges patch series...

> Here is the output from my "git am -s" session.
> 
> Applying: csum-file: rename hashclose() to finalize_hashfile()
> Applying: csum-file: refactor finalize_hashfile() method
> .git/rebase-apply/patch:109: new blank line at EOF.

Stolee, you definitely want to inspect those changes (`git log --check`
was introduced to show you whitespace problems). If all of those
whitespace issues are unintentional, you can fix them using `git rebase
--whitespace=fix` in the most efficient way.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-15 17:23     ` Johannes Schindelin
@ 2018-03-15 18:41       ` Junio C Hamano
  2018-03-15 21:51         ` Ramsay Jones
  2018-03-16 11:50         ` Johannes Schindelin
  0 siblings, 2 replies; 110+ messages in thread
From: Junio C Hamano @ 2018-03-15 18:41 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Derrick Stolee, git, peff, sbeller, szeder.dev, ramsay, git,
	Derrick Stolee

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Stolee, you definitely want to inspect those changes (`git log --check`
> was introduced to show you whitespace problems). If all of those
> whitespace issues are unintentional, you can fix them using `git rebase
> --whitespace=fix` in the most efficient way.

Another way that may be easier (depending on the way Derrick works)
is to fetch from me and start working from there, as if they were
the last set of commits that were sent to the list.  "git log
--first-parent --oneline master..pu" would show where the tip of the
topic is.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-15 18:41       ` Junio C Hamano
@ 2018-03-15 21:51         ` Ramsay Jones
  2018-03-16 11:50         ` Johannes Schindelin
  1 sibling, 0 replies; 110+ messages in thread
From: Ramsay Jones @ 2018-03-15 21:51 UTC (permalink / raw)
  To: Junio C Hamano, Johannes Schindelin
  Cc: Derrick Stolee, git, peff, sbeller, szeder.dev, git,
	Derrick Stolee



On 15/03/18 18:41, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
>> Stolee, you definitely want to inspect those changes (`git log --check`
>> was introduced to show you whitespace problems). If all of those
>> whitespace issues are unintentional, you can fix them using `git rebase
>> --whitespace=fix` in the most efficient way.
> 
> Another way that may be easier (depending on the way Derrick works)
> is to fetch from me and start working from there, as if they were
> the last set of commits that were sent to the list.  "git log
> --first-parent --oneline master..pu" would show where the tip of the
> topic is.

BTW, thanks for adding the 'SQUASH??? sparse fixes' on top of that
branch - sparse is now quiet on the 'pu' branch. (The same can't
be said of static-check.pl, but that is a different issue. ;-) ).

Thanks!

ATB,
Ramsay Jones


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 12/14] commit-graph: read only from specific pack-indexes
  2018-03-14 19:27   ` [PATCH v6 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
@ 2018-03-15 22:50     ` SZEDER Gábor
  2018-03-19 13:13       ` Derrick Stolee
  0 siblings, 1 reply; 110+ messages in thread
From: SZEDER Gábor @ 2018-03-15 22:50 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Git mailing list, Junio C Hamano, Jeff King, Stefan Beller,
	Ramsay Jones, git, Derrick Stolee

On Wed, Mar 14, 2018 at 8:27 PM, Derrick Stolee <stolee@gmail.com> wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
>
> Teach git-commit-graph to inspect the objects only in a certain list
> of pack-indexes within the given pack directory. This allows updating
> the commit graph iteratively.

This commit message, and indeed the code itself talk about pack
indexes ...

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-commit-graph.txt | 11 ++++++++++-
>  builtin/commit-graph.c             | 33 ++++++++++++++++++++++++++++++---
>  commit-graph.c                     | 26 ++++++++++++++++++++++++--
>  commit-graph.h                     |  4 +++-
>  packfile.c                         |  4 ++--
>  packfile.h                         |  2 ++
>  t/t5318-commit-graph.sh            | 10 ++++++++++
>  7 files changed, 81 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> index 51cb038f3d..b945510f0f 100644
> --- a/Documentation/git-commit-graph.txt
> +++ b/Documentation/git-commit-graph.txt
> @@ -32,7 +32,9 @@ COMMANDS
>  'write'::
>
>  Write a commit graph file based on the commits found in packfiles.
> -Includes all commits from the existing commit graph file.
> ++
> +With the `--stdin-packs` option, generate the new commit graph by
> +walking objects only in the specified packfiles.

... but this piece of documentation ...

> +               OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
> +                       N_("scan packfiles listed by stdin for commits")),

... and this help text, and even the name of the option talk about
packfiles.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-15 18:41       ` Junio C Hamano
  2018-03-15 21:51         ` Ramsay Jones
@ 2018-03-16 11:50         ` Johannes Schindelin
  2018-03-16 17:27           ` Junio C Hamano
  1 sibling, 1 reply; 110+ messages in thread
From: Johannes Schindelin @ 2018-03-16 11:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, git, peff, sbeller, szeder.dev, ramsay, git,
	Derrick Stolee

Hi Junio,

On Thu, 15 Mar 2018, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > Stolee, you definitely want to inspect those changes (`git log --check`
> > was introduced to show you whitespace problems). If all of those
> > whitespace issues are unintentional, you can fix them using `git rebase
> > --whitespace=fix` in the most efficient way.
> 
> Another way that may be easier (depending on the way Derrick works)
> is to fetch from me and start working from there, as if they were
> the last set of commits that were sent to the list.  "git log
> --first-parent --oneline master..pu" would show where the tip of the
> topic is.

That is not really easier. We had that discussion before. Stolee would
have to remove your Signed-off-by: lines *manually*.

I understand that it is a trade-off between time you have to spend and
that others have to spend, and since you do not scale, that trade-off has
to be in your favor.

My hope is that we will eventually collaborate more effectively using Git
itself, then those trade-offs will become a lot less involved because the
overall cost will be a lot smaller.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (15 preceding siblings ...)
  2018-03-14 20:43   ` Junio C Hamano
@ 2018-03-16 15:06   ` Ævar Arnfjörð Bjarmason
  2018-03-16 16:38     ` SZEDER Gábor
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
  17 siblings, 1 reply; 110+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-16 15:06 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, peff, sbeller, szeder.dev, ramsay, git,
	Derrick Stolee


On Wed, Mar 14 2018, Derrick Stolee jotted:

> Hopefully this version is ready to merge. I have several follow-up topics
> in mind to submit soon after, including:

I've been doing some preliminary testing of this internally, all good so
far, on a relatively small repo (~100k commits) I was using for testing:

    - git -c core.commitGraph=true -C <repo> rev-list --all:
        -      /mnt/ext4_graph => min:273    mean:279    max:389    -- (273 274 275 276 277 279 282 282 345 389)
        -            /mnt/ext4 => min:1087   mean:1123   max:1175   -- (1087 1092 1092 1104 1117 1123 1126 1136 1143 1175)

This is on a fresh clone with one giant pack and where the commit graph
data was generated afterwards with "git commit-graph write" for the
*_graph repo, so it contains all the commits.

So less than 25% of the mean time it spent before. Nice. Those are times
in milliseconds over 10 runs, for this particular one I didn't get much
of an improvement in --graph, but still ~10%:

    - git -c core.commitGraph=true -C <repo> log --oneline --graph:
        -      /mnt/ext4_graph => min:1420   mean:1449   max:1586   -- (1420 1423 1428 1434 1449 1449 1490 1548 1567 1586)
        -            /mnt/ext4 => min:1547   mean:1616   max:2136   -- (1547 1557 1581 1585 1598 1616 1621 1740 1964 2136)

I noticed that it takes a *long* time to generate the graph, on a bigger
repo I have it takes 20 minutes, and this is a repo where repack -A -d
itself takes 5-8 minutes, probably on the upper end of that with the
bitmap, but once you do that it's relatively snappy with --stdin-commits
--additive when I feed it the new commits.

I don't have any need really to make this run in 10m instead of 20m,
just something I found interesting, i.e. how it compares to the repack
itself.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-14 20:43   ` Junio C Hamano
  2018-03-15 17:23     ` Johannes Schindelin
@ 2018-03-16 16:28     ` Lars Schneider
  2018-03-19 13:10       ` Derrick Stolee
  1 sibling, 1 reply; 110+ messages in thread
From: Lars Schneider @ 2018-03-16 16:28 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Git List, Jeff King, Stefan Beller, SZEDER Gábor,
	Ramsay Jones, git, Derrick Stolee, Junio C Hamano


> On 14 Mar 2018, at 21:43, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> This v6 includes feedback around csum-file.c and the rename of hashclose()
>> to finalize_hashfile(). These are the first two commits of the series, so
>> they could be pulled out independently.
>> 
>> The only other change since v5 is that I re-ran the performance numbers
>> in "commit: integrate commit graph with commit parsing".
> 
> Thanks.
> 
>> Hopefully this version is ready to merge. I have several follow-up topics
>> in mind to submit soon after, including:
> 
> A few patches add trailing blank lines and other whitespace
> breakages, which will stop my "git merge" later to 'next' and down,
> as I have a pre-commit hook to catch them.

@stolee: 

I run "git --no-pager diff --check $BASE_HASH...$HEAD_HASH" to detect
these kinds of things. I run this as part of my "prepare patch" [1] script
which is inspired by a similar script originally written by Dscho.

Do you think it would make sense to mention (or even
recommend) such a script in your awesome GfW CONTRIBUTING.md?


- Lars


[1] https://github.com/larsxschneider/git-list-helper/blob/master/prepare-patch.sh#L71



^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-16 15:06   ` Ævar Arnfjörð Bjarmason
@ 2018-03-16 16:38     ` SZEDER Gábor
  2018-03-16 18:33       ` Junio C Hamano
  0 siblings, 1 reply; 110+ messages in thread
From: SZEDER Gábor @ 2018-03-16 16:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Derrick Stolee, Git mailing list, Junio C Hamano, Jeff King,
	Stefan Beller, Ramsay Jones, git, Derrick Stolee

On Fri, Mar 16, 2018 at 4:06 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:

> I noticed that it takes a *long* time to generate the graph, on a bigger
> repo I have it takes 20 minutes, and this is a repo where repack -A -d
> itself takes 5-8 minutes, probably on the upper end of that with the
> bitmap, but once you do that it's relatively snappy with --stdin-commits
> --additive when I feed it the new commits.
>
> I don't have any need really to make this run in 10m instead of 20m,
> just something I found interesting, i.e. how it compares to the repack
> itself.

You should forget '--stdin-packs' and use '--stdin-commits' to generate
the initial graph, it's much faster even without '--additive'[1].  See

  https://public-inbox.org/git/CAM0VKj=wmkBNH=psCRztXFrC13RiG1EaSw89Q6LJaNsdJDEFHg@mail.gmail.com/

I still think that the default behaviour for 'git commit-graph write'
should simply walk history from all refs instead of enumerating all
objects in all packfiles.


[1] - Please excuse the bikeshed: '--additive' is such a strange
      sounding option name, at least for me.  '--append', perhaps?

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-16 11:50         ` Johannes Schindelin
@ 2018-03-16 17:27           ` Junio C Hamano
  2018-03-19 11:41             ` Johannes Schindelin
  0 siblings, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2018-03-16 17:27 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Derrick Stolee, git, peff, sbeller, szeder.dev, ramsay, git,
	Derrick Stolee

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > Stolee, you definitely want to inspect those changes (`git log --check`
>> > was introduced to show you whitespace problems). If all of those
>> > whitespace issues are unintentional, you can fix them using `git rebase
>> > --whitespace=fix` in the most efficient way.
>> 
>> Another way that may be easier (depending on the way Derrick works)
>> is to fetch from me and start working from there, as if they were
>> the last set of commits that were sent to the list.  "git log
>> --first-parent --oneline master..pu" would show where the tip of the
>> topic is.
>
> That is not really easier. We had that discussion before. Stolee would
> have to remove your Signed-off-by: lines *manually*.

In return, all the whitespace fixes (and other fixes if any) I did
on my end can be reused free by the submitter, instead of having to
redo it *manually*.

If a reroll of the series does not touch one specific commit, that
commit can be left as-is; I do not see a need to remove anybody's
sign-off or add yet another of your own, if the last two sign-offs
are from you and your upstream maintainer, if you did not change
anythning in what you got from the latter.  This depends on what
tool is used to work on refinement, but with "rebase -i", you'd
leave "pick" as "pick" and not "edit" or "reword" and it would do
the right thing.

If you did refine, you get an editor when you record that
refinement, so it is just a few key strokes, either "dd" or \C-k, to
do that removal *manually*.  So I am not sure why you are making a
mountain out of this molehill.

If you do want to remove the last two sign-off (i.e. penultimate one
by the author done during the initial submission, plus the last one
by me), well, "rebase -i" is open source.  We can add features to
the tool to help everybody collaborate better.  Extending changes
like planned addition of --signoff by Phillip, it is not all that
far-fetched to add a mechanism that notices a project-specific
trailer rewrite rules in-tree and uses that in between each step to
rewrite the trailer block of the commit message, for example, and
the rule

> I understand that it is a trade-off between time you have to spend and
> that others have to spend, and since you do not scale, that trade-off has
> to be in your favor.

That tradeoff may exist, but it does not weigh in the picture above
at all.

Perhaps it is better to try to actually think of a way to work
together better, instead of just whining.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-16 16:38     ` SZEDER Gábor
@ 2018-03-16 18:33       ` Junio C Hamano
  2018-03-16 19:48         ` SZEDER Gábor
  2018-03-16 20:49         ` Jeff King
  0 siblings, 2 replies; 110+ messages in thread
From: Junio C Hamano @ 2018-03-16 18:33 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Git mailing list, Jeff King, Stefan Beller, Ramsay Jones, git,
	Derrick Stolee

SZEDER Gábor <szeder.dev@gmail.com> writes:

> You should forget '--stdin-packs' and use '--stdin-commits' to generate
> the initial graph, it's much faster even without '--additive'[1].  See
>
>   https://public-inbox.org/git/CAM0VKj=wmkBNH=psCRztXFrC13RiG1EaSw89Q6LJaNsdJDEFHg@mail.gmail.com/
>
> I still think that the default behaviour for 'git commit-graph write'
> should simply walk history from all refs instead of enumerating all
> objects in all packfiles.

Somehow I missed that one.  Thanks for the link to it.

It is not so surprising that history walking runs rings around
enumerating objects in packfiles, if packfiles are built well.

A well-built packfile tends to has newer objects in base form and
has delta that goes in backward direction (older objects are
represented as delta against newer ones).  This helps warlking from
the tips of the history quite a bit, because your delta base cache
will tend to have the base object (i.e. objects in the newer part of
the history you just walked) that will be required to access the
"next" older part of the history more often than not.

Trying to read the objects in the pack in their object name order
would essentially mean reading them in a cryptgraphically random
order.  Half the time you will end up wanting to access an object
that is near the tip of a very deep delta chain even before you've
accessed any of the base objects in the delta chain.

> [1] - Please excuse the bikeshed: '--additive' is such a strange
>       sounding option name, at least for me.  '--append', perhaps?

Yeah, I think "fetch --append" is probably a precedence.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-16 18:33       ` Junio C Hamano
@ 2018-03-16 19:48         ` SZEDER Gábor
  2018-03-16 20:06           ` Jeff King
  2018-03-16 20:49         ` Jeff King
  1 sibling, 1 reply; 110+ messages in thread
From: SZEDER Gábor @ 2018-03-16 19:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Derrick Stolee,
	Git mailing list, Jeff King, Stefan Beller, Ramsay Jones, git,
	Derrick Stolee

On Fri, Mar 16, 2018 at 7:33 PM, Junio C Hamano <gitster@pobox.com> wrote:
> SZEDER Gábor <szeder.dev@gmail.com> writes:
>
>> You should forget '--stdin-packs' and use '--stdin-commits' to generate
>> the initial graph, it's much faster even without '--additive'[1].  See
>>
>>   https://public-inbox.org/git/CAM0VKj=wmkBNH=psCRztXFrC13RiG1EaSw89Q6LJaNsdJDEFHg@mail.gmail.com/
>>
>> I still think that the default behaviour for 'git commit-graph write'
>> should simply walk history from all refs instead of enumerating all
>> objects in all packfiles.
>
> Somehow I missed that one.  Thanks for the link to it.
>
> It is not so surprising that history walking runs rings around
> enumerating objects in packfiles, if packfiles are built well.
>
> A well-built packfile tends to has newer objects in base form and
> has delta that goes in backward direction (older objects are
> represented as delta against newer ones).  This helps warlking from
> the tips of the history quite a bit, because your delta base cache
> will tend to have the base object (i.e. objects in the newer part of
> the history you just walked) that will be required to access the
> "next" older part of the history more often than not.
>
> Trying to read the objects in the pack in their object name order
> would essentially mean reading them in a cryptgraphically random
> order.  Half the time you will end up wanting to access an object
> that is near the tip of a very deep delta chain even before you've
> accessed any of the base objects in the delta chain.

I came up with a different explanation back then: we are only interested
in commit objects when creating the commit graph, and only a small-ish
fraction of all objects are commit objects, so the "enumerate objects in
packfiles" approach has to look at a lot more objects:

  # in my git fork
  $ git rev-list --all --objects |cut -d' ' -f1 |\
    git cat-file --batch-check='%(objecttype) %(objectsize)' >type-size
  $ grep -c ^commit type-size
  53754
  $ wc -l type-size
  244723 type-size

I.e. only about 20% of all objects are commit objects.

Furthermore, in order to look at an object it has to be zlib inflated
first, and since commit objects tend to be much smaller than trees and
especially blobs, there are a lot less bytes to inflate:

  $ grep ^commit type-size |cut -d' ' -f2 |avg
  34395730 / 53754 = 639
  $ cat type-size |cut -d' ' -f2 |avg
  3866685744 / 244723 = 15800

So a simple revision walk inflates less than 1% of the bytes that the
"enumerate objects packfiles" approach has to inflate.


>> [1] - Please excuse the bikeshed: '--additive' is such a strange
>>       sounding option name, at least for me.  '--append', perhaps?
>
> Yeah, I think "fetch --append" is probably a precedence.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-16 19:48         ` SZEDER Gábor
@ 2018-03-16 20:06           ` Jeff King
  2018-03-16 20:19             ` Jeff King
  0 siblings, 1 reply; 110+ messages in thread
From: Jeff King @ 2018-03-16 20:06 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Derrick Stolee, Git mailing list, Stefan Beller, Ramsay Jones,
	git, Derrick Stolee

On Fri, Mar 16, 2018 at 08:48:49PM +0100, SZEDER Gábor wrote:

> I came up with a different explanation back then: we are only interested
> in commit objects when creating the commit graph, and only a small-ish
> fraction of all objects are commit objects, so the "enumerate objects in
> packfiles" approach has to look at a lot more objects:
> 
>   # in my git fork
>   $ git rev-list --all --objects |cut -d' ' -f1 |\
>     git cat-file --batch-check='%(objecttype) %(objectsize)' >type-size
>   $ grep -c ^commit type-size
>   53754
>   $ wc -l type-size
>   244723 type-size
> 
> I.e. only about 20% of all objects are commit objects.
> 
> Furthermore, in order to look at an object it has to be zlib inflated
> first, and since commit objects tend to be much smaller than trees and
> especially blobs, there are a lot less bytes to inflate:
> 
>   $ grep ^commit type-size |cut -d' ' -f2 |avg
>   34395730 / 53754 = 639
>   $ cat type-size |cut -d' ' -f2 |avg
>   3866685744 / 244723 = 15800
> 
> So a simple revision walk inflates less than 1% of the bytes that the
> "enumerate objects packfiles" approach has to inflate.

I don't think this is quite accurate. It's true that we have to
_consider_ every object, but Git is smart enough not to inflate each one
to find its type. For loose objects we just inflate the header. For
packed objects, we either pick the type directly out of the packfile
header (for a non-delta) or can walk the delta chain (without actually
looking at the data bytes!) until we hit the base.

So starting from scratch:

  git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectname)' |
  grep ^commit |
  cut -d' ' -f2 |
  git cat-file --batch

is in the same ballpark for most repos as:

  git rev-list --all |
  git cat-file --batch

though in my timings the traversal is a little bit faster (and I'd
expect that to remain the case when doing it all in a single process,
since the traversal only follows commit links, whereas processing the
object list has to do the type lookup for each object before deciding
whether to inflate it).

I'm not sure, though, if that edge would remain for incremental updates.
For instance, after we take in some new objects via "fetch", the
traversal strategy would want to do something like:

  git rev-list $new_tips --not --all |
  git cat-file --batch

whose performance will depend on the refs _currently_ in the repository,
as we load them as UNINTERESTING tips for the walk. Whereas doing:

  git show-index <.git/objects/pack/the-one-new-packfile.idx |
  cut -d' ' -f2 |
  git cat-file --batch-check='%(objecttype) %(objectname)' |
  grep ^commit |
  cut -d' ' -f2 |
  git cat-file --batch

always scales exactly with the size of the new objects (obviously that's
kind of baroque and this would all be done internally, but I'm trying to
demonstrate the algorithmic complexity). I'm not sure what the plan
would be if we explode loose objects, though. ;)

-Peff

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-16 20:06           ` Jeff King
@ 2018-03-16 20:19             ` Jeff King
  2018-03-19 12:55               ` Derrick Stolee
  0 siblings, 1 reply; 110+ messages in thread
From: Jeff King @ 2018-03-16 20:19 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Derrick Stolee, Git mailing list, Stefan Beller, Ramsay Jones,
	git, Derrick Stolee

On Fri, Mar 16, 2018 at 04:06:39PM -0400, Jeff King wrote:

> > Furthermore, in order to look at an object it has to be zlib inflated
> > first, and since commit objects tend to be much smaller than trees and
> > especially blobs, there are a lot less bytes to inflate:
> > 
> >   $ grep ^commit type-size |cut -d' ' -f2 |avg
> >   34395730 / 53754 = 639
> >   $ cat type-size |cut -d' ' -f2 |avg
> >   3866685744 / 244723 = 15800
> > 
> > So a simple revision walk inflates less than 1% of the bytes that the
> > "enumerate objects packfiles" approach has to inflate.
> 
> I don't think this is quite accurate. It's true that we have to
> _consider_ every object, but Git is smart enough not to inflate each one
> to find its type. For loose objects we just inflate the header. For
> packed objects, we either pick the type directly out of the packfile
> header (for a non-delta) or can walk the delta chain (without actually
> looking at the data bytes!) until we hit the base.

Hmm, so that's a big part of the problem with this patch series. It
actually _does_ unpack every object with --stdin-packs to get the type,
which is just silly. With the patch below, my time for "commit-graph
write --stdin-packs" on linux.git goes from over 5 minutes (I got bored
and killed it) to 17 seconds.

diff --git a/commit-graph.c b/commit-graph.c
index 6348bab82b..cf1da2e8c1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -491,11 +491,12 @@ static int add_packed_commits(const struct object_id *oid,
 {
 	struct packed_oid_list *list = (struct packed_oid_list*)data;
 	enum object_type type;
-	unsigned long size;
-	void *inner_data;
 	off_t offset = nth_packed_object_offset(pack, pos);
-	inner_data = unpack_entry(pack, offset, &type, &size);
-	FREE_AND_NULL(inner_data);
+	struct object_info oi = OBJECT_INFO_INIT;
+
+	oi.typep = &type;
+	if (packed_object_info(pack, offset, &oi) < 0)
+		die("unable to get type of object %s", oid_to_hex(oid));
 
 	if (type != OBJ_COMMIT)
 		return 0;

-Peff

^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-16 18:33       ` Junio C Hamano
  2018-03-16 19:48         ` SZEDER Gábor
@ 2018-03-16 20:49         ` Jeff King
  1 sibling, 0 replies; 110+ messages in thread
From: Jeff King @ 2018-03-16 20:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: SZEDER Gábor, Ævar Arnfjörð Bjarmason,
	Derrick Stolee, Git mailing list, Stefan Beller, Ramsay Jones,
	git, Derrick Stolee

On Fri, Mar 16, 2018 at 11:33:55AM -0700, Junio C Hamano wrote:

> It is not so surprising that history walking runs rings around
> enumerating objects in packfiles, if packfiles are built well.
> 
> A well-built packfile tends to has newer objects in base form and
> has delta that goes in backward direction (older objects are
> represented as delta against newer ones).  This helps warlking from
> the tips of the history quite a bit, because your delta base cache
> will tend to have the base object (i.e. objects in the newer part of
> the history you just walked) that will be required to access the
> "next" older part of the history more often than not.
> 
> Trying to read the objects in the pack in their object name order
> would essentially mean reading them in a cryptgraphically random
> order.  Half the time you will end up wanting to access an object
> that is near the tip of a very deep delta chain even before you've
> accessed any of the base objects in the delta chain.

I coincidentally was doing some experiments in this area a few weeks
ago, and found a few things:

  1. The ordering makes a _huge_ difference for accessing trees and
     blobs.

  2. Pack order (not pack-idx order) is actually the best order, since
     it tends to follow the delta patterns (it's close to traversal
     order, but packs delta families more tightly).

  3. None of this really matters for commits, since we almost never
     store them as deltas anyway.

Here are a few experiments people can do themselves to demonstrate (my
numbers here are all from linux.git, which is sort of a wort-case
for bad ordering because its size stresses the default delta cache):

  [every object in sha1 order: slow]
  $ time git cat-file --batch-all-objects --batch >/dev/null
  real	8m44.041s
  user	8m31.359s
  sys	0m12.262s

  [every object from a traversal: faster, but --objects traversals are
   actually CPU heavy due to all of the hash lookups for each tree. Note
   not just wall-clock time but the CPU since it's split across two
   processes]
  $ time git rev-list --objects --all |
         cut -d' ' -f2 |
	 git cat-file --batch >/dev/null
  real	1m2.667s
  user	0m58.537s
  sys	0m32.392s

  [every object in pack order: fastest. This is due to skipping the
   traversal overhead, and should use our delta cache quite efficiently.
   I'm assuming a single pack and no loose objects here, but the
   performance should generalize since accessing the "big" pack
   dominates]
  $ time git show-index <$(ls .git/objects/pack/*.idx) |
         sort -n |
         cut -d' ' -f2 |
	 git cat-file --batch >/dev/null
  real	0m51.718s
  user	0m50.963s
  sys	0m7.068s

  [just commits, sha1 order: not horrible]
  $ time git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectname)' |
         grep ^commit |
	 cut -d' ' -f2 |
	 git cat-file --batch >/dev/null
  real	0m8.115s
  user	0m14.033s
  sys	0m1.170s

  [just commits, pack order: slightly worse due to the extra piping, but
   obviously that could be done more quickly internally]
  $ time git show-index <$(ls .git/objects/pack/*.idx) |
         sort -n |
         cut -d' ' -f2 |
	 git cat-file --batch-check='%(objecttype) %(objectname)' |
         grep ^commit |
	 cut -d' ' -f2 |
	 git cat-file --batch >/dev/null
  real	0m21.670s
  user	0m24.867s
  sys	0m9.600s

  [and the reason is that hardly any commits get deltas]
  $ git cat-file --batch-all-objects --batch-check='%(objecttype) %(deltabase)' |
    grep ^commit >commits
  $ wc -l commits
  692596
  $ grep -v '0000000000000000000000000000000000000000' commits | wc -l
  18856

For the purposes of this patch series, I don't think the order matters
much, since we're only dealing with commits. For doing --batch-check, I
think the sha1 ordering given by "cat-file --batch-all-objects" is
convenient, and doesn't have a big impact on performance. But it's
_awful_ for --batch. I think we may want to add a sorting option to just
return the objects in the original packfile order.

-Peff

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 07/14] commit-graph: implement 'git-commit-graph write'
  2018-03-14 19:27   ` [PATCH v6 07/14] commit-graph: implement 'git-commit-graph write' Derrick Stolee
@ 2018-03-18 13:25     ` Ævar Arnfjörð Bjarmason
  2018-03-19 13:12       ` Derrick Stolee
  0 siblings, 1 reply; 110+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-18 13:25 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, peff, sbeller, szeder.dev, ramsay, git,
	Derrick Stolee


On Wed, Mar 14 2018, Derrick Stolee jotted:

> +'git commit-graph write' <options> [--object-dir <dir>]
> +
> +
> +DESCRIPTION
> +-----------
> +
> +Manage the serialized commit graph file.
> +
> +
> +OPTIONS
> +-------
> +--object-dir::
> +	Use given directory for the location of packfiles and commit graph
> +	file. The commit graph file is expected to be at <dir>/info/commit-graph
> +	and the packfiles are expected to be in <dir>/pack.

Maybe this was covered in a previous round, this series is a little hard
to follow since each version isn't In-Reply-To the version before it,
but why is this option needed, i.e. why would you do:

    git commit-graph write --object-dir=/some/path/.git/objects

As opposed to just pigging-backing on what we already have with both of:

    git --git-dir=/some/path/.git commit-graph write
    git -C /some/path commit-graph write

Is there some use-case where you have *just* the objects dir and not the
rest of the .git folder?

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-16 17:27           ` Junio C Hamano
@ 2018-03-19 11:41             ` Johannes Schindelin
  0 siblings, 0 replies; 110+ messages in thread
From: Johannes Schindelin @ 2018-03-19 11:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, git, peff, sbeller, szeder.dev, ramsay, git,
	Derrick Stolee

Hi Junio,

On Fri, 16 Mar 2018, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > I understand that it is a trade-off between time you have to spend and
> > that others have to spend, and since you do not scale, that trade-off
> > has to be in your favor.
> 
> That tradeoff may exist, but it does not weigh in the picture above
> at all.

It does, however. You frequently do not even tell the original contributor
what changes you made while applying the patches. I know because I have
been surprised by some of those changes, long after you merged them into
`master`.

And quite honestly, my time is valuable, too. So you should stop assuming
that I, for one (and probably other contributors, too) compare carefully
what differences exist between the local topic branch and what you chose
to make of the patches. I cannot make you stop suggesting that, but I can
tell you right away that I won't do that unless I *have* to. It is an
inefficient use of my time. I wish you would also realize that it
invariably leads to your having to "touch up" iteration after iteration
because your touch-ups had not been picked up.

Having said that, I can live with the status quo. I have a track record of
being able to.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-16 20:19             ` Jeff King
@ 2018-03-19 12:55               ` Derrick Stolee
  2018-03-20  1:17                 ` Derrick Stolee
  0 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-03-19 12:55 UTC (permalink / raw)
  To: Jeff King, SZEDER Gábor
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Git mailing list, Stefan Beller, Ramsay Jones, git,
	Derrick Stolee



On 3/16/2018 4:19 PM, Jeff King wrote:
> On Fri, Mar 16, 2018 at 04:06:39PM -0400, Jeff King wrote:
>
>>> Furthermore, in order to look at an object it has to be zlib inflated
>>> first, and since commit objects tend to be much smaller than trees and
>>> especially blobs, there are a lot less bytes to inflate:
>>>
>>>    $ grep ^commit type-size |cut -d' ' -f2 |avg
>>>    34395730 / 53754 = 639
>>>    $ cat type-size |cut -d' ' -f2 |avg
>>>    3866685744 / 244723 = 15800
>>>
>>> So a simple revision walk inflates less than 1% of the bytes that the
>>> "enumerate objects packfiles" approach has to inflate.
>> I don't think this is quite accurate. It's true that we have to
>> _consider_ every object, but Git is smart enough not to inflate each one
>> to find its type. For loose objects we just inflate the header. For
>> packed objects, we either pick the type directly out of the packfile
>> header (for a non-delta) or can walk the delta chain (without actually
>> looking at the data bytes!) until we hit the base.
> Hmm, so that's a big part of the problem with this patch series. It
> actually _does_ unpack every object with --stdin-packs to get the type,
> which is just silly. With the patch below, my time for "commit-graph
> write --stdin-packs" on linux.git goes from over 5 minutes (I got bored
> and killed it) to 17 seconds.
>
> diff --git a/commit-graph.c b/commit-graph.c
> index 6348bab82b..cf1da2e8c1 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -491,11 +491,12 @@ static int add_packed_commits(const struct object_id *oid,
>   {
>   	struct packed_oid_list *list = (struct packed_oid_list*)data;
>   	enum object_type type;
> -	unsigned long size;
> -	void *inner_data;
>   	off_t offset = nth_packed_object_offset(pack, pos);
> -	inner_data = unpack_entry(pack, offset, &type, &size);
> -	FREE_AND_NULL(inner_data);
> +	struct object_info oi = OBJECT_INFO_INIT;
> +
> +	oi.typep = &type;
> +	if (packed_object_info(pack, offset, &oi) < 0)
> +		die("unable to get type of object %s", oid_to_hex(oid));
>   
>   	if (type != OBJ_COMMIT)
>   		return 0;
>
> -Peff

Thanks for this! Fixing this performance problem is very important to 
me, as we will use the "--stdin-packs" mechanism in the GVFS scenario 
(we will walk all commits in the prefetch packs full of commits and 
trees instead of relying on refs). This speedup is very valuable!

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-16 16:28     ` Lars Schneider
@ 2018-03-19 13:10       ` Derrick Stolee
  0 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-19 13:10 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Git List, Jeff King, Stefan Beller, SZEDER Gábor,
	Ramsay Jones, git, Derrick Stolee, Junio C Hamano,
	Johannes Schindelin

On 3/16/2018 12:28 PM, Lars Schneider wrote:
>
>> On 14 Mar 2018, at 21:43, Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Derrick Stolee <stolee@gmail.com> writes:
>>
>>
>>> Hopefully this version is ready to merge. I have several follow-up topics
>>> in mind to submit soon after, including:
>> A few patches add trailing blank lines and other whitespace
>> breakages, which will stop my "git merge" later to 'next' and down,
>> as I have a pre-commit hook to catch them.
> @stolee:
>
> I run "git --no-pager diff --check $BASE_HASH...$HEAD_HASH" to detect
> these kinds of things. I run this as part of my "prepare patch" [1] script
> which is inspired by a similar script originally written by Dscho.
>
> Do you think it would make sense to mention (or even
> recommend) such a script in your awesome GfW CONTRIBUTING.md?
>
>
> - Lars
>
>
> [1] https://github.com/larsxschneider/git-list-helper/blob/master/prepare-patch.sh#L71
>

Thanks for the suggestions. Somehow I got extra whitespace doing 
copy/paste in vim and I never re-opened that file in my normal editor 
(VS Code with an extension that shows trailing whitespace).

On 3/15/2018 1:23 PM, Johannes Schindelin wrote:
> git log --check`
> was introduced to show you whitespace problems). If all of those
> whitespace issues are unintentional, you can fix them using `git rebase
> --whitespace=fix` in the most efficient way.

Thanks for both of the suggestions. The `rebase` check was already in 
the document, so I put the checks immediately above that line. PR is 
available now [1].

Thanks,
-Stolee

[1] https://github.com/git-for-windows/git/pull/1567

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 07/14] commit-graph: implement 'git-commit-graph write'
  2018-03-18 13:25     ` Ævar Arnfjörð Bjarmason
@ 2018-03-19 13:12       ` Derrick Stolee
  2018-03-19 14:36         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-03-19 13:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, gitster, peff, sbeller, szeder.dev, ramsay, git,
	Derrick Stolee

On 3/18/2018 9:25 AM, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Mar 14 2018, Derrick Stolee jotted:
>
>> +'git commit-graph write' <options> [--object-dir <dir>]
>> +
>> +
>> +DESCRIPTION
>> +-----------
>> +
>> +Manage the serialized commit graph file.
>> +
>> +
>> +OPTIONS
>> +-------
>> +--object-dir::
>> +	Use given directory for the location of packfiles and commit graph
>> +	file. The commit graph file is expected to be at <dir>/info/commit-graph
>> +	and the packfiles are expected to be in <dir>/pack.
> Maybe this was covered in a previous round, this series is a little hard
> to follow since each version isn't In-Reply-To the version before it,
> but why is this option needed, i.e. why would you do:
>
>      git commit-graph write --object-dir=/some/path/.git/objects
>
> As opposed to just pigging-backing on what we already have with both of:
>
>      git --git-dir=/some/path/.git commit-graph write
>      git -C /some/path commit-graph write
>
> Is there some use-case where you have *just* the objects dir and not the
> rest of the .git folder?

Yes, such as an alternate. If I remember correctly, alternates only need 
the objects directory.

In the GVFS case, we place prefetch packfiles in an alternate so there 
is only one copy of the "remote objects" per drive. The commit graph 
will be stored in that alternate.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 12/14] commit-graph: read only from specific pack-indexes
  2018-03-15 22:50     ` SZEDER Gábor
@ 2018-03-19 13:13       ` Derrick Stolee
  0 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-19 13:13 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Git mailing list, Junio C Hamano, Jeff King, Stefan Beller,
	Ramsay Jones, git, Derrick Stolee

On 3/15/2018 6:50 PM, SZEDER Gábor wrote:
> On Wed, Mar 14, 2018 at 8:27 PM, Derrick Stolee <stolee@gmail.com> wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Teach git-commit-graph to inspect the objects only in a certain list
>> of pack-indexes within the given pack directory. This allows updating
>> the commit graph iteratively.
> This commit message, and indeed the code itself talk about pack
> indexes ...
>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   Documentation/git-commit-graph.txt | 11 ++++++++++-
>>   builtin/commit-graph.c             | 33 ++++++++++++++++++++++++++++++---
>>   commit-graph.c                     | 26 ++++++++++++++++++++++++--
>>   commit-graph.h                     |  4 +++-
>>   packfile.c                         |  4 ++--
>>   packfile.h                         |  2 ++
>>   t/t5318-commit-graph.sh            | 10 ++++++++++
>>   7 files changed, 81 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
>> index 51cb038f3d..b945510f0f 100644
>> --- a/Documentation/git-commit-graph.txt
>> +++ b/Documentation/git-commit-graph.txt
>> @@ -32,7 +32,9 @@ COMMANDS
>>   'write'::
>>
>>   Write a commit graph file based on the commits found in packfiles.
>> -Includes all commits from the existing commit graph file.
>> ++
>> +With the `--stdin-packs` option, generate the new commit graph by
>> +walking objects only in the specified packfiles.
> ... but this piece of documentation ...
>
>> +               OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
>> +                       N_("scan packfiles listed by stdin for commits")),
> ... and this help text, and even the name of the option talk about
> packfiles.

Thanks! I'll fix that.

-Stolee

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 07/14] commit-graph: implement 'git-commit-graph write'
  2018-03-19 13:12       ` Derrick Stolee
@ 2018-03-19 14:36         ` Ævar Arnfjörð Bjarmason
  2018-03-19 18:27           ` Derrick Stolee
  0 siblings, 1 reply; 110+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-19 14:36 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, peff, sbeller, szeder.dev, ramsay, git,
	Derrick Stolee


On Mon, Mar 19 2018, Derrick Stolee jotted:

> On 3/18/2018 9:25 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Wed, Mar 14 2018, Derrick Stolee jotted:
>>
>>> +'git commit-graph write' <options> [--object-dir <dir>]
>>> +
>>> +
>>> +DESCRIPTION
>>> +-----------
>>> +
>>> +Manage the serialized commit graph file.
>>> +
>>> +
>>> +OPTIONS
>>> +-------
>>> +--object-dir::
>>> +	Use given directory for the location of packfiles and commit graph
>>> +	file. The commit graph file is expected to be at <dir>/info/commit-graph
>>> +	and the packfiles are expected to be in <dir>/pack.
>> Maybe this was covered in a previous round, this series is a little hard
>> to follow since each version isn't In-Reply-To the version before it,
>> but why is this option needed, i.e. why would you do:
>>
>>      git commit-graph write --object-dir=/some/path/.git/objects
>>
>> As opposed to just pigging-backing on what we already have with both of:
>>
>>      git --git-dir=/some/path/.git commit-graph write
>>      git -C /some/path commit-graph write
>>
>> Is there some use-case where you have *just* the objects dir and not the
>> rest of the .git folder?
>
> Yes, such as an alternate. If I remember correctly, alternates only
> need the objects directory.
>
> In the GVFS case, we place prefetch packfiles in an alternate so there
> is only one copy of the "remote objects" per drive. The commit graph
> will be stored in that alternate.

Makes sense, but we should really document this as being such an unusual
option, i.e. instead say something like.

    Use given directory for the location of packfiles and commit graph
    file. Usually you'd use the `--git-dir` or `-C` arguments to `git`
    itself. This option is here to support obscure use-cases where we
    have a stand-alone object directory. The commit graph file is
    expected to be at <dir>/info/commit-graph and the packfiles are
    expected to be in <dir>/pack.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 07/14] commit-graph: implement 'git-commit-graph write'
  2018-03-19 14:36         ` Ævar Arnfjörð Bjarmason
@ 2018-03-19 18:27           ` Derrick Stolee
  2018-03-19 18:48             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-03-19 18:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, gitster, peff, sbeller, szeder.dev, ramsay, git,
	Derrick Stolee

On 3/19/2018 10:36 AM, Ævar Arnfjörð Bjarmason wrote:
> On Mon, Mar 19 2018, Derrick Stolee jotted:
>
>> On 3/18/2018 9:25 AM, Ævar Arnfjörð Bjarmason wrote:
>>> On Wed, Mar 14 2018, Derrick Stolee jotted:
>>>
>>>> +'git commit-graph write' <options> [--object-dir <dir>]
>>>> +
>>>> +
>>>> +DESCRIPTION
>>>> +-----------
>>>> +
>>>> +Manage the serialized commit graph file.
>>>> +
>>>> +
>>>> +OPTIONS
>>>> +-------
>>>> +--object-dir::
>>>> +	Use given directory for the location of packfiles and commit graph
>>>> +	file. The commit graph file is expected to be at <dir>/info/commit-graph
>>>> +	and the packfiles are expected to be in <dir>/pack.
>>> Maybe this was covered in a previous round, this series is a little hard
>>> to follow since each version isn't In-Reply-To the version before it,
>>> but why is this option needed, i.e. why would you do:
>>>
>>>       git commit-graph write --object-dir=/some/path/.git/objects
>>>
>>> As opposed to just pigging-backing on what we already have with both of:
>>>
>>>       git --git-dir=/some/path/.git commit-graph write
>>>       git -C /some/path commit-graph write
>>>
>>> Is there some use-case where you have *just* the objects dir and not the
>>> rest of the .git folder?
>> Yes, such as an alternate. If I remember correctly, alternates only
>> need the objects directory.
>>
>> In the GVFS case, we place prefetch packfiles in an alternate so there
>> is only one copy of the "remote objects" per drive. The commit graph
>> will be stored in that alternate.
> Makes sense, but we should really document this as being such an unusual
> option, i.e. instead say something like.
>
>      Use given directory for the location of packfiles and commit graph
>      file. Usually you'd use the `--git-dir` or `-C` arguments to `git`
>      itself. This option is here to support obscure use-cases where we
>      have a stand-alone object directory. The commit graph file is
>      expected to be at <dir>/info/commit-graph and the packfiles are
>      expected to be in <dir>/pack.

A slight change to your recommendation:


OPTIONS
-------
--object-dir::
         Use given directory for the location of packfiles and commit graph
         file. This parameter exists to specify the location of an alternate
         that only has the objects directory, not a full .git directory. The
         commit graph file is expected to be at <dir>/info/commit-graph and
         the packfiles are expected to be in <dir>/pack.


^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 07/14] commit-graph: implement 'git-commit-graph write'
  2018-03-19 18:27           ` Derrick Stolee
@ 2018-03-19 18:48             ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 110+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-03-19 18:48 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, peff, sbeller, szeder.dev, ramsay, git,
	Derrick Stolee


On Mon, Mar 19 2018, Derrick Stolee jotted:

> On 3/19/2018 10:36 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Mon, Mar 19 2018, Derrick Stolee jotted:
>>
>>> On 3/18/2018 9:25 AM, Ævar Arnfjörð Bjarmason wrote:
>>>> On Wed, Mar 14 2018, Derrick Stolee jotted:
>>>>
>>>>> +'git commit-graph write' <options> [--object-dir <dir>]
>>>>> +
>>>>> +
>>>>> +DESCRIPTION
>>>>> +-----------
>>>>> +
>>>>> +Manage the serialized commit graph file.
>>>>> +
>>>>> +
>>>>> +OPTIONS
>>>>> +-------
>>>>> +--object-dir::
>>>>> +	Use given directory for the location of packfiles and commit graph
>>>>> +	file. The commit graph file is expected to be at <dir>/info/commit-graph
>>>>> +	and the packfiles are expected to be in <dir>/pack.
>>>> Maybe this was covered in a previous round, this series is a little hard
>>>> to follow since each version isn't In-Reply-To the version before it,
>>>> but why is this option needed, i.e. why would you do:
>>>>
>>>>       git commit-graph write --object-dir=/some/path/.git/objects
>>>>
>>>> As opposed to just pigging-backing on what we already have with both of:
>>>>
>>>>       git --git-dir=/some/path/.git commit-graph write
>>>>       git -C /some/path commit-graph write
>>>>
>>>> Is there some use-case where you have *just* the objects dir and not the
>>>> rest of the .git folder?
>>> Yes, such as an alternate. If I remember correctly, alternates only
>>> need the objects directory.
>>>
>>> In the GVFS case, we place prefetch packfiles in an alternate so there
>>> is only one copy of the "remote objects" per drive. The commit graph
>>> will be stored in that alternate.
>> Makes sense, but we should really document this as being such an unusual
>> option, i.e. instead say something like.
>>
>>      Use given directory for the location of packfiles and commit graph
>>      file. Usually you'd use the `--git-dir` or `-C` arguments to `git`
>>      itself. This option is here to support obscure use-cases where we
>>      have a stand-alone object directory. The commit graph file is
>>      expected to be at <dir>/info/commit-graph and the packfiles are
>>      expected to be in <dir>/pack.
>
> A slight change to your recommendation:
>
>
> OPTIONS
> -------
> --object-dir::
>  Use given directory for the location of packfiles and commit graph
>  file. This parameter exists to specify the location of an alternate
>  that only has the objects directory, not a full .git directory. The
>  commit graph file is expected to be at <dir>/info/commit-graph and
>  the packfiles are expected to be in <dir>/pack.

Sounds good. Although I think we should add

    For full .git directories use the `--git-dir` or `-C` arguments to
    git itself.

I.e. for documenting an unusual option it makes sense to have docs in
the form "this is bit odd, usually you'd use XYZ", rather than just
"this is a bit odd"..

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v6 00/14] Serialized Git Commit Graph
  2018-03-19 12:55               ` Derrick Stolee
@ 2018-03-20  1:17                 ` Derrick Stolee
  0 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-03-20  1:17 UTC (permalink / raw)
  To: Jeff King, SZEDER Gábor
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Git mailing list, Stefan Beller, Ramsay Jones, git,
	Derrick Stolee

On 3/19/2018 8:55 AM, Derrick Stolee wrote:
>
> Thanks for this! Fixing this performance problem is very important to 
> me, as we will use the "--stdin-packs" mechanism in the GVFS scenario 
> (we will walk all commits in the prefetch packs full of commits and 
> trees instead of relying on refs). This speedup is very valuable!
>
> Thanks,
> -Stolee

Also, for those interested in this series, I plan to do a rebase onto 
2.17.0, when available, as my re-roll. I pushed my responses to the 
current feedback at the GitHub PR for the series [1].

If you are planning to provide more feedback to the series, then please 
let me know and I'll delay my re-roll so you have a chance to review.

Thanks,
-Stolee

[1] https://github.com/derrickstolee/git/pull/2

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v7 00/14] Serialized Git Commit Graph
  2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
                     ` (16 preceding siblings ...)
  2018-03-16 15:06   ` Ævar Arnfjörð Bjarmason
@ 2018-04-02 20:34   ` Derrick Stolee
  2018-04-02 20:34     ` [PATCH v7 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
                       ` (14 more replies)
  17 siblings, 15 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

This patch has only a few changes since v6:

* Fixed whitespace issues using 'git rebase --whitespace=fix'

* The --stdin-packs docs now refer to "pack-indexes" insead of "packs"

* Modified description of --object-dir option to warn use is rare

* Replaced '--additive' with '--append'

* In "commit-graph: close under reachability" I greatly simplified
  the check that every reachable commit is included. While running
  tests I noticed that the revision walk machinery could not keep up
  with a very large queue created when combined with the '--append'
  option that added all commits from the existing file as starting
  points for the walk. The new algorithm simply appends missing commits
  to the end of the list, which are then iterated to ensure their
  parents are in the list.

I have a few patch series prepared that provide further performance
improvments following this patch.

-- >8 --

This patch contains a way to serialize the commit graph.

The current implementation defines a new file format to store the graph
structure (parent relationships) and basic commit metadata (commit date,
root tree OID) in order to prevent parsing raw commits while performing
basic graph walks. For example, we do not need to parse the full commit
when performing these walks:

* 'git log --topo-order -1000' walks all reachable commits to avoid
  incorrect topological orders, but only needs the commit message for
  the top 1000 commits.

* 'git merge-base <A> <B>' may walk many commits to find the correct
  boundary between the commits reachable from A and those reachable
  from B. No commit messages are needed.

* 'git branch -vv' checks ahead/behind status for all local branches
  compared to their upstream remote branches. This is essentially as
  hard as computing merge bases for each.

The current patch speeds up these calculations by injecting a check in
parse_commit_gently() to check if there is a graph file and using that
to provide the required metadata to the struct commit.

The file format has room to store generation numbers, which will be
provided as a patch after this framework is merged. Generation numbers
are referenced by the design document but not implemented in order to
make the current patch focus on the graph construction process. Once
that is stable, it will be easier to add generation numbers and make
graph walks aware of generation numbers one-by-one.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 678,653
reachable commits and is behind 'origin/master' by 59,929 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  8.31s |  0.94s | -88%  |
| branch -vv                       |  1.02s |  0.14s | -86%  |
| rev-list --all                   |  5.89s |  1.07s | -81%  |
| rev-list --all --objects         | 66.15s | 58.45s | -11%  |

To test this yourself, run the following on your repo:

  git config core.commitGraph true
  git show-ref -s | git commit-graph write --stdin-commits

The second command writes a commit graph file containing every commit
reachable from your refs. Now, all git commands that walk commits will
check your graph first before consulting the ODB. You can run your own
performance comparisons by toggling the 'core.commitGraph' setting.

[1] https://github.com/derrickstolee/git/pull/2
    A GitHub pull request containing the latest version of this patch.

Derrick Stolee (14):
  csum-file: rename hashclose() to finalize_hashfile()
  csum-file: refactor finalize_hashfile() method
  commit-graph: add format document
  graph: add commit graph design document
  commit-graph: create git-commit-graph builtin
  commit-graph: implement write_commit_graph()
  commit-graph: implement git-commit-graph write
  commit-graph: implement git commit-graph read
  commit-graph: add core.commitGraph setting
  commit-graph: close under reachability
  commit: integrate commit graph with commit parsing
  commit-graph: read only from specific pack-indexes
  commit-graph: build graph from starting commits
  commit-graph: implement "--additive" option

 .gitignore                                    |   1 +
 Documentation/config.txt                      |   4 +
 Documentation/git-commit-graph.txt            |  94 +++
 .../technical/commit-graph-format.txt         |  97 +++
 Documentation/technical/commit-graph.txt      | 163 ++++
 Makefile                                      |   2 +
 alloc.c                                       |   1 +
 builtin.h                                     |   1 +
 builtin/commit-graph.c                        | 171 ++++
 builtin/index-pack.c                          |   2 +-
 builtin/pack-objects.c                        |   6 +-
 bulk-checkin.c                                |   4 +-
 cache.h                                       |   1 +
 command-list.txt                              |   1 +
 commit-graph.c                                | 738 ++++++++++++++++++
 commit-graph.h                                |  46 ++
 commit.c                                      |   3 +
 commit.h                                      |   3 +
 config.c                                      |   5 +
 contrib/completion/git-completion.bash        |   2 +
 csum-file.c                                   |  10 +-
 csum-file.h                                   |   9 +-
 environment.c                                 |   1 +
 fast-import.c                                 |   2 +-
 git.c                                         |   1 +
 pack-bitmap-write.c                           |   2 +-
 pack-write.c                                  |   5 +-
 packfile.c                                    |   4 +-
 packfile.h                                    |   2 +
 t/t5318-commit-graph.sh                       | 224 ++++++
 30 files changed, 1584 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 Documentation/technical/commit-graph-format.txt
 create mode 100644 Documentation/technical/commit-graph.txt
 create mode 100644 builtin/commit-graph.c
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h
 create mode 100755 t/t5318-commit-graph.sh


base-commit: 468165c1d8a442994a825f3684528361727cd8c0
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v7 01/14] csum-file: rename hashclose() to finalize_hashfile()
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-02 20:34     ` [PATCH v7 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
                       ` (13 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The hashclose() method behaves very differently depending on the flags
parameter. In particular, the file descriptor is not always closed.

Perform a simple rename of "hashclose()" to "finalize_hashfile()" in
preparation for functional changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/index-pack.c   | 2 +-
 builtin/pack-objects.c | 6 +++---
 bulk-checkin.c         | 4 ++--
 csum-file.c            | 2 +-
 csum-file.h            | 4 ++--
 fast-import.c          | 2 +-
 pack-bitmap-write.c    | 2 +-
 pack-write.c           | 4 ++--
 8 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index bda84a92ef..8bcf280e0b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1270,7 +1270,7 @@ static void conclude_pack(int fix_thin_pack, const char *curr_pack, unsigned cha
 			    nr_objects - nr_objects_initial);
 		stop_progress_msg(&progress, msg.buf);
 		strbuf_release(&msg);
-		hashclose(f, tail_hash, 0);
+		finalize_hashfile(f, tail_hash, 0);
 		hashcpy(read_hash, pack_hash);
 		fixup_pack_header_footer(output_fd, pack_hash,
 					 curr_pack, nr_objects,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e9d3cfb9e3..ab3e80ee49 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -837,11 +837,11 @@ static void write_pack_file(void)
 		 * If so, rewrite it like in fast-import
 		 */
 		if (pack_to_stdout) {
-			hashclose(f, oid.hash, CSUM_CLOSE);
+			finalize_hashfile(f, oid.hash, CSUM_CLOSE);
 		} else if (nr_written == nr_remaining) {
-			hashclose(f, oid.hash, CSUM_FSYNC);
+			finalize_hashfile(f, oid.hash, CSUM_FSYNC);
 		} else {
-			int fd = hashclose(f, oid.hash, 0);
+			int fd = finalize_hashfile(f, oid.hash, 0);
 			fixup_pack_header_footer(fd, oid.hash, pack_tmp_name,
 						 nr_written, oid.hash, offset);
 			close(fd);
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 9d87eac07b..227cc9f3b1 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -35,9 +35,9 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
 		unlink(state->pack_tmp_name);
 		goto clear_exit;
 	} else if (state->nr_written == 1) {
-		hashclose(state->f, oid.hash, CSUM_FSYNC);
+		finalize_hashfile(state->f, oid.hash, CSUM_FSYNC);
 	} else {
-		int fd = hashclose(state->f, oid.hash, 0);
+		int fd = finalize_hashfile(state->f, oid.hash, 0);
 		fixup_pack_header_footer(fd, oid.hash, state->pack_tmp_name,
 					 state->nr_written, oid.hash,
 					 state->offset);
diff --git a/csum-file.c b/csum-file.c
index 5eda7fb6af..e6c95a6915 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -53,7 +53,7 @@ void hashflush(struct hashfile *f)
 	}
 }
 
-int hashclose(struct hashfile *f, unsigned char *result, unsigned int flags)
+int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int flags)
 {
 	int fd;
 
diff --git a/csum-file.h b/csum-file.h
index 992e5c0141..9ba87f0a6c 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -26,14 +26,14 @@ struct hashfile_checkpoint {
 extern void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *);
 extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
 
-/* hashclose flags */
+/* finalize_hashfile flags */
 #define CSUM_CLOSE	1
 #define CSUM_FSYNC	2
 
 extern struct hashfile *hashfd(int fd, const char *name);
 extern struct hashfile *hashfd_check(const char *name);
 extern struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp);
-extern int hashclose(struct hashfile *, unsigned char *, unsigned int);
+extern int finalize_hashfile(struct hashfile *, unsigned char *, unsigned int);
 extern void hashwrite(struct hashfile *, const void *, unsigned int);
 extern void hashflush(struct hashfile *f);
 extern void crc32_begin(struct hashfile *);
diff --git a/fast-import.c b/fast-import.c
index b5db5d20b1..6d96f55d9d 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1016,7 +1016,7 @@ static void end_packfile(void)
 		struct tag *t;
 
 		close_pack_windows(pack_data);
-		hashclose(pack_file, cur_pack_oid.hash, 0);
+		finalize_hashfile(pack_file, cur_pack_oid.hash, 0);
 		fixup_pack_header_footer(pack_data->pack_fd, pack_data->sha1,
 				    pack_data->pack_name, object_count,
 				    cur_pack_oid.hash, pack_size);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e01f992884..662b44f97d 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -535,7 +535,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
 
-	hashclose(f, NULL, CSUM_FSYNC);
+	finalize_hashfile(f, NULL, CSUM_FSYNC);
 
 	if (adjust_shared_perm(tmp_file.buf))
 		die_errno("unable to make temporary bitmap file readable");
diff --git a/pack-write.c b/pack-write.c
index d775c7406d..044f427392 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -170,8 +170,8 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	}
 
 	hashwrite(f, sha1, the_hash_algo->rawsz);
-	hashclose(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
-			    ? CSUM_CLOSE : CSUM_FSYNC));
+	finalize_hashfile(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
+				    ? CSUM_CLOSE : CSUM_FSYNC));
 	return index_name;
 }
 
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 02/14] csum-file: refactor finalize_hashfile() method
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
  2018-04-02 20:34     ` [PATCH v7 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-07 22:59       ` Jakub Narebski
  2018-04-02 20:34     ` [PATCH v7 03/14] commit-graph: add format document Derrick Stolee
                       ` (12 subsequent siblings)
  14 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we want to use a hashfile on the temporary file for a lockfile, then
we need finalize_hashfile() to fully write the trailing hash but also keep
the file descriptor open.

Do this by adding a new CSUM_HASH_IN_STREAM flag along with a functional
change that checks this flag before writing the checksum to the stream.
This differs from previous behavior since it would be written if either
CSUM_CLOSE or CSUM_FSYNC is provided.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/pack-objects.c | 4 ++--
 bulk-checkin.c         | 2 +-
 csum-file.c            | 8 ++++----
 csum-file.h            | 5 +++--
 pack-bitmap-write.c    | 2 +-
 pack-write.c           | 5 +++--
 6 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ab3e80ee49..b09bbf4f4c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -837,9 +837,9 @@ static void write_pack_file(void)
 		 * If so, rewrite it like in fast-import
 		 */
 		if (pack_to_stdout) {
-			finalize_hashfile(f, oid.hash, CSUM_CLOSE);
+			finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_CLOSE);
 		} else if (nr_written == nr_remaining) {
-			finalize_hashfile(f, oid.hash, CSUM_FSYNC);
+			finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
 		} else {
 			int fd = finalize_hashfile(f, oid.hash, 0);
 			fixup_pack_header_footer(fd, oid.hash, pack_tmp_name,
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 227cc9f3b1..70b14fdf41 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -35,7 +35,7 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
 		unlink(state->pack_tmp_name);
 		goto clear_exit;
 	} else if (state->nr_written == 1) {
-		finalize_hashfile(state->f, oid.hash, CSUM_FSYNC);
+		finalize_hashfile(state->f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
 	} else {
 		int fd = finalize_hashfile(state->f, oid.hash, 0);
 		fixup_pack_header_footer(fd, oid.hash, state->pack_tmp_name,
diff --git a/csum-file.c b/csum-file.c
index e6c95a6915..53ce37f7ca 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -61,11 +61,11 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int fl
 	the_hash_algo->final_fn(f->buffer, &f->ctx);
 	if (result)
 		hashcpy(result, f->buffer);
-	if (flags & (CSUM_CLOSE | CSUM_FSYNC)) {
-		/* write checksum and close fd */
+	if (flags & CSUM_HASH_IN_STREAM)
 		flush(f, f->buffer, the_hash_algo->rawsz);
-		if (flags & CSUM_FSYNC)
-			fsync_or_die(f->fd, f->name);
+	if (flags & CSUM_FSYNC)
+		fsync_or_die(f->fd, f->name);
+	if (flags & CSUM_CLOSE) {
 		if (close(f->fd))
 			die_errno("%s: sha1 file error on close", f->name);
 		fd = 0;
diff --git a/csum-file.h b/csum-file.h
index 9ba87f0a6c..c5a2e335e7 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -27,8 +27,9 @@ extern void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *)
 extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
 
 /* finalize_hashfile flags */
-#define CSUM_CLOSE	1
-#define CSUM_FSYNC	2
+#define CSUM_CLOSE		1
+#define CSUM_FSYNC		2
+#define CSUM_HASH_IN_STREAM	4
 
 extern struct hashfile *hashfd(int fd, const char *name);
 extern struct hashfile *hashfd_check(const char *name);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 662b44f97d..db4c832428 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -535,7 +535,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
 
-	finalize_hashfile(f, NULL, CSUM_FSYNC);
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
 
 	if (adjust_shared_perm(tmp_file.buf))
 		die_errno("unable to make temporary bitmap file readable");
diff --git a/pack-write.c b/pack-write.c
index 044f427392..a9d46bc03f 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -170,8 +170,9 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	}
 
 	hashwrite(f, sha1, the_hash_algo->rawsz);
-	finalize_hashfile(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
-				    ? CSUM_CLOSE : CSUM_FSYNC));
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_CLOSE |
+				    ((opts->flags & WRITE_IDX_VERIFY)
+				    ? 0 : CSUM_FSYNC));
 	return index_name;
 }
 
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 03/14] commit-graph: add format document
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
  2018-04-02 20:34     ` [PATCH v7 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
  2018-04-02 20:34     ` [PATCH v7 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-07 23:49       ` Jakub Narebski
  2018-04-02 20:34     ` [PATCH v7 04/14] graph: add commit graph design document Derrick Stolee
                       ` (11 subsequent siblings)
  14 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Add document specifying the binary format for commit graphs. This
format allows for:

* New versions.
* New hash functions and hash lengths.
* Optional extensions.

Basic header information is followed by a binary table of contents
into "chunks" that include:

* An ordered list of commit object IDs.
* A 256-entry fanout into that list of OIDs.
* A list of metadata for the commits.
* A list of "large edges" to enable octopus merges.

The format automatically includes two parent positions for every
commit. This favors speed over space, since using only one position
per commit would cause an extra level of indirection for every merge
commit. (Octopus merges suffer from this indirection, but they are
very rare.)

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .../technical/commit-graph-format.txt         | 97 +++++++++++++++++++
 1 file changed, 97 insertions(+)
 create mode 100644 Documentation/technical/commit-graph-format.txt

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
new file mode 100644
index 0000000000..ad6af8105c
--- /dev/null
+++ b/Documentation/technical/commit-graph-format.txt
@@ -0,0 +1,97 @@
+Git commit graph format
+=======================
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+  generation number 1; commits with parents have generation number
+  one more than the maximum generation number of its parents. We
+  reserve zero as special, and can be used to mark a generation
+  number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+  the graph file.
+
+These positional references are stored as unsigned 32-bit integers
+corresponding to the array position withing the list of commit OIDs. We
+use the most-significant bit for special purposes, so we can store at most
+(1 << 31) - 1 (around 2 billion) commits.
+
+== Commit graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks
+and hash type.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+  4-byte signature:
+      The signature is: {'C', 'G', 'P', 'H'}
+
+  1-byte version number:
+      Currently, the only valid version is 1.
+
+  1-byte Hash Version (1 = SHA-1)
+      We infer the hash length (H) from this value.
+
+  1-byte number (C) of "chunks"
+
+  1-byte (reserved for later use)
+     Current clients should ignore this value.
+
+CHUNK LOOKUP:
+
+  (C + 1) * 12 bytes listing the table of contents for the chunks:
+      First 4 bytes describe the chunk id. Value 0 is a terminating label.
+      Other 8 bytes provide the byte-offset in current file for chunk to
+      start. (Chunks are ordered contiguously in the file, so you can infer
+      the length using the next chunk position if necessary.) Each chunk
+      ID appears at most once.
+
+  The remaining data in the body is described one chunk at a time, and
+  these chunks may be given in any order. Chunks are required unless
+  otherwise specified.
+
+CHUNK DATA:
+
+  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+      The ith entry, F[i], stores the number of OIDs with first
+      byte at most i. Thus F[255] stores the total
+      number of commits (N).
+
+  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+      The OIDs for all commits in the graph, sorted in ascending order.
+
+  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+    * The first H bytes are for the OID of the root tree.
+    * The next 8 bytes are for the positions of the first two parents
+      of the ith commit. Stores value 0xffffffff if no parent in that
+      position. If there are more than two parents, the second value
+      has its most-significant bit on and the other bits store an array
+      position into the Large Edge List chunk.
+    * The next 8 bytes store the generation number of the commit and
+      the commit time in seconds since EPOCH. The generation number
+      uses the higher 30 bits of the first 4 bytes, while the commit
+      time uses the 32 bits of the second 4 bytes, along with the lowest
+      2 bits of the lowest byte, storing the 33rd and 34th bit of the
+      commit time.
+
+  Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+      This list of 4-byte values store the second through nth parents for
+      all octopus merges. The second parent value in the commit data stores
+      an array position within this list along with the most-significant bit
+      on. Starting at that array position, iterate through this list of commit
+      positions for the parents until reaching a value with the most-significant
+      bit on. The other bits correspond to the position of the last parent.
+
+TRAILER:
+
+	H-byte HASH-checksum of all of the above.
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 04/14] graph: add commit graph design document
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
                       ` (2 preceding siblings ...)
  2018-04-02 20:34     ` [PATCH v7 03/14] commit-graph: add format document Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-08 11:06       ` Jakub Narebski
  2018-04-02 20:34     ` [PATCH v7 05/14] commit-graph: create git-commit-graph builtin Derrick Stolee
                       ` (10 subsequent siblings)
  14 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Add Documentation/technical/commit-graph.txt with details of the planned
commit graph feature, including future plans.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph.txt | 163 +++++++++++++++++++++++
 1 file changed, 163 insertions(+)
 create mode 100644 Documentation/technical/commit-graph.txt

diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
new file mode 100644
index 0000000000..0550c6d0dc
--- /dev/null
+++ b/Documentation/technical/commit-graph.txt
@@ -0,0 +1,163 @@
+Git Commit Graph Design Notes
+=============================
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows. The merge
+base calculation shows up in many user-facing commands, such as 'merge-base'
+or 'status' and can take minutes to compute depending on history shape.
+
+There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to satisfy topological order constraints.
+
+The commit graph file is a supplemental data structure that accelerates
+commit graph walks. If a user downgrades or disables the 'core.commitGraph'
+config setting, then the existing ODB is sufficient. The file is stored
+as "commit-graph" either in the .git/objects/info directory or in the info
+directory of an alternate.
+
+The commit graph file stores the commit graph structure along with some
+extra metadata to speed up graph walks. By listing commit OIDs in lexi-
+cographic order, we can identify an integer position for each commit and
+refer to the parents of a commit using those integer positions. We use
+binary search to find initial commits and then use the integer positions
+for fast lookups during the walk.
+
+A consumer may load the following info for a commit from the graph:
+
+1. The commit OID.
+2. The list of parents, along with their integer position.
+3. The commit date.
+4. The root tree OID.
+5. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+Define the "generation number" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has generation number one.
+
+ * A commit with at least one parent has generation number one more than
+   the largest generation number among its parents.
+
+Equivalently, the generation number of a commit A is one more than the
+length of a longest path from A to a root commit. The recursive definition
+is easier to use for computation and observing the following property:
+
+    If A and B are commits with generation numbers N and M, respectively,
+    and N <= M, then A cannot reach B. That is, we know without searching
+    that B is not an ancestor of A because it is further from a root commit
+    than A.
+
+    Conversely, when checking if A is an ancestor of B, then we only need
+    to walk commits until all commits on the walk boundary have generation
+    number at most N. If we walk commits using a priority queue seeded by
+    generation numbers, then we always expand the boundary commit with highest
+    generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+    If A and B are commits with commit time X and Y, respectively, and
+    X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation is allowed to
+violate topological relationships due to clock skew (such as "git log"
+with default order), but is not used when the topological order is
+required (such as merge base calculations, "git log --graph").
+
+In practice, we expect some commits to be created recently and not stored
+in the commit graph. We can treat these commits as having "infinite"
+generation number and walk until reaching commits with known generation
+number.
+
+Design Details
+--------------
+
+- The commit graph file is stored in a file named 'commit-graph' in the
+  .git/objects/info directory. This could be stored in the info directory
+  of an alternate.
+
+- The core.commitGraph config setting must be on to consume graph files.
+
+- The file format includes parameters for the object ID hash function,
+  so a future change of hash algorithm does not require a change in format.
+
+Future Work
+-----------
+
+- The commit graph feature currently does not honor commit grafts. This can
+  be remedied by duplicating or refactoring the current graft logic.
+
+- The 'commit-graph' subcommand does not have a "verify" mode that is
+  necessary for integration with fsck.
+
+- The file format includes room for precomputed generation numbers. These
+  are not currently computed, so all generation numbers will be marked as
+  0 (or "uncomputed"). A later patch will include this calculation.
+
+- After computing and storing generation numbers, we must make graph
+  walks aware of generation numbers to gain the performance benefits they
+  enable. This will mostly be accomplished by swapping a commit-date-ordered
+  priority queue with one ordered by generation number. The following
+  operations are important candidates:
+
+    - paint_down_to_common()
+    - 'log --topo-order'
+
+- Currently, parse_commit_gently() requires filling in the root tree
+  object for a commit. This passes through lookup_tree() and consequently
+  lookup_object(). Also, it calls lookup_commit() when loading the parents.
+  These method calls check the ODB for object existence, even if the
+  consumer does not need the content. For example, we do not need the
+  tree contents when computing merge bases. Now that commit parsing is
+  removed from the computation time, these lookup operations are the
+  slowest operations keeping graph walks from being fast. Consider
+  loading these objects without verifying their existence in the ODB and
+  only loading them fully when consumers need them. Consider a method
+  such as "ensure_tree_loaded(commit)" that fully loads a tree before
+  using commit->tree.
+
+- The current design uses the 'commit-graph' subcommand to generate the graph.
+  When this feature stabilizes enough to recommend to most users, we should
+  add automatic graph writes to common operations that create many commits.
+  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
+  commands.
+
+- A server could provide a commit graph file as part of the network protocol
+  to avoid extra calculations by clients. This feature is only of benefit if
+  the user is willing to trust the file, because verifying the file is correct
+  is as hard as computing it from scratch.
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=8
+    Chromium work item for: Serialized Commit Graph
+
+[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/
+    An abandoned patch that introduced generation numbers.
+
+[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
+    Discussion about generation numbers on commits and how they interact
+    with fsck.
+
+[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/
+    More discussion about generation numbers and not storing them inside
+    commit objects. A valuable quote:
+
+    "I think we should be moving more in the direction of keeping
+     repo-local caches for optimizations. Reachability bitmaps have been
+     a big performance win. I think we should be doing the same with our
+     properties of commits. Not just generation numbers, but making it
+     cheap to access the graph structure without zlib-inflating whole
+     commit objects (i.e., packv4 or something like the "metapacks" I
+     proposed a few years ago)."
+
+[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
+    A patch to remove the ahead-behind calculation from 'status'.
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 05/14] commit-graph: create git-commit-graph builtin
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
                       ` (3 preceding siblings ...)
  2018-04-02 20:34     ` [PATCH v7 04/14] graph: add commit graph design document Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-02 20:34     ` [PATCH v7 06/14] commit-graph: implement write_commit_graph() Derrick Stolee
                       ` (9 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git the 'commit-graph' builtin that will be used for writing and
reading packed graph files. The current implementation is mostly
empty, except for an '--object-dir' option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                             |  1 +
 Documentation/git-commit-graph.txt     | 10 +++++++
 Makefile                               |  1 +
 builtin.h                              |  1 +
 builtin/commit-graph.c                 | 36 ++++++++++++++++++++++++++
 command-list.txt                       |  1 +
 contrib/completion/git-completion.bash |  2 ++
 git.c                                  |  1 +
 8 files changed, 53 insertions(+)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 builtin/commit-graph.c

diff --git a/.gitignore b/.gitignore
index 833ef3b0b7..e82f90184d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -34,6 +34,7 @@
 /git-clone
 /git-column
 /git-commit
+/git-commit-graph
 /git-commit-tree
 /git-config
 /git-count-objects
diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
new file mode 100644
index 0000000000..f3b34622a8
--- /dev/null
+++ b/Documentation/git-commit-graph.txt
@@ -0,0 +1,10 @@
+git-commit-graph(1)
+===================
+
+NAME
+----
+git-commit-graph - Write and verify Git commit graph files
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index a1d8775adb..a59b62bed1 100644
--- a/Makefile
+++ b/Makefile
@@ -952,6 +952,7 @@ BUILTIN_OBJS += builtin/clone.o
 BUILTIN_OBJS += builtin/column.o
 BUILTIN_OBJS += builtin/commit-tree.o
 BUILTIN_OBJS += builtin/commit.o
+BUILTIN_OBJS += builtin/commit-graph.o
 BUILTIN_OBJS += builtin/config.o
 BUILTIN_OBJS += builtin/count-objects.o
 BUILTIN_OBJS += builtin/credential.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa4..079855b6d4 100644
--- a/builtin.h
+++ b/builtin.h
@@ -149,6 +149,7 @@ extern int cmd_clone(int argc, const char **argv, const char *prefix);
 extern int cmd_clean(int argc, const char **argv, const char *prefix);
 extern int cmd_column(int argc, const char **argv, const char *prefix);
 extern int cmd_commit(int argc, const char **argv, const char *prefix);
+extern int cmd_commit_graph(int argc, const char **argv, const char *prefix);
 extern int cmd_commit_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_config(int argc, const char **argv, const char *prefix);
 extern int cmd_count_objects(int argc, const char **argv, const char *prefix);
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
new file mode 100644
index 0000000000..b466ecd781
--- /dev/null
+++ b/builtin/commit-graph.c
@@ -0,0 +1,36 @@
+#include "builtin.h"
+#include "config.h"
+#include "parse-options.h"
+
+static char const * const builtin_commit_graph_usage[] = {
+	N_("git commit-graph [--object-dir <objdir>]"),
+	NULL
+};
+
+static struct opts_commit_graph {
+	const char *obj_dir;
+} opts;
+
+
+int cmd_commit_graph(int argc, const char **argv, const char *prefix)
+{
+	static struct option builtin_commit_graph_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			N_("dir"),
+			N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(builtin_commit_graph_usage,
+				   builtin_commit_graph_options);
+
+	git_config(git_default_config, NULL);
+	argc = parse_options(argc, argv, prefix,
+			     builtin_commit_graph_options,
+			     builtin_commit_graph_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	usage_with_options(builtin_commit_graph_usage,
+			   builtin_commit_graph_options);
+}
diff --git a/command-list.txt b/command-list.txt
index a1fad28fd8..835c5890be 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -34,6 +34,7 @@ git-clean                               mainporcelain
 git-clone                               mainporcelain           init
 git-column                              purehelpers
 git-commit                              mainporcelain           history
+git-commit-graph                        plumbingmanipulators
 git-commit-tree                         plumbingmanipulators
 git-config                              ancillarymanipulators
 git-count-objects                       ancillaryinterrogators
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index b09c8a2362..6726daaf69 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -878,6 +878,7 @@ __git_list_porcelain_commands ()
 		check-ref-format) : plumbing;;
 		checkout-index)   : plumbing;;
 		column)           : internal helper;;
+		commit-graph)     : plumbing;;
 		commit-tree)      : plumbing;;
 		count-objects)    : infrequent;;
 		credential)       : credentials;;
@@ -2350,6 +2351,7 @@ _git_config ()
 		core.bigFileThreshold
 		core.checkStat
 		core.commentChar
+		core.commitGraph
 		core.compression
 		core.createObject
 		core.deltaBaseCacheLimit
diff --git a/git.c b/git.c
index ceaa58ef40..2808c51de9 100644
--- a/git.c
+++ b/git.c
@@ -388,6 +388,7 @@ static struct cmd_struct commands[] = {
 	{ "clone", cmd_clone },
 	{ "column", cmd_column, RUN_SETUP_GENTLY },
 	{ "commit", cmd_commit, RUN_SETUP | NEED_WORK_TREE },
+	{ "commit-graph", cmd_commit_graph, RUN_SETUP },
 	{ "commit-tree", cmd_commit_tree, RUN_SETUP },
 	{ "config", cmd_config, RUN_SETUP_GENTLY | DELAY_PAGER_CONFIG },
 	{ "count-objects", cmd_count_objects, RUN_SETUP },
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 06/14] commit-graph: implement write_commit_graph()
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
                       ` (4 preceding siblings ...)
  2018-04-02 20:34     ` [PATCH v7 05/14] commit-graph: create git-commit-graph builtin Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-02 20:34     ` [PATCH v7 07/14] commit-graph: implement git-commit-graph write Derrick Stolee
                       ` (8 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach Git to write a commit graph file by checking all packed objects
to see if they are commits, then store the file in the given object
directory.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Makefile       |   1 +
 commit-graph.c | 359 +++++++++++++++++++++++++++++++++++++++++++++++++
 commit-graph.h |   6 +
 3 files changed, 366 insertions(+)
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h

diff --git a/Makefile b/Makefile
index a59b62bed1..26a23257e9 100644
--- a/Makefile
+++ b/Makefile
@@ -777,6 +777,7 @@ LIB_OBJS += color.o
 LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
 LIB_OBJS += commit.o
+LIB_OBJS += commit-graph.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
diff --git a/commit-graph.c b/commit-graph.c
new file mode 100644
index 0000000000..f3f7c4f189
--- /dev/null
+++ b/commit-graph.c
@@ -0,0 +1,359 @@
+#include "cache.h"
+#include "config.h"
+#include "git-compat-util.h"
+#include "lockfile.h"
+#include "pack.h"
+#include "packfile.h"
+#include "commit.h"
+#include "object.h"
+#include "revision.h"
+#include "sha1-lookup.h"
+#include "commit-graph.h"
+
+#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
+
+#define GRAPH_DATA_WIDTH 36
+
+#define GRAPH_VERSION_1 0x1
+#define GRAPH_VERSION GRAPH_VERSION_1
+
+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
+#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
+#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
+
+#define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000
+#define GRAPH_PARENT_MISSING 0x7fffffff
+#define GRAPH_EDGE_LAST_MASK 0x7fffffff
+#define GRAPH_PARENT_NONE 0x70000000
+
+#define GRAPH_LAST_EDGE 0x80000000
+
+#define GRAPH_FANOUT_SIZE (4 * 256)
+#define GRAPH_CHUNKLOOKUP_WIDTH 12
+#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
+			GRAPH_OID_LEN + 8)
+
+
+static char *get_commit_graph_filename(const char *obj_dir)
+{
+	return xstrfmt("%s/info/commit-graph", obj_dir);
+}
+
+static void write_graph_chunk_fanout(struct hashfile *f,
+				     struct commit **commits,
+				     int nr_commits)
+{
+	int i, count = 0;
+	struct commit **list = commits;
+
+	/*
+	 * Write the first-level table (the list is sorted,
+	 * but we use a 256-entry lookup to be able to avoid
+	 * having to do eight extra binary search iterations).
+	 */
+	for (i = 0; i < 256; i++) {
+		while (count < nr_commits) {
+			if ((*list)->object.oid.hash[0] != i)
+				break;
+			count++;
+			list++;
+		}
+
+		hashwrite_be32(f, count);
+	}
+}
+
+static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
+				   struct commit **commits, int nr_commits)
+{
+	struct commit **list = commits;
+	int count;
+	for (count = 0; count < nr_commits; count++, list++)
+		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+}
+
+static const unsigned char *commit_to_sha1(size_t index, void *table)
+{
+	struct commit **commits = table;
+	return commits[index]->object.oid.hash;
+}
+
+static void write_graph_chunk_data(struct hashfile *f, int hash_len,
+				   struct commit **commits, int nr_commits)
+{
+	struct commit **list = commits;
+	struct commit **last = commits + nr_commits;
+	uint32_t num_extra_edges = 0;
+
+	while (list < last) {
+		struct commit_list *parent;
+		int edge_value;
+		uint32_t packedDate[2];
+
+		parse_commit(*list);
+		hashwrite(f, (*list)->tree->object.oid.hash, hash_len);
+
+		parent = (*list)->parents;
+
+		if (!parent)
+			edge_value = GRAPH_PARENT_NONE;
+		else {
+			edge_value = sha1_pos(parent->item->object.oid.hash,
+					      commits,
+					      nr_commits,
+					      commit_to_sha1);
+
+			if (edge_value < 0)
+				edge_value = GRAPH_PARENT_MISSING;
+		}
+
+		hashwrite_be32(f, edge_value);
+
+		if (parent)
+			parent = parent->next;
+
+		if (!parent)
+			edge_value = GRAPH_PARENT_NONE;
+		else if (parent->next)
+			edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_extra_edges;
+		else {
+			edge_value = sha1_pos(parent->item->object.oid.hash,
+					      commits,
+					      nr_commits,
+					      commit_to_sha1);
+			if (edge_value < 0)
+				edge_value = GRAPH_PARENT_MISSING;
+		}
+
+		hashwrite_be32(f, edge_value);
+
+		if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
+			do {
+				num_extra_edges++;
+				parent = parent->next;
+			} while (parent);
+		}
+
+		if (sizeof((*list)->date) > 4)
+			packedDate[0] = htonl(((*list)->date >> 32) & 0x3);
+		else
+			packedDate[0] = 0;
+
+		packedDate[1] = htonl((*list)->date);
+		hashwrite(f, packedDate, 8);
+
+		list++;
+	}
+}
+
+static void write_graph_chunk_large_edges(struct hashfile *f,
+					  struct commit **commits,
+					  int nr_commits)
+{
+	struct commit **list = commits;
+	struct commit **last = commits + nr_commits;
+	struct commit_list *parent;
+
+	while (list < last) {
+		int num_parents = 0;
+		for (parent = (*list)->parents; num_parents < 3 && parent;
+		     parent = parent->next)
+			num_parents++;
+
+		if (num_parents <= 2) {
+			list++;
+			continue;
+		}
+
+		/* Since num_parents > 2, this initializer is safe. */
+		for (parent = (*list)->parents->next; parent; parent = parent->next) {
+			int edge_value = sha1_pos(parent->item->object.oid.hash,
+						  commits,
+						  nr_commits,
+						  commit_to_sha1);
+
+			if (edge_value < 0)
+				edge_value = GRAPH_PARENT_MISSING;
+			else if (!parent->next)
+				edge_value |= GRAPH_LAST_EDGE;
+
+			hashwrite_be32(f, edge_value);
+		}
+
+		list++;
+	}
+}
+
+static int commit_compare(const void *_a, const void *_b)
+{
+	const struct object_id *a = (const struct object_id *)_a;
+	const struct object_id *b = (const struct object_id *)_b;
+	return oidcmp(a, b);
+}
+
+struct packed_commit_list {
+	struct commit **list;
+	int nr;
+	int alloc;
+};
+
+struct packed_oid_list {
+	struct object_id *list;
+	int nr;
+	int alloc;
+};
+
+static int add_packed_commits(const struct object_id *oid,
+			      struct packed_git *pack,
+			      uint32_t pos,
+			      void *data)
+{
+	struct packed_oid_list *list = (struct packed_oid_list*)data;
+	enum object_type type;
+	off_t offset = nth_packed_object_offset(pack, pos);
+	struct object_info oi = OBJECT_INFO_INIT;
+
+	oi.typep = &type;
+	if (packed_object_info(pack, offset, &oi) < 0)
+		die("unable to get type of object %s", oid_to_hex(oid));
+
+	if (type != OBJ_COMMIT)
+		return 0;
+
+	ALLOC_GROW(list->list, list->nr + 1, list->alloc);
+	oidcpy(&(list->list[list->nr]), oid);
+	list->nr++;
+
+	return 0;
+}
+
+void write_commit_graph(const char *obj_dir)
+{
+	struct packed_oid_list oids;
+	struct packed_commit_list commits;
+	struct hashfile *f;
+	uint32_t i, count_distinct = 0;
+	char *graph_name;
+	int fd;
+	struct lock_file lk = LOCK_INIT;
+	uint32_t chunk_ids[5];
+	uint64_t chunk_offsets[5];
+	int num_chunks;
+	int num_extra_edges;
+	struct commit_list *parent;
+
+	oids.nr = 0;
+	oids.alloc = approximate_object_count() / 4;
+
+	if (oids.alloc < 1024)
+		oids.alloc = 1024;
+	ALLOC_ARRAY(oids.list, oids.alloc);
+
+	for_each_packed_object(add_packed_commits, &oids, 0);
+
+	QSORT(oids.list, oids.nr, commit_compare);
+
+	count_distinct = 1;
+	for (i = 1; i < oids.nr; i++) {
+		if (oidcmp(&oids.list[i-1], &oids.list[i]))
+			count_distinct++;
+	}
+
+	if (count_distinct >= GRAPH_PARENT_MISSING)
+		die(_("the commit graph format cannot write %d commits"), count_distinct);
+
+	commits.nr = 0;
+	commits.alloc = count_distinct;
+	ALLOC_ARRAY(commits.list, commits.alloc);
+
+	num_extra_edges = 0;
+	for (i = 0; i < oids.nr; i++) {
+		int num_parents = 0;
+		if (i > 0 && !oidcmp(&oids.list[i-1], &oids.list[i]))
+			continue;
+
+		commits.list[commits.nr] = lookup_commit(&oids.list[i]);
+		parse_commit(commits.list[commits.nr]);
+
+		for (parent = commits.list[commits.nr]->parents;
+		     parent; parent = parent->next)
+			num_parents++;
+
+		if (num_parents > 2)
+			num_extra_edges += num_parents - 1;
+
+		commits.nr++;
+	}
+	num_chunks = num_extra_edges ? 4 : 3;
+
+	if (commits.nr >= GRAPH_PARENT_MISSING)
+		die(_("too many commits to write graph"));
+
+	graph_name = get_commit_graph_filename(obj_dir);
+	fd = hold_lock_file_for_update(&lk, graph_name, 0);
+
+	if (fd < 0) {
+		struct strbuf folder = STRBUF_INIT;
+		strbuf_addstr(&folder, graph_name);
+		strbuf_setlen(&folder, strrchr(folder.buf, '/') - folder.buf);
+
+		if (mkdir(folder.buf, 0777) < 0)
+			die_errno(_("cannot mkdir %s"), folder.buf);
+		strbuf_release(&folder);
+
+		fd = hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
+
+		if (fd < 0)
+			die_errno("unable to create '%s'", graph_name);
+	}
+
+	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+
+	hashwrite_be32(f, GRAPH_SIGNATURE);
+
+	hashwrite_u8(f, GRAPH_VERSION);
+	hashwrite_u8(f, GRAPH_OID_VERSION);
+	hashwrite_u8(f, num_chunks);
+	hashwrite_u8(f, 0); /* unused padding byte */
+
+	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
+	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
+	chunk_ids[2] = GRAPH_CHUNKID_DATA;
+	if (num_extra_edges)
+		chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
+	else
+		chunk_ids[3] = 0;
+	chunk_ids[4] = 0;
+
+	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
+	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
+	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
+	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+
+	for (i = 0; i <= num_chunks; i++) {
+		uint32_t chunk_write[3];
+
+		chunk_write[0] = htonl(chunk_ids[i]);
+		chunk_write[1] = htonl(chunk_offsets[i] >> 32);
+		chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
+		hashwrite(f, chunk_write, 12);
+	}
+
+	write_graph_chunk_fanout(f, commits.list, commits.nr);
+	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	write_graph_chunk_large_edges(f, commits.list, commits.nr);
+
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
+	commit_lock_file(&lk);
+
+	free(oids.list);
+	oids.alloc = 0;
+	oids.nr = 0;
+}
diff --git a/commit-graph.h b/commit-graph.h
new file mode 100644
index 0000000000..16fea993ab
--- /dev/null
+++ b/commit-graph.h
@@ -0,0 +1,6 @@
+#ifndef COMMIT_GRAPH_H
+#define COMMIT_GRAPH_H
+
+void write_commit_graph(const char *obj_dir);
+
+#endif
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 07/14] commit-graph: implement git-commit-graph write
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
                       ` (5 preceding siblings ...)
  2018-04-02 20:34     ` [PATCH v7 06/14] commit-graph: implement write_commit_graph() Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-08 11:59       ` Jakub Narebski
  2018-04-02 20:34     ` [PATCH v7 08/14] commit-graph: implement git commit-graph read Derrick Stolee
                       ` (7 subsequent siblings)
  14 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to write graph files. Create new test script to verify
this command succeeds without failure.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  41 ++++++++++
 builtin/commit-graph.c             |  33 ++++++++
 t/t5318-commit-graph.sh            | 124 +++++++++++++++++++++++++++++
 3 files changed, 198 insertions(+)
 create mode 100755 t/t5318-commit-graph.sh

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index f3b34622a8..47996e8f89 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -5,6 +5,47 @@ NAME
 ----
 git-commit-graph - Write and verify Git commit graph files
 
+
+SYNOPSIS
+--------
+[verse]
+'git commit-graph write' <options> [--object-dir <dir>]
+
+
+DESCRIPTION
+-----------
+
+Manage the serialized commit graph file.
+
+
+OPTIONS
+-------
+--object-dir::
+	Use given directory for the location of packfiles and commit graph
+	file. This parameter exists to specify the location of an alternate
+	that only has the objects directory, not a full .git directory. The
+	commit graph file is expected to be at <dir>/info/commit-graph and
+	the packfiles are expected to be in <dir>/pack.
+
+
+COMMANDS
+--------
+'write'::
+
+Write a commit graph file based on the commits found in packfiles.
+Includes all commits from the existing commit graph file.
+
+
+EXAMPLES
+--------
+
+* Write a commit graph file for the packed commits in your local .git folder.
++
+------------------------------------------------
+$ git commit-graph write
+------------------------------------------------
+
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index b466ecd781..26b6360289 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -1,9 +1,18 @@
 #include "builtin.h"
 #include "config.h"
+#include "dir.h"
+#include "lockfile.h"
 #include "parse-options.h"
+#include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
+	N_("git commit-graph write [--object-dir <objdir>]"),
+	NULL
+};
+
+static const char * const builtin_commit_graph_write_usage[] = {
+	N_("git commit-graph write [--object-dir <objdir>]"),
 	NULL
 };
 
@@ -11,6 +20,25 @@ static struct opts_commit_graph {
 	const char *obj_dir;
 } opts;
 
+static int graph_write(int argc, const char **argv)
+{
+	static struct option builtin_commit_graph_write_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			N_("dir"),
+			N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_commit_graph_write_options,
+			     builtin_commit_graph_write_usage, 0);
+
+	if (!opts.obj_dir)
+		opts.obj_dir = get_object_directory();
+
+	write_commit_graph(opts.obj_dir);
+	return 0;
+}
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 {
@@ -31,6 +59,11 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 			     builtin_commit_graph_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
+	if (argc > 0) {
+		if (!strcmp(argv[0], "write"))
+			return graph_write(argc, argv);
+	}
+
 	usage_with_options(builtin_commit_graph_usage,
 			   builtin_commit_graph_options);
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
new file mode 100755
index 0000000000..d7b635bd68
--- /dev/null
+++ b/t/t5318-commit-graph.sh
@@ -0,0 +1,124 @@
+#!/bin/sh
+
+test_description='commit graph'
+. ./test-lib.sh
+
+test_expect_success 'setup full repo' '
+	mkdir full &&
+	cd "$TRASH_DIRECTORY/full" &&
+	git init &&
+	objdir=".git/objects"
+'
+
+test_expect_success 'write graph with no packs' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write --object-dir . &&
+	test_path_is_file info/commit-graph
+'
+
+test_expect_success 'create commits and repack' '
+	cd "$TRASH_DIRECTORY/full" &&
+	for i in $(test_seq 3)
+	do
+		test_commit $i &&
+		git branch commits/$i
+	done &&
+	git repack
+'
+
+test_expect_success 'write graph' '
+	cd "$TRASH_DIRECTORY/full" &&
+	graph1=$(git commit-graph write) &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'Add more commits' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git reset --hard commits/1 &&
+	for i in $(test_seq 4 5)
+	do
+		test_commit $i &&
+		git branch commits/$i
+	done &&
+	git reset --hard commits/2 &&
+	for i in $(test_seq 6 7)
+	do
+		test_commit $i &&
+		git branch commits/$i
+	done &&
+	git reset --hard commits/2 &&
+	git merge commits/4 &&
+	git branch merge/1 &&
+	git reset --hard commits/4 &&
+	git merge commits/6 &&
+	git branch merge/2 &&
+	git reset --hard commits/3 &&
+	git merge commits/5 commits/7 &&
+	git branch merge/3 &&
+	git repack
+'
+
+# Current graph structure:
+#
+#   __M3___
+#  /   |   \
+# 3 M1 5 M2 7
+# |/  \|/  \|
+# 2    4    6
+# |___/____/
+# 1
+
+
+test_expect_success 'write graph with merges' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'Add one more commit' '
+	cd "$TRASH_DIRECTORY/full" &&
+	test_commit 8 &&
+	git branch commits/8 &&
+	ls $objdir/pack | grep idx >existing-idx &&
+	git repack &&
+	ls $objdir/pack| grep idx | grep -v --file=existing-idx >new-idx
+'
+
+# Current graph structure:
+#
+#      8
+#      |
+#   __M3___
+#  /   |   \
+# 3 M1 5 M2 7
+# |/  \|/  \|
+# 2    4    6
+# |___/____/
+# 1
+
+test_expect_success 'write graph with new commit' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'write graph with nothing new' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'setup bare repo' '
+	cd "$TRASH_DIRECTORY" &&
+	git clone --bare --no-local full bare &&
+	cd bare &&
+	baredir="./objects"
+'
+
+test_expect_success 'write graph in bare repo' '
+	cd "$TRASH_DIRECTORY/bare" &&
+	git commit-graph write &&
+	test_path_is_file $baredir/info/commit-graph
+'
+
+test_done
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 08/14] commit-graph: implement git commit-graph read
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
                       ` (6 preceding siblings ...)
  2018-04-02 20:34     ` [PATCH v7 07/14] commit-graph: implement git-commit-graph write Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-02 21:33       ` Junio C Hamano
  2018-04-08 12:59       ` Jakub Narebski
  2018-04-02 20:34     ` [PATCH v7 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
                       ` (6 subsequent siblings)
  14 siblings, 2 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to read commit graph files and summarize their contents.

Use the read subcommand to verify the contents of a commit graph file in the
tests.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  12 +++
 builtin/commit-graph.c             |  56 ++++++++++++
 commit-graph.c                     | 137 ++++++++++++++++++++++++++++-
 commit-graph.h                     |  23 +++++
 t/t5318-commit-graph.sh            |  32 +++++--
 5 files changed, 254 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 47996e8f89..8aad8303f5 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -9,6 +9,7 @@ git-commit-graph - Write and verify Git commit graph files
 SYNOPSIS
 --------
 [verse]
+'git commit-graph read' [--object-dir <dir>]
 'git commit-graph write' <options> [--object-dir <dir>]
 
 
@@ -35,6 +36,11 @@ COMMANDS
 Write a commit graph file based on the commits found in packfiles.
 Includes all commits from the existing commit graph file.
 
+'read'::
+
+Read a graph file given by the commit-graph file and output basic
+details about the graph file. Used for debugging purposes.
+
 
 EXAMPLES
 --------
@@ -45,6 +51,12 @@ EXAMPLES
 $ git commit-graph write
 ------------------------------------------------
 
+* Read basic information from the commit-graph file.
++
+------------------------------------------------
+$ git commit-graph read
+------------------------------------------------
+
 
 GIT
 ---
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 26b6360289..e3f67401fb 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -7,10 +7,16 @@
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
+	N_("git commit-graph read [--object-dir <objdir>]"),
 	N_("git commit-graph write [--object-dir <objdir>]"),
 	NULL
 };
 
+static const char * const builtin_commit_graph_read_usage[] = {
+	N_("git commit-graph read [--object-dir <objdir>]"),
+	NULL
+};
+
 static const char * const builtin_commit_graph_write_usage[] = {
 	N_("git commit-graph write [--object-dir <objdir>]"),
 	NULL
@@ -20,6 +26,54 @@ static struct opts_commit_graph {
 	const char *obj_dir;
 } opts;
 
+static int graph_read(int argc, const char **argv)
+{
+	struct commit_graph *graph = 0;
+	char *graph_name;
+
+	static struct option builtin_commit_graph_read_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			N_("dir"),
+			N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_commit_graph_read_options,
+			     builtin_commit_graph_read_usage, 0);
+
+	if (!opts.obj_dir)
+		opts.obj_dir = get_object_directory();
+
+	graph_name = get_commit_graph_filename(opts.obj_dir);
+	graph = load_commit_graph_one(graph_name);
+
+	if (!graph)
+		die("graph file %s does not exist", graph_name);
+	FREE_AND_NULL(graph_name);
+
+	printf("header: %08x %d %d %d %d\n",
+		ntohl(*(uint32_t*)graph->data),
+		*(unsigned char*)(graph->data + 4),
+		*(unsigned char*)(graph->data + 5),
+		*(unsigned char*)(graph->data + 6),
+		*(unsigned char*)(graph->data + 7));
+	printf("num_commits: %u\n", graph->num_commits);
+	printf("chunks:");
+
+	if (graph->chunk_oid_fanout)
+		printf(" oid_fanout");
+	if (graph->chunk_oid_lookup)
+		printf(" oid_lookup");
+	if (graph->chunk_commit_data)
+		printf(" commit_metadata");
+	if (graph->chunk_large_edges)
+		printf(" large_edges");
+	printf("\n");
+
+	return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
 	static struct option builtin_commit_graph_write_options[] = {
@@ -60,6 +114,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	if (argc > 0) {
+		if (!strcmp(argv[0], "read"))
+			return graph_read(argc, argv);
 		if (!strcmp(argv[0], "write"))
 			return graph_write(argc, argv);
 	}
diff --git a/commit-graph.c b/commit-graph.c
index f3f7c4f189..b1bd3a892d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -39,11 +39,146 @@
 			GRAPH_OID_LEN + 8)
 
 
-static char *get_commit_graph_filename(const char *obj_dir)
+char *get_commit_graph_filename(const char *obj_dir)
 {
 	return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static struct commit_graph *alloc_commit_graph(void)
+{
+	struct commit_graph *g = xcalloc(1, sizeof(*g));
+	g->graph_fd = -1;
+
+	return g;
+}
+
+struct commit_graph *load_commit_graph_one(const char *graph_file)
+{
+	void *graph_map;
+	const unsigned char *data, *chunk_lookup;
+	size_t graph_size;
+	struct stat st;
+	uint32_t i;
+	struct commit_graph *graph;
+	int fd = git_open(graph_file);
+	uint64_t last_chunk_offset;
+	uint32_t last_chunk_id;
+	uint32_t graph_signature;
+	unsigned char graph_version, hash_version;
+
+	if (fd < 0)
+		return NULL;
+	if (fstat(fd, &st)) {
+		close(fd);
+		return NULL;
+	}
+	graph_size = xsize_t(st.st_size);
+
+	if (graph_size < GRAPH_MIN_SIZE) {
+		close(fd);
+		die("graph file %s is too small", graph_file);
+	}
+	graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	data = (const unsigned char *)graph_map;
+
+	graph_signature = get_be32(data);
+	if (graph_signature != GRAPH_SIGNATURE) {
+		error("graph signature %X does not match signature %X",
+		      graph_signature, GRAPH_SIGNATURE);
+		goto cleanup_fail;
+	}
+
+	graph_version = *(unsigned char*)(data + 4);
+	if (graph_version != GRAPH_VERSION) {
+		error("graph version %X does not match version %X",
+		      graph_version, GRAPH_VERSION);
+		goto cleanup_fail;
+	}
+
+	hash_version = *(unsigned char*)(data + 5);
+	if (hash_version != GRAPH_OID_VERSION) {
+		error("hash version %X does not match version %X",
+		      hash_version, GRAPH_OID_VERSION);
+		goto cleanup_fail;
+	}
+
+	graph = alloc_commit_graph();
+
+	graph->hash_len = GRAPH_OID_LEN;
+	graph->num_chunks = *(unsigned char*)(data + 6);
+	graph->graph_fd = fd;
+	graph->data = graph_map;
+	graph->data_len = graph_size;
+
+	last_chunk_id = 0;
+	last_chunk_offset = 8;
+	chunk_lookup = data + 8;
+	for (i = 0; i < graph->num_chunks; i++) {
+		uint32_t chunk_id = get_be32(chunk_lookup + 0);
+		uint64_t chunk_offset = get_be64(chunk_lookup + 4);
+		int chunk_repeated = 0;
+
+		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
+
+		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+			error("improper chunk offset %08x%08x", (uint32_t)(chunk_offset >> 32),
+			      (uint32_t)chunk_offset);
+			goto cleanup_fail;
+		}
+
+		switch (chunk_id) {
+		case GRAPH_CHUNKID_OIDFANOUT:
+			if (graph->chunk_oid_fanout)
+				chunk_repeated = 1;
+			else
+				graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);
+			break;
+
+		case GRAPH_CHUNKID_OIDLOOKUP:
+			if (graph->chunk_oid_lookup)
+				chunk_repeated = 1;
+			else
+				graph->chunk_oid_lookup = data + chunk_offset;
+			break;
+
+		case GRAPH_CHUNKID_DATA:
+			if (graph->chunk_commit_data)
+				chunk_repeated = 1;
+			else
+				graph->chunk_commit_data = data + chunk_offset;
+			break;
+
+		case GRAPH_CHUNKID_LARGEEDGES:
+			if (graph->chunk_large_edges)
+				chunk_repeated = 1;
+			else
+				graph->chunk_large_edges = data + chunk_offset;
+			break;
+		}
+
+		if (chunk_repeated) {
+			error("chunk id %08x appears multiple times", chunk_id);
+			goto cleanup_fail;
+		}
+
+		if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
+		{
+			graph->num_commits = (chunk_offset - last_chunk_offset)
+					     / graph->hash_len;
+		}
+
+		last_chunk_id = chunk_id;
+		last_chunk_offset = chunk_offset;
+	}
+
+	return graph;
+
+cleanup_fail:
+	munmap(graph_map, graph_size);
+	close(fd);
+	exit(1);
+}
+
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
 				     int nr_commits)
diff --git a/commit-graph.h b/commit-graph.h
index 16fea993ab..2528478f06 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -1,6 +1,29 @@
 #ifndef COMMIT_GRAPH_H
 #define COMMIT_GRAPH_H
 
+#include "git-compat-util.h"
+
+char *get_commit_graph_filename(const char *obj_dir);
+
+struct commit_graph {
+	int graph_fd;
+
+	const unsigned char *data;
+	size_t data_len;
+
+	unsigned char hash_len;
+	unsigned char num_chunks;
+	uint32_t num_commits;
+	struct object_id oid;
+
+	const uint32_t *chunk_oid_fanout;
+	const unsigned char *chunk_oid_lookup;
+	const unsigned char *chunk_commit_data;
+	const unsigned char *chunk_large_edges;
+};
+
+struct commit_graph *load_commit_graph_one(const char *graph_file);
+
 void write_commit_graph(const char *obj_dir);
 
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index d7b635bd68..2f44f91193 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -26,10 +26,28 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
+graph_read_expect() {
+	OPTIONAL=""
+	NUM_CHUNKS=3
+	if test ! -z $2
+	then
+		OPTIONAL=" $2"
+		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
+	fi
+	cat >expect <<- EOF
+	header: 43475048 1 1 $NUM_CHUNKS 0
+	num_commits: $1
+	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
+	EOF
+	git commit-graph read >output &&
+	test_cmp expect output
+}
+
 test_expect_success 'write graph' '
 	cd "$TRASH_DIRECTORY/full" &&
 	graph1=$(git commit-graph write) &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "3"
 '
 
 test_expect_success 'Add more commits' '
@@ -72,7 +90,8 @@ test_expect_success 'Add more commits' '
 test_expect_success 'write graph with merges' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "10" "large_edges"
 '
 
 test_expect_success 'Add one more commit' '
@@ -99,13 +118,15 @@ test_expect_success 'Add one more commit' '
 test_expect_success 'write graph with new commit' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
 '
 
 test_expect_success 'write graph with nothing new' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
 '
 
 test_expect_success 'setup bare repo' '
@@ -118,7 +139,8 @@ test_expect_success 'setup bare repo' '
 test_expect_success 'write graph in bare repo' '
 	cd "$TRASH_DIRECTORY/bare" &&
 	git commit-graph write &&
-	test_path_is_file $baredir/info/commit-graph
+	test_path_is_file $baredir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
 '
 
 test_done
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 09/14] commit-graph: add core.commitGraph setting
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
                       ` (7 preceding siblings ...)
  2018-04-02 20:34     ` [PATCH v7 08/14] commit-graph: implement git commit-graph read Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-08 13:39       ` Jakub Narebski
  2018-04-02 20:34     ` [PATCH v7 10/14] commit-graph: close under reachability Derrick Stolee
                       ` (5 subsequent siblings)
  14 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The commit graph feature is controlled by the new core.commitGraph config
setting. This defaults to 0, so the feature is opt-in.

The intention of core.commitGraph is that a user can always stop checking
for or parsing commit graph files if core.commitGraph=0.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt | 4 ++++
 cache.h                  | 1 +
 config.c                 | 5 +++++
 environment.c            | 1 +
 4 files changed, 11 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 4e0cff87f6..e5c7013fb0 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -898,6 +898,10 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
+core.commitGraph::
+	Enable git commit graph feature. Allows reading from the
+	commit-graph file.
+
 core.sparseCheckout::
 	Enable "sparse checkout" feature. See section "Sparse checkout" in
 	linkgit:git-read-tree[1] for more information.
diff --git a/cache.h b/cache.h
index a61b2d3f0d..8bdbcbbbf7 100644
--- a/cache.h
+++ b/cache.h
@@ -805,6 +805,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_commit_graph;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index b0c20e6cb8..25ee4a676c 100644
--- a/config.c
+++ b/config.c
@@ -1226,6 +1226,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.commitgraph")) {
+		core_commit_graph = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.sparsecheckout")) {
 		core_apply_sparse_checkout = git_config_bool(var, value);
 		return 0;
diff --git a/environment.c b/environment.c
index d6dd64662c..8853e2f0dd 100644
--- a/environment.c
+++ b/environment.c
@@ -62,6 +62,7 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED;
 enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
+int core_commit_graph;
 int core_apply_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 10/14] commit-graph: close under reachability
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
                       ` (8 preceding siblings ...)
  2018-04-02 20:34     ` [PATCH v7 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-02 20:34     ` [PATCH v7 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
                       ` (4 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach write_commit_graph() to walk all parents from the commits
discovered in packfiles. This prevents gaps given by loose objects or
previously-missed packfiles.

Also automatically add commits from the existing graph file, if it
exists.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index b1bd3a892d..ea29c5c2d8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -367,6 +367,50 @@ static int add_packed_commits(const struct object_id *oid,
 	return 0;
 }
 
+static void add_missing_parents(struct packed_oid_list *oids, struct commit *commit)
+{
+	struct commit_list *parent;
+	for (parent = commit->parents; parent; parent = parent->next) {
+		if (!(parent->item->object.flags & UNINTERESTING)) {
+			ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc);
+			oidcpy(&oids->list[oids->nr], &(parent->item->object.oid));
+			oids->nr++;
+			parent->item->object.flags |= UNINTERESTING;
+		}
+	}
+}
+
+static void close_reachable(struct packed_oid_list *oids)
+{
+	int i;
+	struct commit *commit;
+
+	for (i = 0; i < oids->nr; i++) {
+		commit = lookup_commit(&oids->list[i]);
+		if (commit)
+			commit->object.flags |= UNINTERESTING;
+	}
+
+	/*
+	 * As this loop runs, oids->nr may grow, but not more
+	 * than the number of missing commits in the reachable
+	 * closure.
+	 */
+	for (i = 0; i < oids->nr; i++) {
+		commit = lookup_commit(&oids->list[i]);
+
+		if (commit && !parse_commit(commit))
+			add_missing_parents(oids, commit);
+	}
+
+	for (i = 0; i < oids->nr; i++) {
+		commit = lookup_commit(&oids->list[i]);
+
+		if (commit)
+			commit->object.flags &= ~UNINTERESTING;
+	}
+}
+
 void write_commit_graph(const char *obj_dir)
 {
 	struct packed_oid_list oids;
@@ -390,6 +434,7 @@ void write_commit_graph(const char *obj_dir)
 	ALLOC_ARRAY(oids.list, oids.alloc);
 
 	for_each_packed_object(add_packed_commits, &oids, 0);
+	close_reachable(&oids);
 
 	QSORT(oids.list, oids.nr, commit_compare);
 
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 11/14] commit: integrate commit graph with commit parsing
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
                       ` (9 preceding siblings ...)
  2018-04-02 20:34     ` [PATCH v7 10/14] commit-graph: close under reachability Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-02 20:34     ` [PATCH v7 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
                       ` (3 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach Git to inspect a commit graph file to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date.

If core.commitGraph is false, then do not check graph files.

In test script t5318-commit-graph.sh, add output-matching conditions on
read-only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 678,653
reachable commits and is behind 'origin/master' by 59,929 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  8.31s |  0.94s | -88%  |
| branch -vv                       |  1.02s |  0.14s | -86%  |
| rev-list --all                   |  5.89s |  1.07s | -81%  |
| rev-list --all --objects         | 66.15s | 58.45s | -11%  |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 alloc.c                 |   1 +
 commit-graph.c          | 141 +++++++++++++++++++++++++++++++++++++++-
 commit-graph.h          |  12 ++++
 commit.c                |   3 +
 commit.h                |   3 +
 t/t5318-commit-graph.sh |  47 +++++++++++++-
 6 files changed, 205 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacd..cf4f8b61e1 100644
--- a/alloc.c
+++ b/alloc.c
@@ -93,6 +93,7 @@ void *alloc_commit_node(void)
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
 	c->index = alloc_commit_index();
+	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
 	return c;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index ea29c5c2d8..983454785e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -38,7 +38,6 @@
 #define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
 			GRAPH_OID_LEN + 8)
 
-
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	return xstrfmt("%s/info/commit-graph", obj_dir);
@@ -179,6 +178,145 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	exit(1);
 }
 
+/* global storage */
+struct commit_graph *commit_graph = NULL;
+
+static void prepare_commit_graph_one(const char *obj_dir)
+{
+	char *graph_name;
+
+	if (commit_graph)
+		return;
+
+	graph_name = get_commit_graph_filename(obj_dir);
+	commit_graph = load_commit_graph_one(graph_name);
+
+	FREE_AND_NULL(graph_name);
+}
+
+static int prepare_commit_graph_run_once = 0;
+static void prepare_commit_graph(void)
+{
+	struct alternate_object_database *alt;
+	char *obj_dir;
+
+	if (prepare_commit_graph_run_once)
+		return;
+	prepare_commit_graph_run_once = 1;
+
+	obj_dir = get_object_directory();
+	prepare_commit_graph_one(obj_dir);
+	prepare_alt_odb();
+	for (alt = alt_odb_list; !commit_graph && alt; alt = alt->next)
+		prepare_commit_graph_one(alt->path);
+}
+
+static void close_commit_graph(void)
+{
+	if (!commit_graph)
+		return;
+
+	if (commit_graph->graph_fd >= 0) {
+		munmap((void *)commit_graph->data, commit_graph->data_len);
+		commit_graph->data = NULL;
+		close(commit_graph->graph_fd);
+	}
+
+	FREE_AND_NULL(commit_graph);
+}
+
+static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos)
+{
+	return bsearch_hash(oid->hash, g->chunk_oid_fanout,
+			    g->chunk_oid_lookup, g->hash_len, pos);
+}
+
+static struct commit_list **insert_parent_or_die(struct commit_graph *g,
+						 uint64_t pos,
+						 struct commit_list **pptr)
+{
+	struct commit *c;
+	struct object_id oid;
+	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
+	c = lookup_commit(&oid);
+	if (!c)
+		die("could not find commit %s", oid_to_hex(&oid));
+	c->graph_pos = pos;
+	return &commit_list_insert(c, pptr)->next;
+}
+
+static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos)
+{
+	struct object_id oid;
+	uint32_t edge_value;
+	uint32_t *parent_data_ptr;
+	uint64_t date_low, date_high;
+	struct commit_list **pptr;
+	const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos;
+
+	item->object.parsed = 1;
+	item->graph_pos = pos;
+
+	hashcpy(oid.hash, commit_data);
+	item->tree = lookup_tree(&oid);
+
+	date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
+	date_low = get_be32(commit_data + g->hash_len + 12);
+	item->date = (timestamp_t)((date_high << 32) | date_low);
+
+	pptr = &item->parents;
+
+	edge_value = get_be32(commit_data + g->hash_len);
+	if (edge_value == GRAPH_PARENT_NONE)
+		return 1;
+	pptr = insert_parent_or_die(g, edge_value, pptr);
+
+	edge_value = get_be32(commit_data + g->hash_len + 4);
+	if (edge_value == GRAPH_PARENT_NONE)
+		return 1;
+	if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) {
+		pptr = insert_parent_or_die(g, edge_value, pptr);
+		return 1;
+	}
+
+	parent_data_ptr = (uint32_t*)(g->chunk_large_edges +
+			  4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
+	do {
+		edge_value = get_be32(parent_data_ptr);
+		pptr = insert_parent_or_die(g,
+					    edge_value & GRAPH_EDGE_LAST_MASK,
+					    pptr);
+		parent_data_ptr++;
+	} while (!(edge_value & GRAPH_LAST_EDGE));
+
+	return 1;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+	if (!core_commit_graph)
+		return 0;
+	if (item->object.parsed)
+		return 1;
+
+	prepare_commit_graph();
+	if (commit_graph) {
+		uint32_t pos;
+		int found;
+		if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
+			pos = item->graph_pos;
+			found = 1;
+		} else {
+			found = bsearch_graph(commit_graph, &(item->object.oid), &pos);
+		}
+
+		if (found)
+			return fill_commit_in_graph(item, commit_graph, pos);
+	}
+
+	return 0;
+}
+
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
 				     int nr_commits)
@@ -530,6 +668,7 @@ void write_commit_graph(const char *obj_dir)
 	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
 	write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
+	close_commit_graph();
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
 
diff --git a/commit-graph.h b/commit-graph.h
index 2528478f06..73b28beed1 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -5,6 +5,18 @@
 
 char *get_commit_graph_filename(const char *obj_dir);
 
+/*
+ * Given a commit struct, try to fill the commit struct info, including:
+ *  1. tree object
+ *  2. date
+ *  3. parents.
+ *
+ * Returns 1 if and only if the commit was found in the packed graph.
+ *
+ * See parse_commit_buffer() for the fallback after this call.
+ */
+int parse_commit_in_graph(struct commit *item);
+
 struct commit_graph {
 	int graph_fd;
 
diff --git a/commit.c b/commit.c
index 00c99c7272..3e39c86abf 100644
--- a/commit.c
+++ b/commit.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "tag.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "pkt-line.h"
 #include "utf8.h"
 #include "diff.h"
@@ -383,6 +384,8 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 		return -1;
 	if (item->object.parsed)
 		return 0;
+	if (parse_commit_in_graph(item))
+		return 0;
 	buffer = read_sha1_file(item->object.oid.hash, &type, &size);
 	if (!buffer)
 		return quiet_on_missing ? -1 :
diff --git a/commit.h b/commit.h
index 0fb8271665..e57ae4b583 100644
--- a/commit.h
+++ b/commit.h
@@ -9,6 +9,8 @@
 #include "string-list.h"
 #include "pretty.h"
 
+#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
+
 struct commit_list {
 	struct commit *item;
 	struct commit_list *next;
@@ -21,6 +23,7 @@ struct commit {
 	timestamp_t date;
 	struct commit_list *parents;
 	struct tree *tree;
+	uint32_t graph_pos;
 };
 
 extern int save_commit_buffer;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2f44f91193..51de9cc455 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -7,6 +7,7 @@ test_expect_success 'setup full repo' '
 	mkdir full &&
 	cd "$TRASH_DIRECTORY/full" &&
 	git init &&
+	git config core.commitGraph true &&
 	objdir=".git/objects"
 '
 
@@ -26,6 +27,29 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
+graph_git_two_modes() {
+	git -c core.graph=true $1 >output
+	git -c core.graph=false $1 >expect
+	test_cmp output expect
+}
+
+graph_git_behavior() {
+	MSG=$1
+	DIR=$2
+	BRANCH=$3
+	COMPARE=$4
+	test_expect_success "check normal git operations: $MSG" '
+		cd "$TRASH_DIRECTORY/$DIR" &&
+		graph_git_two_modes "log --oneline $BRANCH" &&
+		graph_git_two_modes "log --topo-order $BRANCH" &&
+		graph_git_two_modes "log --graph $COMPARE..$BRANCH" &&
+		graph_git_two_modes "branch -vv" &&
+		graph_git_two_modes "merge-base -a $BRANCH $COMPARE"
+	'
+}
+
+graph_git_behavior 'no graph' full commits/3 commits/1
+
 graph_read_expect() {
 	OPTIONAL=""
 	NUM_CHUNKS=3
@@ -50,6 +74,8 @@ test_expect_success 'write graph' '
 	graph_read_expect "3"
 '
 
+graph_git_behavior 'graph exists' full commits/3 commits/1
+
 test_expect_success 'Add more commits' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git reset --hard commits/1 &&
@@ -86,7 +112,6 @@ test_expect_success 'Add more commits' '
 # |___/____/
 # 1
 
-
 test_expect_success 'write graph with merges' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
@@ -94,6 +119,10 @@ test_expect_success 'write graph with merges' '
 	graph_read_expect "10" "large_edges"
 '
 
+graph_git_behavior 'merge 1 vs 2' full merge/1 merge/2
+graph_git_behavior 'merge 1 vs 3' full merge/1 merge/3
+graph_git_behavior 'merge 2 vs 3' full merge/2 merge/3
+
 test_expect_success 'Add one more commit' '
 	cd "$TRASH_DIRECTORY/full" &&
 	test_commit 8 &&
@@ -115,6 +144,9 @@ test_expect_success 'Add one more commit' '
 # |___/____/
 # 1
 
+graph_git_behavior 'mixed mode, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'mixed mode, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'write graph with new commit' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
@@ -122,6 +154,9 @@ test_expect_success 'write graph with new commit' '
 	graph_read_expect "11" "large_edges"
 '
 
+graph_git_behavior 'full graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'full graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'write graph with nothing new' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
@@ -129,13 +164,20 @@ test_expect_success 'write graph with nothing new' '
 	graph_read_expect "11" "large_edges"
 '
 
+graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
 	cd bare &&
+	git config core.commitGraph true &&
 	baredir="./objects"
 '
 
+graph_git_behavior 'bare repo, commit 8 vs merge 1' bare commits/8 merge/1
+graph_git_behavior 'bare repo, commit 8 vs merge 2' bare commits/8 merge/2
+
 test_expect_success 'write graph in bare repo' '
 	cd "$TRASH_DIRECTORY/bare" &&
 	git commit-graph write &&
@@ -143,4 +185,7 @@ test_expect_success 'write graph in bare repo' '
 	graph_read_expect "11" "large_edges"
 '
 
+graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1
+graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare commits/8 merge/2
+
 test_done
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 12/14] commit-graph: read only from specific pack-indexes
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
                       ` (10 preceding siblings ...)
  2018-04-02 20:34     ` [PATCH v7 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-02 20:34     ` [PATCH v7 13/14] commit-graph: build graph from starting commits Derrick Stolee
                       ` (2 subsequent siblings)
  14 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to inspect the objects only in a certain list
of pack-indexes within the given pack directory. This allows updating
the commit graph iteratively.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt | 11 +++++++++-
 builtin/commit-graph.c             | 33 +++++++++++++++++++++++++++---
 commit-graph.c                     | 26 +++++++++++++++++++++--
 commit-graph.h                     |  4 +++-
 packfile.c                         |  4 ++--
 packfile.h                         |  2 ++
 t/t5318-commit-graph.sh            | 10 +++++++++
 7 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 8aad8303f5..8143cc3f07 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -34,7 +34,9 @@ COMMANDS
 'write'::
 
 Write a commit graph file based on the commits found in packfiles.
-Includes all commits from the existing commit graph file.
++
+With the `--stdin-packs` option, generate the new commit graph by
+walking objects only in the specified pack-indexes.
 
 'read'::
 
@@ -51,6 +53,13 @@ EXAMPLES
 $ git commit-graph write
 ------------------------------------------------
 
+* Write a graph file, extending the current graph file using commits
+* in <pack-index>.
++
+------------------------------------------------
+$ echo <pack-index> | git commit-graph write --stdin-packs
+------------------------------------------------
+
 * Read basic information from the commit-graph file.
 +
 ------------------------------------------------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index e3f67401fb..9f83c872e9 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
 	NULL
 };
 
@@ -18,12 +18,13 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
 	NULL
 };
 
 static struct opts_commit_graph {
 	const char *obj_dir;
+	int stdin_packs;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -76,10 +77,18 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
+	const char **pack_indexes = NULL;
+	int packs_nr = 0;
+	const char **lines = NULL;
+	int lines_nr = 0;
+	int lines_alloc = 0;
+
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
 			N_("dir"),
 			N_("The object directory to store the graph")),
+		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
+			N_("scan pack-indexes listed by stdin for commits")),
 		OPT_END(),
 	};
 
@@ -90,7 +99,25 @@ static int graph_write(int argc, const char **argv)
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
-	write_commit_graph(opts.obj_dir);
+	if (opts.stdin_packs) {
+		struct strbuf buf = STRBUF_INIT;
+		lines_nr = 0;
+		lines_alloc = 128;
+		ALLOC_ARRAY(lines, lines_alloc);
+
+		while (strbuf_getline(&buf, stdin) != EOF) {
+			ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
+			lines[lines_nr++] = strbuf_detach(&buf, NULL);
+		}
+
+		pack_indexes = lines;
+		packs_nr = lines_nr;
+	}
+
+	write_commit_graph(opts.obj_dir,
+			   pack_indexes,
+			   packs_nr);
+
 	return 0;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 983454785e..fa19b83a8e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -549,7 +549,9 @@ static void close_reachable(struct packed_oid_list *oids)
 	}
 }
 
-void write_commit_graph(const char *obj_dir)
+void write_commit_graph(const char *obj_dir,
+			const char **pack_indexes,
+			int nr_packs)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -571,7 +573,27 @@ void write_commit_graph(const char *obj_dir)
 		oids.alloc = 1024;
 	ALLOC_ARRAY(oids.list, oids.alloc);
 
-	for_each_packed_object(add_packed_commits, &oids, 0);
+	if (pack_indexes) {
+		struct strbuf packname = STRBUF_INIT;
+		int dirlen;
+		strbuf_addf(&packname, "%s/pack/", obj_dir);
+		dirlen = packname.len;
+		for (i = 0; i < nr_packs; i++) {
+			struct packed_git *p;
+			strbuf_setlen(&packname, dirlen);
+			strbuf_addstr(&packname, pack_indexes[i]);
+			p = add_packed_git(packname.buf, packname.len, 1);
+			if (!p)
+				die("error adding pack %s", packname.buf);
+			if (open_pack_index(p))
+				die("error opening index for %s", packname.buf);
+			for_each_object_in_pack(p, add_packed_commits, &oids);
+			close_pack(p);
+		}
+		strbuf_release(&packname);
+	} else
+		for_each_packed_object(add_packed_commits, &oids, 0);
+
 	close_reachable(&oids);
 
 	QSORT(oids.list, oids.nr, commit_compare);
diff --git a/commit-graph.h b/commit-graph.h
index 73b28beed1..f065f0866f 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -36,6 +36,8 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
-void write_commit_graph(const char *obj_dir);
+void write_commit_graph(const char *obj_dir,
+			const char **pack_indexes,
+			int nr_packs);
 
 #endif
diff --git a/packfile.c b/packfile.c
index 7c1a2519fc..b1d33b646a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -304,7 +304,7 @@ void close_pack_index(struct packed_git *p)
 	}
 }
 
-static void close_pack(struct packed_git *p)
+void close_pack(struct packed_git *p)
 {
 	close_pack_windows(p);
 	close_pack_fd(p);
@@ -1850,7 +1850,7 @@ int has_pack_index(const unsigned char *sha1)
 	return 1;
 }
 
-static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
+int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
 {
 	uint32_t i;
 	int r = 0;
diff --git a/packfile.h b/packfile.h
index a7fca598d6..b341f2bf5e 100644
--- a/packfile.h
+++ b/packfile.h
@@ -63,6 +63,7 @@ extern void close_pack_index(struct packed_git *);
 
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
 extern void close_pack_windows(struct packed_git *);
+extern void close_pack(struct packed_git *);
 extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
@@ -140,6 +141,7 @@ typedef int each_packed_object_fn(const struct object_id *oid,
 				  struct packed_git *pack,
 				  uint32_t pos,
 				  void *data);
+extern int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn, void *data);
 extern int for_each_packed_object(each_packed_object_fn, void *, unsigned flags);
 
 /*
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 51de9cc455..3bb44d0c09 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -167,6 +167,16 @@ test_expect_success 'write graph with nothing new' '
 graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph from latest pack with closure' '
+	cd "$TRASH_DIRECTORY/full" &&
+	cat new-idx | git commit-graph write --stdin-packs &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "9" "large_edges"
+'
+
+graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 13/14] commit-graph: build graph from starting commits
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
                       ` (11 preceding siblings ...)
  2018-04-02 20:34     ` [PATCH v7 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-08 13:50       ` Jakub Narebski
  2018-04-02 20:34     ` [PATCH v7 14/14] commit-graph: implement "--additive" option Derrick Stolee
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
  14 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to read commits from stdin when the
--stdin-commits flag is specified. Commits reachable from these
commits are added to the graph. This is a much faster way to construct
the graph than inspecting all packed objects, but is restricted to
known tips.

For the Linux repository, 700,000+ commits were added to the graph
file starting from 'master' in 7-9 seconds, depending on the number
of packfiles in the repo (1, 24, or 120).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt | 14 +++++++++++++-
 builtin/commit-graph.c             | 27 +++++++++++++++++++++------
 commit-graph.c                     | 27 +++++++++++++++++++++++++--
 commit-graph.h                     |  4 +++-
 t/t5318-commit-graph.sh            | 13 +++++++++++++
 5 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 8143cc3f07..442ac243e6 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -36,7 +36,13 @@ COMMANDS
 Write a commit graph file based on the commits found in packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
-walking objects only in the specified pack-indexes.
+walking objects only in the specified pack-indexes. (Cannot be combined
+with --stdin-commits.)
++
+With the `--stdin-commits` option, generate the new commit graph by
+walking commits starting at the commits specified in stdin as a list
+of OIDs in hex, one OID per line. (Cannot be combined with
+--stdin-packs.)
 
 'read'::
 
@@ -60,6 +66,12 @@ $ git commit-graph write
 $ echo <pack-index> | git commit-graph write --stdin-packs
 ------------------------------------------------
 
+* Write a graph file containing all reachable commits.
++
+------------------------------------------------
+$ git show-ref -s | git commit-graph write --stdin-commits
+------------------------------------------------
+
 * Read basic information from the commit-graph file.
 +
 ------------------------------------------------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 9f83c872e9..f5fc717b8f 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -18,13 +18,14 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
 static struct opts_commit_graph {
 	const char *obj_dir;
 	int stdin_packs;
+	int stdin_commits;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -79,6 +80,8 @@ static int graph_write(int argc, const char **argv)
 {
 	const char **pack_indexes = NULL;
 	int packs_nr = 0;
+	const char **commit_hex = NULL;
+	int commits_nr = 0;
 	const char **lines = NULL;
 	int lines_nr = 0;
 	int lines_alloc = 0;
@@ -89,6 +92,8 @@ static int graph_write(int argc, const char **argv)
 			N_("The object directory to store the graph")),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
 			N_("scan pack-indexes listed by stdin for commits")),
+		OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
+			N_("start walk at commits listed by stdin")),
 		OPT_END(),
 	};
 
@@ -96,10 +101,12 @@ static int graph_write(int argc, const char **argv)
 			     builtin_commit_graph_write_options,
 			     builtin_commit_graph_write_usage, 0);
 
+	if (opts.stdin_packs && opts.stdin_commits)
+		die(_("cannot use both --stdin-commits and --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
-	if (opts.stdin_packs) {
+	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
 		lines_nr = 0;
 		lines_alloc = 128;
@@ -110,13 +117,21 @@ static int graph_write(int argc, const char **argv)
 			lines[lines_nr++] = strbuf_detach(&buf, NULL);
 		}
 
-		pack_indexes = lines;
-		packs_nr = lines_nr;
+		if (opts.stdin_packs) {
+			pack_indexes = lines;
+			packs_nr = lines_nr;
+		}
+		if (opts.stdin_commits) {
+			commit_hex = lines;
+			commits_nr = lines_nr;
+		}
 	}
 
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
-			   packs_nr);
+			   packs_nr,
+			   commit_hex,
+			   commits_nr);
 
 	return 0;
 }
diff --git a/commit-graph.c b/commit-graph.c
index fa19b83a8e..253bc2213a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -551,7 +551,9 @@ static void close_reachable(struct packed_oid_list *oids)
 
 void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
-			int nr_packs)
+			int nr_packs,
+			const char **commit_hex,
+			int nr_commits)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -591,7 +593,28 @@ void write_commit_graph(const char *obj_dir,
 			close_pack(p);
 		}
 		strbuf_release(&packname);
-	} else
+	}
+
+	if (commit_hex) {
+		for (i = 0; i < nr_commits; i++) {
+			const char *end;
+			struct object_id oid;
+			struct commit *result;
+
+			if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end))
+				continue;
+
+			result = lookup_commit_reference_gently(&oid, 1);
+
+			if (result) {
+				ALLOC_GROW(oids.list, oids.nr + 1, oids.alloc);
+				oidcpy(&oids.list[oids.nr], &(result->object.oid));
+				oids.nr++;
+			}
+		}
+	}
+
+	if (!pack_indexes && !commit_hex)
 		for_each_packed_object(add_packed_commits, &oids, 0);
 
 	close_reachable(&oids);
diff --git a/commit-graph.h b/commit-graph.h
index f065f0866f..fd035101b2 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -38,6 +38,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file);
 
 void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
-			int nr_packs);
+			int nr_packs,
+			const char **commit_hex,
+			int nr_commits);
 
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 3bb44d0c09..c28cfb5d7f 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -177,6 +177,19 @@ test_expect_success 'build graph from latest pack with closure' '
 graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph from commits with closure' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git tag -a -m "merge" tag/merge merge/2 &&
+	git rev-parse tag/merge >commits-in &&
+	git rev-parse merge/1 >>commits-in &&
+	cat commits-in | git commit-graph write --stdin-commits &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "6"
+'
+
+graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v7 14/14] commit-graph: implement "--additive" option
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
                       ` (12 preceding siblings ...)
  2018-04-02 20:34     ` [PATCH v7 13/14] commit-graph: build graph from starting commits Derrick Stolee
@ 2018-04-02 20:34     ` Derrick Stolee
  2018-04-05  8:27       ` SZEDER Gábor
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
  14 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-04-02 20:34 UTC (permalink / raw)
  To: git; +Cc: gitster, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to add all commits from the existing
commit-graph file to the file about to be written. This should be
used when adding new commits without performing garbage collection.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt | 10 ++++++++++
 builtin/commit-graph.c             | 10 +++++++---
 commit-graph.c                     | 17 ++++++++++++++++-
 commit-graph.h                     |  3 ++-
 t/t5318-commit-graph.sh            | 10 ++++++++++
 5 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 442ac243e6..4c97b555cc 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -43,6 +43,9 @@ With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
 --stdin-packs.)
++
+With the `--append` option, include all commits that are present in the
+existing commit-graph file.
 
 'read'::
 
@@ -72,6 +75,13 @@ $ echo <pack-index> | git commit-graph write --stdin-packs
 $ git show-ref -s | git commit-graph write --stdin-commits
 ------------------------------------------------
 
+* Write a graph file containing all commits in the current
+* commit-graph file along with those reachable from HEAD.
++
+------------------------------------------------
+$ git rev-parse HEAD | git commit-graph write --stdin-commits --append
+------------------------------------------------
+
 * Read basic information from the commit-graph file.
 +
 ------------------------------------------------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index f5fc717b8f..41c4f76caf 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -18,7 +18,7 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -26,6 +26,7 @@ static struct opts_commit_graph {
 	const char *obj_dir;
 	int stdin_packs;
 	int stdin_commits;
+	int append;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -94,6 +95,8 @@ static int graph_write(int argc, const char **argv)
 			N_("scan pack-indexes listed by stdin for commits")),
 		OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
 			N_("start walk at commits listed by stdin")),
+		OPT_BOOL(0, "append", &opts.append,
+			N_("include all commits already in the commit-graph file")),
 		OPT_END(),
 	};
 
@@ -131,7 +134,8 @@ static int graph_write(int argc, const char **argv)
 			   pack_indexes,
 			   packs_nr,
 			   commit_hex,
-			   commits_nr);
+			   commits_nr,
+			   opts.append);
 
 	return 0;
 }
diff --git a/commit-graph.c b/commit-graph.c
index 253bc2213a..1fc63d541b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -553,7 +553,8 @@ void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
 			int nr_packs,
 			const char **commit_hex,
-			int nr_commits)
+			int nr_commits,
+			int append)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -571,10 +572,24 @@ void write_commit_graph(const char *obj_dir,
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
 
+	if (append) {
+		prepare_commit_graph_one(obj_dir);
+		if (commit_graph)
+			oids.alloc += commit_graph->num_commits;
+	}
+
 	if (oids.alloc < 1024)
 		oids.alloc = 1024;
 	ALLOC_ARRAY(oids.list, oids.alloc);
 
+	if (append && commit_graph) {
+		for (i = 0; i < commit_graph->num_commits; i++) {
+			const unsigned char *hash = commit_graph->chunk_oid_lookup +
+				commit_graph->hash_len * i;
+			hashcpy(oids.list[oids.nr++].hash, hash);
+		}
+	}
+
 	if (pack_indexes) {
 		struct strbuf packname = STRBUF_INIT;
 		int dirlen;
diff --git a/commit-graph.h b/commit-graph.h
index fd035101b2..e1d8580c98 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -40,6 +40,7 @@ void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
 			int nr_packs,
 			const char **commit_hex,
-			int nr_commits);
+			int nr_commits,
+			int append);
 
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c28cfb5d7f..a380419b65 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -190,6 +190,16 @@ test_expect_success 'build graph from commits with closure' '
 graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph from commits with append' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git rev-parse merge/3 | git commit-graph write --stdin-commits --append &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "10" "large_edges"
+'
+
+graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.17.0.14.gba1221a8ce


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v7 08/14] commit-graph: implement git commit-graph read
  2018-04-02 20:34     ` [PATCH v7 08/14] commit-graph: implement git commit-graph read Derrick Stolee
@ 2018-04-02 21:33       ` Junio C Hamano
  2018-04-03 11:49         ` Derrick Stolee
  2018-04-08 12:59       ` Jakub Narebski
  1 sibling, 1 reply; 110+ messages in thread
From: Junio C Hamano @ 2018-04-02 21:33 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
> ...
> +static int graph_read(int argc, const char **argv)
> +{
> +	struct commit_graph *graph = 0;

The previous round said NULL above, not 0, and NULL is the better
way to spell it, I would think.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v7 08/14] commit-graph: implement git commit-graph read
  2018-04-02 21:33       ` Junio C Hamano
@ 2018-04-03 11:49         ` Derrick Stolee
  0 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-03 11:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, sbeller, szeder.dev, ramsay, git, peff, Derrick Stolee

On 4/2/2018 5:33 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> From: Derrick Stolee <dstolee@microsoft.com>
>> ...
>> +static int graph_read(int argc, const char **argv)
>> +{
>> +	struct commit_graph *graph = 0;
> The previous round said NULL above, not 0, and NULL is the better
> way to spell it, I would think.

Sorry about that. Hopefully it is easy to squash.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v7 14/14] commit-graph: implement "--additive" option
  2018-04-02 20:34     ` [PATCH v7 14/14] commit-graph: implement "--additive" option Derrick Stolee
@ 2018-04-05  8:27       ` SZEDER Gábor
  0 siblings, 0 replies; 110+ messages in thread
From: SZEDER Gábor @ 2018-04-05  8:27 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Git mailing list, Junio C Hamano, Stefan Beller, Ramsay Jones,
	Jeff Hostetler, Jeff King, Derrick Stolee

On Mon, Apr 2, 2018 at 10:34 PM, Derrick Stolee <stolee@gmail.com> wrote:

> +With the `--append` option, include all commits that are present in the

> +$ git rev-parse HEAD | git commit-graph write --stdin-commits --append

> +       N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),

> +       N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),

> +               OPT_BOOL(0, "append", &opts.append,

> +       git rev-parse merge/3 | git commit-graph write --stdin-commits --append &&

I see '--append' everywhere in the code and docs, good.
Please update the subject line as well, it still says '--additive'.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v7 02/14] csum-file: refactor finalize_hashfile() method
  2018-04-02 20:34     ` [PATCH v7 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
@ 2018-04-07 22:59       ` Jakub Narebski
  0 siblings, 0 replies; 110+ messages in thread
From: Jakub Narebski @ 2018-04-07 22:59 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, sbeller, szeder.dev, ramsay, git, peff,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> If we want to use a hashfile on the temporary file for a lockfile, then
> we need finalize_hashfile() to fully write the trailing hash but also keep
> the file descriptor open.
>
> Do this by adding a new CSUM_HASH_IN_STREAM flag along with a functional
> change that checks this flag before writing the checksum to the stream.
> This differs from previous behavior since it would be written if either
> CSUM_CLOSE or CSUM_FSYNC is provided.

I'm sorry, but I don't understand from this description what this flag
does and what it is meant to do.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---

[...]
> diff --git a/csum-file.h b/csum-file.h
> index 9ba87f0a6c..c5a2e335e7 100644
> --- a/csum-file.h
> +++ b/csum-file.h
> @@ -27,8 +27,9 @@ extern void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *)
>  extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
>  
>  /* finalize_hashfile flags */
> -#define CSUM_CLOSE	1
> -#define CSUM_FSYNC	2
> +#define CSUM_CLOSE		1
> +#define CSUM_FSYNC		2
> +#define CSUM_HASH_IN_STREAM	4

Especially that it is not commented / described here, and the name is
unsufficiently descriptive for me.

[...]
> diff --git a/csum-file.c b/csum-file.c
> index e6c95a6915..53ce37f7ca 100644
> --- a/csum-file.c
> +++ b/csum-file.c
> @@ -61,11 +61,11 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int fl
>  	the_hash_algo->final_fn(f->buffer, &f->ctx);
>  	if (result)
>  		hashcpy(result, f->buffer);
> -	if (flags & (CSUM_CLOSE | CSUM_FSYNC)) {
> -		/* write checksum and close fd */
> +	if (flags & CSUM_HASH_IN_STREAM)
>  		flush(f, f->buffer, the_hash_algo->rawsz);

Wouldn't CSUM_FLUSH be a better name for this flag?

> -		if (flags & CSUM_FSYNC)
> -			fsync_or_die(f->fd, f->name);
> +	if (flags & CSUM_FSYNC)
> +		fsync_or_die(f->fd, f->name);
> +	if (flags & CSUM_CLOSE) {
>  		if (close(f->fd))
>  			die_errno("%s: sha1 file error on close", f->name);
>  		fd = 0;

--
Jakub Narębski

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v7 03/14] commit-graph: add format document
  2018-04-02 20:34     ` [PATCH v7 03/14] commit-graph: add format document Derrick Stolee
@ 2018-04-07 23:49       ` Jakub Narebski
  0 siblings, 0 replies; 110+ messages in thread
From: Jakub Narebski @ 2018-04-07 23:49 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, sbeller, szeder.dev, ramsay, git, peff,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> diff --git a/Documentation/technical/commit-graph-format.txt
> b/Documentation/technical/commit-graph-format.txt
> new file mode 100644
> index 0000000000..ad6af8105c
> --- /dev/null
> +++ b/Documentation/technical/commit-graph-format.txt
> @@ -0,0 +1,97 @@
> +Git commit graph format
> +=======================
> +
> +The Git commit graph stores a list of commit OIDs and some associated
> +metadata, including:
> +
> +- The generation number of the commit. Commits with no parents have
> +  generation number 1; commits with parents have generation number
> +  one more than the maximum generation number of its parents. We
> +  reserve zero as special, and can be used to mark a generation
> +  number invalid or as "not computed".

Actually we will be reserving "all bits 1" as special, though I don't
think it is worth mentioning here, and at this time.

> +
> +- The root tree OID.
> +
> +- The commit date.
> +
> +- The parents of the commit, stored using positional references within
> +  the graph file.
> +
> +These positional references are stored as unsigned 32-bit integers
> +corresponding to the array position withing the list of commit OIDs. We
> +use the most-significant bit for special purposes, so we can store at most
> +(1 << 31) - 1 (around 2 billion) commits.
> +
> +== Commit graph files have the following format:
> +
> +In order to allow extensions that add extra data to the graph, we organize
> +the body into "chunks" and provide a binary lookup table at the beginning
> +of the body. The header includes certain values, such as number of chunks
> +and hash type.
> +
> +All 4-byte numbers are in network order.
> +
> +HEADER:
> +
> +  4-byte signature:
> +      The signature is: {'C', 'G', 'P', 'H'}

The mnemonics: CGPH = Commit GraPH

> +
> +  1-byte version number:
> +      Currently, the only valid version is 1.
> +
> +  1-byte Hash Version (1 = SHA-1)
> +      We infer the hash length (H) from this value.
> +
> +  1-byte number (C) of "chunks"
> +
> +  1-byte (reserved for later use)
> +     Current clients should ignore this value.

All right, with this reserved byte that makes header word-aligned (be it
32-bit or 64-bit)

> +
> +CHUNK LOOKUP:
> +
> +  (C + 1) * 12 bytes listing the table of contents for the chunks:
> +      First 4 bytes describe the chunk id. Value 0 is a terminating label.

As I understand it, it is value 0 as 4-byte integer, that is 4 x byte 0.
This may need clarification (or may need not).

> +      Other 8 bytes provide the byte-offset in current file for chunk to
> +      start. (Chunks are ordered contiguously in the file, so you can infer
> +      the length using the next chunk position if necessary.) Each chunk
> +      ID appears at most once.
> +
> +  The remaining data in the body is described one chunk at a time, and
> +  these chunks may be given in any order. Chunks are required unless
> +  otherwise specified.

Perhaps there should be here list of all required chunks, and list of
all optional chunks.

> +
> +CHUNK DATA:
> +
> +  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)

The mnemonics: OIDF = CID Fanout

> +      The ith entry, F[i], stores the number of OIDs with first
> +      byte at most i. Thus F[255] stores the total
> +      number of commits (N).

So in other words this is cumulative histogram?  Just ensuring that I
understand it correctly.

> +
> +  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)

The mnemonics: OIDL = OID Lookup

> +      The OIDs for all commits in the graph, sorted in ascending order.
> +
> +  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)

The mnemonics: CGET = ???

It is not CDAT, CGDT, DATA, etc.

> +    * The first H bytes are for the OID of the root tree.
> +    * The next 8 bytes are for the positions of the first two parents
> +      of the ith commit. Stores value 0xffffffff if no parent in that
> +      position. If there are more than two parents, the second value
> +      has its most-significant bit on and the other bits store an array
> +      position into the Large Edge List chunk.

Possibly better:

  +      position into the Large Edge List chunk (EDGE chunk).

> +    * The next 8 bytes store the generation number of the commit and
> +      the commit time in seconds since EPOCH.

The commit time is committer date (without timezone info), that is the
date that the commit object was created, isn't it?

>                                                 The generation number
> +      uses the higher 30 bits of the first 4 bytes, while the commit
> +      time uses the 32 bits of the second 4 bytes, along with the lowest
> +      2 bits of the lowest byte, storing the 33rd and 34th bit of the
> +      commit time.
> +
> +  Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]

Well, it is optional in the sense that it may not be here if the project
doesn't have any octopus merges.  It is not optional in the sense that
Git can ignore this chunk if it doesn't know its type.


Sidenote: PNG has critical and ancillary chunks [1]. Critical chunks
contain information that is necessary to read the file. If a decoder
encounters a critical chunk it does not recognize, it must abort reading
the file or supply the user with an appropriate warning.  If the first
letter of chunk name is uppercase, the chunk is critical; if not, the
chunk is ancillary.

[1]: https://en.wikipedia.org/wiki/Portable_Network_Graphics#%22Chunks%22_within_the_file

> +      This list of 4-byte values store the second through nth parents for
> +      all octopus merges. The second parent value in the commit data stores
> +      an array position within this list along with the most-significant bit
> +      on. Starting at that array position, iterate through this list of commit
> +      positions for the parents until reaching a value with the most-significant
> +      bit on. The other bits correspond to the position of the last parent.

The second sentence in the above paragraph is not entirely clean to me.

> +
> +TRAILER:
> +
> +	H-byte HASH-checksum of all of the above.

Best,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v7 04/14] graph: add commit graph design document
  2018-04-02 20:34     ` [PATCH v7 04/14] graph: add commit graph design document Derrick Stolee
@ 2018-04-08 11:06       ` Jakub Narebski
  0 siblings, 0 replies; 110+ messages in thread
From: Jakub Narebski @ 2018-04-08 11:06 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, sbeller, szeder.dev, ramsay, git, peff,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> Add Documentation/technical/commit-graph.txt with details of the planned
> commit graph feature, including future plans.

That's in my opinion a very good idea.  It would help anyone trying to
add to and extend this feature.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/technical/commit-graph.txt | 163 +++++++++++++++++++++++
>  1 file changed, 163 insertions(+)
>  create mode 100644 Documentation/technical/commit-graph.txt
>
> diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
> new file mode 100644
> index 0000000000..0550c6d0dc
> --- /dev/null
> +++ b/Documentation/technical/commit-graph.txt
> @@ -0,0 +1,163 @@
> +Git Commit Graph Design Notes
> +=============================
> +
> +Git walks the commit graph for many reasons, including:
> +
> +1. Listing and filtering commit history.
> +2. Computing merge bases.
> +
> +These operations can become slow as the commit count grows. The merge
> +base calculation shows up in many user-facing commands, such as 'merge-base'
> +or 'status' and can take minutes to compute depending on history shape.
> +
> +There are two main costs here:
> +
> +1. Decompressing and parsing commits.
> +2. Walking the entire graph to satisfy topological order constraints.
> +
> +The commit graph file is a supplemental data structure that accelerates
> +commit graph walks. If a user downgrades or disables the 'core.commitGraph'
> +config setting, then the existing ODB is sufficient.

This is a good explanation of why we want to have this, and why in this
form.  I really like the "Related links" with summary.

>                                                       The file is stored
> +as "commit-graph" either in the .git/objects/info directory or in the info
> +directory of an alternate.

That is a good thing.

Do I understand it correctly that Git would use first "commit-graph"
file that it would encounter, or would it merge information from all
alternates (including perhaps self)?  What if repository is a
subtree-merge of different repositories, with each listed separately as
alternate?

> +
> +The commit graph file stores the commit graph structure along with some
> +extra metadata to speed up graph walks. By listing commit OIDs in lexi-
> +cographic order, we can identify an integer position for each commit and
> +refer to the parents of a commit using those integer positions. We use
> +binary search to find initial commits and then use the integer positions
> +for fast lookups during the walk.

It might be worth emphasizing here that fast access using integer
positions for commits in the chunk is possible only if chunk used
fixed-width format (each commit taking the same amount of space -- which
for example is not true for packfile).

> +
> +A consumer may load the following info for a commit from the graph:
> +
> +1. The commit OID.
> +2. The list of parents, along with their integer position.
> +3. The commit date.
> +4. The root tree OID.
> +5. The generation number (see definition below).
> +
> +Values 1-4 satisfy the requirements of parse_commit_gently().

Good to have it here.  It is nice to know why 1-4 are needed to be in
the "commit-graph" structure.

Do I understand it correctly that this feature is what makes porting Git
to start using "commit-graph" information easy, because it is single
point of entry, isn't it?

> +
> +Define the "generation number" of a commit recursively as follows:
> +
> + * A commit with no parents (a root commit) has generation number one.
> +
> + * A commit with at least one parent has generation number one more than
> +   the largest generation number among its parents.
> +
> +Equivalently, the generation number of a commit A is one more than the
> +length of a longest path from A to a root commit. The recursive definition
> +is easier to use for computation and observing the following property:
> +
> +    If A and B are commits with generation numbers N and M, respectively,
> +    and N <= M, then A cannot reach B. That is, we know without searching
> +    that B is not an ancestor of A because it is further from a root commit
> +    than A.

Because generation numbers from the "commit-graph" may not cover all
commits (recent commits can have no generation number information), I
think it would be good idea to state what happens then.

Because of how generation numbers are defined, if commit A has
generation number provided, then all ancestors also have generation
number information.  Thus:

       If A is a commit with generation number N, and B is a commit
       without generation number information, then A cannot reach B.
       That is, we know without searching that B is not an ancestor of A
       because if it was, it would have generation number information.

Additionally (but that might be not worth adding here):

       If A is a commit without generation number information, and B is
       a commit with generation number M, then we walk till we get to
       commits with generation numbers, at which points the problem
       reduces to the first case.


Which I think (but it might be worth spelling out explicitly) is what
using "infinity" number for unknown generation number gives
automatically.

> +
> +    Conversely, when checking if A is an ancestor of B, then we only need
> +    to walk commits until all commits on the walk boundary have generation
> +    number at most N. If we walk commits using a priority queue seeded by
> +    generation numbers, then we always expand the boundary commit with highest
> +    generation number and can easily detect the stopping condition.

Well, that is a bit information dense for me.  Let me unpack, and check
if I understand it correctly.

We have a priority queue, ordered by generation number:

* If it is min-order queue, where we can find commit on the walk
  boundary with lowest generation number, we can quickly discard those
  that cannot reach B because they have generation number smaller than
  M=gen(B).

  If I understand it correctly, with this we would get shortest path, if
  it exists.
  
* If it is max-order queue, we can quickly find if there is anything in
  the queue that can reach B: if maximum generation number in the
  priority queue is smaller than M=gen(B), we can discard the whole
  queue and decide that B cannot be reached.

Which is it?


Sidenote: what makes the above possible is that generation numbers
(which are integers) are fully ordered: we have gen(A) < gen(B) or
gen(A) = gen(B), or gen(B) < gen(A).  This is not the case for FELINE
index (which is, similarly to generation number, a negative-cut filter),
neither for min-post spanning-tree intervals index (which is a
positive-cut filter).


To gather information about [some of] reachability helpers in one place,
let me put it here:

 - If A can reach B (if B is ancestor of A) and A != B,
   then gen(B) < gen(A)

 - If A can reach B (if B is ancestor of A),
   then fel(A) ≼ fel(B),
   where fel(A) = (x_A, y_A), fel(B) = (x_B, y_B),
   and ((x_B, y_B) ≼ (x_B, y_B)) <=> (x_A <= x_B and y_A <= y_B)
   for NW [weak] dominance drawing

 + If interval(B) ⊆ interval(A),
   then A can reach B via spanning-tree path (B is ancestor of A)

> +
> +This property can be used to significantly reduce the time it takes to
> +walk commits and determine topological relationships. Without generation
> +numbers, the general heuristic is the following:
> +
> +    If A and B are commits with commit time X and Y, respectively, and
> +    X < Y, then A _probably_ cannot reach B.
> +
> +This heuristic is currently used whenever the computation is allowed to
> +violate topological relationships due to clock skew (such as "git log"
> +with default order), but is not used when the topological order is
> +required (such as merge base calculations, "git log --graph").

Doesn't Git have some kind of allowed clock skew, doesn't it have some
slop value for comparing timestamps?  Just for completeness.

This of course is not needed for generation numbers.

> +
> +In practice, we expect some commits to be created recently and not stored
> +in the commit graph. We can treat these commits as having "infinite"
> +generation number and walk until reaching commits with known generation
> +number.

Ah, so here is some of the information on what to do if commit A does
not have generation number information, or commit B does not have
generation number information, or both do not have it.

> +
> +Design Details
> +--------------
> +
> +- The commit graph file is stored in a file named 'commit-graph' in the
> +  .git/objects/info directory. This could be stored in the info directory
> +  of an alternate.

Note that this repeats some of information from above, which is not
necessarily a bad thing, but we need to take care to keep the two in
sync if we change either one.

> +
> +- The core.commitGraph config setting must be on to consume graph files.

This is similar to how it is done for bitmaps, with pack.useBitmaps.  We
would probably have separate settng for automatic creation and updating
of "commit-pack" files, isn't it?

> +
> +- The file format includes parameters for the object ID hash function,
> +  so a future change of hash algorithm does not require a change in format.

That's a good thing.  As I understand the format uses ID hash for
addressing and for integrity check (the latter could in princible be
replaced by other fast to calculate and reliable checksum, like
e.g. CRC).

If I understand it you can port "commit-graph" file from one ID hash to
other without recalculating it, isn't it?  The only thing that is needed
is old-to-new mappings for commits and top trees.

> +
> +Future Work
> +-----------
> +
> +- The commit graph feature currently does not honor commit grafts. This can
> +  be remedied by duplicating or refactoring the current graft logic.

From what I remember there are actually three mechanisms that can affect
the way Git views history:
 * the [deprecated] grafts file
 * shallow clone, which is kind of grafts file
 * the git-replace feature (if it replaces commits with other commits
   with different parents)

All of those could change.  Also, one can request for Git to not honor
replacements, and cloning does not by default transfer replacement info.

We also need to clarify what "does not honor" means for _use_ of commit
graph feature (not only for _generation_ of it).  Does it mean that it
is disabled if any of those [rare] features are in use?


I think in the future we could add VAL4 / VALF chunk that would specify
for what contents of grafts, shallow and replaces the commit graph was
constructed.  If the value does not match, or if the repository has some
history-view-changing feature enabled that is not included in VAL4
chunk, then Git cannot use information in this file.  What are your
thoughts about this?

> +
> +- The 'commit-graph' subcommand does not have a "verify" mode that is
> +  necessary for integration with fsck.

This is still in the future, isn't it?

> +
> +- The file format includes room for precomputed generation numbers. These
> +  are not currently computed, so all generation numbers will be marked as
> +  0 (or "uncomputed"). A later patch will include this calculation.
> +
> +- After computing and storing generation numbers, we must make graph
> +  walks aware of generation numbers to gain the performance benefits they
> +  enable. This will mostly be accomplished by swapping a commit-date-ordered
> +  priority queue with one ordered by generation number. The following
> +  operations are important candidates:
> +
> +    - paint_down_to_common()
> +    - 'log --topo-order'
> +
> +- Currently, parse_commit_gently() requires filling in the root tree
> +  object for a commit. This passes through lookup_tree() and consequently
> +  lookup_object(). Also, it calls lookup_commit() when loading the parents.
> +  These method calls check the ODB for object existence, even if the
> +  consumer does not need the content. For example, we do not need the
> +  tree contents when computing merge bases. Now that commit parsing is
> +  removed from the computation time, these lookup operations are the
> +  slowest operations keeping graph walks from being fast. Consider
> +  loading these objects without verifying their existence in the ODB and
> +  only loading them fully when consumers need them. Consider a method
> +  such as "ensure_tree_loaded(commit)" that fully loads a tree before
> +  using commit->tree.
> +
> +- The current design uses the 'commit-graph' subcommand to generate the graph.
> +  When this feature stabilizes enough to recommend to most users, we should
> +  add automatic graph writes to common operations that create many commits.
> +  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
> +  commands.

All of the above were sent as subsequent patch series, isn't it?

> +
> +- A server could provide a commit graph file as part of the network protocol
> +  to avoid extra calculations by clients. This feature is only of benefit if
> +  the user is willing to trust the file, because verifying the file is correct
> +  is as hard as computing it from scratch.

VSTS / GVFS has it, isn't it?

> +
> +Related Links
> +-------------

I really like this, and especially the summary of each entry.

> +[0] https://bugs.chromium.org/p/git/issues/detail?id=8
> +    Chromium work item for: Serialized Commit Graph
> +
> +[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/
> +    An abandoned patch that introduced generation numbers.
> +
> +[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
> +    Discussion about generation numbers on commits and how they interact
> +    with fsck.
> +
> +[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/
> +    More discussion about generation numbers and not storing them inside
> +    commit objects. A valuable quote:
> +
> +    "I think we should be moving more in the direction of keeping
> +     repo-local caches for optimizations. Reachability bitmaps have been
> +     a big performance win. I think we should be doing the same with our
> +     properties of commits. Not just generation numbers, but making it
> +     cheap to access the graph structure without zlib-inflating whole
> +     commit objects (i.e., packv4 or something like the "metapacks" I
> +     proposed a few years ago)."
> +
> +[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
> +    A patch to remove the ahead-behind calculation from 'status'.

Thank you for all the work on this,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v7 07/14] commit-graph: implement git-commit-graph write
  2018-04-02 20:34     ` [PATCH v7 07/14] commit-graph: implement git-commit-graph write Derrick Stolee
@ 2018-04-08 11:59       ` Jakub Narebski
  0 siblings, 0 replies; 110+ messages in thread
From: Jakub Narebski @ 2018-04-08 11:59 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, sbeller, szeder.dev, ramsay, git, peff,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> +# Current graph structure:
> +#
> +#   __M3___
> +#  /   |   \
> +# 3 M1 5 M2 7
> +# |/  \|/  \|
> +# 2    4    6
> +# |___/____/
> +# 1

Good, so we are testing EDGE chunk, because the commit graph has octopus
merge in it (with more than two parents).

Do we need to test multiple roots, and/or independent (orphan) branches
cases?

Best,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v7 08/14] commit-graph: implement git commit-graph read
  2018-04-02 20:34     ` [PATCH v7 08/14] commit-graph: implement git commit-graph read Derrick Stolee
  2018-04-02 21:33       ` Junio C Hamano
@ 2018-04-08 12:59       ` Jakub Narebski
  1 sibling, 0 replies; 110+ messages in thread
From: Jakub Narebski @ 2018-04-08 12:59 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, sbeller, szeder.dev, ramsay, git, peff,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

[...]
>  EXAMPLES
>  --------
> @@ -45,6 +51,12 @@ EXAMPLES
>  $ git commit-graph write
>  ------------------------------------------------
>  
> +* Read basic information from the commit-graph file.
> ++
> +------------------------------------------------
> +$ git commit-graph read
> +------------------------------------------------

It would be better to have example output of this command here, perhaps
together with ASCII-art diagram of the code graph.

[...]
> +	if (!graph)
> +		die("graph file %s does not exist", graph_name);
> +	FREE_AND_NULL(graph_name);

Shouldn't the error message be marked up for translation, too?

Regards,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v7 09/14] commit-graph: add core.commitGraph setting
  2018-04-02 20:34     ` [PATCH v7 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
@ 2018-04-08 13:39       ` Jakub Narebski
  0 siblings, 0 replies; 110+ messages in thread
From: Jakub Narebski @ 2018-04-08 13:39 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, sbeller, szeder.dev, ramsay, git, peff,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The commit graph feature is controlled by the new core.commitGraph config
> setting. This defaults to 0, so the feature is opt-in.

Nice.  That's how bitmaps feature was introduced, I think.  I guess that
in the future reading would be opt-out, isn't it, same as currently for
bitmaps (pack.useBitmaps setting).

>
> The intention of core.commitGraph is that a user can always stop checking
> for or parsing commit graph files if core.commitGraph=0.

Shouldn't it be "false", not "0"?

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/config.txt | 4 ++++
>  cache.h                  | 1 +
>  config.c                 | 5 +++++
>  environment.c            | 1 +
>  4 files changed, 11 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 4e0cff87f6..e5c7013fb0 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -898,6 +898,10 @@ core.notesRef::
>  This setting defaults to "refs/notes/commits", and it can be overridden by
>  the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
>  
> +core.commitGraph::
> +	Enable git commit graph feature. Allows reading from the
> +	commit-graph file.
> +

Good.  It would be nice to have "Defaults to false." and possibly also
reference to "git-commit-graph(1)" manpage for more details, though.

>  core.sparseCheckout::
>  	Enable "sparse checkout" feature. See section "Sparse checkout" in
>  	linkgit:git-read-tree[1] for more information.
[...]

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v7 13/14] commit-graph: build graph from starting commits
  2018-04-02 20:34     ` [PATCH v7 13/14] commit-graph: build graph from starting commits Derrick Stolee
@ 2018-04-08 13:50       ` Jakub Narebski
  0 siblings, 0 replies; 110+ messages in thread
From: Jakub Narebski @ 2018-04-08 13:50 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, sbeller, szeder.dev, ramsay, git, peff,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> @@ -96,10 +101,12 @@ static int graph_write(int argc, const char **argv)
>  			     builtin_commit_graph_write_options,
>  			     builtin_commit_graph_write_usage, 0);
>  
> +	if (opts.stdin_packs && opts.stdin_commits)
> +		die(_("cannot use both --stdin-commits and --stdin-packs"));

Here error message is marked for translation, which is not the case for
other patches in the series.

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 110+ messages in thread

* [PATCH v8 00/14] Serialized Git Commit Graph
  2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
                       ` (13 preceding siblings ...)
  2018-04-02 20:34     ` [PATCH v7 14/14] commit-graph: implement "--additive" option Derrick Stolee
@ 2018-04-10 12:55     ` Derrick Stolee
  2018-04-10 12:55       ` [PATCH v8 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
                         ` (13 more replies)
  14 siblings, 14 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:55 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

This version covers the review that I missed when rerolling v7. The
file diff is below from previous version, and also PATCH 14/14 was
reworded to use "--append" properly.

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 41c4f76caf..37420ae0fd 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -31,7 +31,7 @@ static struct opts_commit_graph {

 static int graph_read(int argc, const char **argv)
 {
-       struct commit_graph *graph = 0;
+       struct commit_graph *graph = NULL;
        char *graph_name;

        static struct option builtin_commit_graph_read_options[] = {
diff --git a/commit-graph.c b/commit-graph.c
index 1fc63d541b..3ff8c84c0e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -179,7 +179,7 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 }

 /* global storage */
-struct commit_graph *commit_graph = NULL;
+static struct commit_graph *commit_graph = NULL;

 static void prepare_commit_graph_one(const char *obj_dir)
 {

-- >8 -- 

This patch contains a way to serialize the commit graph.

The current implementation defines a new file format to store the graph
structure (parent relationships) and basic commit metadata (commit date,
root tree OID) in order to prevent parsing raw commits while performing
basic graph walks. For example, we do not need to parse the full commit
when performing these walks:

* 'git log --topo-order -1000' walks all reachable commits to avoid
  incorrect topological orders, but only needs the commit message for
  the top 1000 commits.

* 'git merge-base <A> <B>' may walk many commits to find the correct
  boundary between the commits reachable from A and those reachable
  from B. No commit messages are needed.

* 'git branch -vv' checks ahead/behind status for all local branches
  compared to their upstream remote branches. This is essentially as
  hard as computing merge bases for each.

The current patch speeds up these calculations by injecting a check in
parse_commit_gently() to check if there is a graph file and using that
to provide the required metadata to the struct commit.

The file format has room to store generation numbers, which will be
provided as a patch after this framework is merged. Generation numbers
are referenced by the design document but not implemented in order to
make the current patch focus on the graph construction process. Once
that is stable, it will be easier to add generation numbers and make
graph walks aware of generation numbers one-by-one.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 678,653
reachable commits and is behind 'origin/master' by 59,929 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  8.31s |  0.94s | -88%  |
| branch -vv                       |  1.02s |  0.14s | -86%  |
| rev-list --all                   |  5.89s |  1.07s | -81%  |
| rev-list --all --objects         | 66.15s | 58.45s | -11%  |

To test this yourself, run the following on your repo:

  git config core.commitGraph true
  git show-ref -s | git commit-graph write --stdin-commits

The second command writes a commit graph file containing every commit
reachable from your refs. Now, all git commands that walk commits will
check your graph first before consulting the ODB. You can run your own
performance comparisons by toggling the 'core.commitGraph' setting.

[1] https://github.com/derrickstolee/git/pull/2
    A GitHub pull request containing the latest version of this patch.

Derrick Stolee (14):
  csum-file: rename hashclose() to finalize_hashfile()
  csum-file: refactor finalize_hashfile() method
  commit-graph: add format document
  graph: add commit graph design document
  commit-graph: create git-commit-graph builtin
  commit-graph: implement write_commit_graph()
  commit-graph: implement git-commit-graph write
  commit-graph: implement git commit-graph read
  commit-graph: add core.commitGraph setting
  commit-graph: close under reachability
  commit: integrate commit graph with commit parsing
  commit-graph: read only from specific pack-indexes
  commit-graph: build graph from starting commits
  commit-graph: implement "--append" option

 .gitignore                                    |   1 +
 Documentation/config.txt                      |   4 +
 Documentation/git-commit-graph.txt            |  94 +++
 .../technical/commit-graph-format.txt         |  97 +++
 Documentation/technical/commit-graph.txt      | 163 ++++
 Makefile                                      |   2 +
 alloc.c                                       |   1 +
 builtin.h                                     |   1 +
 builtin/commit-graph.c                        | 171 ++++
 builtin/index-pack.c                          |   2 +-
 builtin/pack-objects.c                        |   6 +-
 bulk-checkin.c                                |   4 +-
 cache.h                                       |   1 +
 command-list.txt                              |   1 +
 commit-graph.c                                | 738 ++++++++++++++++++
 commit-graph.h                                |  46 ++
 commit.c                                      |   3 +
 commit.h                                      |   3 +
 config.c                                      |   5 +
 contrib/completion/git-completion.bash        |   2 +
 csum-file.c                                   |  10 +-
 csum-file.h                                   |   9 +-
 environment.c                                 |   1 +
 fast-import.c                                 |   2 +-
 git.c                                         |   1 +
 pack-bitmap-write.c                           |   2 +-
 pack-write.c                                  |   5 +-
 packfile.c                                    |   4 +-
 packfile.h                                    |   2 +
 t/t5318-commit-graph.sh                       | 224 ++++++
 30 files changed, 1584 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 Documentation/technical/commit-graph-format.txt
 create mode 100644 Documentation/technical/commit-graph.txt
 create mode 100644 builtin/commit-graph.c
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h
 create mode 100755 t/t5318-commit-graph.sh


base-commit: 468165c1d8a442994a825f3684528361727cd8c0
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 01/14] csum-file: rename hashclose() to finalize_hashfile()
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
@ 2018-04-10 12:55       ` Derrick Stolee
  2018-04-10 12:55       ` [PATCH v8 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
                         ` (12 subsequent siblings)
  13 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:55 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The hashclose() method behaves very differently depending on the flags
parameter. In particular, the file descriptor is not always closed.

Perform a simple rename of "hashclose()" to "finalize_hashfile()" in
preparation for functional changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/index-pack.c   | 2 +-
 builtin/pack-objects.c | 6 +++---
 bulk-checkin.c         | 4 ++--
 csum-file.c            | 2 +-
 csum-file.h            | 4 ++--
 fast-import.c          | 2 +-
 pack-bitmap-write.c    | 2 +-
 pack-write.c           | 4 ++--
 8 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index bda84a92ef..8bcf280e0b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1270,7 +1270,7 @@ static void conclude_pack(int fix_thin_pack, const char *curr_pack, unsigned cha
 			    nr_objects - nr_objects_initial);
 		stop_progress_msg(&progress, msg.buf);
 		strbuf_release(&msg);
-		hashclose(f, tail_hash, 0);
+		finalize_hashfile(f, tail_hash, 0);
 		hashcpy(read_hash, pack_hash);
 		fixup_pack_header_footer(output_fd, pack_hash,
 					 curr_pack, nr_objects,
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e9d3cfb9e3..ab3e80ee49 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -837,11 +837,11 @@ static void write_pack_file(void)
 		 * If so, rewrite it like in fast-import
 		 */
 		if (pack_to_stdout) {
-			hashclose(f, oid.hash, CSUM_CLOSE);
+			finalize_hashfile(f, oid.hash, CSUM_CLOSE);
 		} else if (nr_written == nr_remaining) {
-			hashclose(f, oid.hash, CSUM_FSYNC);
+			finalize_hashfile(f, oid.hash, CSUM_FSYNC);
 		} else {
-			int fd = hashclose(f, oid.hash, 0);
+			int fd = finalize_hashfile(f, oid.hash, 0);
 			fixup_pack_header_footer(fd, oid.hash, pack_tmp_name,
 						 nr_written, oid.hash, offset);
 			close(fd);
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 9d87eac07b..227cc9f3b1 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -35,9 +35,9 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
 		unlink(state->pack_tmp_name);
 		goto clear_exit;
 	} else if (state->nr_written == 1) {
-		hashclose(state->f, oid.hash, CSUM_FSYNC);
+		finalize_hashfile(state->f, oid.hash, CSUM_FSYNC);
 	} else {
-		int fd = hashclose(state->f, oid.hash, 0);
+		int fd = finalize_hashfile(state->f, oid.hash, 0);
 		fixup_pack_header_footer(fd, oid.hash, state->pack_tmp_name,
 					 state->nr_written, oid.hash,
 					 state->offset);
diff --git a/csum-file.c b/csum-file.c
index 5eda7fb6af..e6c95a6915 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -53,7 +53,7 @@ void hashflush(struct hashfile *f)
 	}
 }
 
-int hashclose(struct hashfile *f, unsigned char *result, unsigned int flags)
+int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int flags)
 {
 	int fd;
 
diff --git a/csum-file.h b/csum-file.h
index 992e5c0141..9ba87f0a6c 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -26,14 +26,14 @@ struct hashfile_checkpoint {
 extern void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *);
 extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
 
-/* hashclose flags */
+/* finalize_hashfile flags */
 #define CSUM_CLOSE	1
 #define CSUM_FSYNC	2
 
 extern struct hashfile *hashfd(int fd, const char *name);
 extern struct hashfile *hashfd_check(const char *name);
 extern struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp);
-extern int hashclose(struct hashfile *, unsigned char *, unsigned int);
+extern int finalize_hashfile(struct hashfile *, unsigned char *, unsigned int);
 extern void hashwrite(struct hashfile *, const void *, unsigned int);
 extern void hashflush(struct hashfile *f);
 extern void crc32_begin(struct hashfile *);
diff --git a/fast-import.c b/fast-import.c
index b5db5d20b1..6d96f55d9d 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1016,7 +1016,7 @@ static void end_packfile(void)
 		struct tag *t;
 
 		close_pack_windows(pack_data);
-		hashclose(pack_file, cur_pack_oid.hash, 0);
+		finalize_hashfile(pack_file, cur_pack_oid.hash, 0);
 		fixup_pack_header_footer(pack_data->pack_fd, pack_data->sha1,
 				    pack_data->pack_name, object_count,
 				    cur_pack_oid.hash, pack_size);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e01f992884..662b44f97d 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -535,7 +535,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
 
-	hashclose(f, NULL, CSUM_FSYNC);
+	finalize_hashfile(f, NULL, CSUM_FSYNC);
 
 	if (adjust_shared_perm(tmp_file.buf))
 		die_errno("unable to make temporary bitmap file readable");
diff --git a/pack-write.c b/pack-write.c
index d775c7406d..044f427392 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -170,8 +170,8 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	}
 
 	hashwrite(f, sha1, the_hash_algo->rawsz);
-	hashclose(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
-			    ? CSUM_CLOSE : CSUM_FSYNC));
+	finalize_hashfile(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
+				    ? CSUM_CLOSE : CSUM_FSYNC));
 	return index_name;
 }
 
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 02/14] csum-file: refactor finalize_hashfile() method
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
  2018-04-10 12:55       ` [PATCH v8 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
@ 2018-04-10 12:55       ` Derrick Stolee
  2018-04-10 12:55       ` [PATCH v8 03/14] commit-graph: add format document Derrick Stolee
                         ` (11 subsequent siblings)
  13 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:55 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

If we want to use a hashfile on the temporary file for a lockfile, then
we need finalize_hashfile() to fully write the trailing hash but also keep
the file descriptor open.

Do this by adding a new CSUM_HASH_IN_STREAM flag along with a functional
change that checks this flag before writing the checksum to the stream.
This differs from previous behavior since it would be written if either
CSUM_CLOSE or CSUM_FSYNC is provided.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/pack-objects.c | 4 ++--
 bulk-checkin.c         | 2 +-
 csum-file.c            | 8 ++++----
 csum-file.h            | 5 +++--
 pack-bitmap-write.c    | 2 +-
 pack-write.c           | 5 +++--
 6 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ab3e80ee49..b09bbf4f4c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -837,9 +837,9 @@ static void write_pack_file(void)
 		 * If so, rewrite it like in fast-import
 		 */
 		if (pack_to_stdout) {
-			finalize_hashfile(f, oid.hash, CSUM_CLOSE);
+			finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_CLOSE);
 		} else if (nr_written == nr_remaining) {
-			finalize_hashfile(f, oid.hash, CSUM_FSYNC);
+			finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
 		} else {
 			int fd = finalize_hashfile(f, oid.hash, 0);
 			fixup_pack_header_footer(fd, oid.hash, pack_tmp_name,
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 227cc9f3b1..70b14fdf41 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -35,7 +35,7 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state)
 		unlink(state->pack_tmp_name);
 		goto clear_exit;
 	} else if (state->nr_written == 1) {
-		finalize_hashfile(state->f, oid.hash, CSUM_FSYNC);
+		finalize_hashfile(state->f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
 	} else {
 		int fd = finalize_hashfile(state->f, oid.hash, 0);
 		fixup_pack_header_footer(fd, oid.hash, state->pack_tmp_name,
diff --git a/csum-file.c b/csum-file.c
index e6c95a6915..53ce37f7ca 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -61,11 +61,11 @@ int finalize_hashfile(struct hashfile *f, unsigned char *result, unsigned int fl
 	the_hash_algo->final_fn(f->buffer, &f->ctx);
 	if (result)
 		hashcpy(result, f->buffer);
-	if (flags & (CSUM_CLOSE | CSUM_FSYNC)) {
-		/* write checksum and close fd */
+	if (flags & CSUM_HASH_IN_STREAM)
 		flush(f, f->buffer, the_hash_algo->rawsz);
-		if (flags & CSUM_FSYNC)
-			fsync_or_die(f->fd, f->name);
+	if (flags & CSUM_FSYNC)
+		fsync_or_die(f->fd, f->name);
+	if (flags & CSUM_CLOSE) {
 		if (close(f->fd))
 			die_errno("%s: sha1 file error on close", f->name);
 		fd = 0;
diff --git a/csum-file.h b/csum-file.h
index 9ba87f0a6c..c5a2e335e7 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -27,8 +27,9 @@ extern void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *)
 extern int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
 
 /* finalize_hashfile flags */
-#define CSUM_CLOSE	1
-#define CSUM_FSYNC	2
+#define CSUM_CLOSE		1
+#define CSUM_FSYNC		2
+#define CSUM_HASH_IN_STREAM	4
 
 extern struct hashfile *hashfd(int fd, const char *name);
 extern struct hashfile *hashfd_check(const char *name);
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 662b44f97d..db4c832428 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -535,7 +535,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 	if (options & BITMAP_OPT_HASH_CACHE)
 		write_hash_cache(f, index, index_nr);
 
-	finalize_hashfile(f, NULL, CSUM_FSYNC);
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE);
 
 	if (adjust_shared_perm(tmp_file.buf))
 		die_errno("unable to make temporary bitmap file readable");
diff --git a/pack-write.c b/pack-write.c
index 044f427392..a9d46bc03f 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -170,8 +170,9 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec
 	}
 
 	hashwrite(f, sha1, the_hash_algo->rawsz);
-	finalize_hashfile(f, NULL, ((opts->flags & WRITE_IDX_VERIFY)
-				    ? CSUM_CLOSE : CSUM_FSYNC));
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_CLOSE |
+				    ((opts->flags & WRITE_IDX_VERIFY)
+				    ? 0 : CSUM_FSYNC));
 	return index_name;
 }
 
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 03/14] commit-graph: add format document
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
  2018-04-10 12:55       ` [PATCH v8 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
  2018-04-10 12:55       ` [PATCH v8 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
@ 2018-04-10 12:55       ` Derrick Stolee
  2018-04-10 19:10         ` Stefan Beller
  2018-04-11 20:58         ` Jakub Narebski
  2018-04-10 12:55       ` [PATCH v8 04/14] graph: add commit graph design document Derrick Stolee
                         ` (10 subsequent siblings)
  13 siblings, 2 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:55 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Add document specifying the binary format for commit graphs. This
format allows for:

* New versions.
* New hash functions and hash lengths.
* Optional extensions.

Basic header information is followed by a binary table of contents
into "chunks" that include:

* An ordered list of commit object IDs.
* A 256-entry fanout into that list of OIDs.
* A list of metadata for the commits.
* A list of "large edges" to enable octopus merges.

The format automatically includes two parent positions for every
commit. This favors speed over space, since using only one position
per commit would cause an extra level of indirection for every merge
commit. (Octopus merges suffer from this indirection, but they are
very rare.)

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .../technical/commit-graph-format.txt         | 97 +++++++++++++++++++
 1 file changed, 97 insertions(+)
 create mode 100644 Documentation/technical/commit-graph-format.txt

diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt
new file mode 100644
index 0000000000..ad6af8105c
--- /dev/null
+++ b/Documentation/technical/commit-graph-format.txt
@@ -0,0 +1,97 @@
+Git commit graph format
+=======================
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+  generation number 1; commits with parents have generation number
+  one more than the maximum generation number of its parents. We
+  reserve zero as special, and can be used to mark a generation
+  number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+  the graph file.
+
+These positional references are stored as unsigned 32-bit integers
+corresponding to the array position withing the list of commit OIDs. We
+use the most-significant bit for special purposes, so we can store at most
+(1 << 31) - 1 (around 2 billion) commits.
+
+== Commit graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks
+and hash type.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+  4-byte signature:
+      The signature is: {'C', 'G', 'P', 'H'}
+
+  1-byte version number:
+      Currently, the only valid version is 1.
+
+  1-byte Hash Version (1 = SHA-1)
+      We infer the hash length (H) from this value.
+
+  1-byte number (C) of "chunks"
+
+  1-byte (reserved for later use)
+     Current clients should ignore this value.
+
+CHUNK LOOKUP:
+
+  (C + 1) * 12 bytes listing the table of contents for the chunks:
+      First 4 bytes describe the chunk id. Value 0 is a terminating label.
+      Other 8 bytes provide the byte-offset in current file for chunk to
+      start. (Chunks are ordered contiguously in the file, so you can infer
+      the length using the next chunk position if necessary.) Each chunk
+      ID appears at most once.
+
+  The remaining data in the body is described one chunk at a time, and
+  these chunks may be given in any order. Chunks are required unless
+  otherwise specified.
+
+CHUNK DATA:
+
+  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+      The ith entry, F[i], stores the number of OIDs with first
+      byte at most i. Thus F[255] stores the total
+      number of commits (N).
+
+  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+      The OIDs for all commits in the graph, sorted in ascending order.
+
+  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+    * The first H bytes are for the OID of the root tree.
+    * The next 8 bytes are for the positions of the first two parents
+      of the ith commit. Stores value 0xffffffff if no parent in that
+      position. If there are more than two parents, the second value
+      has its most-significant bit on and the other bits store an array
+      position into the Large Edge List chunk.
+    * The next 8 bytes store the generation number of the commit and
+      the commit time in seconds since EPOCH. The generation number
+      uses the higher 30 bits of the first 4 bytes, while the commit
+      time uses the 32 bits of the second 4 bytes, along with the lowest
+      2 bits of the lowest byte, storing the 33rd and 34th bit of the
+      commit time.
+
+  Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional]
+      This list of 4-byte values store the second through nth parents for
+      all octopus merges. The second parent value in the commit data stores
+      an array position within this list along with the most-significant bit
+      on. Starting at that array position, iterate through this list of commit
+      positions for the parents until reaching a value with the most-significant
+      bit on. The other bits correspond to the position of the last parent.
+
+TRAILER:
+
+	H-byte HASH-checksum of all of the above.
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 04/14] graph: add commit graph design document
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
                         ` (2 preceding siblings ...)
  2018-04-10 12:55       ` [PATCH v8 03/14] commit-graph: add format document Derrick Stolee
@ 2018-04-10 12:55       ` Derrick Stolee
  2018-04-15 22:48         ` Jakub Narebski
  2018-04-10 12:55       ` [PATCH v8 05/14] commit-graph: create git-commit-graph builtin Derrick Stolee
                         ` (9 subsequent siblings)
  13 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:55 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Add Documentation/technical/commit-graph.txt with details of the planned
commit graph feature, including future plans.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/commit-graph.txt | 163 +++++++++++++++++++++++
 1 file changed, 163 insertions(+)
 create mode 100644 Documentation/technical/commit-graph.txt

diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt
new file mode 100644
index 0000000000..0550c6d0dc
--- /dev/null
+++ b/Documentation/technical/commit-graph.txt
@@ -0,0 +1,163 @@
+Git Commit Graph Design Notes
+=============================
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows. The merge
+base calculation shows up in many user-facing commands, such as 'merge-base'
+or 'status' and can take minutes to compute depending on history shape.
+
+There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to satisfy topological order constraints.
+
+The commit graph file is a supplemental data structure that accelerates
+commit graph walks. If a user downgrades or disables the 'core.commitGraph'
+config setting, then the existing ODB is sufficient. The file is stored
+as "commit-graph" either in the .git/objects/info directory or in the info
+directory of an alternate.
+
+The commit graph file stores the commit graph structure along with some
+extra metadata to speed up graph walks. By listing commit OIDs in lexi-
+cographic order, we can identify an integer position for each commit and
+refer to the parents of a commit using those integer positions. We use
+binary search to find initial commits and then use the integer positions
+for fast lookups during the walk.
+
+A consumer may load the following info for a commit from the graph:
+
+1. The commit OID.
+2. The list of parents, along with their integer position.
+3. The commit date.
+4. The root tree OID.
+5. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+Define the "generation number" of a commit recursively as follows:
+
+ * A commit with no parents (a root commit) has generation number one.
+
+ * A commit with at least one parent has generation number one more than
+   the largest generation number among its parents.
+
+Equivalently, the generation number of a commit A is one more than the
+length of a longest path from A to a root commit. The recursive definition
+is easier to use for computation and observing the following property:
+
+    If A and B are commits with generation numbers N and M, respectively,
+    and N <= M, then A cannot reach B. That is, we know without searching
+    that B is not an ancestor of A because it is further from a root commit
+    than A.
+
+    Conversely, when checking if A is an ancestor of B, then we only need
+    to walk commits until all commits on the walk boundary have generation
+    number at most N. If we walk commits using a priority queue seeded by
+    generation numbers, then we always expand the boundary commit with highest
+    generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+    If A and B are commits with commit time X and Y, respectively, and
+    X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation is allowed to
+violate topological relationships due to clock skew (such as "git log"
+with default order), but is not used when the topological order is
+required (such as merge base calculations, "git log --graph").
+
+In practice, we expect some commits to be created recently and not stored
+in the commit graph. We can treat these commits as having "infinite"
+generation number and walk until reaching commits with known generation
+number.
+
+Design Details
+--------------
+
+- The commit graph file is stored in a file named 'commit-graph' in the
+  .git/objects/info directory. This could be stored in the info directory
+  of an alternate.
+
+- The core.commitGraph config setting must be on to consume graph files.
+
+- The file format includes parameters for the object ID hash function,
+  so a future change of hash algorithm does not require a change in format.
+
+Future Work
+-----------
+
+- The commit graph feature currently does not honor commit grafts. This can
+  be remedied by duplicating or refactoring the current graft logic.
+
+- The 'commit-graph' subcommand does not have a "verify" mode that is
+  necessary for integration with fsck.
+
+- The file format includes room for precomputed generation numbers. These
+  are not currently computed, so all generation numbers will be marked as
+  0 (or "uncomputed"). A later patch will include this calculation.
+
+- After computing and storing generation numbers, we must make graph
+  walks aware of generation numbers to gain the performance benefits they
+  enable. This will mostly be accomplished by swapping a commit-date-ordered
+  priority queue with one ordered by generation number. The following
+  operations are important candidates:
+
+    - paint_down_to_common()
+    - 'log --topo-order'
+
+- Currently, parse_commit_gently() requires filling in the root tree
+  object for a commit. This passes through lookup_tree() and consequently
+  lookup_object(). Also, it calls lookup_commit() when loading the parents.
+  These method calls check the ODB for object existence, even if the
+  consumer does not need the content. For example, we do not need the
+  tree contents when computing merge bases. Now that commit parsing is
+  removed from the computation time, these lookup operations are the
+  slowest operations keeping graph walks from being fast. Consider
+  loading these objects without verifying their existence in the ODB and
+  only loading them fully when consumers need them. Consider a method
+  such as "ensure_tree_loaded(commit)" that fully loads a tree before
+  using commit->tree.
+
+- The current design uses the 'commit-graph' subcommand to generate the graph.
+  When this feature stabilizes enough to recommend to most users, we should
+  add automatic graph writes to common operations that create many commits.
+  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
+  commands.
+
+- A server could provide a commit graph file as part of the network protocol
+  to avoid extra calculations by clients. This feature is only of benefit if
+  the user is willing to trust the file, because verifying the file is correct
+  is as hard as computing it from scratch.
+
+Related Links
+-------------
+[0] https://bugs.chromium.org/p/git/issues/detail?id=8
+    Chromium work item for: Serialized Commit Graph
+
+[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/
+    An abandoned patch that introduced generation numbers.
+
+[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
+    Discussion about generation numbers on commits and how they interact
+    with fsck.
+
+[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/
+    More discussion about generation numbers and not storing them inside
+    commit objects. A valuable quote:
+
+    "I think we should be moving more in the direction of keeping
+     repo-local caches for optimizations. Reachability bitmaps have been
+     a big performance win. I think we should be doing the same with our
+     properties of commits. Not just generation numbers, but making it
+     cheap to access the graph structure without zlib-inflating whole
+     commit objects (i.e., packv4 or something like the "metapacks" I
+     proposed a few years ago)."
+
+[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
+    A patch to remove the ahead-behind calculation from 'status'.
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 05/14] commit-graph: create git-commit-graph builtin
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
                         ` (3 preceding siblings ...)
  2018-04-10 12:55       ` [PATCH v8 04/14] graph: add commit graph design document Derrick Stolee
@ 2018-04-10 12:55       ` Derrick Stolee
  2018-04-10 12:56       ` [PATCH v8 06/14] commit-graph: implement write_commit_graph() Derrick Stolee
                         ` (8 subsequent siblings)
  13 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:55 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git the 'commit-graph' builtin that will be used for writing and
reading packed graph files. The current implementation is mostly
empty, except for an '--object-dir' option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 .gitignore                             |  1 +
 Documentation/git-commit-graph.txt     | 10 +++++++
 Makefile                               |  1 +
 builtin.h                              |  1 +
 builtin/commit-graph.c                 | 36 ++++++++++++++++++++++++++
 command-list.txt                       |  1 +
 contrib/completion/git-completion.bash |  2 ++
 git.c                                  |  1 +
 8 files changed, 53 insertions(+)
 create mode 100644 Documentation/git-commit-graph.txt
 create mode 100644 builtin/commit-graph.c

diff --git a/.gitignore b/.gitignore
index 833ef3b0b7..e82f90184d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -34,6 +34,7 @@
 /git-clone
 /git-column
 /git-commit
+/git-commit-graph
 /git-commit-tree
 /git-config
 /git-count-objects
diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
new file mode 100644
index 0000000000..f3b34622a8
--- /dev/null
+++ b/Documentation/git-commit-graph.txt
@@ -0,0 +1,10 @@
+git-commit-graph(1)
+===================
+
+NAME
+----
+git-commit-graph - Write and verify Git commit graph files
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index a1d8775adb..a59b62bed1 100644
--- a/Makefile
+++ b/Makefile
@@ -952,6 +952,7 @@ BUILTIN_OBJS += builtin/clone.o
 BUILTIN_OBJS += builtin/column.o
 BUILTIN_OBJS += builtin/commit-tree.o
 BUILTIN_OBJS += builtin/commit.o
+BUILTIN_OBJS += builtin/commit-graph.o
 BUILTIN_OBJS += builtin/config.o
 BUILTIN_OBJS += builtin/count-objects.o
 BUILTIN_OBJS += builtin/credential.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa4..079855b6d4 100644
--- a/builtin.h
+++ b/builtin.h
@@ -149,6 +149,7 @@ extern int cmd_clone(int argc, const char **argv, const char *prefix);
 extern int cmd_clean(int argc, const char **argv, const char *prefix);
 extern int cmd_column(int argc, const char **argv, const char *prefix);
 extern int cmd_commit(int argc, const char **argv, const char *prefix);
+extern int cmd_commit_graph(int argc, const char **argv, const char *prefix);
 extern int cmd_commit_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_config(int argc, const char **argv, const char *prefix);
 extern int cmd_count_objects(int argc, const char **argv, const char *prefix);
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
new file mode 100644
index 0000000000..b466ecd781
--- /dev/null
+++ b/builtin/commit-graph.c
@@ -0,0 +1,36 @@
+#include "builtin.h"
+#include "config.h"
+#include "parse-options.h"
+
+static char const * const builtin_commit_graph_usage[] = {
+	N_("git commit-graph [--object-dir <objdir>]"),
+	NULL
+};
+
+static struct opts_commit_graph {
+	const char *obj_dir;
+} opts;
+
+
+int cmd_commit_graph(int argc, const char **argv, const char *prefix)
+{
+	static struct option builtin_commit_graph_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			N_("dir"),
+			N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(builtin_commit_graph_usage,
+				   builtin_commit_graph_options);
+
+	git_config(git_default_config, NULL);
+	argc = parse_options(argc, argv, prefix,
+			     builtin_commit_graph_options,
+			     builtin_commit_graph_usage,
+			     PARSE_OPT_STOP_AT_NON_OPTION);
+
+	usage_with_options(builtin_commit_graph_usage,
+			   builtin_commit_graph_options);
+}
diff --git a/command-list.txt b/command-list.txt
index a1fad28fd8..835c5890be 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -34,6 +34,7 @@ git-clean                               mainporcelain
 git-clone                               mainporcelain           init
 git-column                              purehelpers
 git-commit                              mainporcelain           history
+git-commit-graph                        plumbingmanipulators
 git-commit-tree                         plumbingmanipulators
 git-config                              ancillarymanipulators
 git-count-objects                       ancillaryinterrogators
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index b09c8a2362..6726daaf69 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -878,6 +878,7 @@ __git_list_porcelain_commands ()
 		check-ref-format) : plumbing;;
 		checkout-index)   : plumbing;;
 		column)           : internal helper;;
+		commit-graph)     : plumbing;;
 		commit-tree)      : plumbing;;
 		count-objects)    : infrequent;;
 		credential)       : credentials;;
@@ -2350,6 +2351,7 @@ _git_config ()
 		core.bigFileThreshold
 		core.checkStat
 		core.commentChar
+		core.commitGraph
 		core.compression
 		core.createObject
 		core.deltaBaseCacheLimit
diff --git a/git.c b/git.c
index ceaa58ef40..2808c51de9 100644
--- a/git.c
+++ b/git.c
@@ -388,6 +388,7 @@ static struct cmd_struct commands[] = {
 	{ "clone", cmd_clone },
 	{ "column", cmd_column, RUN_SETUP_GENTLY },
 	{ "commit", cmd_commit, RUN_SETUP | NEED_WORK_TREE },
+	{ "commit-graph", cmd_commit_graph, RUN_SETUP },
 	{ "commit-tree", cmd_commit_tree, RUN_SETUP },
 	{ "config", cmd_config, RUN_SETUP_GENTLY | DELAY_PAGER_CONFIG },
 	{ "count-objects", cmd_count_objects, RUN_SETUP },
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 06/14] commit-graph: implement write_commit_graph()
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
                         ` (4 preceding siblings ...)
  2018-04-10 12:55       ` [PATCH v8 05/14] commit-graph: create git-commit-graph builtin Derrick Stolee
@ 2018-04-10 12:56       ` Derrick Stolee
  2018-04-10 12:56       ` [PATCH v8 07/14] commit-graph: implement git-commit-graph write Derrick Stolee
                         ` (7 subsequent siblings)
  13 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:56 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach Git to write a commit graph file by checking all packed objects
to see if they are commits, then store the file in the given object
directory.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Makefile       |   1 +
 commit-graph.c | 359 +++++++++++++++++++++++++++++++++++++++++++++++++
 commit-graph.h |   6 +
 3 files changed, 366 insertions(+)
 create mode 100644 commit-graph.c
 create mode 100644 commit-graph.h

diff --git a/Makefile b/Makefile
index a59b62bed1..26a23257e9 100644
--- a/Makefile
+++ b/Makefile
@@ -777,6 +777,7 @@ LIB_OBJS += color.o
 LIB_OBJS += column.o
 LIB_OBJS += combine-diff.o
 LIB_OBJS += commit.o
+LIB_OBJS += commit-graph.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
diff --git a/commit-graph.c b/commit-graph.c
new file mode 100644
index 0000000000..f3f7c4f189
--- /dev/null
+++ b/commit-graph.c
@@ -0,0 +1,359 @@
+#include "cache.h"
+#include "config.h"
+#include "git-compat-util.h"
+#include "lockfile.h"
+#include "pack.h"
+#include "packfile.h"
+#include "commit.h"
+#include "object.h"
+#include "revision.h"
+#include "sha1-lookup.h"
+#include "commit-graph.h"
+
+#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
+
+#define GRAPH_DATA_WIDTH 36
+
+#define GRAPH_VERSION_1 0x1
+#define GRAPH_VERSION GRAPH_VERSION_1
+
+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
+#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
+#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
+
+#define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000
+#define GRAPH_PARENT_MISSING 0x7fffffff
+#define GRAPH_EDGE_LAST_MASK 0x7fffffff
+#define GRAPH_PARENT_NONE 0x70000000
+
+#define GRAPH_LAST_EDGE 0x80000000
+
+#define GRAPH_FANOUT_SIZE (4 * 256)
+#define GRAPH_CHUNKLOOKUP_WIDTH 12
+#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
+			GRAPH_OID_LEN + 8)
+
+
+static char *get_commit_graph_filename(const char *obj_dir)
+{
+	return xstrfmt("%s/info/commit-graph", obj_dir);
+}
+
+static void write_graph_chunk_fanout(struct hashfile *f,
+				     struct commit **commits,
+				     int nr_commits)
+{
+	int i, count = 0;
+	struct commit **list = commits;
+
+	/*
+	 * Write the first-level table (the list is sorted,
+	 * but we use a 256-entry lookup to be able to avoid
+	 * having to do eight extra binary search iterations).
+	 */
+	for (i = 0; i < 256; i++) {
+		while (count < nr_commits) {
+			if ((*list)->object.oid.hash[0] != i)
+				break;
+			count++;
+			list++;
+		}
+
+		hashwrite_be32(f, count);
+	}
+}
+
+static void write_graph_chunk_oids(struct hashfile *f, int hash_len,
+				   struct commit **commits, int nr_commits)
+{
+	struct commit **list = commits;
+	int count;
+	for (count = 0; count < nr_commits; count++, list++)
+		hashwrite(f, (*list)->object.oid.hash, (int)hash_len);
+}
+
+static const unsigned char *commit_to_sha1(size_t index, void *table)
+{
+	struct commit **commits = table;
+	return commits[index]->object.oid.hash;
+}
+
+static void write_graph_chunk_data(struct hashfile *f, int hash_len,
+				   struct commit **commits, int nr_commits)
+{
+	struct commit **list = commits;
+	struct commit **last = commits + nr_commits;
+	uint32_t num_extra_edges = 0;
+
+	while (list < last) {
+		struct commit_list *parent;
+		int edge_value;
+		uint32_t packedDate[2];
+
+		parse_commit(*list);
+		hashwrite(f, (*list)->tree->object.oid.hash, hash_len);
+
+		parent = (*list)->parents;
+
+		if (!parent)
+			edge_value = GRAPH_PARENT_NONE;
+		else {
+			edge_value = sha1_pos(parent->item->object.oid.hash,
+					      commits,
+					      nr_commits,
+					      commit_to_sha1);
+
+			if (edge_value < 0)
+				edge_value = GRAPH_PARENT_MISSING;
+		}
+
+		hashwrite_be32(f, edge_value);
+
+		if (parent)
+			parent = parent->next;
+
+		if (!parent)
+			edge_value = GRAPH_PARENT_NONE;
+		else if (parent->next)
+			edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_extra_edges;
+		else {
+			edge_value = sha1_pos(parent->item->object.oid.hash,
+					      commits,
+					      nr_commits,
+					      commit_to_sha1);
+			if (edge_value < 0)
+				edge_value = GRAPH_PARENT_MISSING;
+		}
+
+		hashwrite_be32(f, edge_value);
+
+		if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) {
+			do {
+				num_extra_edges++;
+				parent = parent->next;
+			} while (parent);
+		}
+
+		if (sizeof((*list)->date) > 4)
+			packedDate[0] = htonl(((*list)->date >> 32) & 0x3);
+		else
+			packedDate[0] = 0;
+
+		packedDate[1] = htonl((*list)->date);
+		hashwrite(f, packedDate, 8);
+
+		list++;
+	}
+}
+
+static void write_graph_chunk_large_edges(struct hashfile *f,
+					  struct commit **commits,
+					  int nr_commits)
+{
+	struct commit **list = commits;
+	struct commit **last = commits + nr_commits;
+	struct commit_list *parent;
+
+	while (list < last) {
+		int num_parents = 0;
+		for (parent = (*list)->parents; num_parents < 3 && parent;
+		     parent = parent->next)
+			num_parents++;
+
+		if (num_parents <= 2) {
+			list++;
+			continue;
+		}
+
+		/* Since num_parents > 2, this initializer is safe. */
+		for (parent = (*list)->parents->next; parent; parent = parent->next) {
+			int edge_value = sha1_pos(parent->item->object.oid.hash,
+						  commits,
+						  nr_commits,
+						  commit_to_sha1);
+
+			if (edge_value < 0)
+				edge_value = GRAPH_PARENT_MISSING;
+			else if (!parent->next)
+				edge_value |= GRAPH_LAST_EDGE;
+
+			hashwrite_be32(f, edge_value);
+		}
+
+		list++;
+	}
+}
+
+static int commit_compare(const void *_a, const void *_b)
+{
+	const struct object_id *a = (const struct object_id *)_a;
+	const struct object_id *b = (const struct object_id *)_b;
+	return oidcmp(a, b);
+}
+
+struct packed_commit_list {
+	struct commit **list;
+	int nr;
+	int alloc;
+};
+
+struct packed_oid_list {
+	struct object_id *list;
+	int nr;
+	int alloc;
+};
+
+static int add_packed_commits(const struct object_id *oid,
+			      struct packed_git *pack,
+			      uint32_t pos,
+			      void *data)
+{
+	struct packed_oid_list *list = (struct packed_oid_list*)data;
+	enum object_type type;
+	off_t offset = nth_packed_object_offset(pack, pos);
+	struct object_info oi = OBJECT_INFO_INIT;
+
+	oi.typep = &type;
+	if (packed_object_info(pack, offset, &oi) < 0)
+		die("unable to get type of object %s", oid_to_hex(oid));
+
+	if (type != OBJ_COMMIT)
+		return 0;
+
+	ALLOC_GROW(list->list, list->nr + 1, list->alloc);
+	oidcpy(&(list->list[list->nr]), oid);
+	list->nr++;
+
+	return 0;
+}
+
+void write_commit_graph(const char *obj_dir)
+{
+	struct packed_oid_list oids;
+	struct packed_commit_list commits;
+	struct hashfile *f;
+	uint32_t i, count_distinct = 0;
+	char *graph_name;
+	int fd;
+	struct lock_file lk = LOCK_INIT;
+	uint32_t chunk_ids[5];
+	uint64_t chunk_offsets[5];
+	int num_chunks;
+	int num_extra_edges;
+	struct commit_list *parent;
+
+	oids.nr = 0;
+	oids.alloc = approximate_object_count() / 4;
+
+	if (oids.alloc < 1024)
+		oids.alloc = 1024;
+	ALLOC_ARRAY(oids.list, oids.alloc);
+
+	for_each_packed_object(add_packed_commits, &oids, 0);
+
+	QSORT(oids.list, oids.nr, commit_compare);
+
+	count_distinct = 1;
+	for (i = 1; i < oids.nr; i++) {
+		if (oidcmp(&oids.list[i-1], &oids.list[i]))
+			count_distinct++;
+	}
+
+	if (count_distinct >= GRAPH_PARENT_MISSING)
+		die(_("the commit graph format cannot write %d commits"), count_distinct);
+
+	commits.nr = 0;
+	commits.alloc = count_distinct;
+	ALLOC_ARRAY(commits.list, commits.alloc);
+
+	num_extra_edges = 0;
+	for (i = 0; i < oids.nr; i++) {
+		int num_parents = 0;
+		if (i > 0 && !oidcmp(&oids.list[i-1], &oids.list[i]))
+			continue;
+
+		commits.list[commits.nr] = lookup_commit(&oids.list[i]);
+		parse_commit(commits.list[commits.nr]);
+
+		for (parent = commits.list[commits.nr]->parents;
+		     parent; parent = parent->next)
+			num_parents++;
+
+		if (num_parents > 2)
+			num_extra_edges += num_parents - 1;
+
+		commits.nr++;
+	}
+	num_chunks = num_extra_edges ? 4 : 3;
+
+	if (commits.nr >= GRAPH_PARENT_MISSING)
+		die(_("too many commits to write graph"));
+
+	graph_name = get_commit_graph_filename(obj_dir);
+	fd = hold_lock_file_for_update(&lk, graph_name, 0);
+
+	if (fd < 0) {
+		struct strbuf folder = STRBUF_INIT;
+		strbuf_addstr(&folder, graph_name);
+		strbuf_setlen(&folder, strrchr(folder.buf, '/') - folder.buf);
+
+		if (mkdir(folder.buf, 0777) < 0)
+			die_errno(_("cannot mkdir %s"), folder.buf);
+		strbuf_release(&folder);
+
+		fd = hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR);
+
+		if (fd < 0)
+			die_errno("unable to create '%s'", graph_name);
+	}
+
+	f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
+
+	hashwrite_be32(f, GRAPH_SIGNATURE);
+
+	hashwrite_u8(f, GRAPH_VERSION);
+	hashwrite_u8(f, GRAPH_OID_VERSION);
+	hashwrite_u8(f, num_chunks);
+	hashwrite_u8(f, 0); /* unused padding byte */
+
+	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
+	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
+	chunk_ids[2] = GRAPH_CHUNKID_DATA;
+	if (num_extra_edges)
+		chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
+	else
+		chunk_ids[3] = 0;
+	chunk_ids[4] = 0;
+
+	chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
+	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
+	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
+	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+	chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
+
+	for (i = 0; i <= num_chunks; i++) {
+		uint32_t chunk_write[3];
+
+		chunk_write[0] = htonl(chunk_ids[i]);
+		chunk_write[1] = htonl(chunk_offsets[i] >> 32);
+		chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
+		hashwrite(f, chunk_write, 12);
+	}
+
+	write_graph_chunk_fanout(f, commits.list, commits.nr);
+	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+	write_graph_chunk_large_edges(f, commits.list, commits.nr);
+
+	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
+	commit_lock_file(&lk);
+
+	free(oids.list);
+	oids.alloc = 0;
+	oids.nr = 0;
+}
diff --git a/commit-graph.h b/commit-graph.h
new file mode 100644
index 0000000000..16fea993ab
--- /dev/null
+++ b/commit-graph.h
@@ -0,0 +1,6 @@
+#ifndef COMMIT_GRAPH_H
+#define COMMIT_GRAPH_H
+
+void write_commit_graph(const char *obj_dir);
+
+#endif
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 07/14] commit-graph: implement git-commit-graph write
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
                         ` (5 preceding siblings ...)
  2018-04-10 12:56       ` [PATCH v8 06/14] commit-graph: implement write_commit_graph() Derrick Stolee
@ 2018-04-10 12:56       ` Derrick Stolee
  2018-04-10 12:56       ` [PATCH v8 08/14] commit-graph: implement git commit-graph read Derrick Stolee
                         ` (6 subsequent siblings)
  13 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:56 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to write graph files. Create new test script to verify
this command succeeds without failure.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  41 ++++++++++
 builtin/commit-graph.c             |  33 ++++++++
 t/t5318-commit-graph.sh            | 124 +++++++++++++++++++++++++++++
 3 files changed, 198 insertions(+)
 create mode 100755 t/t5318-commit-graph.sh

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index f3b34622a8..47996e8f89 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -5,6 +5,47 @@ NAME
 ----
 git-commit-graph - Write and verify Git commit graph files
 
+
+SYNOPSIS
+--------
+[verse]
+'git commit-graph write' <options> [--object-dir <dir>]
+
+
+DESCRIPTION
+-----------
+
+Manage the serialized commit graph file.
+
+
+OPTIONS
+-------
+--object-dir::
+	Use given directory for the location of packfiles and commit graph
+	file. This parameter exists to specify the location of an alternate
+	that only has the objects directory, not a full .git directory. The
+	commit graph file is expected to be at <dir>/info/commit-graph and
+	the packfiles are expected to be in <dir>/pack.
+
+
+COMMANDS
+--------
+'write'::
+
+Write a commit graph file based on the commits found in packfiles.
+Includes all commits from the existing commit graph file.
+
+
+EXAMPLES
+--------
+
+* Write a commit graph file for the packed commits in your local .git folder.
++
+------------------------------------------------
+$ git commit-graph write
+------------------------------------------------
+
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index b466ecd781..26b6360289 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -1,9 +1,18 @@
 #include "builtin.h"
 #include "config.h"
+#include "dir.h"
+#include "lockfile.h"
 #include "parse-options.h"
+#include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
+	N_("git commit-graph write [--object-dir <objdir>]"),
+	NULL
+};
+
+static const char * const builtin_commit_graph_write_usage[] = {
+	N_("git commit-graph write [--object-dir <objdir>]"),
 	NULL
 };
 
@@ -11,6 +20,25 @@ static struct opts_commit_graph {
 	const char *obj_dir;
 } opts;
 
+static int graph_write(int argc, const char **argv)
+{
+	static struct option builtin_commit_graph_write_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			N_("dir"),
+			N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_commit_graph_write_options,
+			     builtin_commit_graph_write_usage, 0);
+
+	if (!opts.obj_dir)
+		opts.obj_dir = get_object_directory();
+
+	write_commit_graph(opts.obj_dir);
+	return 0;
+}
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 {
@@ -31,6 +59,11 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 			     builtin_commit_graph_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
+	if (argc > 0) {
+		if (!strcmp(argv[0], "write"))
+			return graph_write(argc, argv);
+	}
+
 	usage_with_options(builtin_commit_graph_usage,
 			   builtin_commit_graph_options);
 }
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
new file mode 100755
index 0000000000..d7b635bd68
--- /dev/null
+++ b/t/t5318-commit-graph.sh
@@ -0,0 +1,124 @@
+#!/bin/sh
+
+test_description='commit graph'
+. ./test-lib.sh
+
+test_expect_success 'setup full repo' '
+	mkdir full &&
+	cd "$TRASH_DIRECTORY/full" &&
+	git init &&
+	objdir=".git/objects"
+'
+
+test_expect_success 'write graph with no packs' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write --object-dir . &&
+	test_path_is_file info/commit-graph
+'
+
+test_expect_success 'create commits and repack' '
+	cd "$TRASH_DIRECTORY/full" &&
+	for i in $(test_seq 3)
+	do
+		test_commit $i &&
+		git branch commits/$i
+	done &&
+	git repack
+'
+
+test_expect_success 'write graph' '
+	cd "$TRASH_DIRECTORY/full" &&
+	graph1=$(git commit-graph write) &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'Add more commits' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git reset --hard commits/1 &&
+	for i in $(test_seq 4 5)
+	do
+		test_commit $i &&
+		git branch commits/$i
+	done &&
+	git reset --hard commits/2 &&
+	for i in $(test_seq 6 7)
+	do
+		test_commit $i &&
+		git branch commits/$i
+	done &&
+	git reset --hard commits/2 &&
+	git merge commits/4 &&
+	git branch merge/1 &&
+	git reset --hard commits/4 &&
+	git merge commits/6 &&
+	git branch merge/2 &&
+	git reset --hard commits/3 &&
+	git merge commits/5 commits/7 &&
+	git branch merge/3 &&
+	git repack
+'
+
+# Current graph structure:
+#
+#   __M3___
+#  /   |   \
+# 3 M1 5 M2 7
+# |/  \|/  \|
+# 2    4    6
+# |___/____/
+# 1
+
+
+test_expect_success 'write graph with merges' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'Add one more commit' '
+	cd "$TRASH_DIRECTORY/full" &&
+	test_commit 8 &&
+	git branch commits/8 &&
+	ls $objdir/pack | grep idx >existing-idx &&
+	git repack &&
+	ls $objdir/pack| grep idx | grep -v --file=existing-idx >new-idx
+'
+
+# Current graph structure:
+#
+#      8
+#      |
+#   __M3___
+#  /   |   \
+# 3 M1 5 M2 7
+# |/  \|/  \|
+# 2    4    6
+# |___/____/
+# 1
+
+test_expect_success 'write graph with new commit' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'write graph with nothing new' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git commit-graph write &&
+	test_path_is_file $objdir/info/commit-graph
+'
+
+test_expect_success 'setup bare repo' '
+	cd "$TRASH_DIRECTORY" &&
+	git clone --bare --no-local full bare &&
+	cd bare &&
+	baredir="./objects"
+'
+
+test_expect_success 'write graph in bare repo' '
+	cd "$TRASH_DIRECTORY/bare" &&
+	git commit-graph write &&
+	test_path_is_file $baredir/info/commit-graph
+'
+
+test_done
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 08/14] commit-graph: implement git commit-graph read
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
                         ` (6 preceding siblings ...)
  2018-04-10 12:56       ` [PATCH v8 07/14] commit-graph: implement git-commit-graph write Derrick Stolee
@ 2018-04-10 12:56       ` Derrick Stolee
  2018-04-14 22:15         ` Jakub Narebski
  2018-04-10 12:56       ` [PATCH v8 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
                         ` (5 subsequent siblings)
  13 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:56 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to read commit graph files and summarize their contents.

Use the read subcommand to verify the contents of a commit graph file in the
tests.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt |  12 +++
 builtin/commit-graph.c             |  56 ++++++++++++
 commit-graph.c                     | 137 ++++++++++++++++++++++++++++-
 commit-graph.h                     |  23 +++++
 t/t5318-commit-graph.sh            |  32 +++++--
 5 files changed, 254 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 47996e8f89..8aad8303f5 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -9,6 +9,7 @@ git-commit-graph - Write and verify Git commit graph files
 SYNOPSIS
 --------
 [verse]
+'git commit-graph read' [--object-dir <dir>]
 'git commit-graph write' <options> [--object-dir <dir>]
 
 
@@ -35,6 +36,11 @@ COMMANDS
 Write a commit graph file based on the commits found in packfiles.
 Includes all commits from the existing commit graph file.
 
+'read'::
+
+Read a graph file given by the commit-graph file and output basic
+details about the graph file. Used for debugging purposes.
+
 
 EXAMPLES
 --------
@@ -45,6 +51,12 @@ EXAMPLES
 $ git commit-graph write
 ------------------------------------------------
 
+* Read basic information from the commit-graph file.
++
+------------------------------------------------
+$ git commit-graph read
+------------------------------------------------
+
 
 GIT
 ---
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 26b6360289..efd39331d7 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -7,10 +7,16 @@
 
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
+	N_("git commit-graph read [--object-dir <objdir>]"),
 	N_("git commit-graph write [--object-dir <objdir>]"),
 	NULL
 };
 
+static const char * const builtin_commit_graph_read_usage[] = {
+	N_("git commit-graph read [--object-dir <objdir>]"),
+	NULL
+};
+
 static const char * const builtin_commit_graph_write_usage[] = {
 	N_("git commit-graph write [--object-dir <objdir>]"),
 	NULL
@@ -20,6 +26,54 @@ static struct opts_commit_graph {
 	const char *obj_dir;
 } opts;
 
+static int graph_read(int argc, const char **argv)
+{
+	struct commit_graph *graph = NULL;
+	char *graph_name;
+
+	static struct option builtin_commit_graph_read_options[] = {
+		OPT_STRING(0, "object-dir", &opts.obj_dir,
+			N_("dir"),
+			N_("The object directory to store the graph")),
+		OPT_END(),
+	};
+
+	argc = parse_options(argc, argv, NULL,
+			     builtin_commit_graph_read_options,
+			     builtin_commit_graph_read_usage, 0);
+
+	if (!opts.obj_dir)
+		opts.obj_dir = get_object_directory();
+
+	graph_name = get_commit_graph_filename(opts.obj_dir);
+	graph = load_commit_graph_one(graph_name);
+
+	if (!graph)
+		die("graph file %s does not exist", graph_name);
+	FREE_AND_NULL(graph_name);
+
+	printf("header: %08x %d %d %d %d\n",
+		ntohl(*(uint32_t*)graph->data),
+		*(unsigned char*)(graph->data + 4),
+		*(unsigned char*)(graph->data + 5),
+		*(unsigned char*)(graph->data + 6),
+		*(unsigned char*)(graph->data + 7));
+	printf("num_commits: %u\n", graph->num_commits);
+	printf("chunks:");
+
+	if (graph->chunk_oid_fanout)
+		printf(" oid_fanout");
+	if (graph->chunk_oid_lookup)
+		printf(" oid_lookup");
+	if (graph->chunk_commit_data)
+		printf(" commit_metadata");
+	if (graph->chunk_large_edges)
+		printf(" large_edges");
+	printf("\n");
+
+	return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
 	static struct option builtin_commit_graph_write_options[] = {
@@ -60,6 +114,8 @@ int cmd_commit_graph(int argc, const char **argv, const char *prefix)
 			     PARSE_OPT_STOP_AT_NON_OPTION);
 
 	if (argc > 0) {
+		if (!strcmp(argv[0], "read"))
+			return graph_read(argc, argv);
 		if (!strcmp(argv[0], "write"))
 			return graph_write(argc, argv);
 	}
diff --git a/commit-graph.c b/commit-graph.c
index f3f7c4f189..b1bd3a892d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -39,11 +39,146 @@
 			GRAPH_OID_LEN + 8)
 
 
-static char *get_commit_graph_filename(const char *obj_dir)
+char *get_commit_graph_filename(const char *obj_dir)
 {
 	return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static struct commit_graph *alloc_commit_graph(void)
+{
+	struct commit_graph *g = xcalloc(1, sizeof(*g));
+	g->graph_fd = -1;
+
+	return g;
+}
+
+struct commit_graph *load_commit_graph_one(const char *graph_file)
+{
+	void *graph_map;
+	const unsigned char *data, *chunk_lookup;
+	size_t graph_size;
+	struct stat st;
+	uint32_t i;
+	struct commit_graph *graph;
+	int fd = git_open(graph_file);
+	uint64_t last_chunk_offset;
+	uint32_t last_chunk_id;
+	uint32_t graph_signature;
+	unsigned char graph_version, hash_version;
+
+	if (fd < 0)
+		return NULL;
+	if (fstat(fd, &st)) {
+		close(fd);
+		return NULL;
+	}
+	graph_size = xsize_t(st.st_size);
+
+	if (graph_size < GRAPH_MIN_SIZE) {
+		close(fd);
+		die("graph file %s is too small", graph_file);
+	}
+	graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	data = (const unsigned char *)graph_map;
+
+	graph_signature = get_be32(data);
+	if (graph_signature != GRAPH_SIGNATURE) {
+		error("graph signature %X does not match signature %X",
+		      graph_signature, GRAPH_SIGNATURE);
+		goto cleanup_fail;
+	}
+
+	graph_version = *(unsigned char*)(data + 4);
+	if (graph_version != GRAPH_VERSION) {
+		error("graph version %X does not match version %X",
+		      graph_version, GRAPH_VERSION);
+		goto cleanup_fail;
+	}
+
+	hash_version = *(unsigned char*)(data + 5);
+	if (hash_version != GRAPH_OID_VERSION) {
+		error("hash version %X does not match version %X",
+		      hash_version, GRAPH_OID_VERSION);
+		goto cleanup_fail;
+	}
+
+	graph = alloc_commit_graph();
+
+	graph->hash_len = GRAPH_OID_LEN;
+	graph->num_chunks = *(unsigned char*)(data + 6);
+	graph->graph_fd = fd;
+	graph->data = graph_map;
+	graph->data_len = graph_size;
+
+	last_chunk_id = 0;
+	last_chunk_offset = 8;
+	chunk_lookup = data + 8;
+	for (i = 0; i < graph->num_chunks; i++) {
+		uint32_t chunk_id = get_be32(chunk_lookup + 0);
+		uint64_t chunk_offset = get_be64(chunk_lookup + 4);
+		int chunk_repeated = 0;
+
+		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
+
+		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+			error("improper chunk offset %08x%08x", (uint32_t)(chunk_offset >> 32),
+			      (uint32_t)chunk_offset);
+			goto cleanup_fail;
+		}
+
+		switch (chunk_id) {
+		case GRAPH_CHUNKID_OIDFANOUT:
+			if (graph->chunk_oid_fanout)
+				chunk_repeated = 1;
+			else
+				graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);
+			break;
+
+		case GRAPH_CHUNKID_OIDLOOKUP:
+			if (graph->chunk_oid_lookup)
+				chunk_repeated = 1;
+			else
+				graph->chunk_oid_lookup = data + chunk_offset;
+			break;
+
+		case GRAPH_CHUNKID_DATA:
+			if (graph->chunk_commit_data)
+				chunk_repeated = 1;
+			else
+				graph->chunk_commit_data = data + chunk_offset;
+			break;
+
+		case GRAPH_CHUNKID_LARGEEDGES:
+			if (graph->chunk_large_edges)
+				chunk_repeated = 1;
+			else
+				graph->chunk_large_edges = data + chunk_offset;
+			break;
+		}
+
+		if (chunk_repeated) {
+			error("chunk id %08x appears multiple times", chunk_id);
+			goto cleanup_fail;
+		}
+
+		if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
+		{
+			graph->num_commits = (chunk_offset - last_chunk_offset)
+					     / graph->hash_len;
+		}
+
+		last_chunk_id = chunk_id;
+		last_chunk_offset = chunk_offset;
+	}
+
+	return graph;
+
+cleanup_fail:
+	munmap(graph_map, graph_size);
+	close(fd);
+	exit(1);
+}
+
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
 				     int nr_commits)
diff --git a/commit-graph.h b/commit-graph.h
index 16fea993ab..2528478f06 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -1,6 +1,29 @@
 #ifndef COMMIT_GRAPH_H
 #define COMMIT_GRAPH_H
 
+#include "git-compat-util.h"
+
+char *get_commit_graph_filename(const char *obj_dir);
+
+struct commit_graph {
+	int graph_fd;
+
+	const unsigned char *data;
+	size_t data_len;
+
+	unsigned char hash_len;
+	unsigned char num_chunks;
+	uint32_t num_commits;
+	struct object_id oid;
+
+	const uint32_t *chunk_oid_fanout;
+	const unsigned char *chunk_oid_lookup;
+	const unsigned char *chunk_commit_data;
+	const unsigned char *chunk_large_edges;
+};
+
+struct commit_graph *load_commit_graph_one(const char *graph_file);
+
 void write_commit_graph(const char *obj_dir);
 
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index d7b635bd68..2f44f91193 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -26,10 +26,28 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
+graph_read_expect() {
+	OPTIONAL=""
+	NUM_CHUNKS=3
+	if test ! -z $2
+	then
+		OPTIONAL=" $2"
+		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
+	fi
+	cat >expect <<- EOF
+	header: 43475048 1 1 $NUM_CHUNKS 0
+	num_commits: $1
+	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
+	EOF
+	git commit-graph read >output &&
+	test_cmp expect output
+}
+
 test_expect_success 'write graph' '
 	cd "$TRASH_DIRECTORY/full" &&
 	graph1=$(git commit-graph write) &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "3"
 '
 
 test_expect_success 'Add more commits' '
@@ -72,7 +90,8 @@ test_expect_success 'Add more commits' '
 test_expect_success 'write graph with merges' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "10" "large_edges"
 '
 
 test_expect_success 'Add one more commit' '
@@ -99,13 +118,15 @@ test_expect_success 'Add one more commit' '
 test_expect_success 'write graph with new commit' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
 '
 
 test_expect_success 'write graph with nothing new' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
-	test_path_is_file $objdir/info/commit-graph
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
 '
 
 test_expect_success 'setup bare repo' '
@@ -118,7 +139,8 @@ test_expect_success 'setup bare repo' '
 test_expect_success 'write graph in bare repo' '
 	cd "$TRASH_DIRECTORY/bare" &&
 	git commit-graph write &&
-	test_path_is_file $baredir/info/commit-graph
+	test_path_is_file $baredir/info/commit-graph &&
+	graph_read_expect "11" "large_edges"
 '
 
 test_done
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 09/14] commit-graph: add core.commitGraph setting
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
                         ` (7 preceding siblings ...)
  2018-04-10 12:56       ` [PATCH v8 08/14] commit-graph: implement git commit-graph read Derrick Stolee
@ 2018-04-10 12:56       ` Derrick Stolee
  2018-04-14 18:33         ` Jakub Narebski
  2018-04-10 12:56       ` [PATCH v8 10/14] commit-graph: close under reachability Derrick Stolee
                         ` (4 subsequent siblings)
  13 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:56 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The commit graph feature is controlled by the new core.commitGraph config
setting. This defaults to 0, so the feature is opt-in.

The intention of core.commitGraph is that a user can always stop checking
for or parsing commit graph files if core.commitGraph=0.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt | 4 ++++
 cache.h                  | 1 +
 config.c                 | 5 +++++
 environment.c            | 1 +
 4 files changed, 11 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 4e0cff87f6..e5c7013fb0 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -898,6 +898,10 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
+core.commitGraph::
+	Enable git commit graph feature. Allows reading from the
+	commit-graph file.
+
 core.sparseCheckout::
 	Enable "sparse checkout" feature. See section "Sparse checkout" in
 	linkgit:git-read-tree[1] for more information.
diff --git a/cache.h b/cache.h
index a61b2d3f0d..8bdbcbbbf7 100644
--- a/cache.h
+++ b/cache.h
@@ -805,6 +805,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_commit_graph;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index b0c20e6cb8..25ee4a676c 100644
--- a/config.c
+++ b/config.c
@@ -1226,6 +1226,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.commitgraph")) {
+		core_commit_graph = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.sparsecheckout")) {
 		core_apply_sparse_checkout = git_config_bool(var, value);
 		return 0;
diff --git a/environment.c b/environment.c
index d6dd64662c..8853e2f0dd 100644
--- a/environment.c
+++ b/environment.c
@@ -62,6 +62,7 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED;
 enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
+int core_commit_graph;
 int core_apply_sparse_checkout;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 10/14] commit-graph: close under reachability
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
                         ` (8 preceding siblings ...)
  2018-04-10 12:56       ` [PATCH v8 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
@ 2018-04-10 12:56       ` Derrick Stolee
  2018-04-10 12:56       ` [PATCH v8 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
                         ` (3 subsequent siblings)
  13 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:56 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach write_commit_graph() to walk all parents from the commits
discovered in packfiles. This prevents gaps given by loose objects or
previously-missed packfiles.

Also automatically add commits from the existing graph file, if it
exists.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index b1bd3a892d..ea29c5c2d8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -367,6 +367,50 @@ static int add_packed_commits(const struct object_id *oid,
 	return 0;
 }
 
+static void add_missing_parents(struct packed_oid_list *oids, struct commit *commit)
+{
+	struct commit_list *parent;
+	for (parent = commit->parents; parent; parent = parent->next) {
+		if (!(parent->item->object.flags & UNINTERESTING)) {
+			ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc);
+			oidcpy(&oids->list[oids->nr], &(parent->item->object.oid));
+			oids->nr++;
+			parent->item->object.flags |= UNINTERESTING;
+		}
+	}
+}
+
+static void close_reachable(struct packed_oid_list *oids)
+{
+	int i;
+	struct commit *commit;
+
+	for (i = 0; i < oids->nr; i++) {
+		commit = lookup_commit(&oids->list[i]);
+		if (commit)
+			commit->object.flags |= UNINTERESTING;
+	}
+
+	/*
+	 * As this loop runs, oids->nr may grow, but not more
+	 * than the number of missing commits in the reachable
+	 * closure.
+	 */
+	for (i = 0; i < oids->nr; i++) {
+		commit = lookup_commit(&oids->list[i]);
+
+		if (commit && !parse_commit(commit))
+			add_missing_parents(oids, commit);
+	}
+
+	for (i = 0; i < oids->nr; i++) {
+		commit = lookup_commit(&oids->list[i]);
+
+		if (commit)
+			commit->object.flags &= ~UNINTERESTING;
+	}
+}
+
 void write_commit_graph(const char *obj_dir)
 {
 	struct packed_oid_list oids;
@@ -390,6 +434,7 @@ void write_commit_graph(const char *obj_dir)
 	ALLOC_ARRAY(oids.list, oids.alloc);
 
 	for_each_packed_object(add_packed_commits, &oids, 0);
+	close_reachable(&oids);
 
 	QSORT(oids.list, oids.nr, commit_compare);
 
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 11/14] commit: integrate commit graph with commit parsing
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
                         ` (9 preceding siblings ...)
  2018-04-10 12:56       ` [PATCH v8 10/14] commit-graph: close under reachability Derrick Stolee
@ 2018-04-10 12:56       ` Derrick Stolee
  2018-04-10 12:56       ` [PATCH v8 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
                         ` (2 subsequent siblings)
  13 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:56 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach Git to inspect a commit graph file to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date.

If core.commitGraph is false, then do not check graph files.

In test script t5318-commit-graph.sh, add output-matching conditions on
read-only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commit walks. Here are some performance
results for a copy of the Linux repository where 'master' has 678,653
reachable commits and is behind 'origin/master' by 59,929 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  8.31s |  0.94s | -88%  |
| branch -vv                       |  1.02s |  0.14s | -86%  |
| rev-list --all                   |  5.89s |  1.07s | -81%  |
| rev-list --all --objects         | 66.15s | 58.45s | -11%  |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 alloc.c                 |   1 +
 commit-graph.c          | 141 +++++++++++++++++++++++++++++++++++++++-
 commit-graph.h          |  12 ++++
 commit.c                |   3 +
 commit.h                |   3 +
 t/t5318-commit-graph.sh |  47 +++++++++++++-
 6 files changed, 205 insertions(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacd..cf4f8b61e1 100644
--- a/alloc.c
+++ b/alloc.c
@@ -93,6 +93,7 @@ void *alloc_commit_node(void)
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
 	c->index = alloc_commit_index();
+	c->graph_pos = COMMIT_NOT_FROM_GRAPH;
 	return c;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index ea29c5c2d8..f745186e7f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -38,7 +38,6 @@
 #define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \
 			GRAPH_OID_LEN + 8)
 
-
 char *get_commit_graph_filename(const char *obj_dir)
 {
 	return xstrfmt("%s/info/commit-graph", obj_dir);
@@ -179,6 +178,145 @@ struct commit_graph *load_commit_graph_one(const char *graph_file)
 	exit(1);
 }
 
+/* global storage */
+static struct commit_graph *commit_graph = NULL;
+
+static void prepare_commit_graph_one(const char *obj_dir)
+{
+	char *graph_name;
+
+	if (commit_graph)
+		return;
+
+	graph_name = get_commit_graph_filename(obj_dir);
+	commit_graph = load_commit_graph_one(graph_name);
+
+	FREE_AND_NULL(graph_name);
+}
+
+static int prepare_commit_graph_run_once = 0;
+static void prepare_commit_graph(void)
+{
+	struct alternate_object_database *alt;
+	char *obj_dir;
+
+	if (prepare_commit_graph_run_once)
+		return;
+	prepare_commit_graph_run_once = 1;
+
+	obj_dir = get_object_directory();
+	prepare_commit_graph_one(obj_dir);
+	prepare_alt_odb();
+	for (alt = alt_odb_list; !commit_graph && alt; alt = alt->next)
+		prepare_commit_graph_one(alt->path);
+}
+
+static void close_commit_graph(void)
+{
+	if (!commit_graph)
+		return;
+
+	if (commit_graph->graph_fd >= 0) {
+		munmap((void *)commit_graph->data, commit_graph->data_len);
+		commit_graph->data = NULL;
+		close(commit_graph->graph_fd);
+	}
+
+	FREE_AND_NULL(commit_graph);
+}
+
+static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos)
+{
+	return bsearch_hash(oid->hash, g->chunk_oid_fanout,
+			    g->chunk_oid_lookup, g->hash_len, pos);
+}
+
+static struct commit_list **insert_parent_or_die(struct commit_graph *g,
+						 uint64_t pos,
+						 struct commit_list **pptr)
+{
+	struct commit *c;
+	struct object_id oid;
+	hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos);
+	c = lookup_commit(&oid);
+	if (!c)
+		die("could not find commit %s", oid_to_hex(&oid));
+	c->graph_pos = pos;
+	return &commit_list_insert(c, pptr)->next;
+}
+
+static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos)
+{
+	struct object_id oid;
+	uint32_t edge_value;
+	uint32_t *parent_data_ptr;
+	uint64_t date_low, date_high;
+	struct commit_list **pptr;
+	const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos;
+
+	item->object.parsed = 1;
+	item->graph_pos = pos;
+
+	hashcpy(oid.hash, commit_data);
+	item->tree = lookup_tree(&oid);
+
+	date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
+	date_low = get_be32(commit_data + g->hash_len + 12);
+	item->date = (timestamp_t)((date_high << 32) | date_low);
+
+	pptr = &item->parents;
+
+	edge_value = get_be32(commit_data + g->hash_len);
+	if (edge_value == GRAPH_PARENT_NONE)
+		return 1;
+	pptr = insert_parent_or_die(g, edge_value, pptr);
+
+	edge_value = get_be32(commit_data + g->hash_len + 4);
+	if (edge_value == GRAPH_PARENT_NONE)
+		return 1;
+	if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) {
+		pptr = insert_parent_or_die(g, edge_value, pptr);
+		return 1;
+	}
+
+	parent_data_ptr = (uint32_t*)(g->chunk_large_edges +
+			  4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK));
+	do {
+		edge_value = get_be32(parent_data_ptr);
+		pptr = insert_parent_or_die(g,
+					    edge_value & GRAPH_EDGE_LAST_MASK,
+					    pptr);
+		parent_data_ptr++;
+	} while (!(edge_value & GRAPH_LAST_EDGE));
+
+	return 1;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
+	if (!core_commit_graph)
+		return 0;
+	if (item->object.parsed)
+		return 1;
+
+	prepare_commit_graph();
+	if (commit_graph) {
+		uint32_t pos;
+		int found;
+		if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
+			pos = item->graph_pos;
+			found = 1;
+		} else {
+			found = bsearch_graph(commit_graph, &(item->object.oid), &pos);
+		}
+
+		if (found)
+			return fill_commit_in_graph(item, commit_graph, pos);
+	}
+
+	return 0;
+}
+
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
 				     int nr_commits)
@@ -530,6 +668,7 @@ void write_commit_graph(const char *obj_dir)
 	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
 	write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
+	close_commit_graph();
 	finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
 	commit_lock_file(&lk);
 
diff --git a/commit-graph.h b/commit-graph.h
index 2528478f06..73b28beed1 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -5,6 +5,18 @@
 
 char *get_commit_graph_filename(const char *obj_dir);
 
+/*
+ * Given a commit struct, try to fill the commit struct info, including:
+ *  1. tree object
+ *  2. date
+ *  3. parents.
+ *
+ * Returns 1 if and only if the commit was found in the packed graph.
+ *
+ * See parse_commit_buffer() for the fallback after this call.
+ */
+int parse_commit_in_graph(struct commit *item);
+
 struct commit_graph {
 	int graph_fd;
 
diff --git a/commit.c b/commit.c
index 00c99c7272..3e39c86abf 100644
--- a/commit.c
+++ b/commit.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "tag.h"
 #include "commit.h"
+#include "commit-graph.h"
 #include "pkt-line.h"
 #include "utf8.h"
 #include "diff.h"
@@ -383,6 +384,8 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 		return -1;
 	if (item->object.parsed)
 		return 0;
+	if (parse_commit_in_graph(item))
+		return 0;
 	buffer = read_sha1_file(item->object.oid.hash, &type, &size);
 	if (!buffer)
 		return quiet_on_missing ? -1 :
diff --git a/commit.h b/commit.h
index 0fb8271665..e57ae4b583 100644
--- a/commit.h
+++ b/commit.h
@@ -9,6 +9,8 @@
 #include "string-list.h"
 #include "pretty.h"
 
+#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
+
 struct commit_list {
 	struct commit *item;
 	struct commit_list *next;
@@ -21,6 +23,7 @@ struct commit {
 	timestamp_t date;
 	struct commit_list *parents;
 	struct tree *tree;
+	uint32_t graph_pos;
 };
 
 extern int save_commit_buffer;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 2f44f91193..51de9cc455 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -7,6 +7,7 @@ test_expect_success 'setup full repo' '
 	mkdir full &&
 	cd "$TRASH_DIRECTORY/full" &&
 	git init &&
+	git config core.commitGraph true &&
 	objdir=".git/objects"
 '
 
@@ -26,6 +27,29 @@ test_expect_success 'create commits and repack' '
 	git repack
 '
 
+graph_git_two_modes() {
+	git -c core.graph=true $1 >output
+	git -c core.graph=false $1 >expect
+	test_cmp output expect
+}
+
+graph_git_behavior() {
+	MSG=$1
+	DIR=$2
+	BRANCH=$3
+	COMPARE=$4
+	test_expect_success "check normal git operations: $MSG" '
+		cd "$TRASH_DIRECTORY/$DIR" &&
+		graph_git_two_modes "log --oneline $BRANCH" &&
+		graph_git_two_modes "log --topo-order $BRANCH" &&
+		graph_git_two_modes "log --graph $COMPARE..$BRANCH" &&
+		graph_git_two_modes "branch -vv" &&
+		graph_git_two_modes "merge-base -a $BRANCH $COMPARE"
+	'
+}
+
+graph_git_behavior 'no graph' full commits/3 commits/1
+
 graph_read_expect() {
 	OPTIONAL=""
 	NUM_CHUNKS=3
@@ -50,6 +74,8 @@ test_expect_success 'write graph' '
 	graph_read_expect "3"
 '
 
+graph_git_behavior 'graph exists' full commits/3 commits/1
+
 test_expect_success 'Add more commits' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git reset --hard commits/1 &&
@@ -86,7 +112,6 @@ test_expect_success 'Add more commits' '
 # |___/____/
 # 1
 
-
 test_expect_success 'write graph with merges' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
@@ -94,6 +119,10 @@ test_expect_success 'write graph with merges' '
 	graph_read_expect "10" "large_edges"
 '
 
+graph_git_behavior 'merge 1 vs 2' full merge/1 merge/2
+graph_git_behavior 'merge 1 vs 3' full merge/1 merge/3
+graph_git_behavior 'merge 2 vs 3' full merge/2 merge/3
+
 test_expect_success 'Add one more commit' '
 	cd "$TRASH_DIRECTORY/full" &&
 	test_commit 8 &&
@@ -115,6 +144,9 @@ test_expect_success 'Add one more commit' '
 # |___/____/
 # 1
 
+graph_git_behavior 'mixed mode, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'mixed mode, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'write graph with new commit' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
@@ -122,6 +154,9 @@ test_expect_success 'write graph with new commit' '
 	graph_read_expect "11" "large_edges"
 '
 
+graph_git_behavior 'full graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'full graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'write graph with nothing new' '
 	cd "$TRASH_DIRECTORY/full" &&
 	git commit-graph write &&
@@ -129,13 +164,20 @@ test_expect_success 'write graph with nothing new' '
 	graph_read_expect "11" "large_edges"
 '
 
+graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
 	cd bare &&
+	git config core.commitGraph true &&
 	baredir="./objects"
 '
 
+graph_git_behavior 'bare repo, commit 8 vs merge 1' bare commits/8 merge/1
+graph_git_behavior 'bare repo, commit 8 vs merge 2' bare commits/8 merge/2
+
 test_expect_success 'write graph in bare repo' '
 	cd "$TRASH_DIRECTORY/bare" &&
 	git commit-graph write &&
@@ -143,4 +185,7 @@ test_expect_success 'write graph in bare repo' '
 	graph_read_expect "11" "large_edges"
 '
 
+graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 merge/1
+graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare commits/8 merge/2
+
 test_done
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 12/14] commit-graph: read only from specific pack-indexes
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
                         ` (10 preceding siblings ...)
  2018-04-10 12:56       ` [PATCH v8 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
@ 2018-04-10 12:56       ` Derrick Stolee
  2018-04-10 12:56       ` [PATCH v8 13/14] commit-graph: build graph from starting commits Derrick Stolee
  2018-04-10 12:56       ` [PATCH v8 14/14] commit-graph: implement "--append" option Derrick Stolee
  13 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:56 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to inspect the objects only in a certain list
of pack-indexes within the given pack directory. This allows updating
the commit graph iteratively.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt | 11 +++++++++-
 builtin/commit-graph.c             | 33 +++++++++++++++++++++++++++---
 commit-graph.c                     | 26 +++++++++++++++++++++--
 commit-graph.h                     |  4 +++-
 packfile.c                         |  4 ++--
 packfile.h                         |  2 ++
 t/t5318-commit-graph.sh            | 10 +++++++++
 7 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 8aad8303f5..8143cc3f07 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -34,7 +34,9 @@ COMMANDS
 'write'::
 
 Write a commit graph file based on the commits found in packfiles.
-Includes all commits from the existing commit graph file.
++
+With the `--stdin-packs` option, generate the new commit graph by
+walking objects only in the specified pack-indexes.
 
 'read'::
 
@@ -51,6 +53,13 @@ EXAMPLES
 $ git commit-graph write
 ------------------------------------------------
 
+* Write a graph file, extending the current graph file using commits
+* in <pack-index>.
++
+------------------------------------------------
+$ echo <pack-index> | git commit-graph write --stdin-packs
+------------------------------------------------
+
 * Read basic information from the commit-graph file.
 +
 ------------------------------------------------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index efd39331d7..5c70199003 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
 	NULL
 };
 
@@ -18,12 +18,13 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
 	NULL
 };
 
 static struct opts_commit_graph {
 	const char *obj_dir;
+	int stdin_packs;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -76,10 +77,18 @@ static int graph_read(int argc, const char **argv)
 
 static int graph_write(int argc, const char **argv)
 {
+	const char **pack_indexes = NULL;
+	int packs_nr = 0;
+	const char **lines = NULL;
+	int lines_nr = 0;
+	int lines_alloc = 0;
+
 	static struct option builtin_commit_graph_write_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
 			N_("dir"),
 			N_("The object directory to store the graph")),
+		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
+			N_("scan pack-indexes listed by stdin for commits")),
 		OPT_END(),
 	};
 
@@ -90,7 +99,25 @@ static int graph_write(int argc, const char **argv)
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
-	write_commit_graph(opts.obj_dir);
+	if (opts.stdin_packs) {
+		struct strbuf buf = STRBUF_INIT;
+		lines_nr = 0;
+		lines_alloc = 128;
+		ALLOC_ARRAY(lines, lines_alloc);
+
+		while (strbuf_getline(&buf, stdin) != EOF) {
+			ALLOC_GROW(lines, lines_nr + 1, lines_alloc);
+			lines[lines_nr++] = strbuf_detach(&buf, NULL);
+		}
+
+		pack_indexes = lines;
+		packs_nr = lines_nr;
+	}
+
+	write_commit_graph(opts.obj_dir,
+			   pack_indexes,
+			   packs_nr);
+
 	return 0;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index f745186e7f..70472840a3 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -549,7 +549,9 @@ static void close_reachable(struct packed_oid_list *oids)
 	}
 }
 
-void write_commit_graph(const char *obj_dir)
+void write_commit_graph(const char *obj_dir,
+			const char **pack_indexes,
+			int nr_packs)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -571,7 +573,27 @@ void write_commit_graph(const char *obj_dir)
 		oids.alloc = 1024;
 	ALLOC_ARRAY(oids.list, oids.alloc);
 
-	for_each_packed_object(add_packed_commits, &oids, 0);
+	if (pack_indexes) {
+		struct strbuf packname = STRBUF_INIT;
+		int dirlen;
+		strbuf_addf(&packname, "%s/pack/", obj_dir);
+		dirlen = packname.len;
+		for (i = 0; i < nr_packs; i++) {
+			struct packed_git *p;
+			strbuf_setlen(&packname, dirlen);
+			strbuf_addstr(&packname, pack_indexes[i]);
+			p = add_packed_git(packname.buf, packname.len, 1);
+			if (!p)
+				die("error adding pack %s", packname.buf);
+			if (open_pack_index(p))
+				die("error opening index for %s", packname.buf);
+			for_each_object_in_pack(p, add_packed_commits, &oids);
+			close_pack(p);
+		}
+		strbuf_release(&packname);
+	} else
+		for_each_packed_object(add_packed_commits, &oids, 0);
+
 	close_reachable(&oids);
 
 	QSORT(oids.list, oids.nr, commit_compare);
diff --git a/commit-graph.h b/commit-graph.h
index 73b28beed1..f065f0866f 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -36,6 +36,8 @@ struct commit_graph {
 
 struct commit_graph *load_commit_graph_one(const char *graph_file);
 
-void write_commit_graph(const char *obj_dir);
+void write_commit_graph(const char *obj_dir,
+			const char **pack_indexes,
+			int nr_packs);
 
 #endif
diff --git a/packfile.c b/packfile.c
index 7c1a2519fc..b1d33b646a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -304,7 +304,7 @@ void close_pack_index(struct packed_git *p)
 	}
 }
 
-static void close_pack(struct packed_git *p)
+void close_pack(struct packed_git *p)
 {
 	close_pack_windows(p);
 	close_pack_fd(p);
@@ -1850,7 +1850,7 @@ int has_pack_index(const unsigned char *sha1)
 	return 1;
 }
 
-static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
+int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
 {
 	uint32_t i;
 	int r = 0;
diff --git a/packfile.h b/packfile.h
index a7fca598d6..b341f2bf5e 100644
--- a/packfile.h
+++ b/packfile.h
@@ -63,6 +63,7 @@ extern void close_pack_index(struct packed_git *);
 
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
 extern void close_pack_windows(struct packed_git *);
+extern void close_pack(struct packed_git *);
 extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
@@ -140,6 +141,7 @@ typedef int each_packed_object_fn(const struct object_id *oid,
 				  struct packed_git *pack,
 				  uint32_t pos,
 				  void *data);
+extern int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn, void *data);
 extern int for_each_packed_object(each_packed_object_fn, void *, unsigned flags);
 
 /*
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 51de9cc455..3bb44d0c09 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -167,6 +167,16 @@ test_expect_success 'write graph with nothing new' '
 graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph from latest pack with closure' '
+	cd "$TRASH_DIRECTORY/full" &&
+	cat new-idx | git commit-graph write --stdin-packs &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "9" "large_edges"
+'
+
+graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 13/14] commit-graph: build graph from starting commits
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
                         ` (11 preceding siblings ...)
  2018-04-10 12:56       ` [PATCH v8 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
@ 2018-04-10 12:56       ` Derrick Stolee
  2018-04-10 12:56       ` [PATCH v8 14/14] commit-graph: implement "--append" option Derrick Stolee
  13 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:56 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to read commits from stdin when the
--stdin-commits flag is specified. Commits reachable from these
commits are added to the graph. This is a much faster way to construct
the graph than inspecting all packed objects, but is restricted to
known tips.

For the Linux repository, 700,000+ commits were added to the graph
file starting from 'master' in 7-9 seconds, depending on the number
of packfiles in the repo (1, 24, or 120).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt | 14 +++++++++++++-
 builtin/commit-graph.c             | 27 +++++++++++++++++++++------
 commit-graph.c                     | 27 +++++++++++++++++++++++++--
 commit-graph.h                     |  4 +++-
 t/t5318-commit-graph.sh            | 13 +++++++++++++
 5 files changed, 75 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 8143cc3f07..442ac243e6 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -36,7 +36,13 @@ COMMANDS
 Write a commit graph file based on the commits found in packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
-walking objects only in the specified pack-indexes.
+walking objects only in the specified pack-indexes. (Cannot be combined
+with --stdin-commits.)
++
+With the `--stdin-commits` option, generate the new commit graph by
+walking commits starting at the commits specified in stdin as a list
+of OIDs in hex, one OID per line. (Cannot be combined with
+--stdin-packs.)
 
 'read'::
 
@@ -60,6 +66,12 @@ $ git commit-graph write
 $ echo <pack-index> | git commit-graph write --stdin-packs
 ------------------------------------------------
 
+* Write a graph file containing all reachable commits.
++
+------------------------------------------------
+$ git show-ref -s | git commit-graph write --stdin-commits
+------------------------------------------------
+
 * Read basic information from the commit-graph file.
 +
 ------------------------------------------------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 5c70199003..b5c0b08905 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -18,13 +18,14 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
 static struct opts_commit_graph {
 	const char *obj_dir;
 	int stdin_packs;
+	int stdin_commits;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -79,6 +80,8 @@ static int graph_write(int argc, const char **argv)
 {
 	const char **pack_indexes = NULL;
 	int packs_nr = 0;
+	const char **commit_hex = NULL;
+	int commits_nr = 0;
 	const char **lines = NULL;
 	int lines_nr = 0;
 	int lines_alloc = 0;
@@ -89,6 +92,8 @@ static int graph_write(int argc, const char **argv)
 			N_("The object directory to store the graph")),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
 			N_("scan pack-indexes listed by stdin for commits")),
+		OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
+			N_("start walk at commits listed by stdin")),
 		OPT_END(),
 	};
 
@@ -96,10 +101,12 @@ static int graph_write(int argc, const char **argv)
 			     builtin_commit_graph_write_options,
 			     builtin_commit_graph_write_usage, 0);
 
+	if (opts.stdin_packs && opts.stdin_commits)
+		die(_("cannot use both --stdin-commits and --stdin-packs"));
 	if (!opts.obj_dir)
 		opts.obj_dir = get_object_directory();
 
-	if (opts.stdin_packs) {
+	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
 		lines_nr = 0;
 		lines_alloc = 128;
@@ -110,13 +117,21 @@ static int graph_write(int argc, const char **argv)
 			lines[lines_nr++] = strbuf_detach(&buf, NULL);
 		}
 
-		pack_indexes = lines;
-		packs_nr = lines_nr;
+		if (opts.stdin_packs) {
+			pack_indexes = lines;
+			packs_nr = lines_nr;
+		}
+		if (opts.stdin_commits) {
+			commit_hex = lines;
+			commits_nr = lines_nr;
+		}
 	}
 
 	write_commit_graph(opts.obj_dir,
 			   pack_indexes,
-			   packs_nr);
+			   packs_nr,
+			   commit_hex,
+			   commits_nr);
 
 	return 0;
 }
diff --git a/commit-graph.c b/commit-graph.c
index 70472840a3..a59d1e387b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -551,7 +551,9 @@ static void close_reachable(struct packed_oid_list *oids)
 
 void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
-			int nr_packs)
+			int nr_packs,
+			const char **commit_hex,
+			int nr_commits)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -591,7 +593,28 @@ void write_commit_graph(const char *obj_dir,
 			close_pack(p);
 		}
 		strbuf_release(&packname);
-	} else
+	}
+
+	if (commit_hex) {
+		for (i = 0; i < nr_commits; i++) {
+			const char *end;
+			struct object_id oid;
+			struct commit *result;
+
+			if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end))
+				continue;
+
+			result = lookup_commit_reference_gently(&oid, 1);
+
+			if (result) {
+				ALLOC_GROW(oids.list, oids.nr + 1, oids.alloc);
+				oidcpy(&oids.list[oids.nr], &(result->object.oid));
+				oids.nr++;
+			}
+		}
+	}
+
+	if (!pack_indexes && !commit_hex)
 		for_each_packed_object(add_packed_commits, &oids, 0);
 
 	close_reachable(&oids);
diff --git a/commit-graph.h b/commit-graph.h
index f065f0866f..fd035101b2 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -38,6 +38,8 @@ struct commit_graph *load_commit_graph_one(const char *graph_file);
 
 void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
-			int nr_packs);
+			int nr_packs,
+			const char **commit_hex,
+			int nr_commits);
 
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 3bb44d0c09..c28cfb5d7f 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -177,6 +177,19 @@ test_expect_success 'build graph from latest pack with closure' '
 graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph from commits with closure' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git tag -a -m "merge" tag/merge merge/2 &&
+	git rev-parse tag/merge >commits-in &&
+	git rev-parse merge/1 >>commits-in &&
+	cat commits-in | git commit-graph write --stdin-commits &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "6"
+'
+
+graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* [PATCH v8 14/14] commit-graph: implement "--append" option
  2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
                         ` (12 preceding siblings ...)
  2018-04-10 12:56       ` [PATCH v8 13/14] commit-graph: build graph from starting commits Derrick Stolee
@ 2018-04-10 12:56       ` Derrick Stolee
  13 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 12:56 UTC (permalink / raw)
  To: git; +Cc: gitster, ramsay, sbeller, szeder.dev, git, peff, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Teach git-commit-graph to add all commits from the existing
commit-graph file to the file about to be written. This should be
used when adding new commits without performing garbage collection.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-commit-graph.txt | 10 ++++++++++
 builtin/commit-graph.c             | 10 +++++++---
 commit-graph.c                     | 17 ++++++++++++++++-
 commit-graph.h                     |  3 ++-
 t/t5318-commit-graph.sh            | 10 ++++++++++
 5 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
index 442ac243e6..4c97b555cc 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -43,6 +43,9 @@ With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
 --stdin-packs.)
++
+With the `--append` option, include all commits that are present in the
+existing commit-graph file.
 
 'read'::
 
@@ -72,6 +75,13 @@ $ echo <pack-index> | git commit-graph write --stdin-packs
 $ git show-ref -s | git commit-graph write --stdin-commits
 ------------------------------------------------
 
+* Write a graph file containing all commits in the current
+* commit-graph file along with those reachable from HEAD.
++
+------------------------------------------------
+$ git rev-parse HEAD | git commit-graph write --stdin-commits --append
+------------------------------------------------
+
 * Read basic information from the commit-graph file.
 +
 ------------------------------------------------
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index b5c0b08905..37420ae0fd 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -8,7 +8,7 @@
 static char const * const builtin_commit_graph_usage[] = {
 	N_("git commit-graph [--object-dir <objdir>]"),
 	N_("git commit-graph read [--object-dir <objdir>]"),
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -18,7 +18,7 @@ static const char * const builtin_commit_graph_read_usage[] = {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-	N_("git commit-graph write [--object-dir <objdir>] [--stdin-packs|--stdin-commits]"),
+	N_("git commit-graph write [--object-dir <objdir>] [--append] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -26,6 +26,7 @@ static struct opts_commit_graph {
 	const char *obj_dir;
 	int stdin_packs;
 	int stdin_commits;
+	int append;
 } opts;
 
 static int graph_read(int argc, const char **argv)
@@ -94,6 +95,8 @@ static int graph_write(int argc, const char **argv)
 			N_("scan pack-indexes listed by stdin for commits")),
 		OPT_BOOL(0, "stdin-commits", &opts.stdin_commits,
 			N_("start walk at commits listed by stdin")),
+		OPT_BOOL(0, "append", &opts.append,
+			N_("include all commits already in the commit-graph file")),
 		OPT_END(),
 	};
 
@@ -131,7 +134,8 @@ static int graph_write(int argc, const char **argv)
 			   pack_indexes,
 			   packs_nr,
 			   commit_hex,
-			   commits_nr);
+			   commits_nr,
+			   opts.append);
 
 	return 0;
 }
diff --git a/commit-graph.c b/commit-graph.c
index a59d1e387b..3ff8c84c0e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -553,7 +553,8 @@ void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
 			int nr_packs,
 			const char **commit_hex,
-			int nr_commits)
+			int nr_commits,
+			int append)
 {
 	struct packed_oid_list oids;
 	struct packed_commit_list commits;
@@ -571,10 +572,24 @@ void write_commit_graph(const char *obj_dir,
 	oids.nr = 0;
 	oids.alloc = approximate_object_count() / 4;
 
+	if (append) {
+		prepare_commit_graph_one(obj_dir);
+		if (commit_graph)
+			oids.alloc += commit_graph->num_commits;
+	}
+
 	if (oids.alloc < 1024)
 		oids.alloc = 1024;
 	ALLOC_ARRAY(oids.list, oids.alloc);
 
+	if (append && commit_graph) {
+		for (i = 0; i < commit_graph->num_commits; i++) {
+			const unsigned char *hash = commit_graph->chunk_oid_lookup +
+				commit_graph->hash_len * i;
+			hashcpy(oids.list[oids.nr++].hash, hash);
+		}
+	}
+
 	if (pack_indexes) {
 		struct strbuf packname = STRBUF_INIT;
 		int dirlen;
diff --git a/commit-graph.h b/commit-graph.h
index fd035101b2..e1d8580c98 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -40,6 +40,7 @@ void write_commit_graph(const char *obj_dir,
 			const char **pack_indexes,
 			int nr_packs,
 			const char **commit_hex,
-			int nr_commits);
+			int nr_commits,
+			int append);
 
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index c28cfb5d7f..a380419b65 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -190,6 +190,16 @@ test_expect_success 'build graph from commits with closure' '
 graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commits/8 merge/1
 graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commits/8 merge/2
 
+test_expect_success 'build graph from commits with append' '
+	cd "$TRASH_DIRECTORY/full" &&
+	git rev-parse merge/3 | git commit-graph write --stdin-commits --append &&
+	test_path_is_file $objdir/info/commit-graph &&
+	graph_read_expect "10" "large_edges"
+'
+
+graph_git_behavior 'append graph, commit 8 vs merge 1' full commits/8 merge/1
+graph_git_behavior 'append graph, commit 8 vs merge 2' full commits/8 merge/2
+
 test_expect_success 'setup bare repo' '
 	cd "$TRASH_DIRECTORY" &&
 	git clone --bare --no-local full bare &&
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v8 03/14] commit-graph: add format document
  2018-04-10 12:55       ` [PATCH v8 03/14] commit-graph: add format document Derrick Stolee
@ 2018-04-10 19:10         ` Stefan Beller
  2018-04-10 19:18           ` Derrick Stolee
  2018-04-11 20:58         ` Jakub Narebski
  1 sibling, 1 reply; 110+ messages in thread
From: Stefan Beller @ 2018-04-10 19:10 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Ramsay Jones, SZEDER Gábor,
	Jeff Hostetler, Jeff King, Derrick Stolee

Hi Derrick,

On Tue, Apr 10, 2018 at 5:55 AM, Derrick Stolee <stolee@gmail.com> wrote:

> +  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
> +      The ith entry, F[i], stores the number of OIDs with first
> +      byte at most i. Thus F[255] stores the total
> +      number of commits (N).

I was about to give this series one last read not expecting any questions
to come up (this series has had a lot of feedback already!)
Although I just did.

What were your design considerations for the fanout table?
Did you include it as the pack index has one or did you come up with
them from first principles?
Have you measured the performance impact of the fanout table
(maybe even depending on the size of the fanout) ?

context:
https://public-inbox.org/git/CAJo=hJsto1ik=GTC8c3+2_jBuUqcAPL0UWp-1uoYYMpgbLB+qg@mail.gmail.com/
(side note: searching the web for fanout makes it seem
as if it is git-lingo, apparently the term is not widely used)

I don't think we want to restart the design discussion,
I am just curious.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v8 03/14] commit-graph: add format document
  2018-04-10 19:10         ` Stefan Beller
@ 2018-04-10 19:18           ` Derrick Stolee
  0 siblings, 0 replies; 110+ messages in thread
From: Derrick Stolee @ 2018-04-10 19:18 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Junio C Hamano, Ramsay Jones, SZEDER Gábor,
	Jeff Hostetler, Jeff King, Derrick Stolee

On 4/10/2018 3:10 PM, Stefan Beller wrote:
> Hi Derrick,
>
> On Tue, Apr 10, 2018 at 5:55 AM, Derrick Stolee <stolee@gmail.com> wrote:
>
>> +  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
>> +      The ith entry, F[i], stores the number of OIDs with first
>> +      byte at most i. Thus F[255] stores the total
>> +      number of commits (N).
> I was about to give this series one last read not expecting any questions
> to come up (this series has had a lot of feedback already!)
> Although I just did.
>
> What were your design considerations for the fanout table?
> Did you include it as the pack index has one or did you come up with
> them from first principles?
> Have you measured the performance impact of the fanout table
> (maybe even depending on the size of the fanout) ?
>
> context:
> https://public-inbox.org/git/CAJo=hJsto1ik=GTC8c3+2_jBuUqcAPL0UWp-1uoYYMpgbLB+qg@mail.gmail.com/
> (side note: searching the web for fanout makes it seem
> as if it is git-lingo, apparently the term is not widely used)
>
> I don't think we want to restart the design discussion,
> I am just curious.

I knew that I wanted some amount of a fanout table, and the 256-entry 
one was used for IDX files (and in my MIDX RFC). With the recent 
addition of "packfile: refactor hash search with fanout table" [1] it is 
probably best to keep the 256-entry table to reduce code clones.

As for speed, we have the notion of 'graph_pos' which gives random 
access into the commit-graph after a commit is loaded as a parent of a 
commit from the commit-graph file. Thus, we are spending time in the 
binary search only for commits that do not exist in the commit-graph 
file and those that are first found in the file. Thus, running profilers 
on long commit-graph walks do not show any measurable time spent in 
'bsearch_graph()'.

Thanks,
-Stolee

[1] 
https://github.com/gitster/git/commit/b4e00f7306a160639f047b3421985e8f3d0c6fb1

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v8 03/14] commit-graph: add format document
  2018-04-10 12:55       ` [PATCH v8 03/14] commit-graph: add format document Derrick Stolee
  2018-04-10 19:10         ` Stefan Beller
@ 2018-04-11 20:58         ` Jakub Narebski
  2018-04-12 11:28           ` Derrick Stolee
  1 sibling, 1 reply; 110+ messages in thread
From: Jakub Narebski @ 2018-04-11 20:58 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, ramsay, sbeller, szeder.dev, git, peff,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> +CHUNK DATA:
> +
> +  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
> +      The ith entry, F[i], stores the number of OIDs with first
> +      byte at most i. Thus F[255] stores the total
> +      number of commits (N).
> +
> +  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
> +      The OIDs for all commits in the graph, sorted in ascending order.
> +
> +  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)

I think it is a typo, and it should be CDAT, not CGET
(CDAT seem to me to stand for Commit DATa):

  +  Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes)

This is what you use in actual implementation, in PATCH v8 06/14

DS> +#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
DS> +#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
DS> +#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
DS> +#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
DS> +#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v8 03/14] commit-graph: add format document
  2018-04-11 20:58         ` Jakub Narebski
@ 2018-04-12 11:28           ` Derrick Stolee
  2018-04-13 22:07             ` Jakub Narebski
  0 siblings, 1 reply; 110+ messages in thread
From: Derrick Stolee @ 2018-04-12 11:28 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: git, gitster, ramsay, sbeller, szeder.dev, git, peff,
	Derrick Stolee

On 4/11/2018 4:58 PM, Jakub Narebski wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> +CHUNK DATA:
>> +
>> +  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
>> +      The ith entry, F[i], stores the number of OIDs with first
>> +      byte at most i. Thus F[255] stores the total
>> +      number of commits (N).
>> +
>> +  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
>> +      The OIDs for all commits in the graph, sorted in ascending order.
>> +
>> +  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
> I think it is a typo, and it should be CDAT, not CGET
> (CDAT seem to me to stand for Commit DATa):
>
>    +  Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes)
>
> This is what you use in actual implementation, in PATCH v8 06/14
>
> DS> +#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
> DS> +#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
> DS> +#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
> DS> +#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
> DS> +#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
>

Documentation bugs are hard to diagnose. Thanks for finding this. I 
double checked that the hex int "0x43444154" matches "CDAT".

Here is a diff to make it match.

diff --git a/Documentation/technical/commit-graph-format.txt 
b/Documentation/technical/commit-graph-format.txt
index ad6af8105c..af03501834 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -70,7 +70,7 @@ CHUNK DATA:
    OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
        The OIDs for all commits in the graph, sorted in ascending order.

-  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+  Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes)
      * The first H bytes are for the OID of the root tree.
      * The next 8 bytes are for the positions of the first two parents
        of the ith commit. Stores value 0xffffffff if no parent in that


^ permalink raw reply related	[flat|nested] 110+ messages in thread

* Re: [PATCH v8 03/14] commit-graph: add format document
  2018-04-12 11:28           ` Derrick Stolee
@ 2018-04-13 22:07             ` Jakub Narebski
  0 siblings, 0 replies; 110+ messages in thread
From: Jakub Narebski @ 2018-04-13 22:07 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, ramsay, sbeller, szeder.dev, git, peff,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:
> On 4/11/2018 4:58 PM, Jakub Narebski wrote:
>> Derrick Stolee <stolee@gmail.com> writes:
>>
>>> +CHUNK DATA:
>>> +
>>> +  OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
>>> +      The ith entry, F[i], stores the number of OIDs with first
>>> +      byte at most i. Thus F[255] stores the total
>>> +      number of commits (N).
>>> +
>>> +  OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
>>> +      The OIDs for all commits in the graph, sorted in ascending order.
>>> +
>>> +  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
>> I think it is a typo, and it should be CDAT, not CGET
>> (CDAT seem to me to stand for Commit DATa):
>>
>>    +  Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes)
>>
>> This is what you use in actual implementation, in PATCH v8 06/14
>>
>> DS> +#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
>> DS> +#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
>> DS> +#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
>> DS> +#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
>> DS> +#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
>>
>
> Documentation bugs are hard to diagnose. Thanks for finding this. I
> double checked that the hex int "0x43444154" matches "CDAT".

Another possible way of checking the correctness would be to run
`hexdump -C` or equivalent on generated commit-graph file.  File and
chunk headers should be visible in its output.

> Here is a diff to make it match.
>
> diff --git a/Documentation/technical/commit-graph-format.txt
> b/Documentation/technical/commit-graph-format.txt
> index ad6af8105c..af03501834 100644
> --- a/Documentation/technical/commit-graph-format.txt
> +++ b/Documentation/technical/commit-graph-format.txt
> @@ -70,7 +70,7 @@ CHUNK DATA:
>    OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
>        The OIDs for all commits in the graph, sorted in ascending order.
>
> -  Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
> +  Commit Data (ID: {'C', 'D', 'A', 'T' }) (N * (H + 16) bytes)
>      * The first H bytes are for the OID of the root tree.
>      * The next 8 bytes are for the positions of the first two parents
>        of the ith commit. Stores value 0xffffffff if no parent in that

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v8 09/14] commit-graph: add core.commitGraph setting
  2018-04-10 12:56       ` [PATCH v8 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
@ 2018-04-14 18:33         ` Jakub Narebski
  0 siblings, 0 replies; 110+ messages in thread
From: Jakub Narebski @ 2018-04-14 18:33 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, ramsay, sbeller, szeder.dev, git, peff,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The commit graph feature is controlled by the new core.commitGraph config
> setting. This defaults to 0, so the feature is opt-in.
>
> The intention of core.commitGraph is that a user can always stop checking
> for or parsing commit graph files if core.commitGraph=0.

This is bool-valued setting, so the commit message should talk about
'true' and 'false', not '1' or '0'.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/config.txt | 4 ++++
>  cache.h                  | 1 +
>  config.c                 | 5 +++++
>  environment.c            | 1 +
>  4 files changed, 11 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 4e0cff87f6..e5c7013fb0 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -898,6 +898,10 @@ core.notesRef::
>  This setting defaults to "refs/notes/commits", and it can be overridden by
>  the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
>  
> +core.commitGraph::
> +	Enable git commit graph feature. Allows reading from the
> +	commit-graph file.
> +

This is a very minimal description of this config variable.  In my
opionion it lack two things:

1. The reference to the documentation where one can read more, for
   example linkgit:git-commit-graph[1] manpage, like in the description
   of core.sparseCheckout feature described below.

2. The information about restrictions, for example something like the
   following:

  "The feature does not work for shallow clones, neither when
  `git-replace` or grafts file are used."

   Perhaps even with references to each commit-graph disabling feature.

When the feature matures, and it will be on by default (well, the
reading part at least), it would probably acquire wording similar to the
one for `pack.useBitmaps` config option[1], isn't it?

[1]: https://git-scm.com/docs/git-config#git-config-packuseBitmaps

>  core.sparseCheckout::
>  	Enable "sparse checkout" feature. See section "Sparse checkout" in
>  	linkgit:git-read-tree[1] for more information.
> diff --git a/cache.h b/cache.h
> index a61b2d3f0d..8bdbcbbbf7 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -805,6 +805,7 @@ extern char *git_replace_ref_base;
>  
>  extern int fsync_object_files;
>  extern int core_preload_index;
> +extern int core_commit_graph;
>  extern int core_apply_sparse_checkout;
>  extern int precomposed_unicode;
>  extern int protect_hfs;
> diff --git a/config.c b/config.c
> index b0c20e6cb8..25ee4a676c 100644
> --- a/config.c
> +++ b/config.c
> @@ -1226,6 +1226,11 @@ static int git_default_core_config(const char *var, const char *value)
>  		return 0;
>  	}
>  
> +	if (!strcmp(var, "core.commitgraph")) {
> +		core_commit_graph = git_config_bool(var, value);
> +		return 0;
> +	}
> +
>  	if (!strcmp(var, "core.sparsecheckout")) {
>  		core_apply_sparse_checkout = git_config_bool(var, value);
>  		return 0;
> diff --git a/environment.c b/environment.c
> index d6dd64662c..8853e2f0dd 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -62,6 +62,7 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED;
>  enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
>  char *notes_ref_name;
>  int grafts_replace_parents = 1;
> +int core_commit_graph;
>  int core_apply_sparse_checkout;
>  int merge_log_config = -1;
>  int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */

So this is just a config variable handling.  Nicely short patch to
review; the code part looks good to me.

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v8 08/14] commit-graph: implement git commit-graph read
  2018-04-10 12:56       ` [PATCH v8 08/14] commit-graph: implement git commit-graph read Derrick Stolee
@ 2018-04-14 22:15         ` Jakub Narebski
  2018-04-15  3:26           ` Eric Sunshine
  0 siblings, 1 reply; 110+ messages in thread
From: Jakub Narebski @ 2018-04-14 22:15 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, ramsay, sbeller, szeder.dev, git, peff,
	Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
> Subject: [PATCH v8 08/14] commit-graph: implement git commit-graph read

Minor nit: this is one commit message [subject] among all others that
uses "git commit-graph" instead of "git-commit-graph" in the
description.

Also, perhaps this (and similarly titled commits in this series) would
read better with quotes, that is as:

  commit-graph: implement "git commit-graph read"

Though that might be a matter of personal taste.

>
> Teach git-commit-graph to read commit graph files and summarize their contents.
>
> Use the read subcommand to verify the contents of a commit graph file in the
> tests.

Better would be, in my opinion

  Use the 'read' subcommand

or

  Use the "read" subcommand

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-commit-graph.txt |  12 +++
>  builtin/commit-graph.c             |  56 ++++++++++++
>  commit-graph.c                     | 137 ++++++++++++++++++++++++++++-
>  commit-graph.h                     |  23 +++++
>  t/t5318-commit-graph.sh            |  32 +++++--
>  5 files changed, 254 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt
> index 47996e8f89..8aad8303f5 100644
> --- a/Documentation/git-commit-graph.txt
> +++ b/Documentation/git-commit-graph.txt
> @@ -9,6 +9,7 @@ git-commit-graph - Write and verify Git commit graph files
>  SYNOPSIS
>  --------
>  [verse]
> +'git commit-graph read' [--object-dir <dir>]
>  'git commit-graph write' <options> [--object-dir <dir>]

Why do you need this '[--object-dir <dir>]' parameter?  Anyway, because
Git has the GIT_OBJECT_DIRECTORY environment variable support, I would
expect '--object-dir' to be parameter to the 'git' wrapper/command, like
'--git-dir' is, not to the 'git commit-graph' command, or even only its
selected individual subcommands.

>  
>  
> @@ -35,6 +36,11 @@ COMMANDS
>  Write a commit graph file based on the commits found in packfiles.
>  Includes all commits from the existing commit graph file.
>  
> +'read'::
> +
> +Read a graph file given by the commit-graph file

The above part of sentence reads very strange, as a truism.

>                                                   and output basic
> +details about the graph file. Used for debugging purposes.

I would say that it is 'used' for testing, and is 'useful' (or 'can be
used') for debugging purposes.

> +
>  
>  EXAMPLES
>  --------
> @@ -45,6 +51,12 @@ EXAMPLES
>  $ git commit-graph write
>  ------------------------------------------------
>  
> +* Read basic information from the commit-graph file.
> ++
> +------------------------------------------------
> +$ git commit-graph read
> +------------------------------------------------

I would personally prefer to have example output together with example
calling convention.

> +
>  
>  GIT
>  ---
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 26b6360289..efd39331d7 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -7,10 +7,16 @@
>  
>  static char const * const builtin_commit_graph_usage[] = {
>  	N_("git commit-graph [--object-dir <objdir>]"),
> +	N_("git commit-graph read [--object-dir <objdir>]"),
>  	N_("git commit-graph write [--object-dir <objdir>]"),
>  	NULL
>  };
>  
> +static const char * const builtin_commit_graph_read_usage[] = {
> +	N_("git commit-graph read [--object-dir <objdir>]"),
> +	NULL
> +};
> +
>  static const char * const builtin_commit_graph_write_usage[] = {
>  	N_("git commit-graph write [--object-dir <objdir>]"),
>  	NULL
> @@ -20,6 +26,54 @@ static struct opts_commit_graph {
>  	const char *obj_dir;
>  } opts;
>  
> +static int graph_read(int argc, const char **argv)
> +{
> +	struct commit_graph *graph = NULL;
> +	char *graph_name;
> +
> +	static struct option builtin_commit_graph_read_options[] = {
> +		OPT_STRING(0, "object-dir", &opts.obj_dir,
> +			N_("dir"),
> +			N_("The object directory to store the graph")),

Actually it is not the object directory to store the graph, but it is
the object directory to read the commit-graph file from.

> +		OPT_END(),
> +	};
> +
> +	argc = parse_options(argc, argv, NULL,
> +			     builtin_commit_graph_read_options,
> +			     builtin_commit_graph_read_usage, 0);
> +
> +	if (!opts.obj_dir)
> +		opts.obj_dir = get_object_directory();
> +
> +	graph_name = get_commit_graph_filename(opts.obj_dir);
> +	graph = load_commit_graph_one(graph_name);
> +
> +	if (!graph)
> +		die("graph file %s does not exist", graph_name);

It might be better to use single quotes around '%s'; this is absolute
pathname (if I understand it correctly), and it may contain spaces in
it.

> +	FREE_AND_NULL(graph_name);
> +
> +	printf("header: %08x %d %d %d %d\n",

Wouldn't it be better to print signature charactes (FourCC-like), that
is 'CGPH'?  And maybe name each part of header?

  +	printf("header: %c%c%c%c ver=%d hash=%d chunks=%d reserved=%d\n",

Would it make using the command in tests harder, maybe?

> +		ntohl(*(uint32_t*)graph->data),
> +		*(unsigned char*)(graph->data + 4),
> +		*(unsigned char*)(graph->data + 5),
> +		*(unsigned char*)(graph->data + 6),
> +		*(unsigned char*)(graph->data + 7));
> +	printf("num_commits: %u\n", graph->num_commits);

All right.

> +	printf("chunks:");
> +
> +	if (graph->chunk_oid_fanout)
> +		printf(" oid_fanout");
> +	if (graph->chunk_oid_lookup)
> +		printf(" oid_lookup");
> +	if (graph->chunk_commit_data)
> +		printf(" commit_metadata");
> +	if (graph->chunk_large_edges)
> +		printf(" large_edges");
> +	printf("\n");

This means that there is no support for unknown chunks (perhaps created
by newer version of Git - that does not exist yet), including unknown
optional chunks.  But I guess that is acceptable at this stage.

Note that for unknown chunks you would be able to only print their
signatures, because we do not know their full names.

> +
> +	return 0;
> +}
> +

No unmap, no closing file descriptor; I guess we can rely on operating
system doing this cleanup for us on exit.


[...]
> +static struct commit_graph *alloc_commit_graph(void)
> +{
> +	struct commit_graph *g = xcalloc(1, sizeof(*g));

All right, that is the standard idiom used by git code.

> +	g->graph_fd = -1;
> +
> +	return g;
> +}

Would we need some safe way of deallocating graph data?  Who owns
graph_fd, and is responsible for closing the file (well, except system
when program exits - but what about libgit2 then)?

> +
> +struct commit_graph *load_commit_graph_one(const char *graph_file)
> +{
> +	void *graph_map;
> +	const unsigned char *data, *chunk_lookup;
> +	size_t graph_size;
> +	struct stat st;
> +	uint32_t i;
> +	struct commit_graph *graph;
> +	int fd = git_open(graph_file);
> +	uint64_t last_chunk_offset;
> +	uint32_t last_chunk_id;
> +	uint32_t graph_signature;
> +	unsigned char graph_version, hash_version;
> +
> +	if (fd < 0)
> +		return NULL;
> +	if (fstat(fd, &st)) {
> +		close(fd);
> +		return NULL;
> +	}
> +	graph_size = xsize_t(st.st_size);
> +
> +	if (graph_size < GRAPH_MIN_SIZE) {
> +		close(fd);
> +		die("graph file %s is too small", graph_file);

Should we print its expected minimal size, too?
Shouldn't error messages be marked for localization?

> +	}
> +	graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
> +	data = (const unsigned char *)graph_map;

All right, speed is important, so let's (x)mmap the file.

> +
> +	graph_signature = get_be32(data);
> +	if (graph_signature != GRAPH_SIGNATURE) {
> +		error("graph signature %X does not match signature %X",
> +		      graph_signature, GRAPH_SIGNATURE);
> +		goto cleanup_fail;
> +	}

All right, we check the signature of the file.

> +
> +	graph_version = *(unsigned char*)(data + 4);

I wonder if those numbers should not be replaced by preprocessor
constants.  I guess it wouldn't actually improve readability.

> +	if (graph_version != GRAPH_VERSION) {
> +		error("graph version %X does not match version %X",
> +		      graph_version, GRAPH_VERSION);
> +		goto cleanup_fail;
> +	}

Does this mean that the command is not forward-compatibile, in that it
would fail on "commit-graph" files created in newer version of Git, then
accessed with older version?

> +
> +	hash_version = *(unsigned char*)(data + 5);
> +	if (hash_version != GRAPH_OID_VERSION) {
> +		error("hash version %X does not match version %X",
> +		      hash_version, GRAPH_OID_VERSION);
> +		goto cleanup_fail;
> +	}

All right, there is no support for NewHash yet, so there is nothing to
do but fail.

> +
> +	graph = alloc_commit_graph();
> +
> +	graph->hash_len = GRAPH_OID_LEN;
> +	graph->num_chunks = *(unsigned char*)(data + 6);
> +	graph->graph_fd = fd;
> +	graph->data = graph_map;
> +	graph->data_len = graph_size;
> +
> +	last_chunk_id = 0;
> +	last_chunk_offset = 8;
> +	chunk_lookup = data + 8;
> +	for (i = 0; i < graph->num_chunks; i++) {
> +		uint32_t chunk_id = get_be32(chunk_lookup + 0);
> +		uint64_t chunk_offset = get_be64(chunk_lookup + 4);
> +		int chunk_repeated = 0;
> +
> +		chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;

All right, here we use preprocessor constant (I would guess: 4 + 8).

> +
> +		if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {

All right, there must be place for final H-byte HASH-checksum of all of
contents.

> +			error("improper chunk offset %08x%08x", (uint32_t)(chunk_offset >> 32),

And by "improper" you mean "too large" here.

Why the strange formatting of uint64_t / off64_t values?  Is it
compatibility reasons?

> +			      (uint32_t)chunk_offset);
> +			goto cleanup_fail;
> +		}
> +
> +		switch (chunk_id) {
> +		case GRAPH_CHUNKID_OIDFANOUT:
> +			if (graph->chunk_oid_fanout)
> +				chunk_repeated = 1;
> +			else
> +				graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);

All right, this is the only currently defined chunk where the element is
a simple type, it would be always the same simple type, and we know this
type.  Not so for the rest of chunks: either the element is composite
type, or the size of element can change in the future (like hash size).

Sidenote: for verification one would probably have to check that:
  - the size of oid_fanout chunk is 256 * 4 bytes
  - that 0 <= F[0] <= F[1] <= ... <= F[255] = num_commits

> +			break;
> +
> +		case GRAPH_CHUNKID_OIDLOOKUP:
> +			if (graph->chunk_oid_lookup)
> +				chunk_repeated = 1;
> +			else
> +				graph->chunk_oid_lookup = data + chunk_offset;
> +			break;

Sidenote: for verification one would probably have to check that:
 - the size of oid_lookup is N * H bytes, where N = num_commits
 - the OIDs are sorted in ascending lexicographical order
 - that each objects with given OID exists, and is a commit object

Though the problem is that we may not know num_commits with this way of
reading at this time.

> +
> +		case GRAPH_CHUNKID_DATA:
> +			if (graph->chunk_commit_data)
> +				chunk_repeated = 1;
> +			else
> +				graph->chunk_commit_data = data + chunk_offset;
> +			break;

Sidenote: for verification one would probably have to check that:
 - the size of oid_lookup is N * (H + 16) bytes, where N = num_commits
 - that data in here agrees with data from the ODB

> +
> +		case GRAPH_CHUNKID_LARGEEDGES:
> +			if (graph->chunk_large_edges)
> +				chunk_repeated = 1;
> +			else
> +				graph->chunk_large_edges = data + chunk_offset;
> +			break;
> +		}

Sidenote: verification of this would be even more involved.

> +
> +		if (chunk_repeated) {
> +			error("chunk id %08x appears multiple times", chunk_id);

Wouldn't it be better to print signature, and not raw chunk_id in hex?

> +			goto cleanup_fail;
> +		}

All right, we fail on first repeated non-repeatable chunk.

> +
> +		if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
> +		{
> +			graph->num_commits = (chunk_offset - last_chunk_offset)
> +					     / graph->hash_len;
> +		}

All right, looks good to me.

Sidenote: one should probably verify that (chunk_offset - last_chunk_offset)
here is evenly divisible into hash_len.

> +
> +		last_chunk_id = chunk_id;
> +		last_chunk_offset = chunk_offset;
> +	}

Sidenote: the verification should check that final checksum is correct.

> +
> +	return graph;
> +
> +cleanup_fail:
> +	munmap(graph_map, graph_size);
> +	close(fd);
> +	exit(1);
> +}
> +
>  static void write_graph_chunk_fanout(struct hashfile *f,
>  				     struct commit **commits,
>  				     int nr_commits)
> diff --git a/commit-graph.h b/commit-graph.h
> index 16fea993ab..2528478f06 100644
> --- a/commit-graph.h
> +++ b/commit-graph.h
> @@ -1,6 +1,29 @@
>  #ifndef COMMIT_GRAPH_H
>  #define COMMIT_GRAPH_H
>  
> +#include "git-compat-util.h"
> +
> +char *get_commit_graph_filename(const char *obj_dir);
> +
> +struct commit_graph {
> +	int graph_fd;
> +
> +	const unsigned char *data;
> +	size_t data_len;

All right, this is "raw data".

> +
> +	unsigned char hash_len;
> +	unsigned char num_chunks;
> +	uint32_t num_commits;

All right.

> +	struct object_id oid;

What is this for?

> +
> +	const uint32_t *chunk_oid_fanout;
> +	const unsigned char *chunk_oid_lookup;
> +	const unsigned char *chunk_commit_data;
> +	const unsigned char *chunk_large_edges;

All right, individual chunks (or NULL if chunks does not exist - for
optional ones).

> +};
> +
> +struct commit_graph *load_commit_graph_one(const char *graph_file);
> +
>  void write_commit_graph(const char *obj_dir);
>  
>  #endif
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index d7b635bd68..2f44f91193 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -26,10 +26,28 @@ test_expect_success 'create commits and repack' '
>  	git repack
>  '
>  
> +graph_read_expect() {

All right, I see that we have unstated convention of not documenting
local shell functions in tests.

This should have space before parentheses, like this:

  +graph_read_expect () {

                    ^
                    \-- here

> +	OPTIONAL=""
> +	NUM_CHUNKS=3
> +	if test ! -z $2
> +	then
> +		OPTIONAL=" $2"
> +		NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
> +	fi

I don't know if it is possible to do the above in a portable shell
without using external 'wc' command.  Also, isn't $(( ... )) bashism?

Perhaps better solution would be to pass each expected extra chunk as
separate parameter, and simply compose OPTIONAL from those subsequent
parameters: we know that the separator is space.

Also, currently this is overengineered a bit... or just
forward-thinking, as we will have at most single-word 2nd parameter,
namely "large_edges".

> +	cat >expect <<- EOF
> +	header: 43475048 1 1 $NUM_CHUNKS 0
> +	num_commits: $1
> +	chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL
> +	EOF
> +	git commit-graph read >output &&
> +	test_cmp expect output
> +}
> +
>  test_expect_success 'write graph' '
>  	cd "$TRASH_DIRECTORY/full" &&
>  	graph1=$(git commit-graph write) &&

Why do you use command substitution here?  'graph1' variable is not used
anywhere I can see, and in all other examples below you simply run
"git commit-graph write" without command substitution.

> -	test_path_is_file $objdir/info/commit-graph
> +	test_path_is_file $objdir/info/commit-graph &&
> +	graph_read_expect "3"
>  '
>  
>  test_expect_success 'Add more commits' '
> @@ -72,7 +90,8 @@ test_expect_success 'Add more commits' '
>  test_expect_success 'write graph with merges' '
>  	cd "$TRASH_DIRECTORY/full" &&
>  	git commit-graph write &&
> -	test_path_is_file $objdir/info/commit-graph
> +	test_path_is_file $objdir/info/commit-graph &&
> +	graph_read_expect "10" "large_edges"
>  '
>  
>  test_expect_success 'Add one more commit' '
[...]

Thank you for your patient work on this feature,
-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v8 08/14] commit-graph: implement git commit-graph read
  2018-04-14 22:15         ` Jakub Narebski
@ 2018-04-15  3:26           ` Eric Sunshine
  0 siblings, 0 replies; 110+ messages in thread
From: Eric Sunshine @ 2018-04-15  3:26 UTC (permalink / raw)
  To: Jakub Narebski
  Cc: Derrick Stolee, Git List, Junio C Hamano, Ramsay Jones,
	Stefan Beller, SZEDER Gábor, Jeff Hostetler, Jeff King,
	Derrick Stolee

On Sat, Apr 14, 2018 at 6:15 PM, Jakub Narebski <jnareb@gmail.com> wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>> +             NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
>
> I don't know if it is possible to do the above in a portable shell
> without using external 'wc' command.  Also, isn't $(( ... )) bashism?

$((...)) is POSIX and used heavily in existing Git test scripts.

^ permalink raw reply	[flat|nested] 110+ messages in thread

* Re: [PATCH v8 04/14] graph: add commit graph design document
  2018-04-10 12:55       ` [PATCH v8 04/14] graph: add commit graph design document Derrick Stolee
@ 2018-04-15 22:48         ` Jakub Narebski
  0 siblings, 0 replies; 110+ messages in thread
From: Jakub Narebski @ 2018-04-15 22:48 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Ramsay Jones, Stefan Beller,
	SZEDER Gábor, Jeff Hostetler, Jeff King, Derrick Stolee

Derrick Stolee <stolee@gmail.com> writes:

> +Future Work
> +-----------
> +
> +- The commit graph feature currently does not honor commit grafts. This can
> +  be remedied by duplicating or refactoring the current graft logic.

The problem in my opinion lies in different direction, namely that
commit grafts can change, changing the view of the history.  If we want
commit-graph file to follow user-visible view of the history of the
project, it needs to respect current version of commit grafts - but what
if commit grafts changed since commit-graph file was generated?

Actually, there are currently three ways to affect the view of the
history:

a. legacy commit grafts mechanism; it was first, but it is not safe,
   cannot be transferred on fetch / push, and is now deprecated.

b. shallow clones, which are kind of specialized and limited grafts;
   they used to limit available functionality, but restrictions are
   being lifted (or perhaps even got lifted)

c. git-replace mechanism, where we can create an "overlay" of any
   object, and is intended to be among others replacement for commit
   grafts; safe, transferable, can be turned off with "git
   --no-replace-objects <command>"

All those can change; some more likely than others.  The problem is if
they change between writing commit-graph file (respecting old view of
the history) and reading it (where we expect to see the new view).

a. grafts file can change: lines can be added, removed or changed

b. shallow clones can be deepened or shortened, or even make
   not shallow

c. new replacements can be added, old removed, and existing edited


There are, as far as I can see, two ways of handling the issue of Git
features that can change the view of the project's history, namely:

 * Disable commit-graph reading when any of this features are used, and
   always write full graph info.

   This may not matter much for shallow clones, where commit count
   should be small anyway, but when using git-replace to stitch together
   current repository with historical one, commit-graph would be
   certainly useful.  Also, git-replace does not need to change history.

   On the other hand I think it is the easier solution.

Or

 * Detect somehow that the view of the history changed, and invalidate
   commit-graph (perhaps even automatically regenerate it).

   For shallow clone changes I think one can still use the old
   commit-graph file to generate the new one.  For other cases, the
   metadata is simple to modify, but indices such as generation number
   would need to be at least partially calculated anew.

Happily, you don't need to do this now.  It can be done in later series;
on the other hand this would be required before the switch can be turned
from "default off" to "default on" for commit-graph feature (configured
with core.commitGraph).

So please keep up the good work with sending nicely digestible patch
series.

> +
> +- The 'commit-graph' subcommand does not have a "verify" mode that is
> +  necessary for integration with fsck.

The "read" mode has beginnings of "verify", or at least "fsck", isn't
it?

[...]
> +- The current design uses the 'commit-graph' subcommand to generate the graph.
> +  When this feature stabilizes enough to recommend to most users, we should
> +  add automatic graph writes to common operations that create many commits.
> +  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
> +  commands.

Automatic is good ("git gc --auto").

But that needs handling of view chaning features such as commit grafts,
isn't it?

> +
> +- A server could provide a commit graph file as part of the network protocol
> +  to avoid extra calculations by clients. This feature is only of benefit if
> +  the user is willing to trust the file, because verifying the file is correct
> +  is as hard as computing it from scratch.

Should server send different commit-graph file / info depending on
whether client fetches from refs/replaces/* nameespace?

> +
> +Related Links
> +-------------

Thank you for providing them (together with summary).

> +[0] https://bugs.chromium.org/p/git/issues/detail?id=8
> +    Chromium work item for: Serialized Commit Graph
> +
> +[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/
> +    An abandoned patch that introduced generation numbers.
> +
> +[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
> +    Discussion about generation numbers on commits and how they interact
> +    with fsck.
> +
> +[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/
> +    More discussion about generation numbers and not storing them inside
> +    commit objects. A valuable quote:
> +
> +    "I think we should be moving more in the direction of keeping
> +     repo-local caches for optimizations. Reachability bitmaps have been
> +     a big performance win. I think we should be doing the same with our
> +     properties of commits. Not just generation numbers, but making it
> +     cheap to access the graph structure without zlib-inflating whole
> +     commit objects (i.e., packv4 or something like the "metapacks" I
> +     proposed a few years ago)."
> +
> +[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
> +    A patch to remove the ahead-behind calculation from 'status'.

^ permalink raw reply	[flat|nested] 110+ messages in thread

end of thread, other threads:[~2018-04-15 22:48 UTC | newest]

Thread overview: 110+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-27  2:32 [PATCH v5 00/13] Serialized Git Commit Graph Derrick Stolee
2018-02-27  2:32 ` [PATCH v5 01/13] commit-graph: add format document Derrick Stolee
2018-02-27  2:32 ` [PATCH v5 02/13] graph: add commit graph design document Derrick Stolee
2018-02-27  2:32 ` [PATCH v5 03/13] commit-graph: create git-commit-graph builtin Derrick Stolee
2018-02-27  2:32 ` [PATCH v5 04/13] csum-file: add CSUM_KEEP_OPEN flag Derrick Stolee
2018-03-12 13:55   ` Derrick Stolee
2018-03-13 21:42     ` Junio C Hamano
2018-03-14  2:26       ` Derrick Stolee
2018-03-14 17:00         ` Junio C Hamano
2018-02-27  2:32 ` [PATCH v5 05/13] commit-graph: implement write_commit_graph() Derrick Stolee
2018-02-27  2:33 ` [PATCH v5 06/13] commit-graph: implement 'git-commit-graph write' Derrick Stolee
2018-02-27  2:33 ` [PATCH v5 07/13] commit-graph: implement git commit-graph read Derrick Stolee
2018-02-27  2:33 ` [PATCH v5 08/13] commit-graph: add core.commitGraph setting Derrick Stolee
2018-02-27  2:33 ` [PATCH v5 09/13] commit-graph: close under reachability Derrick Stolee
2018-02-27  2:33 ` [PATCH v5 10/13] commit: integrate commit graph with commit parsing Derrick Stolee
2018-02-27  2:33 ` [PATCH v5 11/13] commit-graph: read only from specific pack-indexes Derrick Stolee
2018-02-27 20:15   ` Stefan Beller
2018-02-27  2:33 ` [PATCH v5 12/13] commit-graph: build graph from starting commits Derrick Stolee
2018-02-27  2:33 ` [PATCH v5 13/13] commit-graph: implement "--additive" option Derrick Stolee
2018-02-27 18:50 ` [PATCH v5 00/13] Serialized Git Commit Graph Stefan Beller
2018-03-14 19:27 ` [PATCH v6 00/14] " Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 03/14] commit-graph: add format document Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 04/14] graph: add commit graph design document Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 05/14] commit-graph: create git-commit-graph builtin Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 06/14] commit-graph: implement write_commit_graph() Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 07/14] commit-graph: implement 'git-commit-graph write' Derrick Stolee
2018-03-18 13:25     ` Ævar Arnfjörð Bjarmason
2018-03-19 13:12       ` Derrick Stolee
2018-03-19 14:36         ` Ævar Arnfjörð Bjarmason
2018-03-19 18:27           ` Derrick Stolee
2018-03-19 18:48             ` Ævar Arnfjörð Bjarmason
2018-03-14 19:27   ` [PATCH v6 08/14] commit-graph: implement git commit-graph read Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 10/14] commit-graph: close under reachability Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
2018-03-15 22:50     ` SZEDER Gábor
2018-03-19 13:13       ` Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 13/14] commit-graph: build graph from starting commits Derrick Stolee
2018-03-14 19:27   ` [PATCH v6 14/14] commit-graph: implement "--additive" option Derrick Stolee
2018-03-14 20:10   ` [PATCH v6 00/14] Serialized Git Commit Graph Ramsay Jones
2018-03-14 20:43   ` Junio C Hamano
2018-03-15 17:23     ` Johannes Schindelin
2018-03-15 18:41       ` Junio C Hamano
2018-03-15 21:51         ` Ramsay Jones
2018-03-16 11:50         ` Johannes Schindelin
2018-03-16 17:27           ` Junio C Hamano
2018-03-19 11:41             ` Johannes Schindelin
2018-03-16 16:28     ` Lars Schneider
2018-03-19 13:10       ` Derrick Stolee
2018-03-16 15:06   ` Ævar Arnfjörð Bjarmason
2018-03-16 16:38     ` SZEDER Gábor
2018-03-16 18:33       ` Junio C Hamano
2018-03-16 19:48         ` SZEDER Gábor
2018-03-16 20:06           ` Jeff King
2018-03-16 20:19             ` Jeff King
2018-03-19 12:55               ` Derrick Stolee
2018-03-20  1:17                 ` Derrick Stolee
2018-03-16 20:49         ` Jeff King
2018-04-02 20:34   ` [PATCH v7 " Derrick Stolee
2018-04-02 20:34     ` [PATCH v7 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
2018-04-02 20:34     ` [PATCH v7 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
2018-04-07 22:59       ` Jakub Narebski
2018-04-02 20:34     ` [PATCH v7 03/14] commit-graph: add format document Derrick Stolee
2018-04-07 23:49       ` Jakub Narebski
2018-04-02 20:34     ` [PATCH v7 04/14] graph: add commit graph design document Derrick Stolee
2018-04-08 11:06       ` Jakub Narebski
2018-04-02 20:34     ` [PATCH v7 05/14] commit-graph: create git-commit-graph builtin Derrick Stolee
2018-04-02 20:34     ` [PATCH v7 06/14] commit-graph: implement write_commit_graph() Derrick Stolee
2018-04-02 20:34     ` [PATCH v7 07/14] commit-graph: implement git-commit-graph write Derrick Stolee
2018-04-08 11:59       ` Jakub Narebski
2018-04-02 20:34     ` [PATCH v7 08/14] commit-graph: implement git commit-graph read Derrick Stolee
2018-04-02 21:33       ` Junio C Hamano
2018-04-03 11:49         ` Derrick Stolee
2018-04-08 12:59       ` Jakub Narebski
2018-04-02 20:34     ` [PATCH v7 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
2018-04-08 13:39       ` Jakub Narebski
2018-04-02 20:34     ` [PATCH v7 10/14] commit-graph: close under reachability Derrick Stolee
2018-04-02 20:34     ` [PATCH v7 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
2018-04-02 20:34     ` [PATCH v7 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
2018-04-02 20:34     ` [PATCH v7 13/14] commit-graph: build graph from starting commits Derrick Stolee
2018-04-08 13:50       ` Jakub Narebski
2018-04-02 20:34     ` [PATCH v7 14/14] commit-graph: implement "--additive" option Derrick Stolee
2018-04-05  8:27       ` SZEDER Gábor
2018-04-10 12:55     ` [PATCH v8 00/14] Serialized Git Commit Graph Derrick Stolee
2018-04-10 12:55       ` [PATCH v8 01/14] csum-file: rename hashclose() to finalize_hashfile() Derrick Stolee
2018-04-10 12:55       ` [PATCH v8 02/14] csum-file: refactor finalize_hashfile() method Derrick Stolee
2018-04-10 12:55       ` [PATCH v8 03/14] commit-graph: add format document Derrick Stolee
2018-04-10 19:10         ` Stefan Beller
2018-04-10 19:18           ` Derrick Stolee
2018-04-11 20:58         ` Jakub Narebski
2018-04-12 11:28           ` Derrick Stolee
2018-04-13 22:07             ` Jakub Narebski
2018-04-10 12:55       ` [PATCH v8 04/14] graph: add commit graph design document Derrick Stolee
2018-04-15 22:48         ` Jakub Narebski
2018-04-10 12:55       ` [PATCH v8 05/14] commit-graph: create git-commit-graph builtin Derrick Stolee
2018-04-10 12:56       ` [PATCH v8 06/14] commit-graph: implement write_commit_graph() Derrick Stolee
2018-04-10 12:56       ` [PATCH v8 07/14] commit-graph: implement git-commit-graph write Derrick Stolee
2018-04-10 12:56       ` [PATCH v8 08/14] commit-graph: implement git commit-graph read Derrick Stolee
2018-04-14 22:15         ` Jakub Narebski
2018-04-15  3:26           ` Eric Sunshine
2018-04-10 12:56       ` [PATCH v8 09/14] commit-graph: add core.commitGraph setting Derrick Stolee
2018-04-14 18:33         ` Jakub Narebski
2018-04-10 12:56       ` [PATCH v8 10/14] commit-graph: close under reachability Derrick Stolee
2018-04-10 12:56       ` [PATCH v8 11/14] commit: integrate commit graph with commit parsing Derrick Stolee
2018-04-10 12:56       ` [PATCH v8 12/14] commit-graph: read only from specific pack-indexes Derrick Stolee
2018-04-10 12:56       ` [PATCH v8 13/14] commit-graph: build graph from starting commits Derrick Stolee
2018-04-10 12:56       ` [PATCH v8 14/14] commit-graph: implement "--append" option Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).