git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/14] Serialized Commit Graph
@ 2018-01-25 14:02 Derrick Stolee
  2018-01-25 14:02 ` [PATCH 01/14] graph: add packed graph design document Derrick Stolee
                   ` (14 more replies)
  0 siblings, 15 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

As promised [1], this patch contains a way to serialize the commit graph.
The current implementation defines a new file format to store the graph
structure (parent relationships) and basic commit metadata (commit date,
root tree OID) in order to prevent parsing raw commits while performing
basic graph walks. For example, we do not need to parse the full commit
when performing these walks:

* 'git log --topo-order -1000' walks all reachable commits to avoid
  incorrect topological orders, but only needs the commit message for
  the top 1000 commits.

* 'git merge-base <A> <B>' may walk many commits to find the correct
  boundary between the commits reachable from A and those reachable
  from B. No commit messages are needed.

* 'git branch -vv' checks ahead/behind status for all local branches
  compared to their upstream remote branches. This is essentially as
  hard as computing merge bases for each.

The current patch speeds up these calculations by injecting a check in
parse_commit_gently() to check if there is a graph file and using that
to provide the required metadata to the struct commit.

The file format has room to store generation numbers, which will be
provided as a patch after this framework is merged. Generation numbers
are referenced by the design document but not implemented in order to
make the current patch focus on the graph construction process. Once
that is stable, it will be easier to add generation numbers and make
graph walks aware of generation numbers one-by-one.

Here are some performance results for a copy of the Linux repository
where 'master' has 704,766 reachable commits and is behind 'origin/master'
by 19,610 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  5.9s  |  0.7s  | -88%  |
| branch -vv                       |  0.42s |  0.27s | -35%  |
| rev-list --all                   |  6.4s  |  1.0s  | -84%  |
| rev-list --all --objects         | 32.6s  | 27.6s  | -15%  |

To test this yourself, run the following on your repo:

	git config core.graph true
	git show-ref -s | git graph --write --update-head

The second command writes a commit graph file containing every commit
reachable from your refs. Now, all git commands that walk commits will
check your graph first before consulting the ODB. You can run your own
performance comparisions by toggling the 'core.graph' setting.

[1] https://public-inbox.org/git/d154319e-bb9e-b300-7c37-27b1dcd2a2ce@jeffhostetler.com/
    Re: What's cooking in git.git (Jan 2018, #03; Tue, 23)

[2] https://github.com/derrickstolee/git/pull/2
    A GitHub pull request containing the latest version of this patch.

P.S. I'm sending this patch from my gmail address to avoid Outlook
munging the URLs included in the design document.

Derrick Stolee (14):
  graph: add packed graph design document
  packed-graph: add core.graph setting
  packed-graph: create git-graph builtin
  packed-graph: add format document
  packed-graph: implement construct_graph()
  packed-graph: implement git-graph --write
  packed-graph: implement git-graph --read
  graph: implement git-graph --update-head
  packed-graph: implement git-graph --clear
  packed-graph: teach git-graph --delete-expired
  commit: integrate packed graph with commit parsing
  packed-graph: read only from specific pack-indexes
  packed-graph: close under reachability
  packed-graph: teach git-graph to read commits

 Documentation/config.txt                 |   3 +
 Documentation/git-graph.txt              | 102 ++++
 Documentation/technical/graph-format.txt |  88 ++++
 Documentation/technical/packed-graph.txt | 185 +++++++
 Makefile                                 |   2 +
 alloc.c                                  |   1 +
 builtin.h                                |   1 +
 builtin/graph.c                          | 231 +++++++++
 cache.h                                  |   1 +
 command-list.txt                         |   1 +
 commit.c                                 |  20 +-
 commit.h                                 |   2 +
 config.c                                 |   5 +
 environment.c                            |   1 +
 git.c                                    |   1 +
 log-tree.c                               |   3 +-
 packed-graph.c                           | 840 +++++++++++++++++++++++++++++++
 packed-graph.h                           |  65 +++
 packfile.c                               |   4 +-
 packfile.h                               |   2 +
 t/t5319-graph.sh                         | 271 ++++++++++
 21 files changed, 1822 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/git-graph.txt
 create mode 100644 Documentation/technical/graph-format.txt
 create mode 100644 Documentation/technical/packed-graph.txt
 create mode 100644 builtin/graph.c
 create mode 100644 packed-graph.c
 create mode 100644 packed-graph.h
 create mode 100755 t/t5319-graph.sh

-- 
2.16.0


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/14] graph: add packed graph design document
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 20:04   ` Stefan Beller
                     ` (2 more replies)
  2018-01-25 14:02 ` [PATCH 02/14] packed-graph: add core.graph setting Derrick Stolee
                   ` (13 subsequent siblings)
  14 siblings, 3 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

Add Documentation/technical/packed-graph.txt with details of the planned
packed graph feature, including future plans.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/packed-graph.txt | 185 +++++++++++++++++++++++++++++++
 1 file changed, 185 insertions(+)
 create mode 100644 Documentation/technical/packed-graph.txt

diff --git a/Documentation/technical/packed-graph.txt b/Documentation/technical/packed-graph.txt
new file mode 100644
index 0000000000..fcc0c83874
--- /dev/null
+++ b/Documentation/technical/packed-graph.txt
@@ -0,0 +1,185 @@
+Git Packed Graph Design Notes
+=============================
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows above 100K.
+The merge base calculation shows up in many user-facing commands, such
+as 'status' and 'fetch' and can take minutes to compute depending on
+data shape. There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to avoid topological order mistakes.
+
+The packed graph is a file that stores the commit graph structure along
+with some extra metadata to speed up graph walks. This format allows a
+consumer to load the following info for a commit:
+
+1. The commit OID.
+2. The list of parents.
+3. The commit date.
+4. The root tree OID.
+5. An integer ID for fast lookups in the graph.
+6. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+By providing an integer ID we can avoid lookups in the graph as we walk
+commits. Specifically, we need to provide the integer ID of the parent
+commits so we navigate directly to their information on request.
+
+Define the "generation number" of a commit recursively as follows:
+ * A commit with no parents (a root commit) has generation number 1.
+ * A commit with at least one parent has generation number 1 more than
+   the largest generation number among its parents.
+Equivalently, the generation number is one more than the length of a
+longest path from the commit to a root commit. The recursive definition
+is easier to use for computation and the following property:
+
+    If A and B are commits with generation numbers N and M, respectively,
+    and N <= M, then A cannot reach B. That is, we know without searching
+    that B is not an ancestor of A because it is further from a root commit
+    than A.
+
+    Conversely, when checking if A is an ancestor of B, then we only need
+    to walk commits until all commits on the walk boundary have generation
+    number at most N. If we walk commits using a priority queue seeded by
+    generation numbers, then we always expand the boundary commit with highest
+    generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+    If A and B are commits with commit time X and Y, respectively, and
+    X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation can make
+mistakes with topological orders (such as "git log" with default order),
+but is not used when the topological order is required (such as merge
+base calculations, "git log --graph").
+
+Design Details
+--------------
+
+- A graph file is stored in a file named 'graph-<oid>.graph' in the pack
+  directory. This could be stored in an alternate.
+
+- The most-recent graph file OID is stored in a 'graph-head' file for
+  immediate access and storing backup graphs. This could be stored in an
+  alternate, and refers to a 'graph-<oid>.graph' file in the same pack
+  directory.
+
+- The core.graph config setting must be on to create or consume graph files.
+
+- The graph file is only a supplemental structure. If a user downgrades
+  or disables the 'core.graph' config setting, then the existing ODB is
+  sufficient.
+
+- The file format includes parameters for the object id length
+  and hash algorithm, so a future change of hash algorithm does
+  not require a change in format.
+
+Current Limitations
+-------------------
+
+- Only one graph file is used at one time. This allows the integer ID to
+  seek into the single graph file. It is possible to extend the model
+  for multiple graph files, but that is currently not part of the design.
+
+- .graph files are managed only by the 'graph' builtin. These are not
+  updated automatically during clone or fetch.
+
+- There is no '--verify' option for the 'graph' builtin to verify the
+  contents of the graph file.
+
+- The graph only considers commits existing in packfiles and does not
+  walk to fill in reachable commits. [Small]
+
+- When rewriting the graph, we do not check for a commit still existing
+  in the ODB, so garbage collection may remove commits
+
+- Generation numbers are not computed in the current version. The file
+  format supports storing them, along with a mechanism to upgrade from
+  a file without generation numbers to one that uses them.
+
+Future Work
+-----------
+
+- The file format includes room for precomputed generation numbers. These
+  are not currently computed, so all generation numbers will be marked as
+  0 (or "uncomputed"). A later patch will include this calculation.
+
+- The current implementation of the 'graph' builtin walks all packed objects
+  to find a complete list of commits in packfiles. If some commits are
+  stored as loose objects, then these do not appear in the graph. This is
+  handled gracefully by the file format, but it would cause incorrect
+  generation number calculations. We should implement the construct_graph()
+  method in a way that walks all commits reachable from some starting set
+  and then can use complete information for generation numbers. (Some
+  care must be taken around shallow clones.)
+
+- The graph is not currently integrated with grafts.
+
+- After computing and storing generation numbers, we must make graph
+  walks aware of generation numbers to gain performance benefits. This
+  will mostly be accomplished by swapping a commit-date-ordered priority
+  queue with one ordered by generation number. The following operations
+  are important candidates:
+
+    - paint_down_to_common()
+    - 'log --topo-order'
+
+- The graph currently only adds commits to a previously existing graph.
+  When writing a new graph, we could check that the ODB still contains
+  the commits and choose to remove the commits that are deleted from the
+  ODB. For performance reasons, this check should remain optional.
+
+- Currently, parse_commit_gently() requires filling in the root tree
+  object for a commit. This passes through lookup_tree() and consequently
+  lookup_object(). Also, it calls lookup_commit() when loading the parents.
+  These method calls check the ODB for object existence, even if the
+  consumer does not need the content. For example, we do not need the
+  tree contents when computing merge bases. Now that commit parsing is
+  removed from the computation time, these lookup operations are the
+  slowest operations keeping graph walks from being fast. Consider
+  loading these objects without verifying their existence in the ODB and
+  only loading them fully when consumers need them. Consider a method
+  such as "ensure_tree_loaded(commit)" that fully loads a tree before
+  using commit->tree.
+
+- The current design uses the 'graph' builtin to generate the graph. When
+  this feature stabilizes enough to recommend to most users, we should
+  add automatic graph writes to common operations that create many commits.
+  For example, one coulde compute a graph on 'clone' and 'fetch' commands.
+
+Related Links
+-------------
+
+[0] https://bugs.chromium.org/p/git/issues/detail?id=8
+    Chromium work item for: Serialized Commit Graph
+
+[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/
+    An abandoned patch that introduced generation numbers.
+
+[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/
+    Discussion about generation numbers on commits and how they interact
+    with fsck.
+
+[3] https://public-inbox.org/git/20170907094718.b6kuzp2uhvkmwcso@sigill.intra.peff.net/t/#m7a2ea7b355aeda962e6b86404bcbadc648abfbba
+    More discussion about generation numbers and not storing them inside
+    commit objects. A valuable quote:
+
+    "I think we should be moving more in the direction of keeping
+     repo-local caches for optimizations. Reachability bitmaps have been
+     a big performance win. I think we should be doing the same with our
+     properties of commits. Not just generation numbers, but making it
+     cheap to access the graph structure without zlib-inflating whole
+     commit objects (i.e., packv4 or something like the "metapacks" I
+     proposed a few years ago)."
+
+[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u
+    A patch to remove the ahead-behind calculation from 'status'.
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 02/14] packed-graph: add core.graph setting
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
  2018-01-25 14:02 ` [PATCH 01/14] graph: add packed graph design document Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 20:17   ` Stefan Beller
  2018-01-25 21:43   ` Junio C Hamano
  2018-01-25 14:02 ` [PATCH 03/14] packed-graph: create git-graph builtin Derrick Stolee
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

The packed graph feature is controlled by the new core.graph config
setting. This defaults to 0, so the feature is opt-in.

The intention of core.graph is that a user can always stop checking
for or parsing packed graph files if core.graph=0.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/config.txt | 3 +++
 cache.h                  | 1 +
 config.c                 | 5 +++++
 environment.c            | 1 +
 4 files changed, 10 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e25b2c92b..e7b98fa14f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -898,6 +898,9 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
+core.graph::
+	Enable git commit graph feature. Allows writing and reading from .graph files.
+
 core.sparseCheckout::
 	Enable "sparse checkout" feature. See section "Sparse checkout" in
 	linkgit:git-read-tree[1] for more information.
diff --git a/cache.h b/cache.h
index d8b975a571..655a81ac90 100644
--- a/cache.h
+++ b/cache.h
@@ -825,6 +825,7 @@ extern char *git_replace_ref_base;
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
+extern int core_graph;
 extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
diff --git a/config.c b/config.c
index e617c2018d..fee90912d8 100644
--- a/config.c
+++ b/config.c
@@ -1223,6 +1223,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.graph")) {
+		core_graph = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.sparsecheckout")) {
 		core_apply_sparse_checkout = git_config_bool(var, value);
 		return 0;
diff --git a/environment.c b/environment.c
index 63ac38a46f..0c56a3d869 100644
--- a/environment.c
+++ b/environment.c
@@ -61,6 +61,7 @@ enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
 int core_apply_sparse_checkout;
+int core_graph;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 03/14] packed-graph: create git-graph builtin
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
  2018-01-25 14:02 ` [PATCH 01/14] graph: add packed graph design document Derrick Stolee
  2018-01-25 14:02 ` [PATCH 02/14] packed-graph: add core.graph setting Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 21:45   ` Stefan Beller
  2018-01-25 23:01   ` Junio C Hamano
  2018-01-25 14:02 ` [PATCH 04/14] packed-graph: add format document Derrick Stolee
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

Teach Git the 'graph' builtin that will be used for writing and
reading packed graph files. The current implementation is mostly
empty, except for a check that the core.graph setting is enabled
and a '--pack-dir' option.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-graph.txt |  7 +++++++
 Makefile                    |  1 +
 builtin.h                   |  1 +
 builtin/graph.c             | 36 ++++++++++++++++++++++++++++++++++++
 command-list.txt            |  1 +
 git.c                       |  1 +
 6 files changed, 47 insertions(+)
 create mode 100644 Documentation/git-graph.txt
 create mode 100644 builtin/graph.c

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
new file mode 100644
index 0000000000..de5a3c07e6
--- /dev/null
+++ b/Documentation/git-graph.txt
@@ -0,0 +1,7 @@
+git-graph(1)
+============
+
+NAME
+----
+git-graph - Write and verify Git commit graphs (.graph files)
+
diff --git a/Makefile b/Makefile
index 1a9b23b679..d8b0d0457a 100644
--- a/Makefile
+++ b/Makefile
@@ -965,6 +965,7 @@ BUILTIN_OBJS += builtin/for-each-ref.o
 BUILTIN_OBJS += builtin/fsck.o
 BUILTIN_OBJS += builtin/gc.o
 BUILTIN_OBJS += builtin/get-tar-commit-id.o
+BUILTIN_OBJS += builtin/graph.o
 BUILTIN_OBJS += builtin/grep.o
 BUILTIN_OBJS += builtin/hash-object.o
 BUILTIN_OBJS += builtin/help.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa4..ae7e816908 100644
--- a/builtin.h
+++ b/builtin.h
@@ -168,6 +168,7 @@ extern int cmd_format_patch(int argc, const char **argv, const char *prefix);
 extern int cmd_fsck(int argc, const char **argv, const char *prefix);
 extern int cmd_gc(int argc, const char **argv, const char *prefix);
 extern int cmd_get_tar_commit_id(int argc, const char **argv, const char *prefix);
+extern int cmd_graph(int argc, const char **argv, const char *prefix);
 extern int cmd_grep(int argc, const char **argv, const char *prefix);
 extern int cmd_hash_object(int argc, const char **argv, const char *prefix);
 extern int cmd_help(int argc, const char **argv, const char *prefix);
diff --git a/builtin/graph.c b/builtin/graph.c
new file mode 100644
index 0000000000..a902dc8646
--- /dev/null
+++ b/builtin/graph.c
@@ -0,0 +1,36 @@
+#include "builtin.h"
+#include "cache.h"
+#include "config.h"
+#include "dir.h"
+#include "git-compat-util.h"
+#include "lockfile.h"
+#include "packfile.h"
+#include "parse-options.h"
+
+static char const * const builtin_graph_usage[] ={
+	N_("git graph [--pack-dir <packdir>]"),
+	NULL
+};
+
+static struct opts_graph {
+	const char *pack_dir;
+} opts;
+
+int cmd_graph(int argc, const char **argv, const char *prefix)
+{
+	static struct option builtin_graph_options[] = {
+		{ OPTION_STRING, 'p', "pack-dir", &opts.pack_dir,
+			N_("dir"),
+			N_("The pack directory to store the graph") },
+		OPT_END(),
+	};
+
+	if (!core_graph)
+		die("core.graph is false");
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(builtin_graph_usage, builtin_graph_options);
+
+	return 0;
+}
+
diff --git a/command-list.txt b/command-list.txt
index a1fad28fd8..d9c17cb9f8 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -61,6 +61,7 @@ git-format-patch                        mainporcelain
 git-fsck                                ancillaryinterrogators
 git-gc                                  mainporcelain
 git-get-tar-commit-id                   ancillaryinterrogators
+git-graph                               plumbingmanipulators
 git-grep                                mainporcelain           info
 git-gui                                 mainporcelain
 git-hash-object                         plumbingmanipulators
diff --git a/git.c b/git.c
index c870b9719c..29f8b6e7dd 100644
--- a/git.c
+++ b/git.c
@@ -408,6 +408,7 @@ static struct cmd_struct commands[] = {
 	{ "fsck-objects", cmd_fsck, RUN_SETUP },
 	{ "gc", cmd_gc, RUN_SETUP },
 	{ "get-tar-commit-id", cmd_get_tar_commit_id },
+	{ "graph", cmd_graph, RUN_SETUP_GENTLY },
 	{ "grep", cmd_grep, RUN_SETUP_GENTLY },
 	{ "hash-object", cmd_hash_object },
 	{ "help", cmd_help },
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 04/14] packed-graph: add format document
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
                   ` (2 preceding siblings ...)
  2018-01-25 14:02 ` [PATCH 03/14] packed-graph: create git-graph builtin Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 22:06   ` Junio C Hamano
  2018-01-25 22:07   ` Stefan Beller
  2018-01-25 14:02 ` [PATCH 05/14] packed-graph: implement construct_graph() Derrick Stolee
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

Add document specifying the binary format for packed graphs. This
format allows for:

* New versions.
* New hash functions and hash lengths.
* Optional extensions.

Basic header information is followed by a binary table of contents
into "chunks" that include:

* An ordered list of commit object IDs.
* A 256-entry fanout into that list of OIDs.
* A list of metadata for the commits.
* A list of "large edges" to enable octopus merges.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/technical/graph-format.txt | 88 ++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)
 create mode 100644 Documentation/technical/graph-format.txt

diff --git a/Documentation/technical/graph-format.txt b/Documentation/technical/graph-format.txt
new file mode 100644
index 0000000000..a15e1036d7
--- /dev/null
+++ b/Documentation/technical/graph-format.txt
@@ -0,0 +1,88 @@
+Git commit graph format
+=======================
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+  generation number 1; commits with parents have generation number
+  one more than the maximum generation number of its parents. We
+  reserve zero as special, and can be used to mark a generation
+  number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+  the graph file.
+
+== graph-*.graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks,
+hash lengths and types.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+	4-byte signature:
+	    The signature is: {'C', 'G', 'P', 'H'}
+
+	1-byte version number:
+	    Currently, the only valid version is 1.
+
+	1-byte Object Id Version (1 = SHA-1)
+
+	1-byte Object Id Length (H)
+
+	1-byte number (C) of "chunks"
+
+CHUNK LOOKUP:
+
+	(C + 1) * 12 bytes listing the table of contents for the chunks:
+	    First 4 bytes describe chunk id. Value 0 is a terminating label.
+	    Other 8 bytes provide offset in current file for chunk to start.
+	    (Chunks are ordered contiguously in the file, so you can infer
+		the length using the next chunk position if necessary.)
+
+	The remaining data in the body is described one chunk at a time, and
+	these chunks may be given in any order. Chunks are required unless
+	otherwise specified.
+
+CHUNK DATA:
+
+	OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+	    The ith entry, F[i], stores the number of OIDs with first
+	    byte at most i. Thus F[255] stores the total
+	    number of commits (N).
+
+	OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+	    The OIDs for all commits in the graph.
+
+	Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+	    * The first H bytes are for the OID of the root tree.
+	    * The next 8 bytes are for the int-ids of the first two parents of
+	      the ith commit. Stores value 0xffffffff if no parent in that position.
+	      If there are more than two parents, the second value has its most-
+	      significant bit on and the other bits store an offset into the Large
+	      Edge List chunk.
+	    * The next 8 bytes store the generation number of the commit and the
+	      commit time in seconds since EPOCH. The generation number uses the
+	      higher 30 bits of the first 4 bytes, while the commit time uses the
+	      32 bits of the second 4 bytes, along with the lowest 2 bits of the
+	      lowest byte, storing the 33rd and 34th bit of the commit time.
+
+	[Optional] Large Edge List (ID: {'E', 'D', 'G', 'E'})
+	    This list of 4-byte values store the second through nth parents for
+	    all octoput merges. The second parent value in the commit data is a
+	    negative number pointing into this list. Then iterate through this
+	    list starting at that position until reaching a value with the most-
+	    significant bit on. The other bits correspond to the int-id of the
+	    last parent.
+
+TRAILER:
+
+	H-byte HASH-checksum of all of the above.
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 05/14] packed-graph: implement construct_graph()
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
                   ` (3 preceding siblings ...)
  2018-01-25 14:02 ` [PATCH 04/14] packed-graph: add format document Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 23:21   ` Stefan Beller
  2018-01-26 20:55   ` Junio C Hamano
  2018-01-25 14:02 ` [PATCH 06/14] packed-graph: implement git-graph --write Derrick Stolee
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

Teach Git to write a packed graph file by checking all packed objects
to see if they are commits, then store the file in the given pack
directory.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Makefile       |   1 +
 packed-graph.c | 375 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 packed-graph.h |  20 +++
 3 files changed, 396 insertions(+)
 create mode 100644 packed-graph.c
 create mode 100644 packed-graph.h

diff --git a/Makefile b/Makefile
index d8b0d0457a..59439e13a1 100644
--- a/Makefile
+++ b/Makefile
@@ -841,6 +841,7 @@ LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += oidmap.o
 LIB_OBJS += oidset.o
+LIB_OBJS += packed-graph.o
 LIB_OBJS += packfile.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
diff --git a/packed-graph.c b/packed-graph.c
new file mode 100644
index 0000000000..9be9811667
--- /dev/null
+++ b/packed-graph.c
@@ -0,0 +1,375 @@
+#include "cache.h"
+#include "config.h"
+#include "git-compat-util.h"
+#include "pack.h"
+#include "packfile.h"
+#include "commit.h"
+#include "object.h"
+#include "packed-graph.h"
+
+#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
+
+#define GRAPH_DATA_WIDTH 36
+
+#define GRAPH_VERSION_1 0x1
+#define GRAPH_VERSION GRAPH_VERSION_1
+
+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 20
+#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
+#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
+
+#define GRAPH_LARGE_EDGES_NEEDED 0x80000000
+#define GRAPH_PARENT_MISSING 0x7fffffff
+#define GRAPH_EDGE_LAST_MASK 0x7fffffff
+#define GRAPH_PARENT_NONE 0x70000000
+
+#define GRAPH_LAST_EDGE 0x80000000
+
+char* get_graph_filename_oid(const char *pack_dir,
+				  struct object_id *oid)
+{
+	size_t len;
+	struct strbuf head_path = STRBUF_INIT;
+	strbuf_addstr(&head_path, pack_dir);
+	strbuf_addstr(&head_path, "/graph-");
+	strbuf_addstr(&head_path, oid_to_hex(oid));
+	strbuf_addstr(&head_path, ".graph");
+
+	return strbuf_detach(&head_path, &len);
+}
+
+static void write_graph_chunk_fanout(
+	struct sha1file *f,
+	struct commit **commits, int nr_commits)
+{
+	uint32_t i, count = 0;
+	struct commit **list = commits;
+	struct commit **last = commits + nr_commits;
+
+	/*
+	* Write the first-level table (the list is sorted,
+	* but we use a 256-entry lookup to be able to avoid
+	* having to do eight extra binary search iterations).
+	*/
+	for (i = 0; i < 256; i++) {
+		uint32_t swap_count;
+
+		while (list < last) {
+			if ((*list)->object.oid.hash[0] != i)
+				break;
+			count++;
+			list++;
+		}
+
+		swap_count = htonl(count);
+		sha1write(f, &swap_count, 4);
+	}
+}
+
+static void write_graph_chunk_oids(
+	struct sha1file *f, int hash_len,
+	struct commit **commits, int nr_commits)
+{
+	struct commit **list = commits;
+	uint32_t i;
+	for (i = 0; i < nr_commits; i++) {
+		sha1write(f, (*list)->object.oid.hash, (int)hash_len);
+		list++;
+	}
+}
+
+static int commit_pos(struct commit **commits, int nr_commits, const struct object_id *oid, uint32_t *pos)
+{
+	uint32_t first = 0, last = nr_commits;
+
+	while (first < last) {
+		uint32_t mid = first + (last - first) / 2;
+		struct object_id *current;
+		int cmp;
+
+		current = &(commits[mid]->object.oid);
+		cmp = oidcmp(oid, current);
+		if (!cmp) {
+			*pos = mid;
+			return 1;
+		}
+		if (cmp > 0) {
+			first = mid + 1;
+			continue;
+		}
+		last = mid;
+	}
+
+	*pos = first;
+	return 0;
+}
+
+static void write_graph_chunk_data(
+	struct sha1file *f, int hash_len,
+	struct commit **commits, int nr_commits)
+{
+	struct commit **list = commits;
+	struct commit **last = commits + nr_commits;
+	uint32_t num_large_edges = 0;
+
+	while (list < last) {
+		struct commit_list *parent;
+		uint32_t intId, swapIntId;
+		uint32_t packedDate[2];
+
+		parse_commit(*list);
+		sha1write(f, (*list)->tree->object.oid.hash, hash_len);
+
+		parent = (*list)->parents;
+
+		if (!parent)
+			swapIntId = htonl(GRAPH_PARENT_NONE);
+		else if (commit_pos(commits, nr_commits, &(parent->item->object.oid), &intId))
+			swapIntId = htonl(intId);
+		else
+			swapIntId = htonl(GRAPH_PARENT_MISSING);
+
+		sha1write(f, &swapIntId, 4);
+
+		if (parent)
+			parent = parent->next;
+
+		if (!parent)
+			swapIntId = htonl(GRAPH_PARENT_NONE);
+		else if (parent->next)
+			swapIntId = htonl(GRAPH_LARGE_EDGES_NEEDED | num_large_edges);
+		else if (commit_pos(commits, nr_commits,  &(parent->item->object.oid), &intId))
+			swapIntId = htonl(intId);
+		else
+			swapIntId = htonl(GRAPH_PARENT_MISSING);
+
+		sha1write(f, &swapIntId, 4);
+
+		if (parent && parent->next) {
+			do {
+				num_large_edges++;
+				parent = parent->next;
+			} while (parent);
+		}
+
+		packedDate[0] = htonl((*list)->date >> 32);
+		packedDate[1] = htonl((*list)->date);
+		sha1write(f, packedDate, 8);
+
+		list++;
+	}
+}
+
+static void write_graph_chunk_large_edges(
+	struct sha1file *f,
+	struct commit **commits, int nr_commits)
+{
+	struct commit **list = commits;
+	struct commit **last = commits + nr_commits;
+	struct commit_list *parent;
+
+	while (list < last) {
+		int num_parents = 0;
+		for (parent = (*list)->parents; num_parents < 3 && parent; parent = parent->next)
+			num_parents++;
+
+		if (num_parents <= 2) {
+			list++;
+			continue;
+		}
+
+		for (parent = (*list)->parents; parent; parent = parent->next)
+		{
+			if (parent != (*list)->parents)
+			{
+				uint32_t intId, swapIntId;
+				uint32_t lastEdge = 0;
+
+				if (!parent->next)
+					lastEdge |= GRAPH_LAST_EDGE;
+
+				if (commit_pos(commits, nr_commits,  &(parent->item->object.oid), &intId))
+					swapIntId = htonl(intId | lastEdge);
+				else
+					swapIntId = htonl(GRAPH_PARENT_MISSING | lastEdge);
+
+				sha1write(f, &swapIntId, 4);
+			}
+		}
+
+		list++;
+	}
+}
+
+static int commit_compare(const void *_a, const void *_b)
+{
+	struct object_id *a = *(struct object_id **)_a;
+	struct object_id *b = *(struct object_id **)_b;
+	return oidcmp(a, b);
+}
+
+struct packed_commit_list {
+	struct commit **list;
+	int num;
+	int size;
+};
+
+struct packed_oid_list {
+	struct object_id **list;
+	int num;
+	int size;
+};
+
+static int if_packed_commit_add_to_list(const struct object_id *oid,
+					struct packed_git *pack,
+					uint32_t pos,
+					void *data)
+{
+	struct packed_oid_list *list = (struct packed_oid_list*)data;
+	enum object_type type;
+	unsigned long size;
+	void *inner_data;
+	off_t offset = nth_packed_object_offset(pack, pos);
+	inner_data = unpack_entry(pack, offset, &type, &size);
+
+	if (inner_data)
+		free(inner_data);
+
+	if (type != OBJ_COMMIT)
+		return 0;
+
+	ALLOC_GROW(list->list, list->num + 1, list->size);
+	list->list[list->num] = (struct object_id *)malloc(sizeof(struct object_id));
+	oidcpy(list->list[list->num], oid);
+	(list->num)++;
+
+	return 0;
+}
+
+struct object_id *construct_graph(const char *pack_dir)
+{
+	// Find a list of oids, adding the pointer to a list.
+	struct packed_oid_list oids;
+	struct packed_commit_list commits;
+	struct packed_graph_header hdr;
+	struct sha1file *f;
+	int i, count_distinct = 0;
+	struct strbuf tmp_file = STRBUF_INIT;
+	unsigned char final_hash[GIT_MAX_RAWSZ];
+	char *graph_name;
+	int fd;
+	uint32_t chunk_ids[5];
+	uint64_t chunk_offsets[5];
+	int num_long_edges;
+	struct object_id *f_oid;
+	char *fname;
+	struct commit_list *parent;
+
+	if (!core_graph)
+		return 0;
+
+	oids.num = 0;
+	oids.size = 1024;
+	ALLOC_ARRAY(oids.list, oids.size);
+	for_each_packed_object(if_packed_commit_add_to_list, &oids, 0);
+	QSORT(oids.list, oids.num, commit_compare);
+
+	count_distinct = 1;
+	for (i = 1; i < oids.num; i++)
+	{
+		if (oidcmp(oids.list[i-1], oids.list[i]))
+			count_distinct++;
+	}
+
+	commits.num = 0;
+	commits.size = count_distinct;
+	ALLOC_ARRAY(commits.list, commits.size);
+
+	num_long_edges = 0;
+	for (i = 0; i < oids.num; i++)
+	{
+		int num_parents = 0;
+		if (i > 0 && !oidcmp(oids.list[i-1], oids.list[i]))
+			continue;
+
+		commits.list[commits.num] = lookup_commit(oids.list[i]);
+		parse_commit(commits.list[commits.num]);
+
+		for (parent = commits.list[commits.num]->parents; parent; parent = parent->next) {
+			num_parents++;
+		}
+
+		if (num_parents > 2)
+			num_long_edges += num_parents - 1;
+
+		commits.num++;
+	}
+
+	// Create header (including chunk lengths?!?)
+	strbuf_addstr(&tmp_file, pack_dir);
+	strbuf_addstr(&tmp_file, "/tmp_graph_XXXXXX");
+
+	fd = git_mkstemp_mode(tmp_file.buf, 0444);
+	if (fd < 0)
+		die_errno("unable to create '%s'", tmp_file.buf);
+
+	graph_name = strbuf_detach(&tmp_file, NULL);
+	f = sha1fd(fd, graph_name);
+
+	/* fill header info */
+	hdr.graph_signature = htonl(GRAPH_SIGNATURE);
+	hdr.graph_version = GRAPH_VERSION;
+	hdr.hash_version = GRAPH_OID_VERSION;
+	hdr.hash_len = GRAPH_OID_LEN;
+	hdr.num_chunks = 4;
+
+	/* write header to file */
+	assert(sizeof(hdr) == 8);
+	sha1write(f, &hdr, sizeof(hdr));
+
+	chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
+	chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
+	chunk_ids[2] = GRAPH_CHUNKID_DATA;
+	chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
+	chunk_ids[4] = 0;
+
+	chunk_offsets[0] = sizeof(hdr) + 12 * 5; // Skip header and chunk list
+	chunk_offsets[1] = chunk_offsets[0] + 256 * 4; // fanout table size
+	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.num; // lookup size
+	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.num; // data size
+	chunk_offsets[4] = chunk_offsets[3] + 4 * num_long_edges;
+
+	for (i = 0; i <= hdr.num_chunks; i++) {
+		uint32_t chunk_write[3];
+
+		chunk_write[0] = htonl(chunk_ids[i]);
+		chunk_write[1] = htonl(chunk_offsets[i] >> 32);
+		chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff);
+		sha1write(f, chunk_write, 12);
+	}
+
+	write_graph_chunk_fanout(f, commits.list, commits.num);
+	write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.num);
+	write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.num);
+	write_graph_chunk_large_edges(f, commits.list, commits.num);
+
+	sha1close(f, final_hash, CSUM_CLOSE | CSUM_FSYNC);
+
+	f_oid = (struct object_id *)malloc(sizeof(struct object_id));
+	memcpy(f_oid->hash, final_hash, GIT_MAX_RAWSZ);
+	fname = get_graph_filename_oid(pack_dir, f_oid);
+
+	if (rename(graph_name, fname))
+		die("failed to rename %s to %s", graph_name, fname);
+
+	free(oids.list);
+	oids.size = 0;
+	oids.num = 0;
+
+	return f_oid;
+}
diff --git a/packed-graph.h b/packed-graph.h
new file mode 100644
index 0000000000..d4e10fb612
--- /dev/null
+++ b/packed-graph.h
@@ -0,0 +1,20 @@
+#ifndef PACKED_GRAPH_H
+#define PACKED_GRAPH_H
+
+#include "git-compat-util.h"
+#include "commit.h"
+
+extern char* get_graph_filename_oid(const char *pack_dir,
+				    struct object_id *oid);
+
+struct packed_graph_header {
+	uint32_t graph_signature;
+	unsigned char graph_version;
+	unsigned char hash_version;
+	unsigned char hash_len;
+	unsigned char num_chunks;
+};
+
+extern struct object_id *construct_graph(const char *pack_dir);
+
+#endif
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 06/14] packed-graph: implement git-graph --write
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
                   ` (4 preceding siblings ...)
  2018-01-25 14:02 ` [PATCH 05/14] packed-graph: implement construct_graph() Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 23:28   ` Stefan Beller
  2018-01-25 14:02 ` [PATCH 07/14] packed-graph: implement git-graph --read Derrick Stolee
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

Teach git-graph to write graph files. Create new test script to verify
this command succeeds without failure.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-graph.txt | 26 ++++++++++++++
 builtin/graph.c             | 37 ++++++++++++++++++--
 t/t5319-graph.sh            | 83 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 143 insertions(+), 3 deletions(-)
 create mode 100755 t/t5319-graph.sh

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
index de5a3c07e6..be6bc38814 100644
--- a/Documentation/git-graph.txt
+++ b/Documentation/git-graph.txt
@@ -5,3 +5,29 @@ NAME
 ----
 git-graph - Write and verify Git commit graphs (.graph files)
 
+
+SYNOPSIS
+--------
+[verse]
+'git graph' --write <options> [--pack-dir <pack_dir>]
+
+EXAMPLES
+--------
+
+* Write a graph file for the packed commits in your local .git folder.
++
+------------------------------------------------
+$ git midx --write
+------------------------------------------------
+
+CONFIGURATION
+-------------
+
+core.graph::
+	The graph command will fail if core.graph is false.
+	Also, the written graph files will be ignored by other commands
+	unless core.graph is true.
+
+GIT
+---
+Part of the linkgit:git[1] suite
\ No newline at end of file
diff --git a/builtin/graph.c b/builtin/graph.c
index a902dc8646..09f5552338 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -6,31 +6,62 @@
 #include "lockfile.h"
 #include "packfile.h"
 #include "parse-options.h"
+#include "packed-graph.h"
 
 static char const * const builtin_graph_usage[] ={
 	N_("git graph [--pack-dir <packdir>]"),
+	N_("git graph --write [--pack-dir <packdir>]"),
 	NULL
 };
 
 static struct opts_graph {
 	const char *pack_dir;
+	int write;
 } opts;
 
+static int graph_write(void)
+{
+	struct object_id *graph_id = construct_graph(opts.pack_dir);
+
+	if (graph_id)
+		printf("%s\n", oid_to_hex(graph_id));
+
+	free(graph_id);
+	return 0;
+}
+
 int cmd_graph(int argc, const char **argv, const char *prefix)
 {
 	static struct option builtin_graph_options[] = {
 		{ OPTION_STRING, 'p', "pack-dir", &opts.pack_dir,
 			N_("dir"),
 			N_("The pack directory to store the graph") },
+		OPT_BOOL('w', "write", &opts.write,
+			N_("write graph file")),
 		OPT_END(),
 	};
 
-	if (!core_graph)
-		die("core.graph is false");
-
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(builtin_graph_usage, builtin_graph_options);
 
+	git_config(git_default_config, NULL);
+	if (!core_graph)
+		die("git-graph requires core.graph=true.");
+
+	argc = parse_options(argc, argv, prefix,
+			     builtin_graph_options,
+			     builtin_graph_usage, 0);
+
+	if (!opts.pack_dir) {
+		struct strbuf path = STRBUF_INIT;
+		strbuf_addstr(&path, get_object_directory());
+		strbuf_addstr(&path, "/pack");
+		opts.pack_dir = strbuf_detach(&path, NULL);
+	}
+
+	if (opts.write)
+		return graph_write();
+
 	return 0;
 }
 
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
new file mode 100755
index 0000000000..52e979dfd3
--- /dev/null
+++ b/t/t5319-graph.sh
@@ -0,0 +1,83 @@
+#!/bin/sh
+
+test_description='packed graph'
+. ./test-lib.sh
+
+test_expect_success \
+    'setup full repo' \
+    'rm -rf .git &&
+     mkdir full &&
+     cd full &&
+     git init &&
+     git config core.graph true &&
+     git config pack.threads 1 &&
+     packdir=".git/objects/pack"'
+
+test_expect_success \
+    'write graph with no packs' \
+    'git graph --write --pack-dir .'
+
+test_expect_success \
+    'create commits and repack' \
+    'for i in $(test_seq 5)
+     do
+        echo $i >$i.txt &&
+        git add $i.txt &&
+        git commit -m "commit $i" &&
+        git branch commits/$i
+     done &&
+     git repack'
+
+test_expect_success \
+    'write graph' \
+    'graph1=$(git graph --write) &&
+     test_path_is_file ${packdir}/graph-${graph1}.graph'
+
+test_expect_success \
+    'Add more commits' \
+    'git reset --hard commits/3 &&
+     for i in $(test_seq 6 10)
+     do
+        echo $i >$i.txt &&
+        git add $i.txt &&
+        git commit -m "commit $i" &&
+        git branch commits/$i
+     done &&
+     git reset --hard commits/7 &&
+     for i in $(test_seq 11 15)
+     do
+        echo $i >$i.txt &&
+        git add $i.txt &&
+        git commit -m "commit $i" &&
+        git branch commits/$i
+     done &&
+     git reset --hard commits/7 &&
+     git merge commits/4 &&
+     git branch merge/1 &&
+     git reset --hard commits/8 &&
+     git merge commits/11 &&
+     git branch merge/2 &&
+     git reset --hard commits/9 &&
+     git merge commits/5 commits/13 &&
+     git repack'
+
+test_expect_success \
+    'write graph with merges' \
+    'graph2=$(git graph --write) &&
+     test_path_is_file ${packdir}/graph-${graph2}.graph'
+
+test_expect_success \
+    'setup bare repo' \
+    'cd .. &&
+     git clone --bare full bare &&
+     cd bare &&
+     git config core.graph true &&
+     git config pack.threads 1 &&
+     baredir="objects/pack"'
+
+test_expect_success \
+    'write graph in bare repo' \
+    'graphbare=$(git graph --write) &&
+     test_path_is_file ${baredir}/graph-${graphbare}.graph'
+
+test_done
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 07/14] packed-graph: implement git-graph --read
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
                   ` (5 preceding siblings ...)
  2018-01-25 14:02 ` [PATCH 06/14] packed-graph: implement git-graph --write Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 14:02 ` [PATCH 08/14] graph: implement git-graph --update-head Derrick Stolee
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

Teach git-graph to read packed graph files and summarize their contents.

Use the --read option to verify the contents of a graph file in the
graph tests.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-graph.txt |   7 +++
 builtin/graph.c             |  54 ++++++++++++++++
 packed-graph.c              | 147 +++++++++++++++++++++++++++++++++++++++++++-
 packed-graph.h              |  25 ++++++++
 t/t5319-graph.sh            |  50 +++++++++------
 5 files changed, 260 insertions(+), 23 deletions(-)

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
index be6bc38814..0939c3f1be 100644
--- a/Documentation/git-graph.txt
+++ b/Documentation/git-graph.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git graph' --write <options> [--pack-dir <pack_dir>]
+'git graph' --read <options> [--pack-dir <pack_dir>]
 
 EXAMPLES
 --------
@@ -20,6 +21,12 @@ EXAMPLES
 $ git midx --write
 ------------------------------------------------
 
+* Read basic information from a graph file.
++
+------------------------------------------------
+$ git midx --read --graph-id=<oid>
+------------------------------------------------
+
 CONFIGURATION
 -------------
 
diff --git a/builtin/graph.c b/builtin/graph.c
index 09f5552338..bc66722924 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -10,15 +10,58 @@
 
 static char const * const builtin_graph_usage[] ={
 	N_("git graph [--pack-dir <packdir>]"),
+	N_("git graph --read [--graph-id=<oid>]"),
 	N_("git graph --write [--pack-dir <packdir>]"),
 	NULL
 };
 
 static struct opts_graph {
 	const char *pack_dir;
+	int read;
+	const char *graph_id;
 	int write;
 } opts;
 
+static int graph_read(void)
+{
+	struct object_id graph_oid;
+	struct packed_graph *graph = 0;
+	const char *graph_file;
+
+	if (opts.graph_id && strlen(opts.graph_id) == GIT_MAX_HEXSZ)
+		get_oid_hex(opts.graph_id, &graph_oid);
+	else
+		die("no graph id specified");
+
+	graph_file = get_graph_filename_oid(opts.pack_dir, &graph_oid);
+	graph = load_packed_graph_one(graph_file, opts.pack_dir);
+
+	if (!graph)
+		die("graph file %s does not exist.\n", graph_file);
+
+	printf("header: %08x %02x %02x %02x %02x\n",
+		ntohl(graph->hdr->graph_signature),
+		graph->hdr->graph_version,
+		graph->hdr->hash_version,
+		graph->hdr->hash_len,
+		graph->hdr->num_chunks);
+	printf("num_commits: %u\n", graph->num_commits);
+	printf("chunks:");
+
+	if (graph->chunk_oid_fanout)
+		printf(" oid_fanout");
+	if (graph->chunk_oid_lookup)
+		printf(" oid_lookup");
+	if (graph->chunk_commit_data)
+		printf(" commit_metadata");
+	if (graph->chunk_large_edges)
+		printf(" large_edges");
+	printf("\n");
+
+	printf("pack_dir: %s\n", graph->pack_dir);
+	return 0;
+}
+
 static int graph_write(void)
 {
 	struct object_id *graph_id = construct_graph(opts.pack_dir);
@@ -36,8 +79,14 @@ int cmd_graph(int argc, const char **argv, const char *prefix)
 		{ OPTION_STRING, 'p', "pack-dir", &opts.pack_dir,
 			N_("dir"),
 			N_("The pack directory to store the graph") },
+		OPT_BOOL('r', "read", &opts.read,
+			N_("read graph file")),
 		OPT_BOOL('w', "write", &opts.write,
 			N_("write graph file")),
+		{ OPTION_STRING, 'M', "graph-id", &opts.graph_id,
+			N_("oid"),
+			N_("An OID for a specific graph file in the pack-dir."),
+			PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
 		OPT_END(),
 	};
 
@@ -52,6 +101,9 @@ int cmd_graph(int argc, const char **argv, const char *prefix)
 			     builtin_graph_options,
 			     builtin_graph_usage, 0);
 
+	if (opts.write + opts.read > 1)
+		usage_with_options(builtin_graph_usage, builtin_graph_options);
+
 	if (!opts.pack_dir) {
 		struct strbuf path = STRBUF_INIT;
 		strbuf_addstr(&path, get_object_directory());
@@ -59,6 +111,8 @@ int cmd_graph(int argc, const char **argv, const char *prefix)
 		opts.pack_dir = strbuf_detach(&path, NULL);
 	}
 
+	if (opts.read)
+		return graph_read();
 	if (opts.write)
 		return graph_write();
 
diff --git a/packed-graph.c b/packed-graph.c
index 9be9811667..eaa656becb 100644
--- a/packed-graph.c
+++ b/packed-graph.c
@@ -30,6 +30,11 @@
 
 #define GRAPH_LAST_EDGE 0x80000000
 
+#define GRAPH_FANOUT_SIZE (4*256)
+#define GRAPH_CHUNKLOOKUP_SIZE (5 * 12)
+#define GRAPH_MIN_SIZE (GRAPH_CHUNKLOOKUP_SIZE + GRAPH_FANOUT_SIZE + \
+			GRAPH_OID_LEN + sizeof(struct packed_graph_header))
+
 char* get_graph_filename_oid(const char *pack_dir,
 				  struct object_id *oid)
 {
@@ -43,6 +48,142 @@ char* get_graph_filename_oid(const char *pack_dir,
 	return strbuf_detach(&head_path, &len);
 }
 
+static struct packed_graph *alloc_packed_graph(int extra)
+{
+	struct packed_graph *g = xmalloc(st_add(sizeof(*g), extra));
+	memset(g, 0, sizeof(*g));
+	g->graph_fd = -1;
+
+	return g;
+}
+
+int close_graph(struct packed_graph *g)
+{
+	if (g->graph_fd < 0)
+		return 0;
+
+	munmap((void *)g->data, g->data_len);
+	g->data = 0;
+
+	close(g->graph_fd);
+	g->graph_fd = -1;
+
+	return 1;
+}
+
+static void free_packed_graph(struct packed_graph **g)
+{
+	if (!g || !*g)
+		return;
+
+	close_graph(*g);
+
+	free(*g);
+	*g = NULL;
+}
+
+struct packed_graph *load_packed_graph_one(const char *graph_file, const char *pack_dir)
+{
+	void *graph_map;
+	const unsigned char *data;
+	struct packed_graph_header *hdr;
+	size_t graph_size;
+	struct stat st;
+	uint32_t i;
+	struct packed_graph *graph;
+	int fd = git_open(graph_file);
+	uint64_t last_chunk_offset;
+	uint32_t last_chunk_id;
+
+	if (fd < 0)
+		return 0;
+	if (fstat(fd, &st)) {
+		close(fd);
+		return 0;
+	}
+	graph_size = xsize_t(st.st_size);
+
+	if (graph_size < GRAPH_MIN_SIZE) {
+		close(fd);
+		die("graph file %s is too small", graph_file);
+	}
+	graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	data = (const unsigned char *)graph_map;
+
+	hdr = graph_map;
+	if (ntohl(hdr->graph_signature) != GRAPH_SIGNATURE) {
+		uint32_t signature = ntohl(hdr->graph_signature);
+		munmap(graph_map, graph_size);
+		close(fd);
+		die("graph signature %X does not match signature %X",
+			signature, GRAPH_SIGNATURE);
+	}
+	if (hdr->graph_version != GRAPH_VERSION) {
+		unsigned char version = hdr->graph_version;
+		munmap(graph_map, graph_size);
+		close(fd);
+		die("graph version %X does not match version %X",
+			version, GRAPH_VERSION);
+	}
+
+	graph = alloc_packed_graph(strlen(pack_dir) + 1);
+
+	graph->hdr = hdr;
+	graph->graph_fd = fd;
+	graph->data = graph_map;
+	graph->data_len = graph_size;
+
+	last_chunk_id = 0;
+	last_chunk_offset = (uint64_t)sizeof(*hdr);
+	for (i = 0; i < hdr->num_chunks; i++) {
+		uint32_t chunk_id = ntohl(*(uint32_t*)(data + sizeof(*hdr) + 12 * i));
+		uint64_t chunk_offset1 = ntohl(*(uint32_t*)(data + sizeof(*hdr) + 12 * i + 4));
+		uint32_t chunk_offset2 = ntohl(*(uint32_t*)(data + sizeof(*hdr) + 12 * i + 8));
+		uint64_t chunk_offset = (chunk_offset1 << 32) | chunk_offset2;
+
+		if (chunk_offset > graph_size - GIT_MAX_RAWSZ)
+			die("improper chunk offset %08x%08x", (uint32_t)(chunk_offset >> 32),
+			    (uint32_t)chunk_offset);
+
+		switch (chunk_id) {
+			case GRAPH_CHUNKID_OIDFANOUT:
+				graph->chunk_oid_fanout = data + chunk_offset;
+				break;
+
+			case GRAPH_CHUNKID_OIDLOOKUP:
+				graph->chunk_oid_lookup = data + chunk_offset;
+				break;
+
+			case GRAPH_CHUNKID_DATA:
+				graph->chunk_commit_data = data + chunk_offset;
+				break;
+
+			case GRAPH_CHUNKID_LARGEEDGES:
+				graph->chunk_large_edges = data + chunk_offset;
+				break;
+
+			case 0:
+				break;
+
+			default:
+				free_packed_graph(&graph);
+				die("unrecognized graph chunk id: %08x", chunk_id);
+		}
+
+		if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP)
+		{
+			graph->num_commits = (chunk_offset - last_chunk_offset)
+					     / hdr->hash_len;
+		}
+
+		last_chunk_id = chunk_id;
+		last_chunk_offset = chunk_offset;
+	}
+
+	strcpy(graph->pack_dir, pack_dir);
+	return graph;
+}
+
 static void write_graph_chunk_fanout(
 	struct sha1file *f,
 	struct commit **commits, int nr_commits)
@@ -338,8 +479,8 @@ struct object_id *construct_graph(const char *pack_dir)
 	chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES;
 	chunk_ids[4] = 0;
 
-	chunk_offsets[0] = sizeof(hdr) + 12 * 5; // Skip header and chunk list
-	chunk_offsets[1] = chunk_offsets[0] + 256 * 4; // fanout table size
+	chunk_offsets[0] = sizeof(hdr) + GRAPH_CHUNKLOOKUP_SIZE;
+	chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
 	chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.num; // lookup size
 	chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.num; // data size
 	chunk_offsets[4] = chunk_offsets[3] + 4 * num_long_edges;
@@ -361,7 +502,7 @@ struct object_id *construct_graph(const char *pack_dir)
 	sha1close(f, final_hash, CSUM_CLOSE | CSUM_FSYNC);
 
 	f_oid = (struct object_id *)malloc(sizeof(struct object_id));
-	memcpy(f_oid->hash, final_hash, GIT_MAX_RAWSZ);
+	hashcpy(f_oid->hash, final_hash);
 	fname = get_graph_filename_oid(pack_dir, f_oid);
 
 	if (rename(graph_name, fname))
diff --git a/packed-graph.h b/packed-graph.h
index d4e10fb612..1a7eaa2a46 100644
--- a/packed-graph.h
+++ b/packed-graph.h
@@ -15,6 +15,31 @@ struct packed_graph_header {
 	unsigned char num_chunks;
 };
 
+extern struct packed_graph {
+	int graph_fd;
+
+	const unsigned char *data;
+	size_t data_len;
+
+	const struct packed_graph_header *hdr;
+
+	struct object_id oid;
+
+	uint32_t num_commits;
+
+	const unsigned char *chunk_oid_fanout;
+	const unsigned char *chunk_oid_lookup;
+	const unsigned char *chunk_commit_data;
+	const unsigned char *chunk_large_edges;
+
+	/* something like ".git/objects/pack" */
+	char pack_dir[FLEX_ARRAY]; /* more */
+} *packed_graph;
+
+extern int close_graph(struct packed_graph *g);
+
+extern struct packed_graph *load_packed_graph_one(const char *graph_file, const char *pack_dir);
+
 extern struct object_id *construct_graph(const char *pack_dir);
 
 #endif
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
index 52e979dfd3..4975f65dee 100755
--- a/t/t5319-graph.sh
+++ b/t/t5319-graph.sh
@@ -3,8 +3,7 @@
 test_description='packed graph'
 . ./test-lib.sh
 
-test_expect_success \
-    'setup full repo' \
+test_expect_success 'setup full repo' \
     'rm -rf .git &&
      mkdir full &&
      cd full &&
@@ -13,12 +12,10 @@ test_expect_success \
      git config pack.threads 1 &&
      packdir=".git/objects/pack"'
 
-test_expect_success \
-    'write graph with no packs' \
+test_expect_success 'write graph with no packs' \
     'git graph --write --pack-dir .'
 
-test_expect_success \
-    'create commits and repack' \
+test_expect_success 'create commits and repack' \
     'for i in $(test_seq 5)
      do
         echo $i >$i.txt &&
@@ -28,13 +25,23 @@ test_expect_success \
      done &&
      git repack'
 
-test_expect_success \
-    'write graph' \
+_graph_read_expect() {
+    cat >expect <<- EOF
+header: 43475048 01 01 14 04
+num_commits: $1
+chunks: oid_fanout oid_lookup commit_metadata large_edges
+pack_dir: $2
+EOF
+}
+
+test_expect_success 'write graph' \
     'graph1=$(git graph --write) &&
-     test_path_is_file ${packdir}/graph-${graph1}.graph'
+     test_path_is_file ${packdir}/graph-${graph1}.graph &&
+     git graph --read --graph-id=${graph1} >output &&
+     _graph_read_expect "5" "${packdir}" &&
+     cmp expect output'
 
-test_expect_success \
-    'Add more commits' \
+test_expect_success 'Add more commits' \
     'git reset --hard commits/3 &&
      for i in $(test_seq 6 10)
      do
@@ -61,23 +68,26 @@ test_expect_success \
      git merge commits/5 commits/13 &&
      git repack'
 
-test_expect_success \
-    'write graph with merges' \
+test_expect_success 'write graph with merges' \
     'graph2=$(git graph --write) &&
-     test_path_is_file ${packdir}/graph-${graph2}.graph'
+     test_path_is_file ${packdir}/graph-${graph2}.graph &&
+     git graph --read --graph-id=${graph2} >output &&
+     _graph_read_expect "18" "${packdir}" &&
+     cmp expect output'
 
-test_expect_success \
-    'setup bare repo' \
+test_expect_success 'setup bare repo' \
     'cd .. &&
      git clone --bare full bare &&
      cd bare &&
      git config core.graph true &&
      git config pack.threads 1 &&
-     baredir="objects/pack"'
+     baredir="./objects/pack"'
 
-test_expect_success \
-    'write graph in bare repo' \
+test_expect_success 'write graph in bare repo' \
     'graphbare=$(git graph --write) &&
-     test_path_is_file ${baredir}/graph-${graphbare}.graph'
+     test_path_is_file ${baredir}/graph-${graphbare}.graph &&
+     git graph --read --graph-id=${graphbare} >output &&
+     _graph_read_expect "18" "${baredir}" &&
+     cmp expect output'
 
 test_done
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 08/14] graph: implement git-graph --update-head
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
                   ` (6 preceding siblings ...)
  2018-01-25 14:02 ` [PATCH 07/14] packed-graph: implement git-graph --read Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 14:02 ` [PATCH 09/14] packed-graph: implement git-graph --clear Derrick Stolee
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

It is possible to have multiple packed graph files in a pack directory,
but only one is important at a time. Use a 'graph_head' file to point
to the important file. Teach git-graph to write 'graph_head' upon
writing a new packed graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-graph.txt | 38 ++++++++++++++++++++++++++++++++++++--
 builtin/graph.c             | 38 +++++++++++++++++++++++++++++++++++---
 packed-graph.c              | 25 +++++++++++++++++++++++++
 packed-graph.h              |  1 +
 t/t5319-graph.sh            | 12 ++++++++++--
 5 files changed, 107 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
index 0939c3f1be..ac20aa67a9 100644
--- a/Documentation/git-graph.txt
+++ b/Documentation/git-graph.txt
@@ -12,19 +12,53 @@ SYNOPSIS
 'git graph' --write <options> [--pack-dir <pack_dir>]
 'git graph' --read <options> [--pack-dir <pack_dir>]
 
+OPTIONS
+-------
+--pack-dir::
+	Use given directory for the location of packfiles, graph-head,
+	and graph files.
+
+--read::
+	Read a graph file given by the graph-head file and output basic
+	details about the graph file. (Cannot be combined with --write.)
+
+--graph-id::
+	When used with --read, consider the graph file graph-<oid>.graph.
+
+--write::
+	Write a new graph file to the pack directory. (Cannot be combined
+	with --read.)
+
+--update-head::
+	When used with --write, update the graph-head file to point to
+	the written graph file.
+
 EXAMPLES
 --------
 
+* Output the OID of the graph file pointed to by <dir>/graph-head.
++
+------------------------------------------------
+$ git graph --pack-dir=<dir>
+------------------------------------------------
+
 * Write a graph file for the packed commits in your local .git folder.
 +
 ------------------------------------------------
-$ git midx --write
+$ git graph --write
+------------------------------------------------
+
+* Write a graph file for the packed commits in your local .git folder,
+* and update graph-head.
++
+------------------------------------------------
+$ git graph --write --update-head
 ------------------------------------------------
 
 * Read basic information from a graph file.
 +
 ------------------------------------------------
-$ git midx --read --graph-id=<oid>
+$ git graph --read --graph-id=<oid>
 ------------------------------------------------
 
 CONFIGURATION
diff --git a/builtin/graph.c b/builtin/graph.c
index bc66722924..0760d99f43 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -11,7 +11,7 @@
 static char const * const builtin_graph_usage[] ={
 	N_("git graph [--pack-dir <packdir>]"),
 	N_("git graph --read [--graph-id=<oid>]"),
-	N_("git graph --write [--pack-dir <packdir>]"),
+	N_("git graph --write [--pack-dir <packdir>] [--update-head]"),
 	NULL
 };
 
@@ -20,6 +20,9 @@ static struct opts_graph {
 	int read;
 	const char *graph_id;
 	int write;
+	int update_head;
+	int has_existing;
+	struct object_id old_graph_oid;
 } opts;
 
 static int graph_read(void)
@@ -30,8 +33,8 @@ static int graph_read(void)
 
 	if (opts.graph_id && strlen(opts.graph_id) == GIT_MAX_HEXSZ)
 		get_oid_hex(opts.graph_id, &graph_oid);
-	else
-		die("no graph id specified");
+	else if (!get_graph_head_oid(opts.pack_dir, &graph_oid))
+		die("no graph-head exists.");
 
 	graph_file = get_graph_filename_oid(opts.pack_dir, &graph_oid);
 	graph = load_packed_graph_one(graph_file, opts.pack_dir);
@@ -62,10 +65,33 @@ static int graph_read(void)
 	return 0;
 }
 
+static void update_head_file(const char *pack_dir, const struct object_id *graph_id)
+{
+	struct strbuf head_path = STRBUF_INIT;
+	int fd;
+	struct lock_file lk = LOCK_INIT;
+
+	strbuf_addstr(&head_path, pack_dir);
+	strbuf_addstr(&head_path, "/");
+	strbuf_addstr(&head_path, "graph-head");
+
+	fd = hold_lock_file_for_update(&lk, head_path.buf, LOCK_DIE_ON_ERROR);
+	strbuf_release(&head_path);
+
+	if (fd < 0)
+		die_errno("unable to open graph-head");
+
+	write_in_full(fd, oid_to_hex(graph_id), GIT_MAX_HEXSZ);
+	commit_lock_file(&lk);
+}
+
 static int graph_write(void)
 {
 	struct object_id *graph_id = construct_graph(opts.pack_dir);
 
+	if (opts.update_head)
+		update_head_file(opts.pack_dir, graph_id);
+
 	if (graph_id)
 		printf("%s\n", oid_to_hex(graph_id));
 
@@ -83,6 +109,8 @@ int cmd_graph(int argc, const char **argv, const char *prefix)
 			N_("read graph file")),
 		OPT_BOOL('w', "write", &opts.write,
 			N_("write graph file")),
+		OPT_BOOL('u', "update-head", &opts.update_head,
+			N_("update graph-head to written graph file")),
 		{ OPTION_STRING, 'M', "graph-id", &opts.graph_id,
 			N_("oid"),
 			N_("An OID for a specific graph file in the pack-dir."),
@@ -111,11 +139,15 @@ int cmd_graph(int argc, const char **argv, const char *prefix)
 		opts.pack_dir = strbuf_detach(&path, NULL);
 	}
 
+	opts.has_existing = !!get_graph_head_oid(opts.pack_dir, &opts.old_graph_oid);
+
 	if (opts.read)
 		return graph_read();
 	if (opts.write)
 		return graph_write();
 
+	if (opts.has_existing)
+		printf("%s\n", oid_to_hex(&opts.old_graph_oid));
 	return 0;
 }
 
diff --git a/packed-graph.c b/packed-graph.c
index eaa656becb..5723f163ae 100644
--- a/packed-graph.c
+++ b/packed-graph.c
@@ -35,6 +35,31 @@
 #define GRAPH_MIN_SIZE (GRAPH_CHUNKLOOKUP_SIZE + GRAPH_FANOUT_SIZE + \
 			GRAPH_OID_LEN + sizeof(struct packed_graph_header))
 
+struct object_id *get_graph_head_oid(const char *pack_dir, struct object_id *oid)
+{
+	struct strbuf head_filename = STRBUF_INIT;
+	char oid_hex[GIT_MAX_HEXSZ + 1];
+	FILE *f;
+
+	strbuf_addstr(&head_filename, pack_dir);
+	strbuf_addstr(&head_filename, "/graph-head");
+
+	f = fopen(head_filename.buf, "r");
+	strbuf_release(&head_filename);
+
+	if (!f)
+		return 0;
+
+	if (!fgets(oid_hex, sizeof(oid_hex), f))
+		die("failed to read graph-head");
+
+	fclose(f);
+
+	if (get_oid_hex(oid_hex, oid))
+		return 0;
+	return oid;
+}
+
 char* get_graph_filename_oid(const char *pack_dir,
 				  struct object_id *oid)
 {
diff --git a/packed-graph.h b/packed-graph.h
index 1a7eaa2a46..ad561863c8 100644
--- a/packed-graph.h
+++ b/packed-graph.h
@@ -4,6 +4,7 @@
 #include "git-compat-util.h"
 #include "commit.h"
 
+extern struct object_id *get_graph_head_oid(const char *pack_dir, struct object_id *oid);
 extern char* get_graph_filename_oid(const char *pack_dir,
 				    struct object_id *oid);
 
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
index 4975f65dee..3919a3ad73 100755
--- a/t/t5319-graph.sh
+++ b/t/t5319-graph.sh
@@ -13,7 +13,8 @@ test_expect_success 'setup full repo' \
      packdir=".git/objects/pack"'
 
 test_expect_success 'write graph with no packs' \
-    'git graph --write --pack-dir .'
+    'git graph --write --pack-dir . &&
+     test_path_is_missing graph-head'
 
 test_expect_success 'create commits and repack' \
     'for i in $(test_seq 5)
@@ -37,6 +38,7 @@ EOF
 test_expect_success 'write graph' \
     'graph1=$(git graph --write) &&
      test_path_is_file ${packdir}/graph-${graph1}.graph &&
+     test_path_is_missing ${packdir}/graph-head &&
      git graph --read --graph-id=${graph1} >output &&
      _graph_read_expect "5" "${packdir}" &&
      cmp expect output'
@@ -69,8 +71,11 @@ test_expect_success 'Add more commits' \
      git repack'
 
 test_expect_success 'write graph with merges' \
-    'graph2=$(git graph --write) &&
+    'graph2=$(git graph --write --update-head) &&
      test_path_is_file ${packdir}/graph-${graph2}.graph &&
+     test_path_is_file ${packdir}/graph-head &&
+     echo ${graph2} >expect &&
+     cmp -n 40 expect ${packdir}/graph-head &&
      git graph --read --graph-id=${graph2} >output &&
      _graph_read_expect "18" "${packdir}" &&
      cmp expect output'
@@ -86,6 +91,9 @@ test_expect_success 'setup bare repo' \
 test_expect_success 'write graph in bare repo' \
     'graphbare=$(git graph --write) &&
      test_path_is_file ${baredir}/graph-${graphbare}.graph &&
+     test_path_is_file ${baredir}/graph-head &&
+     echo ${graphbare} >expect &&
+     cmp -n 40 expect ${baredir}/graph-head &&
      git graph --read --graph-id=${graphbare} >output &&
      _graph_read_expect "18" "${baredir}" &&
      cmp expect output'
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 09/14] packed-graph: implement git-graph --clear
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
                   ` (7 preceding siblings ...)
  2018-01-25 14:02 ` [PATCH 08/14] graph: implement git-graph --update-head Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 23:35   ` Stefan Beller
  2018-01-25 14:02 ` [PATCH 10/14] packed-graph: teach git-graph --delete-expired Derrick Stolee
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

Teach Git to delete the current 'graph_head' file and the packed graph
it references. This is a good safety valve if somehow the file is
corrupted and needs to be recalculated. Since the packed graph is a
summary of contents already in the ODB, it can be regenerated.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-graph.txt | 16 ++++++++++++++--
 builtin/graph.c             | 31 ++++++++++++++++++++++++++++++-
 t/t5319-graph.sh            |  7 ++++++-
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
index ac20aa67a9..f690699570 100644
--- a/Documentation/git-graph.txt
+++ b/Documentation/git-graph.txt
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git graph' --write <options> [--pack-dir <pack_dir>]
 'git graph' --read <options> [--pack-dir <pack_dir>]
+'git graph' --clear [--pack-dir <pack_dir>]
 
 OPTIONS
 -------
@@ -18,16 +19,21 @@ OPTIONS
 	Use given directory for the location of packfiles, graph-head,
 	and graph files.
 
+--clear::
+	Delete the graph-head file and the graph file it references.
+	(Cannot be combined with --read or --write.)
+
 --read::
 	Read a graph file given by the graph-head file and output basic
-	details about the graph file. (Cannot be combined with --write.)
+	details about the graph file. (Cannot be combined with --clear
+	or --write.)
 
 --graph-id::
 	When used with --read, consider the graph file graph-<oid>.graph.
 
 --write::
 	Write a new graph file to the pack directory. (Cannot be combined
-	with --read.)
+	with --clear or --read.)
 
 --update-head::
 	When used with --write, update the graph-head file to point to
@@ -61,6 +67,12 @@ $ git graph --write --update-head
 $ git graph --read --graph-id=<oid>
 ------------------------------------------------
 
+* Delete <dir>/graph-head and the file it references.
++
+------------------------------------------------
+$ git graph --clear --pack-dir=<dir>
+------------------------------------------------
+
 CONFIGURATION
 -------------
 
diff --git a/builtin/graph.c b/builtin/graph.c
index 0760d99f43..ac15febc46 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -10,6 +10,7 @@
 
 static char const * const builtin_graph_usage[] ={
 	N_("git graph [--pack-dir <packdir>]"),
+	N_("git graph --clear [--pack-dir <packdir>]"),
 	N_("git graph --read [--graph-id=<oid>]"),
 	N_("git graph --write [--pack-dir <packdir>] [--update-head]"),
 	NULL
@@ -17,6 +18,7 @@ static char const * const builtin_graph_usage[] ={
 
 static struct opts_graph {
 	const char *pack_dir;
+	int clear;
 	int read;
 	const char *graph_id;
 	int write;
@@ -25,6 +27,29 @@ static struct opts_graph {
 	struct object_id old_graph_oid;
 } opts;
 
+static int graph_clear(void)
+{
+	struct strbuf head_path = STRBUF_INIT;
+	char *old_path;
+
+	if (!opts.has_existing)
+		return 0;
+
+	strbuf_addstr(&head_path, opts.pack_dir);
+	strbuf_addstr(&head_path, "/");
+	strbuf_addstr(&head_path, "graph-head");
+	if (remove_path(head_path.buf))
+		die("failed to remove path %s", head_path.buf);
+	strbuf_release(&head_path);
+
+	old_path = get_graph_filename_oid(opts.pack_dir, &opts.old_graph_oid);
+	if (remove_path(old_path))
+		die("failed to remove path %s", old_path);
+	free(old_path);
+
+	return 0;
+}
+
 static int graph_read(void)
 {
 	struct object_id graph_oid;
@@ -105,6 +130,8 @@ int cmd_graph(int argc, const char **argv, const char *prefix)
 		{ OPTION_STRING, 'p', "pack-dir", &opts.pack_dir,
 			N_("dir"),
 			N_("The pack directory to store the graph") },
+		OPT_BOOL('c', "clear", &opts.clear,
+			N_("clear graph file and graph-head")),
 		OPT_BOOL('r', "read", &opts.read,
 			N_("read graph file")),
 		OPT_BOOL('w', "write", &opts.write,
@@ -129,7 +156,7 @@ int cmd_graph(int argc, const char **argv, const char *prefix)
 			     builtin_graph_options,
 			     builtin_graph_usage, 0);
 
-	if (opts.write + opts.read > 1)
+	if (opts.write + opts.read + opts.clear > 1)
 		usage_with_options(builtin_graph_usage, builtin_graph_options);
 
 	if (!opts.pack_dir) {
@@ -141,6 +168,8 @@ int cmd_graph(int argc, const char **argv, const char *prefix)
 
 	opts.has_existing = !!get_graph_head_oid(opts.pack_dir, &opts.old_graph_oid);
 
+	if (opts.clear)
+		return graph_clear();
 	if (opts.read)
 		return graph_read();
 	if (opts.write)
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
index 3919a3ad73..311fb9dd67 100755
--- a/t/t5319-graph.sh
+++ b/t/t5319-graph.sh
@@ -80,6 +80,11 @@ test_expect_success 'write graph with merges' \
      _graph_read_expect "18" "${packdir}" &&
      cmp expect output'
 
+test_expect_success 'clear graph' \
+    'git graph --clear &&
+     test_path_is_missing ${packdir}/graph-${graph2}.graph &&
+     test_path_is_missing ${packdir}/graph-head'
+
 test_expect_success 'setup bare repo' \
     'cd .. &&
      git clone --bare full bare &&
@@ -89,7 +94,7 @@ test_expect_success 'setup bare repo' \
      baredir="./objects/pack"'
 
 test_expect_success 'write graph in bare repo' \
-    'graphbare=$(git graph --write) &&
+    'graphbare=$(git graph --write --update-head) &&
      test_path_is_file ${baredir}/graph-${graphbare}.graph &&
      test_path_is_file ${baredir}/graph-head &&
      echo ${graphbare} >expect &&
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 10/14] packed-graph: teach git-graph --delete-expired
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
                   ` (8 preceding siblings ...)
  2018-01-25 14:02 ` [PATCH 09/14] packed-graph: implement git-graph --clear Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 14:02 ` [PATCH 11/14] commit: integrate packed graph with commit parsing Derrick Stolee
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

Teach git-graph to delete the graph previously referenced by 'graph_head'
when writing a new graph file and updating 'graph_head'. This prevents
data creep by storing a list of useless graphs. Be careful to not delete
the graph if the file did not change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-graph.txt |  8 ++++++--
 builtin/graph.c             | 14 +++++++++++++-
 t/t5319-graph.sh            | 37 +++++++++++++++++++++++++++++++++++--
 3 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
index f690699570..f4f1048d28 100644
--- a/Documentation/git-graph.txt
+++ b/Documentation/git-graph.txt
@@ -39,6 +39,10 @@ OPTIONS
 	When used with --write, update the graph-head file to point to
 	the written graph file.
 
+--delete-expired::
+	When used with --write and --update-head, delete the graph file
+	previously referenced by graph-head.
+
 EXAMPLES
 --------
 
@@ -55,10 +59,10 @@ $ git graph --write
 ------------------------------------------------
 
 * Write a graph file for the packed commits in your local .git folder,
-* and update graph-head.
+* update graph-head, and delete the old graph-<oid>.graph file.
 +
 ------------------------------------------------
-$ git graph --write --update-head
+$ git graph --write --update-head --delete-expired
 ------------------------------------------------
 
 * Read basic information from a graph file.
diff --git a/builtin/graph.c b/builtin/graph.c
index ac15febc46..adf779b601 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -12,7 +12,7 @@ static char const * const builtin_graph_usage[] ={
 	N_("git graph [--pack-dir <packdir>]"),
 	N_("git graph --clear [--pack-dir <packdir>]"),
 	N_("git graph --read [--graph-id=<oid>]"),
-	N_("git graph --write [--pack-dir <packdir>] [--update-head]"),
+	N_("git graph --write [--pack-dir <packdir>] [--update-head] [--delete-expired]"),
 	NULL
 };
 
@@ -23,6 +23,7 @@ static struct opts_graph {
 	const char *graph_id;
 	int write;
 	int update_head;
+	int delete_expired;
 	int has_existing;
 	struct object_id old_graph_oid;
 } opts;
@@ -120,6 +121,15 @@ static int graph_write(void)
 	if (graph_id)
 		printf("%s\n", oid_to_hex(graph_id));
 
+	if (opts.delete_expired && opts.update_head && opts.has_existing &&
+	    oidcmp(graph_id, &opts.old_graph_oid)) {
+		char *old_path = get_graph_filename_oid(opts.pack_dir, &opts.old_graph_oid);
+		if (remove_path(old_path))
+			die("failed to remove path %s", old_path);
+
+		free(old_path);
+	}
+
 	free(graph_id);
 	return 0;
 }
@@ -138,6 +148,8 @@ int cmd_graph(int argc, const char **argv, const char *prefix)
 			N_("write graph file")),
 		OPT_BOOL('u', "update-head", &opts.update_head,
 			N_("update graph-head to written graph file")),
+		OPT_BOOL('d', "delete-expired", &opts.delete_expired,
+			N_("delete expired head graph file")),
 		{ OPTION_STRING, 'M', "graph-id", &opts.graph_id,
 			N_("oid"),
 			N_("An OID for a specific graph file in the pack-dir."),
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
index 311fb9dd67..a70c7bbb02 100755
--- a/t/t5319-graph.sh
+++ b/t/t5319-graph.sh
@@ -80,9 +80,42 @@ test_expect_success 'write graph with merges' \
      _graph_read_expect "18" "${packdir}" &&
      cmp expect output'
 
+test_expect_success 'Add more commits' \
+    'git reset --hard commits/3 &&
+     for i in $(test_seq 16 20)
+     do
+        git commit --allow-empty -m "commit $i" &&
+        git branch commits/$i
+     done &&
+     git repack'
+
+test_expect_success 'write graph with merges' \
+    'graph3=$(git graph --write --update-head --delete-expired) &&
+     test_path_is_file ${packdir}/graph-${graph3}.graph &&
+     test_path_is_missing ${packdir}/graph-${graph2}.graph &&
+     test_path_is_file ${packdir}/graph-${graph1}.graph &&
+     test_path_is_file ${packdir}/graph-head &&
+     echo ${graph3} >expect &&
+     cmp -n 40 expect ${packdir}/graph-head &&
+     git graph --read --graph-id=${graph3} >output &&
+     _graph_read_expect "23" "${packdir}" &&
+     cmp expect output'
+
+test_expect_success 'write graph with nothing new' \
+    'graph4=$(git graph --write --update-head --delete-expired) &&
+     test_path_is_file ${packdir}/graph-${graph4}.graph &&
+     test_path_is_file ${packdir}/graph-${graph1}.graph &&
+     test_path_is_file ${packdir}/graph-head &&
+     echo ${graph4} >expect &&
+     cmp -n 40 expect ${packdir}/graph-head &&
+     git graph --read --graph-id=${graph4} >output &&
+     _graph_read_expect "23" "${packdir}" &&
+     cmp expect output'
+
 test_expect_success 'clear graph' \
     'git graph --clear &&
-     test_path_is_missing ${packdir}/graph-${graph2}.graph &&
+     test_path_is_missing ${packdir}/graph-${graph3}.graph &&
+     test_path_is_file ${packdir}/graph-${graph1}.graph &&
      test_path_is_missing ${packdir}/graph-head'
 
 test_expect_success 'setup bare repo' \
@@ -100,7 +133,7 @@ test_expect_success 'write graph in bare repo' \
      echo ${graphbare} >expect &&
      cmp -n 40 expect ${baredir}/graph-head &&
      git graph --read --graph-id=${graphbare} >output &&
-     _graph_read_expect "18" "${baredir}" &&
+     _graph_read_expect "23" "${baredir}" &&
      cmp expect output'
 
 test_done
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 11/14] commit: integrate packed graph with commit parsing
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
                   ` (9 preceding siblings ...)
  2018-01-25 14:02 ` [PATCH 10/14] packed-graph: teach git-graph --delete-expired Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-26 19:38   ` Stefan Beller
  2018-01-25 14:02 ` [PATCH 12/14] packed-graph: read only from specific pack-indexes Derrick Stolee
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

Teach Git to inspect a packed graph to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date. The only loosely-expected
condition is that the commit buffer is loaded into the cache. This
was checked in log-tree.c:show_log(), but the "return;" on failure
produced unexpected results (i.e. the message line was never terminated).
The new behavior of loading the buffer when needed prevents the
unexpected behavior.

If core.graph is false, then do not load the graph and behave as usual.

In test script t5319-graph.sh, add output-matching conditions on read-
only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commits walks. Here are some performance
results for a copy of the Linux repository where 'master' has 704,766
reachable commits and is behind 'origin/master' by 19,610 commits.

| Command                          | Before | After  | Rel % |
|----------------------------------|--------|--------|-------|
| log --oneline --topo-order -1000 |  5.9s  |  0.7s  | -88%  |
| branch -vv                       |  0.42s |  0.27s | -35%  |
| rev-list --all                   |  6.4s  |  1.0s  | -84%  |
| rev-list --all --objects         | 32.6s  | 27.6s  | -15%  |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 alloc.c          |   1 +
 commit.c         |  20 ++++-
 commit.h         |   2 +
 log-tree.c       |   3 +-
 packed-graph.c   | 242 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 packed-graph.h   |  18 +++++
 t/t5319-graph.sh | 114 ++++++++++++++++++++++++--
 7 files changed, 387 insertions(+), 13 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacd..4a4dcfa2b7 100644
--- a/alloc.c
+++ b/alloc.c
@@ -93,6 +93,7 @@ void *alloc_commit_node(void)
 	struct commit *c = alloc_node(&commit_state, sizeof(struct commit));
 	c->object.type = OBJ_COMMIT;
 	c->index = alloc_commit_index();
+	c->graphId = 0xFFFFFFFF;
 	return c;
 }
 
diff --git a/commit.c b/commit.c
index cab8d4455b..253c102808 100644
--- a/commit.c
+++ b/commit.c
@@ -12,6 +12,7 @@
 #include "prio-queue.h"
 #include "sha1-lookup.h"
 #include "wt-status.h"
+#include "packed-graph.h"
 
 static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
 
@@ -374,7 +375,7 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 	return 0;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int check_packed)
 {
 	enum object_type type;
 	void *buffer;
@@ -383,19 +384,27 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 
 	if (!item)
 		return -1;
+
+	// If we already parsed, but got it from the graph, then keep going!
 	if (item->object.parsed)
 		return 0;
+
+	if (check_packed && parse_packed_commit(item))
+		return 0;
+
 	buffer = read_sha1_file(item->object.oid.hash, &type, &size);
 	if (!buffer)
 		return quiet_on_missing ? -1 :
 			error("Could not read %s",
-			     oid_to_hex(&item->object.oid));
+			oid_to_hex(&item->object.oid));
 	if (type != OBJ_COMMIT) {
 		free(buffer);
 		return error("Object %s not a commit",
-			     oid_to_hex(&item->object.oid));
+			oid_to_hex(&item->object.oid));
 	}
+
 	ret = parse_commit_buffer(item, buffer, size);
+
 	if (save_commit_buffer && !ret) {
 		set_commit_buffer(item, buffer, size);
 		return 0;
@@ -404,6 +413,11 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
 	return ret;
 }
 
+int parse_commit_gently(struct commit *item, int quiet_on_missing)
+{
+	return parse_commit_internal(item, quiet_on_missing, 1);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
 	if (parse_commit(item))
diff --git a/commit.h b/commit.h
index 8c68ca1a5a..02f5f2a182 100644
--- a/commit.h
+++ b/commit.h
@@ -21,6 +21,7 @@ struct commit {
 	timestamp_t date;
 	struct commit_list *parents;
 	struct tree *tree;
+	uint32_t graphId;
 };
 
 extern int save_commit_buffer;
@@ -60,6 +61,7 @@ struct commit *lookup_commit_reference_by_name(const char *name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long size);
+extern int parse_commit_internal(struct commit *item, int quiet_on_missing, int check_packed);
 int parse_commit_gently(struct commit *item, int quiet_on_missing);
 static inline int parse_commit(struct commit *item)
 {
diff --git a/log-tree.c b/log-tree.c
index fca29d4799..156aed4541 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -659,8 +659,7 @@ void show_log(struct rev_info *opt)
 		show_mergetag(opt, commit);
 	}
 
-	if (!get_cached_commit_buffer(commit, NULL))
-		return;
+	get_commit_buffer(commit, NULL);
 
 	if (opt->show_notes) {
 		int raw;
diff --git a/packed-graph.c b/packed-graph.c
index 5723f163ae..343b231973 100644
--- a/packed-graph.c
+++ b/packed-graph.c
@@ -34,6 +34,8 @@
 #define GRAPH_CHUNKLOOKUP_SIZE (5 * 12)
 #define GRAPH_MIN_SIZE (GRAPH_CHUNKLOOKUP_SIZE + GRAPH_FANOUT_SIZE + \
 			GRAPH_OID_LEN + sizeof(struct packed_graph_header))
+/* global storage */
+struct packed_graph *packed_graph = 0;
 
 struct object_id *get_graph_head_oid(const char *pack_dir, struct object_id *oid)
 {
@@ -209,6 +211,225 @@ struct packed_graph *load_packed_graph_one(const char *graph_file, const char *p
 	return graph;
 }
 
+static void prepare_packed_graph_one(const char *obj_dir)
+{
+	char *graph_file;
+	struct object_id oid;
+	struct strbuf pack_dir = STRBUF_INIT;
+	strbuf_addstr(&pack_dir, obj_dir);
+	strbuf_add(&pack_dir, "/pack", 5);
+
+	if (!get_graph_head_oid(pack_dir.buf, &oid))
+		return;
+
+	graph_file = get_graph_filename_oid(pack_dir.buf, &oid);
+
+	packed_graph = load_packed_graph_one(graph_file, pack_dir.buf);
+	strbuf_release(&pack_dir);
+}
+
+static int prepare_packed_graph_run_once = 0;
+void prepare_packed_graph(void)
+{
+	struct alternate_object_database *alt;
+	char *obj_dir;
+
+	if (prepare_packed_graph_run_once)
+		return;
+	prepare_packed_graph_run_once = 1;
+
+	obj_dir = get_object_directory();
+	prepare_packed_graph_one(obj_dir);
+	prepare_alt_odb();
+	for (alt = alt_odb_list; !packed_graph && alt; alt = alt->next)
+		prepare_packed_graph_one(alt->path);
+}
+
+static int bsearch_graph(struct packed_graph *g, struct object_id *oid, uint32_t *pos)
+{
+	uint32_t last, first = 0;
+
+	if (oid->hash[0])
+		first = ntohl(*(uint32_t*)(g->chunk_oid_fanout + 4 * (oid->hash[0] - 1)));
+	last = ntohl(*(uint32_t*)(g->chunk_oid_fanout + 4 * oid->hash[0]));
+
+	while (first < last) {
+		uint32_t mid = first + (last - first) / 2;
+		const unsigned char *current;
+		int cmp;
+
+		current = g->chunk_oid_lookup + g->hdr->hash_len * mid;
+		cmp = hashcmp(oid->hash, current);
+		if (!cmp) {
+			*pos = mid;
+			return 1;
+		}
+		if (cmp > 0) {
+			first = mid + 1;
+			continue;
+		}
+		last = mid;
+	}
+
+	*pos = first;
+	return 0;
+}
+
+struct object_id *get_nth_commit_oid(struct packed_graph *g,
+					    uint32_t n,
+					    struct object_id *oid)
+{
+	hashcpy(oid->hash, g->chunk_oid_lookup + g->hdr->hash_len * n);
+	return oid;
+}
+
+static int full_parse_commit(struct commit *item, struct packed_graph *g,
+			     uint32_t pos, const unsigned char *commit_data)
+{
+	struct object_id oid;
+	struct commit *new_parent;
+	uint32_t new_parent_pos;
+	uint32_t *parent_data_ptr;
+	uint64_t date_low, date_high;
+	struct commit_list **pptr;
+
+	item->object.parsed = 1;
+	item->graphId = pos;
+
+	hashcpy(oid.hash, commit_data);
+	item->tree = lookup_tree(&oid);
+
+	date_high = ntohl(*(uint32_t*)(commit_data + g->hdr->hash_len + 8));
+	date_low = ntohl(*(uint32_t*)(commit_data + g->hdr->hash_len + 12));
+	item->date = (timestamp_t)((date_high << 32) | date_low);
+
+	pptr = &item->parents;
+
+	// First Parent
+	new_parent_pos = ntohl(*(uint32_t*)(commit_data + g->hdr->hash_len));
+	if (new_parent_pos == GRAPH_PARENT_NONE)
+		return 1;
+	get_nth_commit_oid(g, new_parent_pos, &oid);
+	new_parent = lookup_commit(&oid);
+	if (new_parent) {
+		new_parent->graphId = new_parent_pos;
+		pptr = &commit_list_insert(new_parent, pptr)->next;
+	} else {
+		die("could not find commit %s", oid_to_hex(&oid));
+	}
+
+	// Second Parent
+	new_parent_pos = ntohl(*(uint32_t*)(commit_data + g->hdr->hash_len + 4));
+	if (new_parent_pos == GRAPH_PARENT_NONE)
+		return 1;
+	if (!(new_parent_pos & GRAPH_LARGE_EDGES_NEEDED)) {
+		get_nth_commit_oid(g, new_parent_pos, &oid);
+		new_parent = lookup_commit(&oid);
+		if (new_parent) {
+			new_parent->graphId = new_parent_pos;
+			pptr = &commit_list_insert(new_parent, pptr)->next;
+		} else
+			die("could not find commit %s", oid_to_hex(&oid));
+		return 1;
+	}
+
+	parent_data_ptr = (uint32_t*)(g->chunk_large_edges + 4 * (new_parent_pos ^ GRAPH_LARGE_EDGES_NEEDED));
+	do {
+		new_parent_pos = ntohl(*parent_data_ptr);
+
+		get_nth_commit_oid(g, new_parent_pos & GRAPH_EDGE_LAST_MASK, &oid);
+		new_parent = lookup_commit(&oid);
+		if (new_parent) {
+			new_parent->graphId = new_parent_pos & GRAPH_EDGE_LAST_MASK;
+			pptr = &commit_list_insert(new_parent, pptr)->next;
+		} else
+			die("could not find commit %s", oid_to_hex(&oid));
+		parent_data_ptr++;
+	} while (!(new_parent_pos & GRAPH_LAST_EDGE));
+
+	return 1;
+}
+
+/**
+ * Fill 'item' to contain all information that would be parsed by parse_commit_buffer.
+ */
+static int fill_packed_commit(struct commit *item, struct packed_graph *g, uint32_t pos)
+{
+	uint32_t new_parent_pos;
+	uint32_t *parent_data_ptr;
+	const unsigned char *commit_data = g->chunk_commit_data + (g->hdr->hash_len + 16) * pos;
+
+	// Determine if we have ALL of our edges, otherwise we will return as unparsed.
+
+	// Check constant-width parents first.
+	new_parent_pos = ntohl(*(uint32_t*)(commit_data + g->hdr->hash_len));
+
+	if (new_parent_pos == GRAPH_PARENT_MISSING)
+		return 0;
+
+	if (new_parent_pos == GRAPH_PARENT_NONE)
+		return full_parse_commit(item, g, pos, commit_data);
+
+	new_parent_pos = ntohl(*(uint32_t*)(commit_data + g->hdr->hash_len + 4));
+
+	if (new_parent_pos == GRAPH_PARENT_MISSING)
+		return 0;
+	if (!(new_parent_pos & GRAPH_LARGE_EDGES_NEEDED))
+		return full_parse_commit(item, g, pos, commit_data);
+
+	new_parent_pos = new_parent_pos ^ GRAPH_LARGE_EDGES_NEEDED;
+
+	if (new_parent_pos == GRAPH_PARENT_MISSING)
+		return 0;
+
+	parent_data_ptr = (uint32_t*)(g->chunk_large_edges + 4 * new_parent_pos);
+	do {
+		new_parent_pos = ntohl(*parent_data_ptr);
+
+		if ((new_parent_pos & GRAPH_EDGE_LAST_MASK) == GRAPH_PARENT_MISSING)
+			return 0;
+
+		parent_data_ptr++;
+	} while (!(new_parent_pos & GRAPH_LAST_EDGE));
+
+	return full_parse_commit(item, g, pos, commit_data);
+}
+
+/**
+ * Given a commit struct, try to fill the commit struct info, including:
+ *  1. tree object
+ *  2. date
+ *  3. parents.
+ *
+ * Returns 1 iff the commit was found in the packed graph.
+ *
+ * See parse_commit_buffer() for the fallback after this call.
+ */
+int parse_packed_commit(struct commit *item)
+{
+	if (!core_graph)
+		return 0;
+	if (item->object.parsed)
+		return 1;
+
+	prepare_packed_graph();
+	if (packed_graph) {
+		uint32_t pos;
+		int found;
+		if (item->graphId != 0xFFFFFFFF) {
+			pos = item->graphId;
+			found = 1;
+		} else {
+			found = bsearch_graph(packed_graph, &(item->object.oid), &pos);
+		}
+
+		if (found)
+			return fill_packed_commit(item, packed_graph, pos);
+	}
+
+	return parse_commit_internal(item, 0, 0);
+}
+
 static void write_graph_chunk_fanout(
 	struct sha1file *f,
 	struct commit **commits, int nr_commits)
@@ -439,9 +660,24 @@ struct object_id *construct_graph(const char *pack_dir)
 	if (!core_graph)
 		return 0;
 
+	prepare_packed_graph();
+
 	oids.num = 0;
 	oids.size = 1024;
+
+	if (packed_graph && oids.size < packed_graph->num_commits)
+		oids.size = packed_graph->num_commits;
+
 	ALLOC_ARRAY(oids.list, oids.size);
+
+	if (packed_graph) {
+		for (i = 0; i < packed_graph->num_commits; i++) {
+			oids.list[i] = malloc(sizeof(struct object_id));
+			get_nth_commit_oid(packed_graph, i, oids.list[i]);
+		}
+		oids.num = packed_graph->num_commits;
+	}
+
 	for_each_packed_object(if_packed_commit_add_to_list, &oids, 0);
 	QSORT(oids.list, oids.num, commit_compare);
 
@@ -530,6 +766,11 @@ struct object_id *construct_graph(const char *pack_dir)
 	hashcpy(f_oid->hash, final_hash);
 	fname = get_graph_filename_oid(pack_dir, f_oid);
 
+	if (packed_graph) {
+		close_graph(packed_graph);
+		FREE_AND_NULL(packed_graph);
+	}
+
 	if (rename(graph_name, fname))
 		die("failed to rename %s to %s", graph_name, fname);
 
@@ -539,3 +780,4 @@ struct object_id *construct_graph(const char *pack_dir)
 
 	return f_oid;
 }
+
diff --git a/packed-graph.h b/packed-graph.h
index ad561863c8..6010ce3e53 100644
--- a/packed-graph.h
+++ b/packed-graph.h
@@ -4,6 +4,18 @@
 #include "git-compat-util.h"
 #include "commit.h"
 
+/**
+ * Given a commit struct, try to fill the commit struct info, including:
+ *  1. tree object
+ *  2. date
+ *  3. parents.
+ *
+ * Returns 1 iff the commit was found in the packed graph.
+ *
+ * See parse_commit_buffer() for the fallback after this call.
+ */
+extern int parse_packed_commit(struct commit *item);
+
 extern struct object_id *get_graph_head_oid(const char *pack_dir, struct object_id *oid);
 extern char* get_graph_filename_oid(const char *pack_dir,
 				    struct object_id *oid);
@@ -40,7 +52,13 @@ extern struct packed_graph {
 extern int close_graph(struct packed_graph *g);
 
 extern struct packed_graph *load_packed_graph_one(const char *graph_file, const char *pack_dir);
+extern void prepare_packed_graph(void);
+
+extern struct object_id *get_nth_commit_oid(struct packed_graph *g,
+					    uint32_t n,
+					    struct object_id *oid);
 
 extern struct object_id *construct_graph(const char *pack_dir);
+extern int close_graph(struct packed_graph *g);
 
 #endif
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
index a70c7bbb02..4552795179 100755
--- a/t/t5319-graph.sh
+++ b/t/t5319-graph.sh
@@ -26,6 +26,23 @@ test_expect_success 'create commits and repack' \
      done &&
      git repack'
 
+_graph_git_two_modes() {
+    git -c core.graph=true $1 >output
+    git -c core.graph=false $1 >expect
+    cmp output expect
+}
+
+_graph_git_behavior() {
+    test_expect_success 'check normal git operations' \
+        '_graph_git_two_modes "log --oneline master" &&
+         _graph_git_two_modes "log --topo-order master" &&
+         _graph_git_two_modes "log --graph $1 $2" &&
+         _graph_git_two_modes "branch -vv" &&
+         _graph_git_two_modes "merge-base -a $1 $2"'
+}
+
+_graph_git_behavior commits/3 commits/4
+
 _graph_read_expect() {
     cat >expect <<- EOF
 header: 43475048 01 01 14 04
@@ -43,6 +60,8 @@ test_expect_success 'write graph' \
      _graph_read_expect "5" "${packdir}" &&
      cmp expect output'
 
+_graph_git_behavior commits/3 commits/4
+
 test_expect_success 'Add more commits' \
     'git reset --hard commits/3 &&
      for i in $(test_seq 6 10)
@@ -52,7 +71,7 @@ test_expect_success 'Add more commits' \
         git commit -m "commit $i" &&
         git branch commits/$i
      done &&
-     git reset --hard commits/7 &&
+     git reset --hard commits/3 &&
      for i in $(test_seq 11 15)
      do
         echo $i >$i.txt &&
@@ -61,15 +80,40 @@ test_expect_success 'Add more commits' \
         git branch commits/$i
      done &&
      git reset --hard commits/7 &&
-     git merge commits/4 &&
+     git merge commits/11 &&
      git branch merge/1 &&
      git reset --hard commits/8 &&
-     git merge commits/11 &&
+     git merge commits/12 &&
      git branch merge/2 &&
-     git reset --hard commits/9 &&
-     git merge commits/5 commits/13 &&
+     git reset --hard commits/5 &&
+     git merge commits/10 commits/15 &&
+     git branch merge/3 &&
      git repack'
 
+# Current graph structure:
+#
+#      M3
+#     / |\_____
+#    / 10      15
+#   /   |      |
+#  /    9 M2   14
+# |     |/  \ \|
+# |     8 M1 | 13
+# |     |/ | \_|
+# 5     7  |   12
+# |     |   \__|
+# 4     6      11
+# |____/______/
+# 3
+# |
+# 2
+# |
+# 1
+
+_graph_git_behavior merge/1 merge/2
+_graph_git_behavior merge/1 merge/3
+_graph_git_behavior merge/2 merge/3
+
 test_expect_success 'write graph with merges' \
     'graph2=$(git graph --write --update-head) &&
      test_path_is_file ${packdir}/graph-${graph2}.graph &&
@@ -80,15 +124,54 @@ test_expect_success 'write graph with merges' \
      _graph_read_expect "18" "${packdir}" &&
      cmp expect output'
 
+_graph_git_behavior merge/1 merge/2
+_graph_git_behavior merge/1 merge/3
+_graph_git_behavior merge/2 merge/3
+
 test_expect_success 'Add more commits' \
-    'git reset --hard commits/3 &&
-     for i in $(test_seq 16 20)
+    'for i in $(test_seq 16 20)
      do
-        git commit --allow-empty -m "commit $i" &&
+        echo $i >$i.txt &&
+        git add $i.txt &&
+        git commit -m "commit $i" &&
         git branch commits/$i
      done &&
      git repack'
 
+# Current graph structure:
+#
+#      20
+#       |
+#      19
+#       |
+#      18
+#       |
+#      17
+#       |
+#      16
+#       |
+#      M3
+#     / |\_____
+#    / 10      15
+#   /   |      |
+#  /    9 M2   14
+# |     |/  \ \|
+# |     8 M1 | 13
+# |     |/ | \_|
+# 5     7  |   12
+# |     |   \__|
+# 4     6      11
+# |____/______/
+# 3
+# |
+# 2
+# |
+# 1
+
+# Test behavior while in mixed mode
+_graph_git_behavior commits/20 merge/1
+_graph_git_behavior commits/20 merge/2
+
 test_expect_success 'write graph with merges' \
     'graph3=$(git graph --write --update-head --delete-expired) &&
      test_path_is_file ${packdir}/graph-${graph3}.graph &&
@@ -101,6 +184,9 @@ test_expect_success 'write graph with merges' \
      _graph_read_expect "23" "${packdir}" &&
      cmp expect output'
 
+_graph_git_behavior commits/20 merge/1
+_graph_git_behavior commits/20 merge/2
+
 test_expect_success 'write graph with nothing new' \
     'graph4=$(git graph --write --update-head --delete-expired) &&
      test_path_is_file ${packdir}/graph-${graph4}.graph &&
@@ -112,12 +198,18 @@ test_expect_success 'write graph with nothing new' \
      _graph_read_expect "23" "${packdir}" &&
      cmp expect output'
 
+_graph_git_behavior commits/20 merge/1
+_graph_git_behavior commits/20 merge/2
+
 test_expect_success 'clear graph' \
     'git graph --clear &&
      test_path_is_missing ${packdir}/graph-${graph3}.graph &&
      test_path_is_file ${packdir}/graph-${graph1}.graph &&
      test_path_is_missing ${packdir}/graph-head'
 
+_graph_git_behavior commits/20 merge/1
+_graph_git_behavior commits/20 merge/2
+
 test_expect_success 'setup bare repo' \
     'cd .. &&
      git clone --bare full bare &&
@@ -126,6 +218,9 @@ test_expect_success 'setup bare repo' \
      git config pack.threads 1 &&
      baredir="./objects/pack"'
 
+_graph_git_behavior commits/20 merge/1
+_graph_git_behavior commits/20 merge/2
+
 test_expect_success 'write graph in bare repo' \
     'graphbare=$(git graph --write --update-head) &&
      test_path_is_file ${baredir}/graph-${graphbare}.graph &&
@@ -136,4 +231,7 @@ test_expect_success 'write graph in bare repo' \
      _graph_read_expect "23" "${baredir}" &&
      cmp expect output'
 
+_graph_git_behavior commits/20 merge/1
+_graph_git_behavior commits/20 merge/2
+
 test_done
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 12/14] packed-graph: read only from specific pack-indexes
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
                   ` (10 preceding siblings ...)
  2018-01-25 14:02 ` [PATCH 11/14] commit: integrate packed graph with commit parsing Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 14:02 ` [PATCH 13/14] packed-graph: close under reachability Derrick Stolee
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

Teach git-graph to inspect the objects only in a certain list of
pack-indexes within the given pack directory. This allows updating
the graph iteratively, since we add all commits stored in a previous
packed graph.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-graph.txt | 12 ++++++++++++
 builtin/graph.c             | 26 +++++++++++++++++++++++---
 packed-graph.c              | 27 +++++++++++++++++++++++----
 packed-graph.h              |  2 +-
 packfile.c                  |  4 ++--
 packfile.h                  |  2 ++
 t/t5319-graph.sh            | 10 ++++++----
 7 files changed, 69 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
index f4f1048d28..b68a61ddea 100644
--- a/Documentation/git-graph.txt
+++ b/Documentation/git-graph.txt
@@ -43,6 +43,11 @@ OPTIONS
 	When used with --write and --update-head, delete the graph file
 	previously referenced by graph-head.
 
+--stdin-packs::
+	When used with --write, generate the new graph by walking objects
+	only in the specified packfiles and any commits in the
+	existing graph-head.
+
 EXAMPLES
 --------
 
@@ -65,6 +70,13 @@ $ git graph --write
 $ git graph --write --update-head --delete-expired
 ------------------------------------------------
 
+* Write a graph file, extending the current graph file using commits
+* in <packfile>, update graph-head, and delete the old graph-<oid>.graph file.
++
+------------------------------------------------
+$ echo <packfile> | git graph --write --update-head --delete-expired --stdin-packs
+------------------------------------------------
+
 * Read basic information from a graph file.
 +
 ------------------------------------------------
diff --git a/builtin/graph.c b/builtin/graph.c
index adf779b601..3cace3a18c 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -12,7 +12,7 @@ static char const * const builtin_graph_usage[] ={
 	N_("git graph [--pack-dir <packdir>]"),
 	N_("git graph --clear [--pack-dir <packdir>]"),
 	N_("git graph --read [--graph-id=<oid>]"),
-	N_("git graph --write [--pack-dir <packdir>] [--update-head] [--delete-expired]"),
+	N_("git graph --write [--pack-dir <packdir>] [--update-head] [--delete-expired] [--stdin-packs]"),
 	NULL
 };
 
@@ -24,6 +24,7 @@ static struct opts_graph {
 	int write;
 	int update_head;
 	int delete_expired;
+	int stdin_packs;
 	int has_existing;
 	struct object_id old_graph_oid;
 } opts;
@@ -113,7 +114,24 @@ static void update_head_file(const char *pack_dir, const struct object_id *graph
 
 static int graph_write(void)
 {
-	struct object_id *graph_id = construct_graph(opts.pack_dir);
+	struct object_id *graph_id;
+	char **pack_indexes = NULL;
+	int num_packs = 0;
+	int size_packs = 0;
+
+	if (opts.stdin_packs) {
+		struct strbuf buf = STRBUF_INIT;
+		size_packs = 128;
+		ALLOC_ARRAY(pack_indexes, size_packs);
+
+		while (strbuf_getline(&buf, stdin) != EOF) {
+			ALLOC_GROW(pack_indexes, num_packs + 1, size_packs);
+			pack_indexes[num_packs++] = buf.buf;
+			strbuf_detach(&buf, NULL);
+		}
+	}
+
+	graph_id = construct_graph(opts.pack_dir, pack_indexes, num_packs);
 
 	if (opts.update_head)
 		update_head_file(opts.pack_dir, graph_id);
@@ -150,7 +168,9 @@ int cmd_graph(int argc, const char **argv, const char *prefix)
 			N_("update graph-head to written graph file")),
 		OPT_BOOL('d', "delete-expired", &opts.delete_expired,
 			N_("delete expired head graph file")),
-		{ OPTION_STRING, 'M', "graph-id", &opts.graph_id,
+		OPT_BOOL('s', "stdin-packs", &opts.stdin_packs,
+			N_("only scan packfiles listed by stdin")),
+		{ OPTION_STRING, 'G', "graph-id", &opts.graph_id,
 			N_("oid"),
 			N_("An OID for a specific graph file in the pack-dir."),
 			PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
diff --git a/packed-graph.c b/packed-graph.c
index 343b231973..0dc68a077e 100644
--- a/packed-graph.c
+++ b/packed-graph.c
@@ -401,7 +401,7 @@ static int fill_packed_commit(struct commit *item, struct packed_graph *g, uint3
  *  2. date
  *  3. parents.
  *
- * Returns 1 iff the commit was found in the packed graph.
+ * Returns 1 if and only if the commit was found in the packed graph.
  *
  * See parse_commit_buffer() for the fallback after this call.
  */
@@ -427,7 +427,7 @@ int parse_packed_commit(struct commit *item)
 			return fill_packed_commit(item, packed_graph, pos);
 	}
 
-	return parse_commit_internal(item, 0, 0);
+	return 0;
 }
 
 static void write_graph_chunk_fanout(
@@ -638,7 +638,7 @@ static int if_packed_commit_add_to_list(const struct object_id *oid,
 	return 0;
 }
 
-struct object_id *construct_graph(const char *pack_dir)
+struct object_id *construct_graph(const char *pack_dir, char **pack_indexes, int nr_packs)
 {
 	// Find a list of oids, adding the pointer to a list.
 	struct packed_oid_list oids;
@@ -678,7 +678,26 @@ struct object_id *construct_graph(const char *pack_dir)
 		oids.num = packed_graph->num_commits;
 	}
 
-	for_each_packed_object(if_packed_commit_add_to_list, &oids, 0);
+	if (pack_indexes) {
+		int pack_dir_len = strlen(pack_dir) + 1;
+		struct strbuf packname = STRBUF_INIT;
+		strbuf_add(&packname, pack_dir, pack_dir_len - 1);
+		strbuf_addch(&packname, '/');
+		for (i = 0; i < nr_packs; i++) {
+			struct packed_git *p;
+			strbuf_setlen(&packname, pack_dir_len);
+			strbuf_addstr(&packname, pack_indexes[i]);
+			p = add_packed_git(packname.buf, packname.len, 1);
+			if (!p)
+				die("error adding pack %s", packname.buf);
+			if (open_pack_index(p))
+				die("error opening index for %s", packname.buf);
+			for_each_object_in_pack(p, if_packed_commit_add_to_list, &oids);
+			close_pack(p);
+		}
+	} else {
+		for_each_packed_object(if_packed_commit_add_to_list, &oids, 0);
+	}
 	QSORT(oids.list, oids.num, commit_compare);
 
 	count_distinct = 1;
diff --git a/packed-graph.h b/packed-graph.h
index 6010ce3e53..97ce1e2652 100644
--- a/packed-graph.h
+++ b/packed-graph.h
@@ -58,7 +58,7 @@ extern struct object_id *get_nth_commit_oid(struct packed_graph *g,
 					    uint32_t n,
 					    struct object_id *oid);
 
-extern struct object_id *construct_graph(const char *pack_dir);
+extern struct object_id *construct_graph(const char *pack_dir, char **pack_indexes, int nr_packs);
 extern int close_graph(struct packed_graph *g);
 
 #endif
diff --git a/packfile.c b/packfile.c
index 4a5fe7ab18..48133bd669 100644
--- a/packfile.c
+++ b/packfile.c
@@ -299,7 +299,7 @@ void close_pack_index(struct packed_git *p)
 	}
 }
 
-static void close_pack(struct packed_git *p)
+void close_pack(struct packed_git *p)
 {
 	close_pack_windows(p);
 	close_pack_fd(p);
@@ -1860,7 +1860,7 @@ int has_pack_index(const unsigned char *sha1)
 	return 1;
 }
 
-static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
+int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data)
 {
 	uint32_t i;
 	int r = 0;
diff --git a/packfile.h b/packfile.h
index 0cdeb54dcd..cde868feb6 100644
--- a/packfile.h
+++ b/packfile.h
@@ -61,6 +61,7 @@ extern void close_pack_index(struct packed_git *);
 
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *);
 extern void close_pack_windows(struct packed_git *);
+extern void close_pack(struct packed_git *);
 extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
@@ -133,6 +134,7 @@ typedef int each_packed_object_fn(const struct object_id *oid,
 				  struct packed_git *pack,
 				  uint32_t pos,
 				  void *data);
+extern int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data);
 extern int for_each_packed_object(each_packed_object_fn, void *, unsigned flags);
 
 #endif
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
index 4552795179..969150cd21 100755
--- a/t/t5319-graph.sh
+++ b/t/t5319-graph.sh
@@ -34,8 +34,8 @@ _graph_git_two_modes() {
 
 _graph_git_behavior() {
     test_expect_success 'check normal git operations' \
-        '_graph_git_two_modes "log --oneline master" &&
-         _graph_git_two_modes "log --topo-order master" &&
+        '_graph_git_two_modes "log --oneline $1" &&
+         _graph_git_two_modes "log --topo-order $1" &&
          _graph_git_two_modes "log --graph $1 $2" &&
          _graph_git_two_modes "branch -vv" &&
          _graph_git_two_modes "merge-base -a $1 $2"'
@@ -136,7 +136,9 @@ test_expect_success 'Add more commits' \
         git commit -m "commit $i" &&
         git branch commits/$i
      done &&
-     git repack'
+     ls ${packdir} | grep idx >existing-idx &&
+     git repack &&
+     ls ${packdir} | grep idx | grep -v --file=existing-idx >new-idx'
 
 # Current graph structure:
 #
@@ -173,7 +175,7 @@ _graph_git_behavior commits/20 merge/1
 _graph_git_behavior commits/20 merge/2
 
 test_expect_success 'write graph with merges' \
-    'graph3=$(git graph --write --update-head --delete-expired) &&
+    'graph3=$(cat new-idx | git graph --write --update-head --delete-expired --stdin-packs) &&
      test_path_is_file ${packdir}/graph-${graph3}.graph &&
      test_path_is_missing ${packdir}/graph-${graph2}.graph &&
      test_path_is_file ${packdir}/graph-${graph1}.graph &&
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 13/14] packed-graph: close under reachability
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
                   ` (11 preceding siblings ...)
  2018-01-25 14:02 ` [PATCH 12/14] packed-graph: read only from specific pack-indexes Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 14:02 ` [PATCH 14/14] packed-graph: teach git-graph to read commits Derrick Stolee
  2018-01-25 15:46 ` [PATCH 00/14] Serialized Commit Graph Ævar Arnfjörð Bjarmason
  14 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

Teach construct_graph() to walk all parents from the commits discovered in
packfiles. This prevents gaps given by loose objects or previously-missed
packfiles.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 packed-graph.c   | 26 ++++++++++++++++++++++++++
 t/t5319-graph.sh | 14 ++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/packed-graph.c b/packed-graph.c
index 0dc68a077e..c93515f18e 100644
--- a/packed-graph.c
+++ b/packed-graph.c
@@ -5,6 +5,7 @@
 #include "packfile.h"
 #include "commit.h"
 #include "object.h"
+#include "revision.h"
 #include "packed-graph.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
@@ -638,6 +639,29 @@ static int if_packed_commit_add_to_list(const struct object_id *oid,
 	return 0;
 }
 
+static void close_reachable(struct packed_oid_list *oids)
+{
+	int i;
+	struct rev_info revs;
+	struct commit *commit;
+	init_revisions(&revs, NULL);
+
+	for (i = 0; i < oids->num; i++) {
+		commit = lookup_commit(oids->list[i]);
+		if (commit && !parse_commit(commit))
+			revs.commits = commit_list_insert(commit, &revs.commits);
+	}
+
+	if (prepare_revision_walk(&revs))
+		die(_("revision walk setup failed"));
+
+	while ((commit = get_revision(&revs)) != NULL) {
+		ALLOC_GROW(oids->list, oids->num + 1, oids->size);
+		oids->list[oids->num] = &(commit->object.oid);
+		(oids->num)++;
+	}
+}
+
 struct object_id *construct_graph(const char *pack_dir, char **pack_indexes, int nr_packs)
 {
 	// Find a list of oids, adding the pointer to a list.
@@ -698,6 +722,8 @@ struct object_id *construct_graph(const char *pack_dir, char **pack_indexes, int
 	} else {
 		for_each_packed_object(if_packed_commit_add_to_list, &oids, 0);
 	}
+
+	close_reachable(&oids);
 	QSORT(oids.list, oids.num, commit_compare);
 
 	count_distinct = 1;
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
index 969150cd21..8bf5a0c993 100755
--- a/t/t5319-graph.sh
+++ b/t/t5319-graph.sh
@@ -212,6 +212,20 @@ test_expect_success 'clear graph' \
 _graph_git_behavior commits/20 merge/1
 _graph_git_behavior commits/20 merge/2
 
+test_expect_success 'build graph from latest pack with closure' \
+    'graph5=$(cat new-idx | git graph --write --update-head --stdin-packs) &&
+     test_path_is_file ${packdir}/graph-${graph5}.graph &&
+     test_path_is_file ${packdir}/graph-${graph1}.graph &&
+     test_path_is_file ${packdir}/graph-head &&
+     echo ${graph5} >expect &&
+     cmp -n 40 expect ${packdir}/graph-head &&
+     git graph --read --graph-id=${graph5} >output &&
+     _graph_read_expect "21" "${packdir}" &&
+     cmp expect output'
+
+_graph_git_behavior commits/20 merge/1
+_graph_git_behavior commits/20 merge/2
+
 test_expect_success 'setup bare repo' \
     'cd .. &&
      git clone --bare full bare &&
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH 14/14] packed-graph: teach git-graph to read commits
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
                   ` (12 preceding siblings ...)
  2018-01-25 14:02 ` [PATCH 13/14] packed-graph: close under reachability Derrick Stolee
@ 2018-01-25 14:02 ` Derrick Stolee
  2018-01-25 15:46 ` [PATCH 00/14] Serialized Commit Graph Ævar Arnfjörð Bjarmason
  14 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 14:02 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, git, sbeller, dstolee

Teach git-graph to read commits from stdin when the --stdin-commits
flag is specified. Commits reachable from these commits are added to
the graph. This is a much faster way to construct the graph than
inspecting all packed objects, but is restricted to known tips.

For the Linux repository, 700,000+ commits were added to the graph
file starting from 'master' in 7-9 seconds, depending on the number
of packfiles in the repo (1, 24, or 120).

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/graph.c  | 33 +++++++++++++++++++++++++--------
 packed-graph.c   | 18 +++++++++++++++---
 packed-graph.h   |  3 ++-
 t/t5319-graph.sh | 18 ++++++++++++++++++
 4 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/builtin/graph.c b/builtin/graph.c
index 3cace3a18c..708889677b 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -12,7 +12,7 @@ static char const * const builtin_graph_usage[] ={
 	N_("git graph [--pack-dir <packdir>]"),
 	N_("git graph --clear [--pack-dir <packdir>]"),
 	N_("git graph --read [--graph-id=<oid>]"),
-	N_("git graph --write [--pack-dir <packdir>] [--update-head] [--delete-expired] [--stdin-packs]"),
+	N_("git graph --write [--pack-dir <packdir>] [--update-head] [--delete-expired] [--stdin-packs|--stdin-commits]"),
 	NULL
 };
 
@@ -25,6 +25,7 @@ static struct opts_graph {
 	int update_head;
 	int delete_expired;
 	int stdin_packs;
+	int stdin_commits;
 	int has_existing;
 	struct object_id old_graph_oid;
 } opts;
@@ -116,22 +117,36 @@ static int graph_write(void)
 {
 	struct object_id *graph_id;
 	char **pack_indexes = NULL;
+	char **commits = NULL;
 	int num_packs = 0;
-	int size_packs = 0;
+	int num_commits = 0;
+	char **lines = NULL;
+	int num_lines = 0;
+	int size_lines = 0;
 
-	if (opts.stdin_packs) {
+	if (opts.stdin_packs || opts.stdin_commits) {
 		struct strbuf buf = STRBUF_INIT;
-		size_packs = 128;
-		ALLOC_ARRAY(pack_indexes, size_packs);
+		size_lines = 128;
+		ALLOC_ARRAY(lines, size_lines);
 
 		while (strbuf_getline(&buf, stdin) != EOF) {
-			ALLOC_GROW(pack_indexes, num_packs + 1, size_packs);
-			pack_indexes[num_packs++] = buf.buf;
+			ALLOC_GROW(lines, num_lines + 1, size_lines);
+			lines[num_lines++] = buf.buf;
 			strbuf_detach(&buf, NULL);
 		}
+
+		if (opts.stdin_packs) {
+			pack_indexes = lines;
+			num_packs = num_lines;
+		}
+		if (opts.stdin_commits) {
+			commits = lines;
+			num_commits = num_lines;
+		}
 	}
 
-	graph_id = construct_graph(opts.pack_dir, pack_indexes, num_packs);
+	graph_id = construct_graph(opts.pack_dir, pack_indexes, num_packs,
+				   commits, num_commits);
 
 	if (opts.update_head)
 		update_head_file(opts.pack_dir, graph_id);
@@ -170,6 +185,8 @@ int cmd_graph(int argc, const char **argv, const char *prefix)
 			N_("delete expired head graph file")),
 		OPT_BOOL('s', "stdin-packs", &opts.stdin_packs,
 			N_("only scan packfiles listed by stdin")),
+		OPT_BOOL('C', "stdin-commits", &opts.stdin_commits,
+			N_("start walk at commits listed by stdin")),
 		{ OPTION_STRING, 'G', "graph-id", &opts.graph_id,
 			N_("oid"),
 			N_("An OID for a specific graph file in the pack-dir."),
diff --git a/packed-graph.c b/packed-graph.c
index c93515f18e..94e1a97000 100644
--- a/packed-graph.c
+++ b/packed-graph.c
@@ -662,7 +662,8 @@ static void close_reachable(struct packed_oid_list *oids)
 	}
 }
 
-struct object_id *construct_graph(const char *pack_dir, char **pack_indexes, int nr_packs)
+struct object_id *construct_graph(const char *pack_dir, char **pack_indexes, int nr_packs,
+				  char **commit_hex, int nr_commits)
 {
 	// Find a list of oids, adding the pointer to a list.
 	struct packed_oid_list oids;
@@ -719,10 +720,21 @@ struct object_id *construct_graph(const char *pack_dir, char **pack_indexes, int
 			for_each_object_in_pack(p, if_packed_commit_add_to_list, &oids);
 			close_pack(p);
 		}
-	} else {
-		for_each_packed_object(if_packed_commit_add_to_list, &oids, 0);
 	}
 
+	if (commit_hex) {
+		for (i = 0; i < nr_commits; i++) {
+			const char *end;
+			ALLOC_GROW(oids.list, oids.num + 1, oids.size);
+			oids.list[oids.num] = malloc(sizeof(struct object_id));
+			parse_oid_hex(commit_hex[i], oids.list[oids.num], &end);
+			oids.num++;
+		}
+	}
+
+	if (!pack_indexes && !commit_hex)
+		for_each_packed_object(if_packed_commit_add_to_list, &oids, 0);
+
 	close_reachable(&oids);
 	QSORT(oids.list, oids.num, commit_compare);
 
diff --git a/packed-graph.h b/packed-graph.h
index 97ce1e2652..9c766411be 100644
--- a/packed-graph.h
+++ b/packed-graph.h
@@ -58,7 +58,8 @@ extern struct object_id *get_nth_commit_oid(struct packed_graph *g,
 					    uint32_t n,
 					    struct object_id *oid);
 
-extern struct object_id *construct_graph(const char *pack_dir, char **pack_indexes, int nr_packs);
+extern struct object_id *construct_graph(const char *pack_dir, char **pack_indexes, int nr_packs,
+					 char **commits, int nr_commits);
 extern int close_graph(struct packed_graph *g);
 
 #endif
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
index 8bf5a0c993..b12d2477ba 100755
--- a/t/t5319-graph.sh
+++ b/t/t5319-graph.sh
@@ -226,6 +226,24 @@ test_expect_success 'build graph from latest pack with closure' \
 _graph_git_behavior commits/20 merge/1
 _graph_git_behavior commits/20 merge/2
 
+test_expect_success 'build graph from commits with closure' \
+    'git rev-parse commits/20 >commits-in &&
+     git rev-parse merge/1 >>commits-in &&
+     git rev-parse merge/2 >>commits-in &&
+     graph6=$(cat commits-in | git graph --write --update-head --delete-expired --stdin-commits) &&
+     test_path_is_file ${packdir}/graph-${graph6}.graph &&
+     test_path_is_missing ${packdir}/graph-${graph5}.graph &&
+     test_path_is_file ${packdir}/graph-${graph1}.graph &&
+     test_path_is_file ${packdir}/graph-head &&
+     echo ${graph6} >expect &&
+     cmp -n 40 expect ${packdir}/graph-head &&
+     git graph --read --graph-id=${graph6} >output &&
+     _graph_read_expect "23" "${packdir}" &&
+     cmp expect output'
+
+_graph_git_behavior commits/20 merge/1
+_graph_git_behavior commits/20 merge/2
+
 test_expect_success 'setup bare repo' \
     'cd .. &&
      git clone --bare full bare &&
-- 
2.16.0


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/14] Serialized Commit Graph
  2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
                   ` (13 preceding siblings ...)
  2018-01-25 14:02 ` [PATCH 14/14] packed-graph: teach git-graph to read commits Derrick Stolee
@ 2018-01-25 15:46 ` Ævar Arnfjörð Bjarmason
  2018-01-25 16:09   ` Derrick Stolee
  14 siblings, 1 reply; 49+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-25 15:46 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, gitster, peff, git, sbeller, dstolee


On Thu, Jan 25 2018, Derrick Stolee jotted:

> * 'git log --topo-order -1000' walks all reachable commits to avoid
>   incorrect topological orders, but only needs the commit message for
>   the top 1000 commits.
>
> * 'git merge-base <A> <B>' may walk many commits to find the correct
>   boundary between the commits reachable from A and those reachable
>   from B. No commit messages are needed.
>
> * 'git branch -vv' checks ahead/behind status for all local branches
>   compared to their upstream remote branches. This is essentially as
>   hard as computing merge bases for each.

This is great, spotted / questions so far:

* git graph --blah says you need to enable the config, should say
  "unknown option --blah <help>". I.e. overzelous config guard.

* On a big repo (git show-ref -s | ~/g/git/git-graph --write
  --update-head) is as of writing this still hanging for me, but strace
  shows it's brk()-ing. Presumably just still busy, a progress bar would
  be very nice.

* Shouldn't there be a pack.useGraph option so this gets auto-updated on
  repack? I understand this series is a WIP, so that's more a "is that
  the UI" than "it needs now".

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/14] Serialized Commit Graph
  2018-01-25 15:46 ` [PATCH 00/14] Serialized Commit Graph Ævar Arnfjörð Bjarmason
@ 2018-01-25 16:09   ` Derrick Stolee
  2018-01-25 23:06     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 16:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, gitster, peff, git, sbeller, dstolee

On 1/25/2018 10:46 AM, Ævar Arnfjörð Bjarmason wrote:
> On Thu, Jan 25 2018, Derrick Stolee jotted:
>
>> * 'git log --topo-order -1000' walks all reachable commits to avoid
>>    incorrect topological orders, but only needs the commit message for
>>    the top 1000 commits.
>>
>> * 'git merge-base <A> <B>' may walk many commits to find the correct
>>    boundary between the commits reachable from A and those reachable
>>    from B. No commit messages are needed.
>>
>> * 'git branch -vv' checks ahead/behind status for all local branches
>>    compared to their upstream remote branches. This is essentially as
>>    hard as computing merge bases for each.
> This is great, spotted / questions so far:
>
> * git graph --blah says you need to enable the config, should say
>    "unknown option --blah <help>". I.e. overzelous config guard.

This is a good point.

> * On a big repo (git show-ref -s | ~/g/git/git-graph --write
>    --update-head) is as of writing this still hanging for me, but strace
>    shows it's brk()-ing. Presumably just still busy, a progress bar would
>    be very nice.

Oops! This is my mistake. The correct command should be:

     git show-ref -s | git graph --write --update-head --stdin-commits

Without "--stdin-commits" the command will walk all packed objects
to look for commits and then build the graph. That's why it's taking
so long. That method takes several minutes on the Linux repo, but with
--stdin-commits it should take as long as "git log >/dev/null".

> * Shouldn't there be a pack.useGraph option so this gets auto-updated on
>    repack? I understand this series is a WIP, so that's more a "is that
>    the UI" than "it needs now".

This will definitely be part of a follow-up patch.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/14] graph: add packed graph design document
  2018-01-25 14:02 ` [PATCH 01/14] graph: add packed graph design document Derrick Stolee
@ 2018-01-25 20:04   ` Stefan Beller
  2018-01-26 12:49     ` Derrick Stolee
  2018-01-25 21:14   ` Junio C Hamano
  2018-01-26 14:13   ` Duy Nguyen
  2 siblings, 1 reply; 49+ messages in thread
From: Stefan Beller @ 2018-01-25 20:04 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:
> Add Documentation/technical/packed-graph.txt with details of the planned
> packed graph feature, including future plans.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/technical/packed-graph.txt | 185 +++++++++++++++++++++++++++++++
>  1 file changed, 185 insertions(+)
>  create mode 100644 Documentation/technical/packed-graph.txt
>
> diff --git a/Documentation/technical/packed-graph.txt b/Documentation/technical/packed-graph.txt
> new file mode 100644
> index 0000000000..fcc0c83874
> --- /dev/null
> +++ b/Documentation/technical/packed-graph.txt
> @@ -0,0 +1,185 @@
> +Git Packed Graph Design Notes
> +=============================
> +
> +Git walks the commit graph for many reasons, including:
> +
> +1. Listing and filtering commit history.
> +2. Computing merge bases.
> +
> +These operations can become slow as the commit count grows above 100K.

How did you come up with that specific number? (Is it platform dependent?)
I'd avoid a specific number to not derail the reader here into wondering
how this got measured.

> +The merge base calculation shows up in many user-facing commands, such
> +as 'status' and 'fetch' and can take minutes to compute depending on
> +data shape. There are two main costs here:

status needs the walk for the ahead/behind computation which is (1)?
(I forget how status would need to compute a merge-base)

fetch is a networked command, which traditionally in Git is understood as
"can be slow" because you might be in Australia, or the connection is slow
otherwise. So giving this as an example it is not obvious that the DAG walking
is the bottleneck. Maybe git-merge or "git show --remerge-diff" [1] are better
examples for walk-intensive commands?

[1] https://public-inbox.org/git/cover.1409860234.git.tr@thomasrast.ch/
    never landed, so maybe that is a bad example. But for me that command
    is more obviously dependent on cheap walking the DAG compared to fetch.
    So, take my comments with a grain of salt!

> +1. Decompressing and parsing commits.
> +2. Walking the entire graph to avoid topological order mistakes.
> +
> +The packed graph is a file that stores the commit graph structure along
> +with some extra metadata to speed up graph walks. This format allows a
> +consumer to load the following info for a commit:
> +
> +1. The commit OID.
> +2. The list of parents.
> +3. The commit date.
> +4. The root tree OID.
> +5. An integer ID for fast lookups in the graph.
> +6. The generation number (see definition below).
> +
> +Values 1-4 satisfy the requirements of parse_commit_gently().


This new format is specifically removing the cost of decompression and parsing
(1) completely, whereas (2) we still have to walk the entire graph for now as
the generation numbers are not fully used as of yet, but provided.

> +By providing an integer ID we can avoid lookups in the graph as we walk
> +commits. Specifically, we need to provide the integer ID of the parent
> +commits so we navigate directly to their information on request.

Does this mean we decrease the pressure on fast lookups in
packfiles/loose objects?

> +Define the "generation number" of a commit recursively as follows:
> + * A commit with no parents (a root commit) has generation number 1.
> + * A commit with at least one parent has generation number 1 more than
> +   the largest generation number among its parents.
> +Equivalently, the generation number is one more than the length of a
> +longest path from the commit to a root commit. The recursive definition
> +is easier to use for computation and the following property:
> +
> +    If A and B are commits with generation numbers N and M, respectively,
> +    and N <= M, then A cannot reach B. That is, we know without searching
> +    that B is not an ancestor of A because it is further from a root commit
> +    than A.
> +
> +    Conversely, when checking if A is an ancestor of B, then we only need
> +    to walk commits until all commits on the walk boundary have generation
> +    number at most N. If we walk commits using a priority queue seeded by
> +    generation numbers, then we always expand the boundary commit with highest
> +    generation number and can easily detect the stopping condition.

Thanks for including the definition and potential benefits!

> +
> +This property can be used to significantly reduce the time it takes to
> +walk commits and determine topological relationships. Without generation
> +numbers, the general heuristic is the following:
> +
> +    If A and B are commits with commit time X and Y, respectively, and
> +    X < Y, then A _probably_ cannot reach B.
> +
> +This heuristic is currently used whenever the computation can make
> +mistakes with topological orders (such as "git log" with default order),
> +but is not used when the topological order is required (such as merge
> +base calculations, "git log --graph").
> +
> +Design Details
> +--------------
> +
> +- A graph file is stored in a file named 'graph-<oid>.graph' in the pack
> +  directory.

(guessing)
where every commit up to <oid> is included in the file.

> This could be stored in an alternate.

So I can 'borrow' not just the objects itself, but also the file about object
properties from an alternate. Makes sense, though I wonder if it is
worth stating
this as the first bullet point in the design detail section?

> +- The most-recent graph file OID is stored in a 'graph-head' file for
> +  immediate access and storing backup graphs. This could be stored in an
> +  alternate, and refers to a 'graph-<oid>.graph' file in the same pack
> +  directory.

So we have two copies of the latest in both 'graph-head.graph'
as well as 'graph-<oid>.graph'? Would a link be ok? So instead
we might have a 'graph-head.link' file that just contains 'oid: <oid>' ?

But then again, we can get the oid from HEAD, and these graph files
can contain all information up to multiple tips (i.e. there is just one graph
file needed, even if you have 2 branches, neither of them to be
fast-forwardable to the other).

So maybe I do not understand the <oid> after all, yet.

> +- The core.graph config setting must be on to create or consume graph files.

We already have other local files that speed up things, such as idx files.
Searching our config options for "idx" I find 'pack.indexVersion', which
only controls the write side of idx. Reading idx seems automatic if they are
there, otherwise we fallback to non-idx.

By having the setting a boolean we cannot control the version yet, but we can
later upgrade the config to a boolean-or-int that specifies which graph
version is written out.

Why do we need this config option to control the reading side as well?
AFAICT the user can just delete all graphs if the graphs make problems and
then regenerate them if git is too slow?

As these files need to be manually generated as of this series, I'd further
argue that the reading should be unguarded by the config option.

> +- The graph file is only a supplemental structure. If a user downgrades
> +  or disables the 'core.graph' config setting, then the existing ODB is
> +  sufficient.

(Maybe mention idx files as a comparable design?)

> +- The file format includes parameters for the object id length
> +  and hash algorithm, so a future change of hash algorithm does
> +  not require a change in format.

Cool.

> +
> +Current Limitations
> +-------------------
> +
> +- Only one graph file is used at one time. This allows the integer ID to
> +  seek into the single graph file. It is possible to extend the model
> +  for multiple graph files, but that is currently not part of the design.
> +
> +- .graph files are managed only by the 'graph' builtin. These are not
> +  updated automatically during clone or fetch.
> +
> +- There is no '--verify' option for the 'graph' builtin to verify the
> +  contents of the graph file.

c.f. git index-pack --verify (which is not documented :/)

> +- The graph only considers commits existing in packfiles and does not
> +  walk to fill in reachable commits. [Small]

small?

> +- When rewriting the graph, we do not check for a commit still existing
> +  in the ODB, so garbage collection may remove commits
> +
> +- Generation numbers are not computed in the current version. The file
> +  format supports storing them, along with a mechanism to upgrade from
> +  a file without generation numbers to one that uses them.

Maybe move the definition and explanation of the generation numbers above to
the "Future Work" section?


> +
> +Future Work
> +-----------
> +
> +- The file format includes room for precomputed generation numbers. These
> +  are not currently computed, so all generation numbers will be marked as
> +  0 (or "uncomputed"). A later patch will include this calculation.
> +
> +- The current implementation of the 'graph' builtin walks all packed objects
> +  to find a complete list of commits in packfiles. If some commits are
> +  stored as loose objects, then these do not appear in the graph. This is
> +  handled gracefully by the file format, but it would cause incorrect
> +  generation number calculations. We should implement the construct_graph()
> +  method in a way that walks all commits reachable from some starting set
> +  and then can use complete information for generation numbers. (Some
> +  care must be taken around shallow clones.)
> +
> +- The graph is not currently integrated with grafts.

not integrated? Does that mean we error out on grafts, or gracefully fallback
to a sane thing?

> +- After computing and storing generation numbers, we must make graph
> +  walks aware of generation numbers to gain performance benefits. This
> +  will mostly be accomplished by swapping a commit-date-ordered priority
> +  queue with one ordered by generation number. The following operations
> +  are important candidates:
> +
> +    - paint_down_to_common()
> +    - 'log --topo-order'
> +
> +- The graph currently only adds commits to a previously existing graph.
> +  When writing a new graph, we could check that the ODB still contains
> +  the commits and choose to remove the commits that are deleted from the
> +  ODB. For performance reasons, this check should remain optional.
> +
> +- Currently, parse_commit_gently() requires filling in the root tree
> +  object for a commit. This passes through lookup_tree() and consequently
> +  lookup_object(). Also, it calls lookup_commit() when loading the parents.
> +  These method calls check the ODB for object existence, even if the
> +  consumer does not need the content. For example, we do not need the
> +  tree contents when computing merge bases. Now that commit parsing is
> +  removed from the computation time, these lookup operations are the
> +  slowest operations keeping graph walks from being fast. Consider
> +  loading these objects without verifying their existence in the ODB and
> +  only loading them fully when consumers need them. Consider a method
> +  such as "ensure_tree_loaded(commit)" that fully loads a tree before
> +  using commit->tree.
> +
> +- The current design uses the 'graph' builtin to generate the graph. When
> +  this feature stabilizes enough to recommend to most users, we should
> +  add automatic graph writes to common operations that create many commits.
> +  For example, one coulde compute a graph on 'clone' and 'fetch' commands.

typo "could"

Once we have incrementally-update-able graphs, we certainly want commit/merge
to also write out updates? (or batch them after a long rebase).

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/14] packed-graph: add core.graph setting
  2018-01-25 14:02 ` [PATCH 02/14] packed-graph: add core.graph setting Derrick Stolee
@ 2018-01-25 20:17   ` Stefan Beller
  2018-01-25 20:40     ` Derrick Stolee
  2018-01-25 21:43   ` Junio C Hamano
  1 sibling, 1 reply; 49+ messages in thread
From: Stefan Beller @ 2018-01-25 20:17 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:
> The packed graph feature is controlled by the new core.graph config
> setting. This defaults to 0, so the feature is opt-in.
>
> The intention of core.graph is that a user can always stop checking
> for or parsing packed graph files if core.graph=0.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/config.txt | 3 +++
>  cache.h                  | 1 +
>  config.c                 | 5 +++++
>  environment.c            | 1 +
>  4 files changed, 10 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0e25b2c92b..e7b98fa14f 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -898,6 +898,9 @@ core.notesRef::
>  This setting defaults to "refs/notes/commits", and it can be overridden by
>  the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
>
> +core.graph::
> +       Enable git commit graph feature. Allows writing and reading from .graph files.
> +
>  core.sparseCheckout::
>         Enable "sparse checkout" feature. See section "Sparse checkout" in
>         linkgit:git-read-tree[1] for more information.
> diff --git a/cache.h b/cache.h
> index d8b975a571..655a81ac90 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -825,6 +825,7 @@ extern char *git_replace_ref_base;
>  extern int fsync_object_files;
>  extern int core_preload_index;
>  extern int core_apply_sparse_checkout;
> +extern int core_graph;

Putting it here instead of say the_repository makes sense as you'd want
to use this feature globally. However you can still have the config
different per repository  (e.g. version number of the graph setting,
as one might be optimized for speed and the other for file size of
the .graph file or such).

So not sure if we'd rather want to put this into the repository struct.
But then again the other core settings aren't there either and this
feature sounds like it is repository specific only in the experimental
phase; later it is expected to be on everywhere?

>  extern int precomposed_unicode;
>  extern int protect_hfs;
>  extern int protect_ntfs;
> diff --git a/config.c b/config.c
> index e617c2018d..fee90912d8 100644
> --- a/config.c
> +++ b/config.c
> @@ -1223,6 +1223,11 @@ static int git_default_core_config(const char *var, const char *value)
>                 return 0;
>         }
>
> +       if (!strcmp(var, "core.graph")) {
> +               core_graph = git_config_bool(var, value);
> +               return 0;
> +       }
> +
>         if (!strcmp(var, "core.sparsecheckout")) {
>                 core_apply_sparse_checkout = git_config_bool(var, value);
>                 return 0;
> diff --git a/environment.c b/environment.c
> index 63ac38a46f..0c56a3d869 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -61,6 +61,7 @@ enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
>  char *notes_ref_name;
>  int grafts_replace_parents = 1;
>  int core_apply_sparse_checkout;
> +int core_graph;
>  int merge_log_config = -1;
>  int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
>  unsigned long pack_size_limit_cfg;
> --
> 2.16.0
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/14] packed-graph: add core.graph setting
  2018-01-25 20:17   ` Stefan Beller
@ 2018-01-25 20:40     ` Derrick Stolee
  0 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-25 20:40 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On 1/25/2018 3:17 PM, Stefan Beller wrote:
> On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:
>> The packed graph feature is controlled by the new core.graph config
>> setting. This defaults to 0, so the feature is opt-in.
>>
>> The intention of core.graph is that a user can always stop checking
>> for or parsing packed graph files if core.graph=0.
>>
>> @@ -825,6 +825,7 @@ extern char *git_replace_ref_base;
>>   extern int fsync_object_files;
>>   extern int core_preload_index;
>>   extern int core_apply_sparse_checkout;
>> +extern int core_graph;
> Putting it here instead of say the_repository makes sense as you'd want
> to use this feature globally. However you can still have the config
> different per repository  (e.g. version number of the graph setting,
> as one might be optimized for speed and the other for file size of
> the .graph file or such).
>
> So not sure if we'd rather want to put this into the repository struct.
> But then again the other core settings aren't there either and this
> feature sounds like it is repository specific only in the experimental
> phase; later it is expected to be on everywhere?

I do think that more things should go in the repository struct. 
Unfortunately, that is not the world we live in.

However, to make things clearer I'm following the pattern currently in 
master. You'll see the same with the global 'packed_graph' pointer, 
similar to 'packed_git'. I think these should be paired together until 
the repository absorbs them.

If other 'core_*' variables move to the repository, I'm happy to move 
core_graph.
If 'packed_git' moves to the repository, I'm happy to move 'packed_git'.

However, if there is significant interest in moving all new state to the 
repository, then I'll move these values there. Let's have that 
discussion here instead of spread around the rest of the patch.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/14] graph: add packed graph design document
  2018-01-25 14:02 ` [PATCH 01/14] graph: add packed graph design document Derrick Stolee
  2018-01-25 20:04   ` Stefan Beller
@ 2018-01-25 21:14   ` Junio C Hamano
  2018-01-26 13:06     ` Derrick Stolee
  2018-01-26 14:13   ` Duy Nguyen
  2 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2018-01-25 21:14 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, peff, git, sbeller, dstolee

Derrick Stolee <stolee@gmail.com> writes:

> Add Documentation/technical/packed-graph.txt with details of the planned
> packed graph feature, including future plans.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/technical/packed-graph.txt | 185 +++++++++++++++++++++++++++++++
>  1 file changed, 185 insertions(+)
>  create mode 100644 Documentation/technical/packed-graph.txt

I really wanted to like having this patch at the beginning, but
unfortunatelly I didn't see the actual file format description,
which was a bit disappointing.  An example of the things that I was
curious about was how the "integer ID" is used to access into the
file.  If we could somehow use "integer ID" as an index into an
array of fixed size elements, it would be ideal to gain "fast
lookups", but because of the "list of parents" thing, it needs some
trickery to do so, and that was among the things that I wanted to
see how much thought went into the design, for example.

> diff --git a/Documentation/technical/packed-graph.txt b/Documentation/technical/packed-graph.txt
> new file mode 100644
> index 0000000000..fcc0c83874
> --- /dev/null
> +++ b/Documentation/technical/packed-graph.txt
> @@ -0,0 +1,185 @@
> +Git Packed Graph Design Notes
> +=============================
> +
> +Git walks the commit graph for many reasons, including:
> +
> +1. Listing and filtering commit history.
> +2. Computing merge bases.
> +
> +These operations can become slow as the commit count grows above 100K.
> +The merge base calculation shows up in many user-facing commands, such
> +as 'status' and 'fetch' and can take minutes to compute depending on
> +data shape. There are two main costs here:

s/data shape/history shape/ may make it even clearer.

> +1. The commit OID.
> +2. The list of parents.
> +3. The commit date.
> +4. The root tree OID.
> +5. An integer ID for fast lookups in the graph.
> +6. The generation number (see definition below).
> +
> +Values 1-4 satisfy the requirements of parse_commit_gently().
> +
> +By providing an integer ID we can avoid lookups in the graph as we walk
> +commits. Specifically, we need to provide the integer ID of the parent
> +commits so we navigate directly to their information on request.

Commits created after a packed graph file is built may of course not
appear in a packed graph file, but that is OK because they never need
to be listed as parents of commits in the file.  So "list of parents"
can always refer to the parents using the "integer ID for fast lookup".

Makes sense.  Item 2. might want to say "The list of parents, using
the fast lookup integer ID (see 5.) as reference instead of OID",
though.

> +Define the "generation number" of a commit recursively as follows:
> + * A commit with no parents (a root commit) has generation number 1.
> + * A commit with at least one parent has generation number 1 more than
> +   the largest generation number among its parents.
> +Equivalently, the generation number is one more than the length of a
> +longest path from the commit to a root commit.

When a commit A can reach roots X and Y, and Y is further than X,
the distance between Y and A becomes A's generation number.  "One
more than the length of the path from the commit to the furthest
root commit it can reach", in other words.

> +The recursive definition
> +is easier to use for computation and the following property:
> +
> +    If A and B are commits with generation numbers N and M, respectively,
> +    and N <= M, then A cannot reach B. That is, we know without searching
> +    that B is not an ancestor of A because it is further from a root commit
> +    than A.
> +
> +    Conversely, when checking if A is an ancestor of B, then we only need
> +    to walk commits until all commits on the walk boundary have generation
> +    number at most N. If we walk commits using a priority queue seeded by
> +    generation numbers, then we always expand the boundary commit with highest
> +    generation number and can easily detect the stopping condition.

These are both true.  It would be nice if an optimized walker
algorithm can also deal with history with recent commits for which
we do not yet know the generation numbers (i.e. you first traverse
and assign generation numbers and record in packed graph, then
history grows but we haven't added the new commits to the packed
graph yet).

> +- A graph file is stored in a file named 'graph-<oid>.graph' in the pack
> +  directory. This could be stored in an alternate.

Is that <oid> really an object name?  The <hash> that appears in the
name of a packfile pack-<hash>.pack is *not* an <oid>, and I somehow
suspect that you are doing a similar "use hash of (some) contents to
make it uniquely identify the content", not "information about a
single object that is identified by the <oid>".

> +- The graph file is only a supplemental structure. If a user downgrades
> +  or disables the 'core.graph' config setting, then the existing ODB is
> +  sufficient.

OK, that is exactly what I meant to say in a few paragraphs above
that I wanted to see.

> +Current Limitations
> +-------------------
> +
> +- Only one graph file is used at one time. This allows the integer ID to
> +  seek into the single graph file. It is possible to extend the model
> +  for multiple graph files, but that is currently not part of the design.
> +
> +- .graph files are managed only by the 'graph' builtin. These are not
> +  updated automatically during clone or fetch.

In addition to "clone or fetch", I presume operations that locally
create commits do not automatically create them, right?

> +- After computing and storing generation numbers, we must make graph
> +  walks aware of generation numbers to gain performance benefits. This
> +  will mostly be accomplished by swapping a commit-date-ordered priority
> +  queue with one ordered by generation number. The following operations
> +  are important candidates:
> +
> +    - paint_down_to_common()
> +    - 'log --topo-order'

Do you mean that this round only writes "graph" without any actualy
consumers?  It is somewhat hard to assess the value of what is
stored in the new file without the consumers.

Anyway, thanks for starting this.  Let's read on.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/14] packed-graph: add core.graph setting
  2018-01-25 14:02 ` [PATCH 02/14] packed-graph: add core.graph setting Derrick Stolee
  2018-01-25 20:17   ` Stefan Beller
@ 2018-01-25 21:43   ` Junio C Hamano
  2018-01-26 13:08     ` Derrick Stolee
  1 sibling, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2018-01-25 21:43 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, peff, git, sbeller, dstolee

Derrick Stolee <stolee@gmail.com> writes:

> The packed graph feature is controlled by the new core.graph config
> setting. This defaults to 0, so the feature is opt-in.
>
> The intention of core.graph is that a user can always stop checking
> for or parsing packed graph files if core.graph=0.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/config.txt | 3 +++
>  cache.h                  | 1 +
>  config.c                 | 5 +++++
>  environment.c            | 1 +
>  4 files changed, 10 insertions(+)

Before you get too married to the name "graph", is it reasonable to
assume that the commit ancestry graph is the primary "graph" that
should come to users' minds when a simple word "graph" is used in
the context of discussing Git?  I suspect not.

Let's not just call this "core.graph" and "packed-graph", and in
addition give some adjective before "graph".




> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0e25b2c92b..e7b98fa14f 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -898,6 +898,9 @@ core.notesRef::
>  This setting defaults to "refs/notes/commits", and it can be overridden by
>  the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
>  
> +core.graph::
> +	Enable git commit graph feature. Allows writing and reading from .graph files.
> +
>  core.sparseCheckout::
>  	Enable "sparse checkout" feature. See section "Sparse checkout" in
>  	linkgit:git-read-tree[1] for more information.
> diff --git a/cache.h b/cache.h
> index d8b975a571..655a81ac90 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -825,6 +825,7 @@ extern char *git_replace_ref_base;
>  extern int fsync_object_files;
>  extern int core_preload_index;
>  extern int core_apply_sparse_checkout;
> +extern int core_graph;
>  extern int precomposed_unicode;
>  extern int protect_hfs;
>  extern int protect_ntfs;
> diff --git a/config.c b/config.c
> index e617c2018d..fee90912d8 100644
> --- a/config.c
> +++ b/config.c
> @@ -1223,6 +1223,11 @@ static int git_default_core_config(const char *var, const char *value)
>  		return 0;
>  	}
>  
> +	if (!strcmp(var, "core.graph")) {
> +		core_graph = git_config_bool(var, value);
> +		return 0;
> +	}
> +
>  	if (!strcmp(var, "core.sparsecheckout")) {
>  		core_apply_sparse_checkout = git_config_bool(var, value);
>  		return 0;
> diff --git a/environment.c b/environment.c
> index 63ac38a46f..0c56a3d869 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -61,6 +61,7 @@ enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
>  char *notes_ref_name;
>  int grafts_replace_parents = 1;
>  int core_apply_sparse_checkout;
> +int core_graph;
>  int merge_log_config = -1;
>  int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
>  unsigned long pack_size_limit_cfg;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/14] packed-graph: create git-graph builtin
  2018-01-25 14:02 ` [PATCH 03/14] packed-graph: create git-graph builtin Derrick Stolee
@ 2018-01-25 21:45   ` Stefan Beller
  2018-01-26 13:13     ` Derrick Stolee
  2018-01-25 23:01   ` Junio C Hamano
  1 sibling, 1 reply; 49+ messages in thread
From: Stefan Beller @ 2018-01-25 21:45 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:
> Teach Git the 'graph' builtin that will be used for writing and
> reading packed graph files. The current implementation is mostly
> empty, except for a check that the core.graph setting is enabled
> and a '--pack-dir' option.

I wonder if this builtin should not respect the boolean core graph,
as this new builtin commands' whole existence
is to deal with these new files?

As you assume this builtin as a plumbing command, I would
expect it to pay less attention to config rather than more.


> @@ -408,6 +408,7 @@ static struct cmd_struct commands[] = {
>         { "fsck-objects", cmd_fsck, RUN_SETUP },
>         { "gc", cmd_gc, RUN_SETUP },
>         { "get-tar-commit-id", cmd_get_tar_commit_id },
> +       { "graph", cmd_graph, RUN_SETUP_GENTLY },

Why gently, though?

From reading the docs (and assumptions on further patches)
we'd want to abort if there is no .git dir to be found?

Or is a future patch having manual logic? (e.g. if pack-dir is
given, the command may be invoked from outside a git dir)

Stefan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/14] packed-graph: add format document
  2018-01-25 14:02 ` [PATCH 04/14] packed-graph: add format document Derrick Stolee
@ 2018-01-25 22:06   ` Junio C Hamano
  2018-01-25 22:18     ` Stefan Beller
  2018-01-25 22:07   ` Stefan Beller
  1 sibling, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2018-01-25 22:06 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, peff, git, sbeller, dstolee

Derrick Stolee <stolee@gmail.com> writes:

> Add document specifying the binary format for packed graphs. This
> format allows for:
>
> * New versions.
> * New hash functions and hash lengths.
> * Optional extensions.
>
> Basic header information is followed by a binary table of contents
> into "chunks" that include:
>
> * An ordered list of commit object IDs.
> * A 256-entry fanout into that list of OIDs.
> * A list of metadata for the commits.
> * A list of "large edges" to enable octopus merges.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/technical/graph-format.txt | 88 ++++++++++++++++++++++++++++++++
>  1 file changed, 88 insertions(+)
>  create mode 100644 Documentation/technical/graph-format.txt
>
> diff --git a/Documentation/technical/graph-format.txt b/Documentation/technical/graph-format.txt
> new file mode 100644
> index 0000000000..a15e1036d7
> --- /dev/null
> +++ b/Documentation/technical/graph-format.txt
> @@ -0,0 +1,88 @@
> +Git commit graph format
> +=======================

Good that this is not saying "graph format" but is explicit that it
is about "commit".  Do the same for the previous steps.  Especially,
builtin/graph.c that does not have much to do with graph.c is not a
good way forward ;-)

I do like the fact that later parents of octopus merges are moved
out of way to make the majority of records fixed length, but I am
not sure if the "up to two parents are recorded in line" is truly
the best arrangement.  Aren't majority of commits single-parent,
thereby wasting 4 bytes almost always?

Will 32-bit stay to be enough for everybody?  Wouldn't it make sense
to at least define them to be indices into arrays (i.e. scaled to
element size), not "offsets", to recover a few lost bits?

What's the point of storing object id length?  If you do not
understand the object ID scheme, knowing only the length would not
do you much good anyway, no?  And if you know the hashing scheme
specified by Object ID version, you already know the length, no?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/14] packed-graph: add format document
  2018-01-25 14:02 ` [PATCH 04/14] packed-graph: add format document Derrick Stolee
  2018-01-25 22:06   ` Junio C Hamano
@ 2018-01-25 22:07   ` Stefan Beller
  2018-01-26 13:25     ` Derrick Stolee
  1 sibling, 1 reply; 49+ messages in thread
From: Stefan Beller @ 2018-01-25 22:07 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:
> Add document specifying the binary format for packed graphs. This
> format allows for:
>
> * New versions.
> * New hash functions and hash lengths.
> * Optional extensions.
>
> Basic header information is followed by a binary table of contents
> into "chunks" that include:
>
> * An ordered list of commit object IDs.
> * A 256-entry fanout into that list of OIDs.
> * A list of metadata for the commits.
> * A list of "large edges" to enable octopus merges.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/technical/graph-format.txt | 88 ++++++++++++++++++++++++++++++++

So this is different from Documentation/technical/packed-graph.txt,
which gives high level design and this gives the details on how
to set bits.

>  1 file changed, 88 insertions(+)
>  create mode 100644 Documentation/technical/graph-format.txt
>
> diff --git a/Documentation/technical/graph-format.txt b/Documentation/technical/graph-format.txt
> new file mode 100644
> index 0000000000..a15e1036d7
> --- /dev/null
> +++ b/Documentation/technical/graph-format.txt
> @@ -0,0 +1,88 @@
> +Git commit graph format
> +=======================
> +
> +The Git commit graph stores a list of commit OIDs and some associated
> +metadata, including:
> +
> +- The generation number of the commit. Commits with no parents have
> +  generation number 1; commits with parents have generation number
> +  one more than the maximum generation number of its parents. We
> +  reserve zero as special, and can be used to mark a generation
> +  number invalid or as "not computed".
> +
> +- The root tree OID.
> +
> +- The commit date.
> +
> +- The parents of the commit, stored using positional references within
> +  the graph file.
> +
> +== graph-*.graph files have the following format:
> +
> +In order to allow extensions that add extra data to the graph, we organize
> +the body into "chunks" and provide a binary lookup table at the beginning
> +of the body. The header includes certain values, such as number of chunks,
> +hash lengths and types.
> +
> +All 4-byte numbers are in network order.
> +
> +HEADER:
> +
> +       4-byte signature:
> +           The signature is: {'C', 'G', 'P', 'H'}
> +
> +       1-byte version number:
> +           Currently, the only valid version is 1.
> +
> +       1-byte Object Id Version (1 = SHA-1)
> +
> +       1-byte Object Id Length (H)

  This is 20 or 40 for sha1 ? (binary or text representation?)

> +       1-byte number (C) of "chunks"
> +
> +CHUNK LOOKUP:
> +
> +       (C + 1) * 12 bytes listing the table of contents for the chunks:
> +           First 4 bytes describe chunk id. Value 0 is a terminating label.
> +           Other 8 bytes provide offset in current file for chunk to start.

... offset [in bytes/words/4k blocks?] in ...


> +           (Chunks are ordered contiguously in the file, so you can infer
> +               the length using the next chunk position if necessary.)
> +
> +       The remaining data in the body is described one chunk at a time, and
> +       these chunks may be given in any order. Chunks are required unless
> +       otherwise specified.

> +
> +CHUNK DATA:
> +
> +       OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
> +           The ith entry, F[i], stores the number of OIDs with first
> +           byte at most i. Thus F[255] stores the total
> +           number of commits (N).

So F[0] > 0 for git.git for example.

Or another way: To lookup a 01xxx, I need to look at
entry(F[00] + 1 )...entry(F[01]).

Makes sense.

> +
> +       OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
> +           The OIDs for all commits in the graph.

... sorted ascending.


> +       Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
> +           * The first H bytes are for the OID of the root tree.
> +           * The next 8 bytes are for the int-ids of the first two parents of
> +             the ith commit. Stores value 0xffffffff if no parent in that position.
> +             If there are more than two parents, the second value has its most-
> +             significant bit on and the other bits store an offset into the Large
> +             Edge List chunk.

s/an offset into/position in/ ? (otherwise offset in bytes?)

> +           * The next 8 bytes store the generation number of the commit and the
> +             commit time in seconds since EPOCH. The generation number uses the
> +             higher 30 bits of the first 4 bytes, while the commit time uses the
> +             32 bits of the second 4 bytes, along with the lowest 2 bits of the
> +             lowest byte, storing the 33rd and 34th bit of the commit time.

This allows for a maximum generation number of
1.073.741.823 (2^30 -1) = 1 billion,
and a max time stamp of later than 2100.

Do you allow negative time stamps?


> +
> +       [Optional] Large Edge List (ID: {'E', 'D', 'G', 'E'})
> +           This list of 4-byte values store the second through nth parents for
> +           all octoput merges. The second parent value in the commit data is a

octopus

> +           negative number pointing into this list. Then iterate through this
> +           list starting at that position until reaching a value with the most-
> +           significant bit on. The other bits correspond to the int-id of the
> +           last parent.
> +
> +TRAILER:
> +
> +       H-byte HASH-checksum of all of the above.
> --
> 2.16.0
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/14] packed-graph: add format document
  2018-01-25 22:06   ` Junio C Hamano
@ 2018-01-25 22:18     ` Stefan Beller
  2018-01-25 22:29       ` Junio C Hamano
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Beller @ 2018-01-25 22:18 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, git, Jeff King, Jeff Hostetler, Derrick Stolee

On Thu, Jan 25, 2018 at 2:06 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> Add document specifying the binary format for packed graphs. This
>> format allows for:
>>
>> * New versions.
>> * New hash functions and hash lengths.
>> * Optional extensions.
>>
>> Basic header information is followed by a binary table of contents
>> into "chunks" that include:
>>
>> * An ordered list of commit object IDs.
>> * A 256-entry fanout into that list of OIDs.
>> * A list of metadata for the commits.
>> * A list of "large edges" to enable octopus merges.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>  Documentation/technical/graph-format.txt | 88 ++++++++++++++++++++++++++++++++
>>  1 file changed, 88 insertions(+)
>>  create mode 100644 Documentation/technical/graph-format.txt
>>
>> diff --git a/Documentation/technical/graph-format.txt b/Documentation/technical/graph-format.txt
>> new file mode 100644
>> index 0000000000..a15e1036d7
>> --- /dev/null
>> +++ b/Documentation/technical/graph-format.txt
>> @@ -0,0 +1,88 @@
>> +Git commit graph format
>> +=======================
>
> Good that this is not saying "graph format" but is explicit that it
> is about "commit".  Do the same for the previous steps.  Especially,
> builtin/graph.c that does not have much to do with graph.c is not a
> good way forward ;-)
>
> I do like the fact that later parents of octopus merges are moved
> out of way to make the majority of records fixed length, but I am
> not sure if the "up to two parents are recorded in line" is truly
> the best arrangement.  Aren't majority of commits single-parent,
> thereby wasting 4 bytes almost always?

git.git has ~37k non-merge commits and ~12k merge commits,
(35 of them have 3 or more parents).

So 75% would waste the 4 bytes of the second parent.

However the first parent is still there, so any operation that only needs
the first parent (git bisect --first-parent?) would still be fast.
Not sure if that is common.

The downside of just having one parent or pointer into the edge list
would be to penalize 25% of the commit lookups with an indirection
compared to ~0% (the 35 octopus'). I'd rather want to optimize for
speed than disk size? (4 bytes for 37k is 145kB for git.git, which I
find is not a lot).

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/14] packed-graph: add format document
  2018-01-25 22:18     ` Stefan Beller
@ 2018-01-25 22:29       ` Junio C Hamano
  2018-01-26 13:22         ` Derrick Stolee
  0 siblings, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2018-01-25 22:29 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Derrick Stolee, git, Jeff King, Jeff Hostetler, Derrick Stolee

Stefan Beller <sbeller@google.com> writes:

> The downside of just having one parent or pointer into the edge list
> would be to penalize 25% of the commit lookups with an indirection
> compared to ~0% (the 35 octopus'). I'd rather want to optimize for
> speed than disk size? (4 bytes for 37k is 145kB for git.git, which I
> find is not a lot).

My comment is not about disk size but is about the size of working
set (or "size of array element").

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/14] packed-graph: create git-graph builtin
  2018-01-25 14:02 ` [PATCH 03/14] packed-graph: create git-graph builtin Derrick Stolee
  2018-01-25 21:45   ` Stefan Beller
@ 2018-01-25 23:01   ` Junio C Hamano
  2018-01-26 13:14     ` Derrick Stolee
  1 sibling, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2018-01-25 23:01 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, peff, git, sbeller, dstolee

Derrick Stolee <stolee@gmail.com> writes:

> Teach Git the 'graph' builtin that will be used for writing and
> reading packed graph files. The current implementation is mostly
> empty, except for a check that the core.graph setting is enabled
> and a '--pack-dir' option.

Just to set my expectation straight.

Is it fair to say that in the ideal endgame state, this will be like
"git pack-objects" in that end users won't have to know about it,
but would serve as a crucial building block that is invoked by other
front-end commands that are more familiar to end users (just like
pack-objects are used behind the scenes by repack, push, etc.)?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/14] Serialized Commit Graph
  2018-01-25 16:09   ` Derrick Stolee
@ 2018-01-25 23:06     ` Ævar Arnfjörð Bjarmason
  2018-01-26 12:15       ` Derrick Stolee
  0 siblings, 1 reply; 49+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-25 23:06 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, gitster, peff, git, sbeller, dstolee


On Thu, Jan 25 2018, Derrick Stolee jotted:

> On 1/25/2018 10:46 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Jan 25 2018, Derrick Stolee jotted:
>>
>>> * 'git log --topo-order -1000' walks all reachable commits to avoid
>>>    incorrect topological orders, but only needs the commit message for
>>>    the top 1000 commits.
>>>
>>> * 'git merge-base <A> <B>' may walk many commits to find the correct
>>>    boundary between the commits reachable from A and those reachable
>>>    from B. No commit messages are needed.
>>>
>>> * 'git branch -vv' checks ahead/behind status for all local branches
>>>    compared to their upstream remote branches. This is essentially as
>>>    hard as computing merge bases for each.
>> This is great, spotted / questions so far:
>>
>> * git graph --blah says you need to enable the config, should say
>>    "unknown option --blah <help>". I.e. overzelous config guard.
>
> This is a good point.
>
>> * On a big repo (git show-ref -s | ~/g/git/git-graph --write
>>    --update-head) is as of writing this still hanging for me, but strace
>>    shows it's brk()-ing. Presumably just still busy, a progress bar would
>>    be very nice.
>
> Oops! This is my mistake. The correct command should be:
>
>  git show-ref -s | git graph --write --update-head --stdin-commits
>
> Without "--stdin-commits" the command will walk all packed objects
> to look for commits and then build the graph. That's why it's taking
> so long. That method takes several minutes on the Linux repo, but with
> --stdin-commits it should take as long as "git log >/dev/null".

Thanks, it took around 15m to finish with the command I initially ran on
my test repo.

Then the `merge-base --is-ancestor` performance problem I was
complaining about in
https://public-inbox.org/git/87608bawoa.fsf@evledraar.gmail.com/ takes
around 1s with your series, 5s without it. Nice.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/14] packed-graph: implement construct_graph()
  2018-01-25 14:02 ` [PATCH 05/14] packed-graph: implement construct_graph() Derrick Stolee
@ 2018-01-25 23:21   ` Stefan Beller
  2018-01-26 20:47     ` Junio C Hamano
  2018-01-26 20:55   ` Junio C Hamano
  1 sibling, 1 reply; 49+ messages in thread
From: Stefan Beller @ 2018-01-25 23:21 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:

> +struct object_id *construct_graph(const char *pack_dir)
> +{
> +       // Find a list of oids, adding the pointer to a list.

Comment style.
Here is where the bike shedding begins. ;)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/14] packed-graph: implement git-graph --write
  2018-01-25 14:02 ` [PATCH 06/14] packed-graph: implement git-graph --write Derrick Stolee
@ 2018-01-25 23:28   ` Stefan Beller
  2018-01-26 13:28     ` Derrick Stolee
  0 siblings, 1 reply; 49+ messages in thread
From: Stefan Beller @ 2018-01-25 23:28 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:

> +------------------------------------------------
> +$ git midx --write

midx?

> +test_done

The tests basically tests that there is no segfault?
Makes sense.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 09/14] packed-graph: implement git-graph --clear
  2018-01-25 14:02 ` [PATCH 09/14] packed-graph: implement git-graph --clear Derrick Stolee
@ 2018-01-25 23:35   ` Stefan Beller
  0 siblings, 0 replies; 49+ messages in thread
From: Stefan Beller @ 2018-01-25 23:35 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:
> Teach Git to delete the current 'graph_head' file and the packed graph
> it references. This is a good safety valve if somehow the file is
> corrupted and needs to be recalculated. Since the packed graph is a
> summary of contents already in the ODB, it can be regenerated.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>


>  static int graph_read(void)
>  {
>         struct object_id graph_oid;
> @@ -105,6 +130,8 @@ int cmd_graph(int argc, const char **argv, const char *prefix)
>                 { OPTION_STRING, 'p', "pack-dir", &opts.pack_dir,
>                         N_("dir"),
>                         N_("The pack directory to store the graph") },
> +               OPT_BOOL('c', "clear", &opts.clear,
> +                       N_("clear graph file and graph-head")),

Given the docs building up a large list of "Cannot be combined with",
maybe these OPT_BOOLS want to be OPT_CMDMODE ?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/14] Serialized Commit Graph
  2018-01-25 23:06     ` Ævar Arnfjörð Bjarmason
@ 2018-01-26 12:15       ` Derrick Stolee
  0 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-26 12:15 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, gitster, peff, git, sbeller, dstolee

On 1/25/2018 6:06 PM, Ævar Arnfjörð Bjarmason wrote:
> On Thu, Jan 25 2018, Derrick Stolee jotted:
>> Oops! This is my mistake. The correct command should be:
>>
>>   git show-ref -s | git graph --write --update-head --stdin-commits
>>
>> Without "--stdin-commits" the command will walk all packed objects
>> to look for commits and then build the graph. That's why it's taking
>> so long. That method takes several minutes on the Linux repo, but with
>> --stdin-commits it should take as long as "git log >/dev/null".
> Thanks, it took around 15m to finish with the command I initially ran on
> my test repo.
>
> Then the `merge-base --is-ancestor` performance problem I was
> complaining about in
> https://public-inbox.org/git/87608bawoa.fsf@evledraar.gmail.com/ takes
> around 1s with your series, 5s without it. Nice.

Thanks for testing this! May I ask how many commits are in your repo? 
One way to find out is to run 'git graph --read' and it will tell you 
how many commits are in the serialized graph.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/14] graph: add packed graph design document
  2018-01-25 20:04   ` Stefan Beller
@ 2018-01-26 12:49     ` Derrick Stolee
  2018-01-26 18:17       ` Stefan Beller
  0 siblings, 1 reply; 49+ messages in thread
From: Derrick Stolee @ 2018-01-26 12:49 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On 1/25/2018 3:04 PM, Stefan Beller wrote:
> On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:
>> Add Documentation/technical/packed-graph.txt with details of the planned
>> packed graph feature, including future plans.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   Documentation/technical/packed-graph.txt | 185 +++++++++++++++++++++++++++++++
>>   1 file changed, 185 insertions(+)
>>   create mode 100644 Documentation/technical/packed-graph.txt
>>
>> diff --git a/Documentation/technical/packed-graph.txt b/Documentation/technical/packed-graph.txt
>> new file mode 100644
>> index 0000000000..fcc0c83874
>> --- /dev/null
>> +++ b/Documentation/technical/packed-graph.txt
>> @@ -0,0 +1,185 @@
>> +Git Packed Graph Design Notes
>> +=============================
>> +
>> +Git walks the commit graph for many reasons, including:
>> +
>> +1. Listing and filtering commit history.
>> +2. Computing merge bases.
>> +
>> +These operations can become slow as the commit count grows above 100K.
> How did you come up with that specific number? (Is it platform dependent?)
> I'd avoid a specific number to not derail the reader here into wondering
> how this got measured.

Using a specific number was a mistake. Git can walk ~100K commits per 
second by parsing commits, in my tests on my machine. I'll instead say 
"commit count grows."

>> +The merge base calculation shows up in many user-facing commands, such
>> +as 'status' and 'fetch' and can take minutes to compute depending on
>> +data shape. There are two main costs here:
> status needs the walk for the ahead/behind computation which is (1)?
> (I forget how status would need to compute a merge-base)

'status' computes the ahead/behind counts using paint_down_to_common(). 
This is a more robust method than simply computing merge bases, but the 
possible merge bases are found as a result.

> fetch is a networked command, which traditionally in Git is understood as
> "can be slow" because you might be in Australia, or the connection is slow
> otherwise. So giving this as an example it is not obvious that the DAG walking
> is the bottleneck. Maybe git-merge or "git show --remerge-diff" [1] are better
> examples for walk-intensive commands?
>
> [1] https://public-inbox.org/git/cover.1409860234.git.tr@thomasrast.ch/
>      never landed, so maybe that is a bad example. But for me that command
>      is more obviously dependent on cheap walking the DAG compared to fetch.
>      So, take my comments with a grain of salt!

Actually, a 'fetch' command does the same ahead/behind calculation as 
'status', and in GVFS repos we have seen that walk take 30s per branch 
when comparing local and remote copies a fast-moving branch. Yes, there 
are other (usually) more expensive things in 'fetch' so I'll drop that 
reference..

>> +1. Decompressing and parsing commits.
>> +2. Walking the entire graph to avoid topological order mistakes.
>> +
>> +The packed graph is a file that stores the commit graph structure along
>> +with some extra metadata to speed up graph walks. This format allows a
>> +consumer to load the following info for a commit:
>> +
>> +1. The commit OID.
>> +2. The list of parents.
>> +3. The commit date.
>> +4. The root tree OID.
>> +5. An integer ID for fast lookups in the graph.
>> +6. The generation number (see definition below).
>> +
>> +Values 1-4 satisfy the requirements of parse_commit_gently().
>
> This new format is specifically removing the cost of decompression and parsing
> (1) completely, whereas (2) we still have to walk the entire graph for now as
> the generation numbers are not fully used as of yet, but provided.

A major goal of this work is to provide a place to store computed 
generation numbers so we can not walk the entire graph. I mention this 
here because 'git log -<n>' is O(n) (due to commit-date heuristics that 
prevent walking the entire graph) while 'git log --topo-order -<n>' is 
O(T) where T is the total number of reachable commits.

>> +By providing an integer ID we can avoid lookups in the graph as we walk
>> +commits. Specifically, we need to provide the integer ID of the parent
>> +commits so we navigate directly to their information on request.
> Does this mean we decrease the pressure on fast lookups in
> packfiles/loose objects?

Yes, we do. In fact, when profiling 'git log --topo-order -1000', I 
noticed that 30-50% of the time (after this patch) is spent in 
lookup_tree(). If we can prevent checking the ODB for the existence of 
these trees until they are needed, we can get additional speedups. It is 
a bit wasteful that we are loading these trees even when we will never 
use them (such as computing merge bases).

>
>> +Define the "generation number" of a commit recursively as follows:
>> + * A commit with no parents (a root commit) has generation number 1.
>> + * A commit with at least one parent has generation number 1 more than
>> +   the largest generation number among its parents.
>> +Equivalently, the generation number is one more than the length of a
>> +longest path from the commit to a root commit. The recursive definition
>> +is easier to use for computation and the following property:
>> +
>> +    If A and B are commits with generation numbers N and M, respectively,
>> +    and N <= M, then A cannot reach B. That is, we know without searching
>> +    that B is not an ancestor of A because it is further from a root commit
>> +    than A.
>> +
>> +    Conversely, when checking if A is an ancestor of B, then we only need
>> +    to walk commits until all commits on the walk boundary have generation
>> +    number at most N. If we walk commits using a priority queue seeded by
>> +    generation numbers, then we always expand the boundary commit with highest
>> +    generation number and can easily detect the stopping condition.
> Thanks for including the definition and potential benefits!
>
>> +
>> +This property can be used to significantly reduce the time it takes to
>> +walk commits and determine topological relationships. Without generation
>> +numbers, the general heuristic is the following:
>> +
>> +    If A and B are commits with commit time X and Y, respectively, and
>> +    X < Y, then A _probably_ cannot reach B.
>> +
>> +This heuristic is currently used whenever the computation can make
>> +mistakes with topological orders (such as "git log" with default order),
>> +but is not used when the topological order is required (such as merge
>> +base calculations, "git log --graph").
>> +
>> +Design Details
>> +--------------
>> +
>> +- A graph file is stored in a file named 'graph-<oid>.graph' in the pack
>> +  directory.
> (guessing)
> where every commit up to <oid> is included in the file.

Sorry, the <oid> is the hash of the graph contents (up to its trailing 
bytes that contain <oid> in binary).

>> This could be stored in an alternate.
> So I can 'borrow' not just the objects itself, but also the file about object
> properties from an alternate. Makes sense, though I wonder if it is
> worth stating
> this as the first bullet point in the design detail section?

I could bring it up earlier. For GVFS, we plan to place the graph next 
to the packfiles in the "gitObjectCache" which is an alternate for all 
enlistements in the same drive.

>
>> +- The most-recent graph file OID is stored in a 'graph-head' file for
>> +  immediate access and storing backup graphs. This could be stored in an
>> +  alternate, and refers to a 'graph-<oid>.graph' file in the same pack
>> +  directory.
> So we have two copies of the latest in both 'graph-head.graph'
> as well as 'graph-<oid>.graph'? Would a link be ok? So instead
> we might have a 'graph-head.link' file that just contains 'oid: <oid>' ?
>
> But then again, we can get the oid from HEAD, and these graph files
> can contain all information up to multiple tips (i.e. there is just one graph
> file needed, even if you have 2 branches, neither of them to be
> fast-forwardable to the other).
>
> So maybe I do not understand the <oid> after all, yet.

Perhaps I should instead use <hash> instead of <oid>, since it is not an 
actual Git object but instead the hash of the graph contents. The 
'graph_head' file simply stores the <oid> as plain-text hex characters.

When updating the graph using 'git graph --write --update-head 
--delete-expired', we follow this pattern:

1. write new graph-<hash2>.graph file
2. write graph_head to contain <hash2>
3. delete old graph-<hash1>.graph file.

>> +- The core.graph config setting must be on to create or consume graph files.
> We already have other local files that speed up things, such as idx files.
> Searching our config options for "idx" I find 'pack.indexVersion', which
> only controls the write side of idx. Reading idx seems automatic if they are
> there, otherwise we fallback to non-idx.
>
> By having the setting a boolean we cannot control the version yet, but we can
> later upgrade the config to a boolean-or-int that specifies which graph
> version is written out.
>
> Why do we need this config option to control the reading side as well?
> AFAICT the user can just delete all graphs if the graphs make problems and
> then regenerate them if git is too slow?
>
> As these files need to be manually generated as of this series, I'd further
> argue that the reading should be unguarded by the config option.

I see this config option as temporary as the feature is hardened. In the 
(hopefully unlikely) case of a bug, I'd rather unblock a user by giving 
them a config setting to disable instead of telling them to downgrade. 
As the feature is more robust and covered by other test cases (such as 
when the graph is automatically generated during repack or after a 
fetch) then we can afford to drop this "kill switch".

>> +- The graph file is only a supplemental structure. If a user downgrades
>> +  or disables the 'core.graph' config setting, then the existing ODB is
>> +  sufficient.
> (Maybe mention idx files as a comparable design?)

I was actually thinking this is more comparable to the reachability 
bitmaps. The only difference is that the graph is not paired to a 
packfile. While dropping an IDX file is recoverable through 'git 
index-pack', the contents of that packfile are not available until you 
do so. If you run 'git graph --clear' and delete your graph file, all 
git commands will work as before.

>> +- The file format includes parameters for the object id length
>> +  and hash algorithm, so a future change of hash algorithm does
>> +  not require a change in format.
> Cool.
>
>> +
>> +Current Limitations
>> +-------------------
>> +
>> +- Only one graph file is used at one time. This allows the integer ID to
>> +  seek into the single graph file. It is possible to extend the model
>> +  for multiple graph files, but that is currently not part of the design.
>> +
>> +- .graph files are managed only by the 'graph' builtin. These are not
>> +  updated automatically during clone or fetch.
>> +
>> +- There is no '--verify' option for the 'graph' builtin to verify the
>> +  contents of the graph file.
> c.f. git index-pack --verify (which is not documented :/)
>
>> +- The graph only considers commits existing in packfiles and does not
>> +  walk to fill in reachable commits. [Small]
> small?

Sorry for this in two ways: I left this "small" note to myself, and then 
forgot to remove it from the limitations after I implemented it (see 
patch 13/14).

>> +- When rewriting the graph, we do not check for a commit still existing
>> +  in the ODB, so garbage collection may remove commits
>> +
>> +- Generation numbers are not computed in the current version. The file
>> +  format supports storing them, along with a mechanism to upgrade from
>> +  a file without generation numbers to one that uses them.
> Maybe move the definition and explanation of the generation numbers above to
> the "Future Work" section?

I'm hesitant to move them to "Future Work" because generation numbers 
are so important to the design of the feature. If this design could not 
provide generation numbers as an easy second patch, then the design is 
bad and should not be used.

>> +
>> +Future Work
>> +-----------
>> +
>> +- The file format includes room for precomputed generation numbers. These
>> +  are not currently computed, so all generation numbers will be marked as
>> +  0 (or "uncomputed"). A later patch will include this calculation.
>> +
>> +- The current implementation of the 'graph' builtin walks all packed objects
>> +  to find a complete list of commits in packfiles. If some commits are
>> +  stored as loose objects, then these do not appear in the graph. This is
>> +  handled gracefully by the file format, but it would cause incorrect
>> +  generation number calculations. We should implement the construct_graph()
>> +  method in a way that walks all commits reachable from some starting set
>> +  and then can use complete information for generation numbers. (Some
>> +  care must be taken around shallow clones.)
>> +
>> +- The graph is not currently integrated with grafts.
> not integrated? Does that mean we error out on grafts, or gracefully fallback
> to a sane thing?

I have no idea what would happen if you tried to use this commit graph 
feature with grafts enabled. I'm not familiar enough with the feature to 
know how it works and to plan for it.

The issue right now is that there is graft code in parse_commit_buffer() 
that I did not duplicate in parse_packed_commit() [see patch 11/14]. 
Since currently grafts and packed-grafts are opt-in features that need 
to be set by a user (i.e. you do not get them automatically by cloning) 
I thought it sufficient to say these features are incompatible.

>> +- After computing and storing generation numbers, we must make graph
>> +  walks aware of generation numbers to gain performance benefits. This
>> +  will mostly be accomplished by swapping a commit-date-ordered priority
>> +  queue with one ordered by generation number. The following operations
>> +  are important candidates:
>> +
>> +    - paint_down_to_common()
>> +    - 'log --topo-order'
>> +
>> +- The graph currently only adds commits to a previously existing graph.
>> +  When writing a new graph, we could check that the ODB still contains
>> +  the commits and choose to remove the commits that are deleted from the
>> +  ODB. For performance reasons, this check should remain optional.
>> +
>> +- Currently, parse_commit_gently() requires filling in the root tree
>> +  object for a commit. This passes through lookup_tree() and consequently
>> +  lookup_object(). Also, it calls lookup_commit() when loading the parents.
>> +  These method calls check the ODB for object existence, even if the
>> +  consumer does not need the content. For example, we do not need the
>> +  tree contents when computing merge bases. Now that commit parsing is
>> +  removed from the computation time, these lookup operations are the
>> +  slowest operations keeping graph walks from being fast. Consider
>> +  loading these objects without verifying their existence in the ODB and
>> +  only loading them fully when consumers need them. Consider a method
>> +  such as "ensure_tree_loaded(commit)" that fully loads a tree before
>> +  using commit->tree.
>> +
>> +- The current design uses the 'graph' builtin to generate the graph. When
>> +  this feature stabilizes enough to recommend to most users, we should
>> +  add automatic graph writes to common operations that create many commits.
>> +  For example, one coulde compute a graph on 'clone' and 'fetch' commands.
> typo "could"
>
> Once we have incrementally-update-able graphs, we certainly want commit/merge
> to also write out updates? (or batch them after a long rebase).

I think batching will always be a good idea. It is feasible to have some 
threshold where there are enough reachable commits that are not in the 
graph to trigger a write (incremental or not). My gut feel is that 
number is at least 100.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/14] graph: add packed graph design document
  2018-01-25 21:14   ` Junio C Hamano
@ 2018-01-26 13:06     ` Derrick Stolee
  0 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-26 13:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, git, sbeller, dstolee

On 1/25/2018 4:14 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> Add Documentation/technical/packed-graph.txt with details of the planned
>> packed graph feature, including future plans.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   Documentation/technical/packed-graph.txt | 185 +++++++++++++++++++++++++++++++
>>   1 file changed, 185 insertions(+)
>>   create mode 100644 Documentation/technical/packed-graph.txt
> I really wanted to like having this patch at the beginning, but
> unfortunatelly I didn't see the actual file format description,
> which was a bit disappointing.  An example of the things that I was
> curious about was how the "integer ID" is used to access into the
> file.  If we could somehow use "integer ID" as an index into an
> array of fixed size elements, it would be ideal to gain "fast
> lookups", but because of the "list of parents" thing, it needs some
> trickery to do so, and that was among the things that I wanted to
> see how much thought went into the design, for example.

There is definitely a chicken-or-the-egg situation here. I'm happy to 
start with the format before the design document.

I can try to expand this "integer ID" concept, but you can see how I use 
it in the following method from patch 11/14:

+int parse_packed_commit(struct commit *item)
+{
+        if (!core_graph)
+                return 0;
+        if (item->object.parsed)
+                return 1;
+
+        prepare_packed_graph();
+        if (packed_graph) {
+                uint32_t pos;
+                int found;
+                if (item->graphId != 0xFFFFFFFF) {
+                        pos = item->graphId;
+                        found = 1;
+                } else {
+                        found = bsearch_graph(packed_graph, 
&(item->object.oid), &pos);
+                }
+
+                if (found)
+                        return fill_packed_commit(item, packed_graph, pos);
+        }
+
+        return 0;
+}

Note that if item->graphId has a "real" value (not 0xFFFFFFFF which in 
hindsight should be a macro) then we navigate directly to that position 
in the graph. Otherwise, we use binary search to query the graph's 
commit list to find the position (if the commit is packed).

>> diff --git a/Documentation/technical/packed-graph.txt b/Documentation/technical/packed-graph.txt
>> new file mode 100644
>> index 0000000000..fcc0c83874
>> --- /dev/null
>> +++ b/Documentation/technical/packed-graph.txt
>> @@ -0,0 +1,185 @@
>> +Git Packed Graph Design Notes
>> +=============================
>> +
>> +Git walks the commit graph for many reasons, including:
>> +
>> +1. Listing and filtering commit history.
>> +2. Computing merge bases.
>> +
>> +These operations can become slow as the commit count grows above 100K.
>> +The merge base calculation shows up in many user-facing commands, such
>> +as 'status' and 'fetch' and can take minutes to compute depending on
>> +data shape. There are two main costs here:
> s/data shape/history shape/ may make it even clearer.
>
>> +1. The commit OID.
>> +2. The list of parents.
>> +3. The commit date.
>> +4. The root tree OID.
>> +5. An integer ID for fast lookups in the graph.
>> +6. The generation number (see definition below).
>> +
>> +Values 1-4 satisfy the requirements of parse_commit_gently().
>> +
>> +By providing an integer ID we can avoid lookups in the graph as we walk
>> +commits. Specifically, we need to provide the integer ID of the parent
>> +commits so we navigate directly to their information on request.
> Commits created after a packed graph file is built may of course not
> appear in a packed graph file, but that is OK because they never need
> to be listed as parents of commits in the file.  So "list of parents"
> can always refer to the parents using the "integer ID for fast lookup".

One thing I need to test locally is what happens with boundary commits 
of a shallow clone. The commit's parents are not in the repo, so they 
will not be in the graph. I think that parse_commit_buffer() drops the 
parents, so the graph will treat them as root commits.

> Makes sense.  Item 2. might want to say "The list of parents, using
> the fast lookup integer ID (see 5.) as reference instead of OID",
> though.

That will be more specific, thanks.

>> +Define the "generation number" of a commit recursively as follows:
>> + * A commit with no parents (a root commit) has generation number 1.
>> + * A commit with at least one parent has generation number 1 more than
>> +   the largest generation number among its parents.
>> +Equivalently, the generation number is one more than the length of a
>> +longest path from the commit to a root commit.
> When a commit A can reach roots X and Y, and Y is further than X,
> the distance between Y and A becomes A's generation number.  "One
> more than the length of the path from the commit to the furthest
> root commit it can reach", in other words.

My "Equivalently,..." sentence is mangled. What I mean is:

     Equivalently, the generation number of A is one more than
     the length of a longest path from A to a root commit.

I intended to make a non-recursive definition, and the "one more than" 
here could easily be conflated with the "one more than" in the recursive 
definition. The reason for "one more than" is that root commits have a 
path of length zero but generation number one.

Reminder to self: use "one" instead of "1" in the recursive definition.

>> +The recursive definition
>> +is easier to use for computation and the following property:
>> +
>> +    If A and B are commits with generation numbers N and M, respectively,
>> +    and N <= M, then A cannot reach B. That is, we know without searching
>> +    that B is not an ancestor of A because it is further from a root commit
>> +    than A.
>> +
>> +    Conversely, when checking if A is an ancestor of B, then we only need
>> +    to walk commits until all commits on the walk boundary have generation
>> +    number at most N. If we walk commits using a priority queue seeded by
>> +    generation numbers, then we always expand the boundary commit with highest
>> +    generation number and can easily detect the stopping condition.
> These are both true.  It would be nice if an optimized walker
> algorithm can also deal with history with recent commits for which
> we do not yet know the generation numbers (i.e. you first traverse
> and assign generation numbers and record in packed graph, then
> history grows but we haven't added the new commits to the packed
> graph yet).

My forward-thinking intention is that we will perform walks using a 
priority queue whose priority first compares generation number (which 
will be 0xFFFFFFFF for commits not in the graph) and then compares by 
commit date on equal generation number. For walks that require 
topological constraints, we cannot stop the walk until all commits in 
the queue have a "real" generation number. This will allow our 
algorithms to work with a mix of packed commits and unpacked commits.

>
>> +- A graph file is stored in a file named 'graph-<oid>.graph' in the pack
>> +  directory. This could be stored in an alternate.
> Is that <oid> really an object name?  The <hash> that appears in the
> name of a packfile pack-<hash>.pack is *not* an <oid>, and I somehow
> suspect that you are doing a similar "use hash of (some) contents to
> make it uniquely identify the content", not "information about a
> single object that is identified by the <oid>".

This is definitely a mistake on my part and I'll switch this to use 
<hash> in the documents. I do use 'struct object_id' in the code for 
this value. Is that OK?

>
>> +- The graph file is only a supplemental structure. If a user downgrades
>> +  or disables the 'core.graph' config setting, then the existing ODB is
>> +  sufficient.
> OK, that is exactly what I meant to say in a few paragraphs above
> that I wanted to see.

I'll move this to the top!

>
>> +Current Limitations
>> +-------------------
>> +
>> +- Only one graph file is used at one time. This allows the integer ID to
>> +  seek into the single graph file. It is possible to extend the model
>> +  for multiple graph files, but that is currently not part of the design.
>> +
>> +- .graph files are managed only by the 'graph' builtin. These are not
>> +  updated automatically during clone or fetch.
> In addition to "clone or fetch", I presume operations that locally
> create commits do not automatically create them, right?

No. Writing a graph file is too expensive for one commit operation. 
Perhaps some threshold of new commits could trigger the graph calculation.

>
>> +- After computing and storing generation numbers, we must make graph
>> +  walks aware of generation numbers to gain performance benefits. This
>> +  will mostly be accomplished by swapping a commit-date-ordered priority
>> +  queue with one ordered by generation number. The following operations
>> +  are important candidates:
>> +
>> +    - paint_down_to_common()
>> +    - 'log --topo-order'
> Do you mean that this round only writes "graph" without any actualy
> consumers?  It is somewhat hard to assess the value of what is
> stored in the new file without the consumers.
>
> Anyway, thanks for starting this.  Let's read on.
>

See Patch 11/14 for consumption by EVERY commit-graph walk. This patch 
includes significant performance improvements for every walk by avoiding 
parsing, but to get the benefit of generation numbers we will need to 
change the algorithms that operate on the commit graph.

While that integration with algorithms may seem daunting, the good news 
is that we can make iterative changes to improve each algorithm one-by-one.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 02/14] packed-graph: add core.graph setting
  2018-01-25 21:43   ` Junio C Hamano
@ 2018-01-26 13:08     ` Derrick Stolee
  0 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-26 13:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, git, sbeller, dstolee

On 1/25/2018 4:43 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> The packed graph feature is controlled by the new core.graph config
>> setting. This defaults to 0, so the feature is opt-in.
>>
>> The intention of core.graph is that a user can always stop checking
>> for or parsing packed graph files if core.graph=0.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   Documentation/config.txt | 3 +++
>>   cache.h                  | 1 +
>>   config.c                 | 5 +++++
>>   environment.c            | 1 +
>>   4 files changed, 10 insertions(+)
> Before you get too married to the name "graph", is it reasonable to
> assume that the commit ancestry graph is the primary "graph" that
> should come to users' minds when a simple word "graph" is used in
> the context of discussing Git?  I suspect not.
>
> Let's not just call this "core.graph" and "packed-graph", and in
> addition give some adjective before "graph".

I was too focused that I wanted the word "graph" but "graph.c" already 
existed in source root that I came up with "packed-graph.c" just to have 
a separate filename. Clearly, "commit-graph" should be used instead. In 
v2, I'll use "/commit-graph.c" and "/builtin/commit-graph.c".

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/14] packed-graph: create git-graph builtin
  2018-01-25 21:45   ` Stefan Beller
@ 2018-01-26 13:13     ` Derrick Stolee
  0 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-26 13:13 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On 1/25/2018 4:45 PM, Stefan Beller wrote:
> On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:
>> Teach Git the 'graph' builtin that will be used for writing and
>> reading packed graph files. The current implementation is mostly
>> empty, except for a check that the core.graph setting is enabled
>> and a '--pack-dir' option.
> I wonder if this builtin should not respect the boolean core graph,
> as this new builtin commands' whole existence
> is to deal with these new files?
>
> As you assume this builtin as a plumbing command, I would
> expect it to pay less attention to config rather than more.

My thought was to alert the caller "This graph isn't going to be good 
for anything!" and fail quickly before doing work. You do have a good 
point, and I think we can remove that condition here. When we integrate 
with other commands ('repack', 'fetch', 'clone') we will want a 
different setting that signals automatically writing the graph and we 
don't want those to fail because they are not aware of a second config 
setting.

>
>> @@ -408,6 +408,7 @@ static struct cmd_struct commands[] = {
>>          { "fsck-objects", cmd_fsck, RUN_SETUP },
>>          { "gc", cmd_gc, RUN_SETUP },
>>          { "get-tar-commit-id", cmd_get_tar_commit_id },
>> +       { "graph", cmd_graph, RUN_SETUP_GENTLY },
> Why gently, though?
>
>  From reading the docs (and assumptions on further patches)
> we'd want to abort if there is no .git dir to be found?
>
> Or is a future patch having manual logic? (e.g. if pack-dir is
> given, the command may be invoked from outside a git dir)

You are right. I inherited this from my MIDX patch which can operate on 
a list of IDX files without a .git folder. The commit graph operations 
need an ODB.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/14] packed-graph: create git-graph builtin
  2018-01-25 23:01   ` Junio C Hamano
@ 2018-01-26 13:14     ` Derrick Stolee
  2018-01-26 14:16       ` Duy Nguyen
  0 siblings, 1 reply; 49+ messages in thread
From: Derrick Stolee @ 2018-01-26 13:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, git, sbeller, dstolee

On 1/25/2018 6:01 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> Teach Git the 'graph' builtin that will be used for writing and
>> reading packed graph files. The current implementation is mostly
>> empty, except for a check that the core.graph setting is enabled
>> and a '--pack-dir' option.
> Just to set my expectation straight.
>
> Is it fair to say that in the ideal endgame state, this will be like
> "git pack-objects" in that end users won't have to know about it,
> but would serve as a crucial building block that is invoked by other
> front-end commands that are more familiar to end users (just like
> pack-objects are used behind the scenes by repack, push, etc.)?

That is my hope. Leaving that integration for later, after this feature 
has proven itself.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/14] packed-graph: add format document
  2018-01-25 22:29       ` Junio C Hamano
@ 2018-01-26 13:22         ` Derrick Stolee
  0 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-26 13:22 UTC (permalink / raw)
  To: Junio C Hamano, Stefan Beller
  Cc: git, Jeff King, Jeff Hostetler, Derrick Stolee

On 1/25/2018 5:06 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>> Add document specifying the binary format for packed graphs. This
>> format allows for:
>>
>> * New versions.
>> * New hash functions and hash lengths.
>> * Optional extensions.
>>
>> Basic header information is followed by a binary table of contents
>> into "chunks" that include:
>>
>> * An ordered list of commit object IDs.
>> * A 256-entry fanout into that list of OIDs.
>> * A list of metadata for the commits.
>> * A list of "large edges" to enable octopus merges.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   Documentation/technical/graph-format.txt | 88 ++++++++++++++++++++++++++++++++
>>   1 file changed, 88 insertions(+)
>>   create mode 100644 Documentation/technical/graph-format.txt
>>
>> diff --git a/Documentation/technical/graph-format.txt b/Documentation/technical/graph-format.txt
>> new file mode 100644
>> index 0000000000..a15e1036d7
>> --- /dev/null
>> +++ b/Documentation/technical/graph-format.txt
>> @@ -0,0 +1,88 @@
>> +Git commit graph format
>> +=======================
> Good that this is not saying "graph format" but is explicit that it
> is about "commit".  Do the same for the previous steps.  Especially,
> builtin/graph.c that does not have much to do with graph.c is not a
> good way forward ;-)

:+1:

> I do like the fact that later parents of octopus merges are moved
> out of way to make the majority of records fixed length, but I am
> not sure if the "up to two parents are recorded in line" is truly
> the best arrangement.  Aren't majority of commits single-parent,
> thereby wasting 4 bytes almost always?
>
> Will 32-bit stay to be enough for everybody?  Wouldn't it make sense
> to at least define them to be indices into arrays (i.e. scaled to
> element size), not "offsets", to recover a few lost bits?

I incorrectly used the word "offset" when I mean "array position" for 
the edge values.

> What's the point of storing object id length?  If you do not
> understand the object ID scheme, knowing only the length would not
> do you much good anyway, no?  And if you know the hashing scheme
> specified by Object ID version, you already know the length, no?

I'll go read the OID transition document to learn more, but I didn't 
know if there were plans for things like "Use SHA3 but with different 
hash lengths depending on user requirements". One side benefit is that 
we immediately know the width of our commit and tree references within 
the commit graph file without needing to consult a table of hash 
definitions.


On 1/25/2018 5:18 PM, Stefan Beller wrote:
> git.git has ~37k non-merge commits and ~12k merge commits,
> (35 of them have 3 or more parents).
>
> So 75% would waste the 4 bytes of the second parent.
>
> However the first parent is still there, so any operation that only needs
> the first parent (git bisect --first-parent?) would still be fast.
> Not sure if that is common.

The current API boundary does not support this, as parse_commit_gently() 
is not aware of the --first-parent option. The benefits of injecting 
that information are probably not worth the complication.

On 1/25/2018 5:29 PM, Junio C Hamano wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> The downside of just having one parent or pointer into the edge list
>> would be to penalize 25% of the commit lookups with an indirection
>> compared to ~0% (the 35 octopus'). I'd rather want to optimize for
>> speed than disk size? (4 bytes for 37k is 145kB for git.git, which I
>> find is not a lot).
> My comment is not about disk size but is about the size of working
> set (or "size of array element").
I do want to optimize for speed over space, at least for two-parent 
commits. Hopefully my clarification about offset/array position 
clarifies Junio's concerns here.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 04/14] packed-graph: add format document
  2018-01-25 22:07   ` Stefan Beller
@ 2018-01-26 13:25     ` Derrick Stolee
  0 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-26 13:25 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On 1/25/2018 5:07 PM, Stefan Beller wrote:
> On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:
>> Add document specifying the binary format for packed graphs. This
>> format allows for:
>>
>> * New versions.
>> * New hash functions and hash lengths.
>> * Optional extensions.
>>
>> Basic header information is followed by a binary table of contents
>> into "chunks" that include:
>>
>> * An ordered list of commit object IDs.
>> * A 256-entry fanout into that list of OIDs.
>> * A list of metadata for the commits.
>> * A list of "large edges" to enable octopus merges.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   Documentation/technical/graph-format.txt | 88 ++++++++++++++++++++++++++++++++
> So this is different from Documentation/technical/packed-graph.txt,
> which gives high level design and this gives the details on how
> to set bits.
>
>>   1 file changed, 88 insertions(+)
>>   create mode 100644 Documentation/technical/graph-format.txt
>>
>> diff --git a/Documentation/technical/graph-format.txt b/Documentation/technical/graph-format.txt
>> new file mode 100644
>> index 0000000000..a15e1036d7
>> --- /dev/null
>> +++ b/Documentation/technical/graph-format.txt
>> @@ -0,0 +1,88 @@
>> +Git commit graph format
>> +=======================
>> +
>> +The Git commit graph stores a list of commit OIDs and some associated
>> +metadata, including:
>> +
>> +- The generation number of the commit. Commits with no parents have
>> +  generation number 1; commits with parents have generation number
>> +  one more than the maximum generation number of its parents. We
>> +  reserve zero as special, and can be used to mark a generation
>> +  number invalid or as "not computed".
>> +
>> +- The root tree OID.
>> +
>> +- The commit date.
>> +
>> +- The parents of the commit, stored using positional references within
>> +  the graph file.
>> +
>> +== graph-*.graph files have the following format:
>> +
>> +In order to allow extensions that add extra data to the graph, we organize
>> +the body into "chunks" and provide a binary lookup table at the beginning
>> +of the body. The header includes certain values, such as number of chunks,
>> +hash lengths and types.
>> +
>> +All 4-byte numbers are in network order.
>> +
>> +HEADER:
>> +
>> +       4-byte signature:
>> +           The signature is: {'C', 'G', 'P', 'H'}
>> +
>> +       1-byte version number:
>> +           Currently, the only valid version is 1.
>> +
>> +       1-byte Object Id Version (1 = SHA-1)
>> +
>> +       1-byte Object Id Length (H)
>    This is 20 or 40 for sha1 ? (binary or text representation?)

20 for binary.

>
>> +       1-byte number (C) of "chunks"
>> +
>> +CHUNK LOOKUP:
>> +
>> +       (C + 1) * 12 bytes listing the table of contents for the chunks:
>> +           First 4 bytes describe chunk id. Value 0 is a terminating label.
>> +           Other 8 bytes provide offset in current file for chunk to start.
> ... offset [in bytes/words/4k blocks?] in ...

bytes.

>
>> +           (Chunks are ordered contiguously in the file, so you can infer
>> +               the length using the next chunk position if necessary.)
>> +
>> +       The remaining data in the body is described one chunk at a time, and
>> +       these chunks may be given in any order. Chunks are required unless
>> +       otherwise specified.
>> +
>> +CHUNK DATA:
>> +
>> +       OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
>> +           The ith entry, F[i], stores the number of OIDs with first
>> +           byte at most i. Thus F[255] stores the total
>> +           number of commits (N).
> So F[0] > 0 for git.git for example.
>
> Or another way: To lookup a 01xxx, I need to look at
> entry(F[00] + 1 )...entry(F[01]).
>
> Makes sense.
>
>> +
>> +       OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
>> +           The OIDs for all commits in the graph.
> ... sorted ascending.
>
>
>> +       Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
>> +           * The first H bytes are for the OID of the root tree.
>> +           * The next 8 bytes are for the int-ids of the first two parents of
>> +             the ith commit. Stores value 0xffffffff if no parent in that position.
>> +             If there are more than two parents, the second value has its most-
>> +             significant bit on and the other bits store an offset into the Large
>> +             Edge List chunk.
> s/an offset into/position in/ ? (otherwise offset in bytes?)
>
>> +           * The next 8 bytes store the generation number of the commit and the
>> +             commit time in seconds since EPOCH. The generation number uses the
>> +             higher 30 bits of the first 4 bytes, while the commit time uses the
>> +             32 bits of the second 4 bytes, along with the lowest 2 bits of the
>> +             lowest byte, storing the 33rd and 34th bit of the commit time.
> This allows for a maximum generation number of
> 1.073.741.823 (2^30 -1) = 1 billion,
> and a max time stamp of later than 2100.
>
> Do you allow negative time stamps?
>
>
>> +
>> +       [Optional] Large Edge List (ID: {'E', 'D', 'G', 'E'})
>> +           This list of 4-byte values store the second through nth parents for
>> +           all octoput merges. The second parent value in the commit data is a
> octopus
>
>> +           negative number pointing into this list. Then iterate through this
>> +           list starting at that position until reaching a value with the most-
>> +           significant bit on. The other bits correspond to the int-id of the
>> +           last parent.
>> +
>> +TRAILER:
>> +
>> +       H-byte HASH-checksum of all of the above.
>> --
>> 2.16.0
>>

Thanks for the detailed comments here!
-Stolee



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 06/14] packed-graph: implement git-graph --write
  2018-01-25 23:28   ` Stefan Beller
@ 2018-01-26 13:28     ` Derrick Stolee
  0 siblings, 0 replies; 49+ messages in thread
From: Derrick Stolee @ 2018-01-26 13:28 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On 1/25/2018 6:28 PM, Stefan Beller wrote:
> On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:
>
>> +------------------------------------------------
>> +$ git midx --write
> midx?

Looks like I missed some replacements as I was building this. Now you 
see how I hope the feedback from this patch will inform the MIDX patch. ;)

>
>> +test_done
> The tests basically tests that there is no segfault?
> Makes sense.

Also checks that files are written based on the output hash. The next 
commits gives inspection ability.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/14] graph: add packed graph design document
  2018-01-25 14:02 ` [PATCH 01/14] graph: add packed graph design document Derrick Stolee
  2018-01-25 20:04   ` Stefan Beller
  2018-01-25 21:14   ` Junio C Hamano
@ 2018-01-26 14:13   ` Duy Nguyen
  2 siblings, 0 replies; 49+ messages in thread
From: Duy Nguyen @ 2018-01-26 14:13 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Git Mailing List, Junio C Hamano, Jeff King, Jeff Hostetler,
	Stefan Beller, dstolee

On Thu, Jan 25, 2018 at 9:02 PM, Derrick Stolee <stolee@gmail.com> wrote:
> +Git walks the commit graph for many reasons, including:
> +
> +1. Listing and filtering commit history.
> +2. Computing merge bases.
> +
> +These operations can become slow as the commit count grows above 100K.
> +The merge base calculation shows up in many user-facing commands, such
> +as 'status' and 'fetch' and can take minutes to compute depending on
> +data shape. There are two main costs here:
> +
> +1. Decompressing and parsing commits.
> +2. Walking the entire graph to avoid topological order mistakes.
> +
> +The packed graph is a file that stores the commit graph structure along
> +with some extra metadata to speed up graph walks. This format allows a
> +consumer to load the following info for a commit:
> +
> +1. The commit OID.
> +2. The list of parents.
> +3. The commit date.
> +4. The root tree OID.
> +5. An integer ID for fast lookups in the graph.
> +6. The generation number (see definition below).

I didn't look closely to compare, but perhaps you should check out
pack file format version 4 [1]. It tried to address the same thing but
it never got to the point where we could replace our current pack
format with it. At some point I wanted to push it even as a local
optimization (pack transfer is still in old format) but I never had
enough time or energy for it. How it stores commits though can
probably be reused.

[1] https://github.com/pclouds/git/commit/23cb8ae5bdd968c1a290ff8d0fd7cb6b4d572a43
-- 
Duy

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 03/14] packed-graph: create git-graph builtin
  2018-01-26 13:14     ` Derrick Stolee
@ 2018-01-26 14:16       ` Duy Nguyen
  0 siblings, 0 replies; 49+ messages in thread
From: Duy Nguyen @ 2018-01-26 14:16 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Git Mailing List, Jeff King, Jeff Hostetler,
	Stefan Beller, dstolee

On Fri, Jan 26, 2018 at 8:14 PM, Derrick Stolee <stolee@gmail.com> wrote:
> On 1/25/2018 6:01 PM, Junio C Hamano wrote:
>>
>> Derrick Stolee <stolee@gmail.com> writes:
>>
>>> Teach Git the 'graph' builtin that will be used for writing and
>>> reading packed graph files. The current implementation is mostly
>>> empty, except for a check that the core.graph setting is enabled
>>> and a '--pack-dir' option.
>>
>> Just to set my expectation straight.
>>
>> Is it fair to say that in the ideal endgame state, this will be like
>> "git pack-objects" in that end users won't have to know about it,
>> but would serve as a crucial building block that is invoked by other
>> front-end commands that are more familiar to end users (just like
>> pack-objects are used behind the scenes by repack, push, etc.)?
>
>
> That is my hope. Leaving that integration for later, after this feature has
> proven itself.

Then I suggest you use a more specific command name (commit-graph?
packed-graph?) and leave the pretty name "graph" out, it could be used
for some UI-facing thing someday, maybe.
-- 
Duy

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 01/14] graph: add packed graph design document
  2018-01-26 12:49     ` Derrick Stolee
@ 2018-01-26 18:17       ` Stefan Beller
  0 siblings, 0 replies; 49+ messages in thread
From: Stefan Beller @ 2018-01-26 18:17 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

>>> +- A graph file is stored in a file named 'graph-<oid>.graph' in the pack
>>> +  directory.
>>
>> (guessing)
>> where every commit up to <oid> is included in the file.
>
>
> Sorry, the <oid> is the hash of the graph contents (up to its trailing bytes
> that contain <oid> in binary).

>> So maybe I do not understand the <oid> after all, yet.
>
>
> Perhaps I should instead use <hash> instead of <oid>, since it is not an
> actual Git object but instead the hash of the graph contents. The
> 'graph_head' file simply stores the <oid> as plain-text hex characters.

Yes, please use hash. Oid (object id) means to me the identification of
a Git object, whereas here we only use the hash to
(1) check if that file is valid (consistent, no bit flips in that;
    --verify would do this)
(2) to have a different file name for each file such that we can have
    multiple graph files at once, e.g. build the next file in the background
    while using the old file to keep working.
(3) any other use case?

For (2) we could use `mkstemp` as well, but the property of (1) is nice
to have.

When using <hash> you probably mean sha1. When writing the hash
transition document, we discussed about uses of sha1 apart from
object naming, such as the use here or in the new pack file format
and we thought it would be ok to keep sha1 for the purpose of (1),
but as we may want to lose the sha1 code in the far future, it may be
easier to migrate that hash to the new hash function, too. So
it is easy to confuse that hash with OIDs because it is the same hash
function throughout time.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 11/14] commit: integrate packed graph with commit parsing
  2018-01-25 14:02 ` [PATCH 11/14] commit: integrate packed graph with commit parsing Derrick Stolee
@ 2018-01-26 19:38   ` Stefan Beller
  0 siblings, 0 replies; 49+ messages in thread
From: Stefan Beller @ 2018-01-26 19:38 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, Junio C Hamano, Jeff King, Jeff Hostetler, Derrick Stolee

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:
> Teach Git to inspect a packed graph to supply the contents of a
> struct commit when calling parse_commit_gently(). This implementation
> satisfies all post-conditions on the struct commit, including loading
> parents, the root tree, and the commit date. The only loosely-expected
> condition is that the commit buffer is loaded into the cache. This
> was checked in log-tree.c:show_log(), but the "return;" on failure
> produced unexpected results (i.e. the message line was never terminated).
> The new behavior of loading the buffer when needed prevents the
> unexpected behavior.
>
> If core.graph is false, then do not load the graph and behave as usual.
>
> In test script t5319-graph.sh, add output-matching conditions on read-
> only graph operations.
>
> By loading commits from the graph instead of parsing commit buffers, we
> save a lot of time on long commits walks. Here are some performance
> results for a copy of the Linux repository where 'master' has 704,766
> reachable commits and is behind 'origin/master' by 19,610 commits.
>
> | Command                          | Before | After  | Rel % |
> |----------------------------------|--------|--------|-------|
> | log --oneline --topo-order -1000 |  5.9s  |  0.7s  | -88%  |
> | branch -vv                       |  0.42s |  0.27s | -35%  |
> | rev-list --all                   |  6.4s  |  1.0s  | -84%  |
> | rev-list --all --objects         | 32.6s  | 27.6s  | -15%  |

This sounds impressive!

> @@ -383,19 +384,27 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing)
>
>         if (!item)
>                 return -1;
> +
> +       // If we already parsed, but got it from the graph, then keep going!

comment style.

>         if (item->object.parsed)
>                 return 0;
> +
> +       if (check_packed && parse_packed_commit(item))
> +               return 0;
> +
>         buffer = read_sha1_file(item->object.oid.hash, &type, &size);
>         if (!buffer)
>                 return quiet_on_missing ? -1 :
>                         error("Could not read %s",
> -                            oid_to_hex(&item->object.oid));
> +                       oid_to_hex(&item->object.oid));
>         if (type != OBJ_COMMIT) {
>                 free(buffer);
>                 return error("Object %s not a commit",
> -                            oid_to_hex(&item->object.oid));
> +                       oid_to_hex(&item->object.oid));
>         }
> +
>         ret = parse_commit_buffer(item, buffer, size);
> +

I guess the new lines are for readability?
Not sure if will play out nicely with merges in this area, though.
(I touch this area of the code as well in the not yet sent out series
adding the repository as an argument all over the place. Not your
problem, just me getting anxious)

> @@ -34,6 +34,8 @@
>  #define GRAPH_CHUNKLOOKUP_SIZE (5 * 12)
>  #define GRAPH_MIN_SIZE (GRAPH_CHUNKLOOKUP_SIZE + GRAPH_FANOUT_SIZE + \
>                         GRAPH_OID_LEN + sizeof(struct packed_graph_header))
> +/* global storage */
> +struct packed_graph *packed_graph = 0;
>
>  struct object_id *get_graph_head_oid(const char *pack_dir, struct object_id *oid)
>  {
> @@ -209,6 +211,225 @@ struct packed_graph *load_packed_graph_one(const char *graph_file, const char *p
>         return graph;
>  }
>
> +static void prepare_packed_graph_one(const char *obj_dir)
> +{
> +       char *graph_file;
> +       struct object_id oid;
> +       struct strbuf pack_dir = STRBUF_INIT;
> +       strbuf_addstr(&pack_dir, obj_dir);
> +       strbuf_add(&pack_dir, "/pack", 5);
> +
> +       if (!get_graph_head_oid(pack_dir.buf, &oid))
> +               return;
> +
> +       graph_file = get_graph_filename_oid(pack_dir.buf, &oid);
> +
> +       packed_graph = load_packed_graph_one(graph_file, pack_dir.buf);
> +       strbuf_release(&pack_dir);
> +}
> +
> +static int prepare_packed_graph_run_once = 0;

Okay. :(
Seeing new globals like these, gives me extra motivation to
get the object store series going.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/14] packed-graph: implement construct_graph()
  2018-01-25 23:21   ` Stefan Beller
@ 2018-01-26 20:47     ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2018-01-26 20:47 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Derrick Stolee, git, Jeff King, Jeff Hostetler, Derrick Stolee

Stefan Beller <sbeller@google.com> writes:

> On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee <stolee@gmail.com> wrote:
>
>> +struct object_id *construct_graph(const char *pack_dir)
>> +{
>> +       // Find a list of oids, adding the pointer to a list.
>
> Comment style.
> Here is where the bike shedding begins. ;)

I also saw some violations of the usual formatting conventions, e.g.

	if (condition) {
		block

should not be

	if (condition)
	{
		block


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/14] packed-graph: implement construct_graph()
  2018-01-25 14:02 ` [PATCH 05/14] packed-graph: implement construct_graph() Derrick Stolee
  2018-01-25 23:21   ` Stefan Beller
@ 2018-01-26 20:55   ` Junio C Hamano
  2018-01-26 21:14     ` Andreas Schwab
  1 sibling, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2018-01-26 20:55 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, peff, git, sbeller, dstolee

Derrick Stolee <stolee@gmail.com> writes:

> +		packedDate[0] = htonl((*list)->date >> 32);
> +		packedDate[1] = htonl((*list)->date);

How forward-looking do we want to be?  The commit.date field happens
to be unsigned so there is no point masking the result of right
shifting it, but that would not stay to be the case once we start
supporting negative timestamps, for example.

Also, I recall that you plan to squeeze generation number in 30 bits
or so of one of these words.  Wouldn't it mean that higher order
bits of commit.date must be masked out anyway, even though we do not
have to worry about right shift propagating the sign-bit down?

Also, would >>32 be a problem if commit.date is an uint32 (and
shifting all its bits out to the right)?


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/14] packed-graph: implement construct_graph()
  2018-01-26 20:55   ` Junio C Hamano
@ 2018-01-26 21:14     ` Andreas Schwab
  2018-01-26 22:04       ` Junio C Hamano
  0 siblings, 1 reply; 49+ messages in thread
From: Andreas Schwab @ 2018-01-26 21:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, git, peff, git, sbeller, dstolee

On Jan 26 2018, Junio C Hamano <gitster@pobox.com> wrote:

> Also, would >>32 be a problem if commit.date is an uint32 (and
> shifting all its bits out to the right)?

It would be undefined.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 05/14] packed-graph: implement construct_graph()
  2018-01-26 21:14     ` Andreas Schwab
@ 2018-01-26 22:04       ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2018-01-26 22:04 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Derrick Stolee, git, peff, git, sbeller, dstolee

Andreas Schwab <schwab@linux-m68k.org> writes:

> On Jan 26 2018, Junio C Hamano <gitster@pobox.com> wrote:
>
>> Also, would >>32 be a problem if commit.date is an uint32 (and
>> shifting all its bits out to the right)?
>
> It would be undefined.

Yup, exactly.  Thanks.

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2018-01-26 22:04 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-25 14:02 [PATCH 00/14] Serialized Commit Graph Derrick Stolee
2018-01-25 14:02 ` [PATCH 01/14] graph: add packed graph design document Derrick Stolee
2018-01-25 20:04   ` Stefan Beller
2018-01-26 12:49     ` Derrick Stolee
2018-01-26 18:17       ` Stefan Beller
2018-01-25 21:14   ` Junio C Hamano
2018-01-26 13:06     ` Derrick Stolee
2018-01-26 14:13   ` Duy Nguyen
2018-01-25 14:02 ` [PATCH 02/14] packed-graph: add core.graph setting Derrick Stolee
2018-01-25 20:17   ` Stefan Beller
2018-01-25 20:40     ` Derrick Stolee
2018-01-25 21:43   ` Junio C Hamano
2018-01-26 13:08     ` Derrick Stolee
2018-01-25 14:02 ` [PATCH 03/14] packed-graph: create git-graph builtin Derrick Stolee
2018-01-25 21:45   ` Stefan Beller
2018-01-26 13:13     ` Derrick Stolee
2018-01-25 23:01   ` Junio C Hamano
2018-01-26 13:14     ` Derrick Stolee
2018-01-26 14:16       ` Duy Nguyen
2018-01-25 14:02 ` [PATCH 04/14] packed-graph: add format document Derrick Stolee
2018-01-25 22:06   ` Junio C Hamano
2018-01-25 22:18     ` Stefan Beller
2018-01-25 22:29       ` Junio C Hamano
2018-01-26 13:22         ` Derrick Stolee
2018-01-25 22:07   ` Stefan Beller
2018-01-26 13:25     ` Derrick Stolee
2018-01-25 14:02 ` [PATCH 05/14] packed-graph: implement construct_graph() Derrick Stolee
2018-01-25 23:21   ` Stefan Beller
2018-01-26 20:47     ` Junio C Hamano
2018-01-26 20:55   ` Junio C Hamano
2018-01-26 21:14     ` Andreas Schwab
2018-01-26 22:04       ` Junio C Hamano
2018-01-25 14:02 ` [PATCH 06/14] packed-graph: implement git-graph --write Derrick Stolee
2018-01-25 23:28   ` Stefan Beller
2018-01-26 13:28     ` Derrick Stolee
2018-01-25 14:02 ` [PATCH 07/14] packed-graph: implement git-graph --read Derrick Stolee
2018-01-25 14:02 ` [PATCH 08/14] graph: implement git-graph --update-head Derrick Stolee
2018-01-25 14:02 ` [PATCH 09/14] packed-graph: implement git-graph --clear Derrick Stolee
2018-01-25 23:35   ` Stefan Beller
2018-01-25 14:02 ` [PATCH 10/14] packed-graph: teach git-graph --delete-expired Derrick Stolee
2018-01-25 14:02 ` [PATCH 11/14] commit: integrate packed graph with commit parsing Derrick Stolee
2018-01-26 19:38   ` Stefan Beller
2018-01-25 14:02 ` [PATCH 12/14] packed-graph: read only from specific pack-indexes Derrick Stolee
2018-01-25 14:02 ` [PATCH 13/14] packed-graph: close under reachability Derrick Stolee
2018-01-25 14:02 ` [PATCH 14/14] packed-graph: teach git-graph to read commits Derrick Stolee
2018-01-25 15:46 ` [PATCH 00/14] Serialized Commit Graph Ævar Arnfjörð Bjarmason
2018-01-25 16:09   ` Derrick Stolee
2018-01-25 23:06     ` Ævar Arnfjörð Bjarmason
2018-01-26 12:15       ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).