git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] documentation: add tutorial for revision walking
@ 2019-06-07  1:07 Emily Shaffer
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
                   ` (4 more replies)
  0 siblings, 5 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:07 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Existing documentation on revision walks seems to be primarily intended
as a reference for those already familiar with the procedure. This
tutorial attempts to give an entry-level guide to a couple of bare-bones
revision walks so that new Git contributors can learn the concepts
without having to wade through options parsing or special casing.

The target audience is a Git contributor who is just getting started
with the concept of revision walking. The goal is to prepare this
contributor to be able to understand and modify existing commands which
perform revision walks more easily, although it will also prepare
contributors to create new commands which perform walks.

The tutorial covers a basic overview of the structs involved during
revision walk, setting up a basic commit walk, setting up a basic
all-object walk, and adding some configuration changes to both walk
types. It intentionally does not cover how to create new commands or
search for options from the command line or gitconfigs.

There is an associated patchset at
https://github.com/nasamuffin/git/tree/revwalk that contains a reference
implementation of the code generated by this tutorial.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---

This one is longer than the MyFirstContribution one, thanks in advance
to anybody with the wherewithal to review this.

I'll also be mailing an RFC patchset In-Reply-To this message; the RFC
patchset should not be merged to Git, as I intend to host it in my own
mirror as an example. I hosted a similar example for the
MyFirstContribution tutorial; it's visible at
https://github.com/nasamuffin/git/tree/psuh. There might be a better
place to host these so I don't "own" them but I'm not sure what it is;
keeping them as a live branch somewhere struck me as an okay way to keep
them from getting stale.

Looking forward to hearing everyone's comments!
 - Emily

 Documentation/.gitignore         |   1 +
 Documentation/Makefile           |   1 +
 Documentation/MyFirstRevWalk.txt | 826 +++++++++++++++++++++++++++++++
 3 files changed, 828 insertions(+)
 create mode 100644 Documentation/MyFirstRevWalk.txt

diff --git a/Documentation/.gitignore b/Documentation/.gitignore
index 9022d48355..0e3df737c5 100644
--- a/Documentation/.gitignore
+++ b/Documentation/.gitignore
@@ -12,6 +12,7 @@ cmds-*.txt
 mergetools-*.txt
 manpage-base-url.xsl
 SubmittingPatches.txt
+MyFirstRevWalk.txt
 tmp-doc-diff/
 GIT-ASCIIDOCFLAGS
 /GIT-EXCLUDED-PROGRAMS
diff --git a/Documentation/Makefile b/Documentation/Makefile
index dbf5a0f276..d57b80962f 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -77,6 +77,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica
 SP_ARTICLES += $(API_DOCS)
 
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += MyFirstRevWalk
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
diff --git a/Documentation/MyFirstRevWalk.txt b/Documentation/MyFirstRevWalk.txt
new file mode 100644
index 0000000000..494c09d1fa
--- /dev/null
+++ b/Documentation/MyFirstRevWalk.txt
@@ -0,0 +1,826 @@
+My First Revision Walk
+======================
+
+== What's a Revision Walk?
+
+The revision walk is a key concept in Git - this is the process that underpins
+operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the
+list of objects is found by walking parent relationships between objects. The
+revision walk can also be usedto determine whether or not a given object is
+reachable from the current HEAD pointer.
+
+=== Related Reading
+
+- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
+  the revision walker in its various incarnations.
+- `Documentation/technical/api-revision-walking.txt`
+- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
+  gives a good overview of the types of objects in Git and what your revision
+  walk is really describing.
+
+== Setting Up
+
+Create a new branch from `master`.
+
+----
+git checkout -b revwalk origin/master
+----
+
+We'll put our fiddling into a new command. For fun, let's name it `git walken`.
+Open up a new file `builtin/walken.c` and set up the command handler:
+
+----
+/*
+ * "git walken"
+ *
+ * Part of the "My First Revision Walk" tutorial.
+ */
+
+#include <stdio.h>
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+        printf(_("cmd_walken incoming...\n"));
+        return 0;
+}
+----
+
+Add usage text and `-h` handling, in order to pass the test suite:
+
+----
+static const char * const walken_usage[] = {
+	N_("git walken"),
+	NULL,
+}
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	struct option options[] = {
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
+
+	...
+}
+----
+
+Also add the relevant line in builtin.h near `cmd_whatchanged()`:
+
+----
+extern int cmd_walken(int argc, const char **argv, const char *prefix);
+----
+
+Include the command in `git.c` in `commands[]` near the entry for `whatchanged`:
+
+----
+{ "walken", cmd_walken, RUN_SETUP },
+----
+
+Add it to the `Makefile` near the line for `builtin\worktree.o`:
+
+----
+BUILTIN_OBJS += builtin/walken.o
+----
+
+Build and test out your command, without forgetting to ensure the `DEVELOPER`
+flag is set:
+
+----
+echo DEVELOPER=1 >config.mak
+make
+./bin-wrappers/git walken
+----
+
+NOTE: For a more exhaustive overview of the new command process, take a look at
+`Documentation/MyFirstContribution`.
+
+NOTE: A reference implementation can be found at TODO LINK.
+
+=== `struct rev_cmdline_info`
+
+The definition of `struct rev_cmdline_info` can be found in `revision.h`.
+
+This struct is contained within the `rev_info` struct and is used to reflect
+parameters provided by the user over the CLI.
+
+`nr` represents the number of `rev_cmdline_entry` present in the array.
+
+`alloc` is used by the `ALLOC_GROW` macro. Check
+`Documentation/technical/api-allocation-growing.txt` - this variable is used to
+track the allocated size of the list.
+
+Per entry, we find:
+
+`item` is the object provided upon which to base the revision walk. Items in Git
+can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
+
+`name` is the SHA-1 of the object - a 40-digit hex string you may be familiar
+with from using Git to organize your source in the past. Check the tutorial
+mentioned above towards the top for a discussion of where the SHA-1 can come
+from.
+
+`whence` indicates some information about what to do with the parents of the
+specified object. We'll explore this flag more later on; take a look at
+`Documentation/revisions.txt` to get an idea of what could set the `whence`
+value.
+
+`flags` are used to hint the beginning of the revision walk and are the first
+block under the `#include`s in `revision.h`. The most likely ones to be set in
+the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
+can be used during the walk, as well.
+
+=== `struct rev_info`
+
+This one is quite a bit longer, and many fields are only used during the walk
+by `revision.c` - not configuration options. Most of the configurable flags in
+`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
+good idea to take some time and read through that document.
+
+== Basic Commit Walk
+
+First, let's see if we can replicate the output of `git log --oneline`. We'll
+refer back to the implementation frequently to discover norms when performing
+a revision walk of our own.
+
+We'll need all the commits, in order, which preceded our current commit. We will
+also need to know the name and subject.
+
+Ideally, we will also be able to find out which ones are currently at the tip of
+various branches.
+
+=== Setting Up
+
+Preparing for your revision walk has some distinct stages.
+
+1. Perform default setup for this mode, and others which may be invoked.
+2. Check configuration files for relevant settings.
+3. Set up the rev_info struct.
+4. Tweak the initialized rev_info to suit the current walk.
+5. Prepare the rev_info for the walk.
+6. Iterate over the objects, processing each one.
+
+==== Default Setups
+
+Before you begin to examine user configuration for your revision walk, it's
+common practice for you to initialize to default any switches that your command
+may have, as well as ask any other components you may invoke to initialize as
+well. `git log` does this in `init_log_defaults()`; in that case, one global
+`decoration_style` is initialized, as well as the grep and diff-UI components.
+
+For our purposes, within `git walken`, for the first example we do we don't
+intend to invoke anything, and we don't have any configuration to do. However,
+we may want to add some later, so for now, we can add an empty placeholder.
+Create a new function in `builtin/walken.c`:
+
+----
+static void init_walken_defaults(void)
+{
+	/* We don't actually need the same components `git log` does; leave this
+	 * empty for now.
+	 */
+}
+----
+
+Make sure to add a line invoking it inside of `cmd_walken()`.
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	init_walken_defaults();
+}
+----
+
+==== Configuring From `.gitconfig`
+
+Next, we should have a look at any relevant configuration settings (i.e.,
+settings readable and settable from `git config`). This is done by providing a
+callback to `git_config()`; within that callback, you can also invoke methods
+from other components you may need that need to intercept these options. Your
+callback will be invoked once per each configuration value which Git knows about
+(global, local, worktree, etc.).
+
+Similarly to the default values, we don't have anything to do here yet
+ourselves; however, we should call `git_default_config()` if we aren't calling
+any other existing config callbacks.
+
+TODO: Use the "modern" configset API
+
+Add a new function to `builtin/walken.c`:
+
+----
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	/* For now, let's not bother with anything. */
+	return git_default_config(var, value, cb);
+}
+----
+
+Make sure to invoke `git_config()` with it in your `cmd_walken()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	git_config(git_walken_config, NULL);
+}
+----
+
+// TODO: Checking CLI options
+
+==== Setting Up `rev_info`
+
+Now that we've gathered external configuration and options, it's time to
+initialize the `rev_info` object which we will use to perform the walk. This is
+typically done by calling `repo_init_revisions()` with the repository you intend
+to target, as well as the prefix and your `rev_info` struct.
+
+Add the `struct rev_info` and the `repo_init_revisions()` call:
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	/* This can go wherever you like in your declarations.*/
+	struct rev_info rev;
+	...
+
+	/* This should go after the git_config() call. */
+	repo_init_revisions(the_repository, &rev, prefix);
+}
+----
+
+==== Tweaking `rev_info` For the Walk
+
+We're getting close, but we're still not quite ready to go. Now that `rev` is
+initialized, we can modify it to fit our needs. This is usually done within a
+helper for clarity, so let's add one:
+
+----
+static void final_rev_info_setup(struct rev_info *rev)
+{
+	/* We want to mimick the appearance of `git log --oneline`, so let's
+	 * force oneline format. */
+	get_commit_format("oneline", rev);
+
+	/* Start our revision walk at HEAD. */
+	add_head_to_pending(rev);
+}
+----
+
+[NOTE]
+====
+Instead of using the shorthand `add_head_to_pending()`, you could do
+something like this:
+----
+	struct setup_revision_opt opt;
+
+	memset(&opt, 0, sizeof(opt));                                            
+	opt.def = "HEAD";                                                        
+	opt.revarg_opt = REVARG_COMMITTISH;                                      
+	setup_revisions(argc, argv, rev, &opt);
+----
+Using a `setup_revision_opt` gives you finer control over your walk's starting
+point.
+====
+
+Then let's invoke `final_rev_info_setup()` after the call to
+`repo_init_revisions()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	final_rev_info_setup(&rev);
+} 
+----
+
+Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
+now, this is all we need.
+
+==== Preparing `rev_info` For the Walk
+
+Now that `rev` is all initialized and configured, we've got one more setup step
+before we get rolling. We can do this in a helper, which will both prepare the
+`rev_info` for the walk, and perform the walk itself. Let's start the helper
+with the call to `prepare_revision_walk()`.
+
+----
+static int walken_commit_walk(struct rev_info *rev)
+{
+	/* prepare_revision_walk() gets the final steps ready for a revision
+	 * walk. We check the return value for errors. */
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+}
+----
+
+==== Performing the Walk!
+
+Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
+can also be used as an iterator; we move to the next item in the walk by using
+`get_revision()` repeatedly. Add the listed variable declarations at the top and
+the walk loop below the `prepare_revision_walk()` call within your
+`walken_commit_walk()`:
+
+----
+static int walken_commit_walk(struct rev_info *rev)
+{
+	struct commit *commit;
+	struct strbuf prettybuf;
+	strbuf_init(&prettybuf, 0);
+
+	...
+
+	while ((commit = get_revision(rev)) != NULL) {
+		if (commit == NULL)
+			continue;
+
+		strbuf_reset(&prettybuf);
+		pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
+		printf(_("%s\n"), prettybuf.buf);
+	}
+
+	return 0;
+}
+----
+
+Give it a shot.
+
+----
+$ make
+$ ./bin-wrappers/git walken
+----
+
+You should see all of the subject lines of all the commits in
+your tree's history, in order, ending with the initial commit, "Initial revision
+of "git", the information manager from hell". Congratulations! You've written
+your first revision walk. You can play with printing some additional fields
+from each commit if you're curious; have a look at the functions available in
+`commit.h`.
+
+=== Adding a Filter
+
+Next, let's try to filter the commits we see based on their author. This is
+equivalent to running `git log --author=<pattern>`. We can add a filter by
+modifying `rev_info.grep_filter`, which is a `struct grep_opt`. 
+
+First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add
+`grep_config()` to `git_walken_config()`:
+
+----
+static void init_walken_defaults(void)
+{
+	init_grep_defaults(the_repository);
+}
+
+...
+
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	grep_config(var, value, cb);
+	return git_default_config(var, value, cb);
+}
+----
+
+Next, we can modify the `grep_filter`. This is done with convenience functions
+found in `grep.h`. For fun, we're filtering to only commits from folks using a
+gmail.com email address - a not-very-precise guess at who may be working on Git
+as a hobby. Since we're checking the author, which is a specific line in the
+header, we'll use the `append_header_grep_pattern()` helper. We can use
+the `enum grep_header_field` to indicate which part of the commit header we want
+to search.
+
+In `final_rev_info_setup()`, add your filter line:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
+		"gmail");
+	compile_grep_patterns(&rev->grep_filter);
+
+	...
+}
+----
+
+`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
+it won't work unless we compile it with `compile_grep_patterns()`.
+
+NOTE: If you are using `setup_revisions()` (for example, if you are passing a
+`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
+to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
+
+NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
+wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
+`enum grep_pat_token` for us.
+
+=== Changing the Order
+
+There are a few ways that we can change the order of the commits during a
+revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
+sane orderings.
+
+Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
+`REV_SORT_BY_AUTHOR_DATE`. Add the following:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,                    
+                const char *prefix, struct rev_info *rev)                        
+{
+	...
+
+	rev->topo_order = 1;                                                     
+	rev->sort_order = REV_SORT_BY_COMMIT_DATE;  
+
+	...
+}
+----
+
+Let's output this into a file so we can easily diff it with the walk sorted by
+author date.
+
+----
+$ make
+$ ./bin-wrappers/git walken > commit-date.txt
+----
+
+Then, let's sort by author date and run it again.
+
+----
+static void final_rev_info_setup(int argc, const char **argv,                    
+                const char *prefix, struct rev_info *rev)                        
+{
+	...
+
+	rev->topo_order = 1;                                                     
+	rev->sort_order = REV_SORT_BY_AUTHOR_DATE;  
+
+	...
+}
+----
+
+----
+$ make
+$ ./bin-wrappers/git walken > author-date.txt
+----
+
+Finally, compare the two. This is a little less helpful without object names or
+dates, but hopefully we get the idea.
+
+----
+$ diff -u commit-date.txt author-date.txt
+----
+
+This display is an indicator for the latency between publishing a commit for
+review the first time, and getting it actually merged into master.
+
+Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
+However, it needs to be applied after `add_head_to_pending()` is called. Find
+the line where you call `add_head_to_pending()` and set the `reverse` flag right
+after:
+
+----
+static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
+                struct rev_info *rev) 
+{
+	...
+
+        add_head_to_pending(rev);                                                
+        rev->reverse = 1; 
+
+	...
+}
+----
+
+Run your walk again and note the difference in order. (If you remove the grep
+pattern, you should see the last commit this call gives you as your current
+HEAD.)
+
+== Basic Object Walk
+
+So far we've been walking only commits. But Git has more types of objects than
+that! Let's see if we can walk _all_ objects, and find out some information
+about each one.
+
+We can base our work on an example. `git pack-objects` prepares all kinds of
+objects for packing into a bitmap or packfile. The work we are interested in
+resides in `builtins/pack-objects.c:get_object_list()`; examination of that
+function shows that the all-object walk is being performed by
+`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
+functions reside in `list-objects.c`; examining the source shows that, despite
+the name, these functions traverse all kinds of objects. Let's have a look at
+the arguments to `traverse_commit_list_filtered()`, which are a superset of the
+arguments to the unfiltered version.
+
+- `struct list_objects_filter_options *filter_options`: This is a struct which
+  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
+- `struct rev_info *revs`: This is the `rev_info` used for the walk.
+- `show_commit_fn show_commit`: A callback which will be used to handle each
+  individual commit object.
+- `show_object_fn show_object`: A callback which will be used to handle each
+  non-commit object (so each blob, tree, or tag).
+- `void show_data*`: A context buffer which is passed in turn to `show_commit`
+  and `show_object`.
+- `struct oidset *omitted`: A linked-list of object IDs which the provided
+  filter caused to be omitted.
+
+It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
+instead of needing us to call it repeatedly ourselves. Cool! Let's add the
+callbacks first.
+
+For the sake of this tutorial, we'll simply keep track of how many of each kind
+of object we find. At file scope in `builtin/walken.c` add the following
+tracking variables:
+
+----
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+----
+
+Commits are handled by a different callback than other objects; let's do that
+one first:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+        commit_count++;
+}
+----
+
+Since we have the `struct commit` object, we can look at all the same parts that
+we looked at in our earlier commit-only walk. For the sake of this tutorial,
+though, we'll just increment the commit counter and move on.
+
+The callback for non-commits is a little different, as we'll need to check
+which kind of object we're dealing with:
+
+----
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+        switch (obj->type) {
+        case OBJ_TREE:
+                tree_count++;
+                break;
+        case OBJ_BLOB:
+                blob_count++;
+                break;
+        case OBJ_TAG:
+                tag_count++;
+                break;
+        case OBJ_COMMIT:
+                printf(_("Unexpectedly encountered a commit in "
+                         "walken_show_object!\n"));
+                commit_count++;
+                break;
+        default:
+                printf(_("Unexpected object type %s!\n"),
+                       type_name(obj->type));
+                break;
+        }
+}
+----
+
+To help assure us that we aren't double-counting commits, we'll include some
+complaining if a commit object is routed through our non-commit callback; we'll
+also complain if we see an invalid object type.
+
+Our main object walk implementation is substantially different from our commit
+walk implementation, so let's make a new function to perform the object walk. We
+can perform setup which is applicable to all objects here, too, to keep separate
+from setup which is applicable to commit-only walks.
+
+----
+static int walken_object_walk(struct rev_info *rev)
+{
+}
+----
+
+We'll start by enabling all types of objects in the `struct rev_info`, and
+asking to have our trees and blobs shown in commit order. We'll also exclude
+promisors as the walk becomes more complicated with those types of objects. When
+our settings are ready, we'll perform the normal revision walk setup and
+initialize our tracking variables.
+
+----
+static int walken_object_walk(struct rev_info *rev)
+{
+        rev->tree_objects = 1;
+        rev->blob_objects = 1;
+        rev->tag_objects = 1;
+        rev->tree_blobs_in_commit_order = 1;
+        rev->exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+
+	commit_count = 0;
+	tag_count = 0;
+	blob_count = 0;
+	tree_count = 0;
+----
+
+Unless you cloned or fetched your repository earlier with a filter,
+`exclude_promisor_objects` is unlikely to make a difference, but we'll turn it
+on just to make sure our lives are simple.  We'll also turn on
+`tree_blobs_in_commit_order`, which means that we will walk a commit's tree and
+everything it points to immediately after we find each commit, as opposed to
+waiting for the end and walking through all trees after the commit history has
+been discovered.
+
+Let's start by calling just the unfiltered walk and reporting our counts.
+Complete your implementation of `walken_object_walk()`:
+
+----
+	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+	printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, "
+		 "and %d trees.\n"), commit_count, blob_count, tag_count,
+	       tree_count);
+
+	return 0;
+}
+----
+
+Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
+command line options is out of scope for this tutorial, so we'll just hardcode
+a branch we can change at compile time. Where you call `final_rev_info_setup()`
+and `walken_commit_walk()`, instead branch like so:
+
+----
+	if (1) {
+		add_head_to_pending(&rev);
+		walken_object_walk(&rev);
+	} else {
+		final_rev_info_setup(argc, argv, prefix, &rev);
+		walken_commit_walk(&rev);
+	}
+----
+
+NOTE: For simplicity, we've avoided all the filters and sorts we applied in
+`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
+want, you can certainly use the filters we added before by moving
+`final_rev_info_setup()` out of the conditional and removing the call to
+`add_head_to_pending()`.
+
+Now we can try to run our command! It should take noticeably longer than the
+commit walk, but an examination of the output will give you an idea why - for
+example:
+
+----
+Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
+----
+
+This makes sense. We have more trees than commits because the Git project has
+lots of subdirectories which can change, plus at least one tree per commit. We
+have no tags because we started on a commit (`HEAD`) and while tags can point to
+commits, commits can't point to tags.
+
+NOTE: You will have different counts when you run this yourself! The number of
+objects grows along with the Git project.
+
+=== Adding a Filter
+
+There are a handful of filters that we can apply to the object walk laid out in
+`Documentation/rev-list-options.txt`. These filters are typically useful for
+operations such as creating packfiles or performing a partial or shallow clone.
+They are defined in `list-objects-filter-options.h`. For the purposes of this
+tutorial we will use the "tree:1" filter, which causes the walk to omit all
+trees and blobs which are not directly referenced by commits reachable from the
+commit in `pending` when the walk begins. (In our case, that means we omit trees
+and blobs not directly referenced by HEAD or HEAD's history.)
+
+First, we'll need to `#include "list-objects-filter-options.h`". Then, we can
+set up the `struct list_objects_filter_options` and `struct oidset` at the top
+of `walken_object_walk()`:
+
+----
+static int walken_object_walk(struct rev_info *rev)                              
+{                                                                                
+        struct list_objects_filter_options filter_options = {};                  
+        struct oidset omitted;                                                   
+        oidset_init(&omitted, 0);                                                
+	...
+----
+
+Then, for the sake of simplicity, we'll add a simple build-time branch to use
+our filter or not. Replace the line calling `traverse_commit_list()` with the
+following, which will remind us which kind of walk we've just performed:
+
+----
+        if (1) {                                                                 
+                /* Unfiltered: */                                                
+                printf(_("Unfiltered object walk.\n"));                          
+                traverse_commit_list(rev, walken_show_commit,                    
+                                walken_show_object, NULL);                       
+        } else {                                                                 
+                printf(_("Filtered object walk with filterspec 'tree:1'.\n"));   
+                /*                                                               
+                 * We can parse a tree depth of 1 to demonstrate the kind of     
+                 * filtering that could occur eg during shallow cloning.         
+                 */                                                              
+                parse_list_objects_filter(&filter_options, "tree:1");            
+                                                                                 
+                traverse_commit_list_filtered(&filter_options, rev,              
+                        walken_show_commit, walken_show_object, NULL, &omitted); 
+        } 
+----
+
+`struct list_objects_filter_options` is usually built directly from a command
+line argument, so the module provides an easy way to build one from a string.
+Even though we aren't taking user input right now, we can still build one with
+a hardcoded string using `parse_list_objects_filter()`.
+
+After we run `traverse_commit_list_filtered()` we would also be able to examine
+`omitted`, which is a linked-list of all objects we did not include in our walk.
+Since all omitted objects are included, the performance of
+`traverse_commit_list_filtered()` with a non-null `omitted` arument is equitable
+with the performance of `traverse_commit_list()`; so for our purposes, we leave
+it null. It's easy to provide one and iterate over it, though - check `oidset.h`
+for the declaration of the accessor methods for `oidset`.
+
+With the filter spec "tree:1", we are expecting to see _only_ the root tree for
+each commit; therefore, the tree object count should be less than or equal to
+the number of commits. (For an example of why that's true: `git commit --revert`
+points to the same tree object as its grandparent.)
+
+=== Changing the Order
+
+Finally, let's demonstrate that you can also reorder walks of all objects, not
+just walks of commits. First, we'll make our handlers chattier - modify
+`walken_show_commit()` and `walken_show_object` to print the object as they go:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)                    
+{                                                                                
+        printf(_("commit: %s\n"), oid_to_hex(&cmt->object.oid));                 
+        commit_count++;                                                          
+}                                                                                
+                                                                                 
+static void walken_show_object(struct object *obj, const char *str, void *buf)   
+{                                                                                
+        printf(_("%s: %s\n"), type_name(obj->type), oid_to_hex(&obj->oid));      
+	...
+}
+----
+
+(Try to leave the counter increment logic in place in `walken_show_object()`.)
+
+With only that change, run again (but save yourself some scrollback):
+
+----
+$ ./bin-wrappers/git walken | head -n 10
+----
+
+Take a look at the top commit with `git show` and the OID you printed; it should
+be the same as the output of `git show HEAD`.
+
+Next, let's change a setting on our `struct rev_info` within
+`walken_object_walk()`. Find where you're changing the other settings on `rev`,
+such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add
+another setting at the bottom:
+
+----
+	...
+
+        rev->tree_objects = 1;                                                   
+        rev->blob_objects = 1;                                                   
+        rev->tag_objects = 1;                                                    
+        rev->tree_blobs_in_commit_order = 1;                                     
+        rev->exclude_promisor_objects = 1;                                       
+        rev->reverse = 1;
+
+	...
+----
+
+Now, run again, but this time, let's grab the last handful of objects instead
+of the first handful:
+
+----
+$ make
+$ ./bin-wrappers git walken | tail -n 10
+----
+
+The last commit object given should have the same OID as the one we saw at the
+top before, and running `git show <oid>` with that OID should give you again
+the same results as `git show HEAD`. Furthermore, if you run and examine the
+first ten lines again (with `head` instead of `tail` like we did before applying
+the `reverse` setting), you should see that now the first commit printed is the
+initial commit, `e83c5163`.
+
+== Wrapping Up
+
+Let's review. In this tutorial, we:
+
+- Built a commit walk from the ground up
+- Enabled a grep filter for that commit walk
+- Changed the sort order of that filtered commit walk
+- Built an object walk (tags, commits, trees, and blobs) from the ground up
+- Learned how to add a filter-spec to an object walk
+- Changed the display order of the filtered object walk
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 00/13] example implementation of revwalk tutorial
  2019-06-07  1:07 [PATCH] documentation: add tutorial for revision walking Emily Shaffer
@ 2019-06-07  1:07 ` Emily Shaffer
  2019-06-07  1:07   ` [RFC PATCH 01/13] walken: add infrastructure for revwalk demo Emily Shaffer
                     ` (12 more replies)
  2019-06-07  6:21 ` [PATCH] documentation: add tutorial for revision walking Eric Sunshine
                   ` (3 subsequent siblings)
  4 siblings, 13 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:07 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

This patchset is NOT intended to be merged to the Git project!

This patchset should indicate what a contributor would generate by following the
MyFirstRevWalk tutorial.

I intend to push a feature branch with these patches to my own mirror of
Git on Github (github.com/nasamuffin/git/tree/revwalk). I'm sending them
for review by the list to check for consistency with the Git codebase,
so they aren't a bad example for new contributors.

Thanks for any reviews, all!
 - Emily

Emily Shaffer (13):
  walken: add infrastructure for revwalk demo
  walken: add usage to enable -h
  walken: add placeholder to initialize defaults
  walken: add handler to git_config
  walken: configure rev_info and prepare for walk
  walken: perform our basic revision walk
  walken: filter for authors from gmail address
  walken: demonstrate various topographical sorts
  walken: demonstrate reversing a revision walk list
  walken: add unfiltered object walk from HEAD
  walken: add filtered object walk
  walken: count omitted objects
  walken: reverse the object walk order

 Makefile         |   1 +
 builtin.h        |   1 +
 builtin/walken.c | 263 +++++++++++++++++++++++++++++++++++++++++++++++
 git.c            |   1 +
 4 files changed, 266 insertions(+)
 create mode 100644 builtin/walken.c

-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [RFC PATCH 01/13] walken: add infrastructure for revwalk demo
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
@ 2019-06-07  1:07   ` Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 02/13] walken: add usage to enable -h Emily Shaffer
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:07 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Begin to add scaffolding for `git walken`, a toy command which we will
teach to perform a number of revision walks, in order to demonstrate the
mechanics of revision walking for developers new to the Git project.

This commit is the beginning of an educational series which correspond
to the tutorial in Documentation/MyFirstRevWalk.txt.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 Makefile         |  1 +
 builtin.h        |  1 +
 builtin/walken.c | 14 ++++++++++++++
 3 files changed, 16 insertions(+)
 create mode 100644 builtin/walken.c

diff --git a/Makefile b/Makefile
index 8a7e235352..a25d46c7a3 100644
--- a/Makefile
+++ b/Makefile
@@ -1143,6 +1143,7 @@ BUILTIN_OBJS += builtin/var.o
 BUILTIN_OBJS += builtin/verify-commit.o
 BUILTIN_OBJS += builtin/verify-pack.o
 BUILTIN_OBJS += builtin/verify-tag.o
+BUILTIN_OBJS += builtin/walken.o
 BUILTIN_OBJS += builtin/worktree.o
 BUILTIN_OBJS += builtin/write-tree.o
 
diff --git a/builtin.h b/builtin.h
index ec7e0954c4..c919736c36 100644
--- a/builtin.h
+++ b/builtin.h
@@ -242,6 +242,7 @@ int cmd_var(int argc, const char **argv, const char *prefix);
 int cmd_verify_commit(int argc, const char **argv, const char *prefix);
 int cmd_verify_tag(int argc, const char **argv, const char *prefix);
 int cmd_version(int argc, const char **argv, const char *prefix);
+int cmd_walken(int argc, const char **argv, const char *prefix);
 int cmd_whatchanged(int argc, const char **argv, const char *prefix);
 int cmd_worktree(int argc, const char **argv, const char *prefix);
 int cmd_write_tree(int argc, const char **argv, const char *prefix);
diff --git a/builtin/walken.c b/builtin/walken.c
new file mode 100644
index 0000000000..bfeaa5188d
--- /dev/null
+++ b/builtin/walken.c
@@ -0,0 +1,14 @@
+/*
+ * "git walken"
+ *
+ * Part of the "My First Revision Walk" tutorial.
+ */
+
+#include <stdio.h>
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	printf(_("cmd_walken incoming...\n"));
+	return 0;
+}
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 02/13] walken: add usage to enable -h
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
  2019-06-07  1:07   ` [RFC PATCH 01/13] walken: add infrastructure for revwalk demo Emily Shaffer
@ 2019-06-07  1:08   ` Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 03/13] walken: add placeholder to initialize defaults Emily Shaffer
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:08 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

One requirement of the Git test suite is that all commands support '-h',
which is captured by parse_options(). In order to support this flag, add
a short usage text to walken.c and invoke parse_options().

With this change, we can now add cmd_walken to the builtins set and
expect tests to pass, so we'll do so - cmd_walken is now open for
business.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 12 ++++++++++++
 git.c            |  1 +
 2 files changed, 13 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index bfeaa5188d..5ae7c7d93f 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -6,9 +6,21 @@
 
 #include <stdio.h>
 #include "builtin.h"
+#include "parse-options.h"
+
+static const char * const walken_usage[] = {
+	N_("git walken"),
+	NULL,
+};
 
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
+	struct option options[] = {
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
+
 	printf(_("cmd_walken incoming...\n"));
 	return 0;
 }
diff --git a/git.c b/git.c
index 1bf9c94550..209c42836f 100644
--- a/git.c
+++ b/git.c
@@ -600,6 +600,7 @@ static struct cmd_struct commands[] = {
 	{ "verify-pack", cmd_verify_pack },
 	{ "verify-tag", cmd_verify_tag, RUN_SETUP },
 	{ "version", cmd_version },
+	{ "walken", cmd_walken, RUN_SETUP },
 	{ "whatchanged", cmd_whatchanged, RUN_SETUP },
 	{ "worktree", cmd_worktree, RUN_SETUP | NO_PARSEOPT },
 	{ "write-tree", cmd_write_tree, RUN_SETUP },
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 03/13] walken: add placeholder to initialize defaults
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
  2019-06-07  1:07   ` [RFC PATCH 01/13] walken: add infrastructure for revwalk demo Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 02/13] walken: add usage to enable -h Emily Shaffer
@ 2019-06-07  1:08   ` Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 04/13] walken: add handler to git_config Emily Shaffer
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:08 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Eventually, we will want a good place to initialize default variables
for use during our revision walk(s) in `git walken`. For now, there's
nothing to do here, but let's add the scaffolding so that it's easy to
tell where to put the setup later on.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 5ae7c7d93f..dcee906556 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -13,6 +13,18 @@ static const char * const walken_usage[] = {
 	NULL,
 };
 
+/*
+ * Within init_walken_defaults() we can call into other useful defaults to set
+ * in the global scope or on the_repository. It's okay to borrow from other
+ * functions which are doing something relatively similar to yours.
+ */
+static void init_walken_defaults(void)
+{
+	/* We don't actually need the same components `git log` does; leave this
+	 * empty for now.
+	 */
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
 	struct option options[] = {
@@ -21,6 +33,7 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 
 	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
 
+	init_walken_defaults();
 	printf(_("cmd_walken incoming...\n"));
 	return 0;
 }
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 04/13] walken: add handler to git_config
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
                     ` (2 preceding siblings ...)
  2019-06-07  1:08   ` [RFC PATCH 03/13] walken: add placeholder to initialize defaults Emily Shaffer
@ 2019-06-07  1:08   ` Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 05/13] walken: configure rev_info and prepare for walk Emily Shaffer
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:08 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

For now, we have no configuration options we want to set up for
ourselves, but in the future we may need to. At the very least, we
should invoke git_default_config() for each config option; we will do so
inside of a skeleton config callback so that we know where to add
configuration handling later on when we need it.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index dcee906556..5d1666a5da 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -6,6 +6,7 @@
 
 #include <stdio.h>
 #include "builtin.h"
+#include "config.h"
 #include "parse-options.h"
 
 static const char * const walken_usage[] = {
@@ -25,6 +26,28 @@ static void init_walken_defaults(void)
 	 */
 }
 
+/*
+ * This method will be called back by git_config(). It is used to gather values
+ * from the configuration files available to Git.
+ *
+ * Each time git_config() finds a configuration file entry, it calls this
+ * callback. Then, this function should compare it to entries which concern us,
+ * and make settings changes as necessary.
+ *
+ * If we are called with a config setting we care about, we should use one of
+ * the helpers which exist in config.h to pull out the value for ourselves, i.e.
+ * git_config_string(...) or git_config_bool(...).
+ *
+ * If we don't match anything, we should pass it along to another stakeholder
+ * who may otherwise care - in log's case, grep, gpg, and diff-ui. For our case,
+ * we'll ignore everybody else.
+ */
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	/* For now, let's not bother with anything. */
+	return git_default_config(var, value, cb);
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
 	struct option options[] = {
@@ -34,6 +57,9 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
 
 	init_walken_defaults();
+
+	git_config(git_walken_config, NULL);
+
 	printf(_("cmd_walken incoming...\n"));
 	return 0;
 }
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 05/13] walken: configure rev_info and prepare for walk
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
                     ` (3 preceding siblings ...)
  2019-06-07  1:08   ` [RFC PATCH 04/13] walken: add handler to git_config Emily Shaffer
@ 2019-06-07  1:08   ` Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 06/13] walken: perform our basic revision walk Emily Shaffer
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:08 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

`struct rev_info` is what's used by the struct itself.
`repo_init_revisions()` initializes the struct; then we need to set it
up for the walk we want to perform, which is done in
`final_rev_info_setup()`.

The most important step here is adding the first object we want to walk
to the pending array. Here, we take the easy road and use
`add_head_to_pending()`; there is also a way to do it with
`setup_revision_opt()` and `setup_revisions()` which we demonstrate but
do not use. If we were to forget this step, the walk would do nothing -
the pending queue would be checked, determined to be empty, and the walk
would terminate immediately.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 5d1666a5da..c101db38c7 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -6,6 +6,7 @@
 
 #include <stdio.h>
 #include "builtin.h"
+#include "revision.h"
 #include "config.h"
 #include "parse-options.h"
 
@@ -26,6 +27,35 @@ static void init_walken_defaults(void)
 	 */
 }
 
+/*
+ * cmd_log calls a second set of init after the repo_init_revisions call. We'll
+ * mirror those settings in post_repo_init_init.
+ */
+static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
+		struct rev_info *rev)
+{
+	struct setup_revision_opt opt;
+
+	/* setup_revision_opt is used to pass options to the setup_revisions()
+	 * call. It's got some special items for submodules and other types of
+	 * optimizations, but for now, we'll just point it to HEAD and call it
+	 * good. First we should make sure to reset it. TODO: This is useful for
+	 * more complicated stuff revisions, but a decent shortcut for the first
+	 * pass is add_head_to_pending().
+	 */
+	memset(&opt, 0, sizeof(opt));
+	opt.def = "HEAD";
+	opt.revarg_opt = REVARG_COMMITTISH;
+	//setup_revisions(argc, argv, rev, &opt);
+
+	/* Let's force oneline format. */
+	get_commit_format("oneline", rev);
+	rev->verbose_header = 1;
+	
+	/* add the HEAD to pending so we can start */
+	add_head_to_pending(rev);
+}
+
 /*
  * This method will be called back by git_config(). It is used to gather values
  * from the configuration files available to Git.
@@ -54,12 +84,24 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 		OPT_END()
 	};
 
+	struct rev_info rev;
+
 	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
 
 	init_walken_defaults();
 
 	git_config(git_walken_config, NULL);
 
+	/* Time to set up the walk. repo_init_revisions sets up rev_info with
+	 * the defaults, but then you need to make some configuration settings
+	 * to make it do what's special about your walk.
+	 */
+	repo_init_revisions(the_repository, &rev, prefix);
+
+	/* Before we do the walk, we need to set a starting point. It's not
+	 * coming from opt. */
+	final_rev_info_setup(argc, argv, prefix, &rev);
+
 	printf(_("cmd_walken incoming...\n"));
 	return 0;
 }
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 06/13] walken: perform our basic revision walk
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
                     ` (4 preceding siblings ...)
  2019-06-07  1:08   ` [RFC PATCH 05/13] walken: configure rev_info and prepare for walk Emily Shaffer
@ 2019-06-07  1:08   ` Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 07/13] walken: filter for authors from gmail address Emily Shaffer
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:08 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Add the final steps needed and implement the walk loop itself. We add a
method walken_commit_walk() which performs the final setup to revision.c
and then iterates over commits from get_revision().

This basic walk only prints the subject line of each commit in the
history. It is nearly equivalent to `git log --oneline`.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index c101db38c7..9cf19a24ab 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -7,8 +7,11 @@
 #include <stdio.h>
 #include "builtin.h"
 #include "revision.h"
+#include "commit.h"
 #include "config.h"
 #include "parse-options.h"
+#include "pretty.h"
+#include "line-log.h"
 
 static const char * const walken_usage[] = {
 	N_("git walken"),
@@ -78,6 +81,39 @@ static int git_walken_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
+/*
+ * walken_commit_walk() is invoked by cmd_walken() after initialization. It
+ * does the commit walk only.
+ */
+static int walken_commit_walk(struct rev_info *rev)
+{
+	struct commit *commit;
+	struct strbuf prettybuf;
+
+	strbuf_init(&prettybuf, 0);
+
+
+	/* prepare_revision_walk() gets the final steps ready for a revision
+	 * walk. We check the return value for errors. */
+	if (prepare_revision_walk(rev)) {
+		die(_("revision walk setup failed"));
+	}
+
+	/* Now we can start the real commit walk. get_revision grabs the next
+	 * revision based on the contents of rev.
+	 */
+	rev->diffopt.close_file = 0;
+	while ((commit = get_revision(rev)) != NULL) {
+		if (commit == NULL)
+			continue;
+		strbuf_reset(&prettybuf);
+		pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
+		printf(_("%s\n"), prettybuf.buf);
+
+	}
+	return 0;
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
 	struct option options[] = {
@@ -98,10 +134,15 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 	 */
 	repo_init_revisions(the_repository, &rev, prefix);
 
+	/* We can set our traversal flags here. */
+	rev.always_show_header = 1;
+
 	/* Before we do the walk, we need to set a starting point. It's not
 	 * coming from opt. */
 	final_rev_info_setup(argc, argv, prefix, &rev);
 
+	walken_commit_walk(&rev);
+
 	printf(_("cmd_walken incoming...\n"));
 	return 0;
 }
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 07/13] walken: filter for authors from gmail address
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
                     ` (5 preceding siblings ...)
  2019-06-07  1:08   ` [RFC PATCH 06/13] walken: perform our basic revision walk Emily Shaffer
@ 2019-06-07  1:08   ` Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 08/13] walken: demonstrate various topographical sorts Emily Shaffer
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:08 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

In order to demonstrate how to create grep filters for revision walks,
filter the walk performed by cmd_walken() to print only commits which
are authored by someone with a gmail address.

This commit demonstrates how to append a grep pattern to a
rev_info.grep_filter, to teach new contributors how to create their own
more generalized grep filters during revision walks.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 9cf19a24ab..6c0f4e7b7a 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -12,6 +12,7 @@
 #include "parse-options.h"
 #include "pretty.h"
 #include "line-log.h"
+#include "grep.h"
 
 static const char * const walken_usage[] = {
 	N_("git walken"),
@@ -25,9 +26,8 @@ static const char * const walken_usage[] = {
  */
 static void init_walken_defaults(void)
 {
-	/* We don't actually need the same components `git log` does; leave this
-	 * empty for now.
-	 */
+	/* Needed by our grep filter. */
+	init_grep_defaults(the_repository);
 }
 
 /*
@@ -51,6 +51,10 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
 	opt.revarg_opt = REVARG_COMMITTISH;
 	//setup_revisions(argc, argv, rev, &opt);
 
+	/* Add a grep pattern to the author line in the header. */
+	append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, "gmail");
+	compile_grep_patterns(&rev->grep_filter);
+
 	/* Let's force oneline format. */
 	get_commit_format("oneline", rev);
 	rev->verbose_header = 1;
@@ -77,7 +81,7 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
  */
 static int git_walken_config(const char *var, const char *value, void *cb)
 {
-	/* For now, let's not bother with anything. */
+	grep_config(var, value, cb);
 	return git_default_config(var, value, cb);
 }
 
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 08/13] walken: demonstrate various topographical sorts
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
                     ` (6 preceding siblings ...)
  2019-06-07  1:08   ` [RFC PATCH 07/13] walken: filter for authors from gmail address Emily Shaffer
@ 2019-06-07  1:08   ` Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 09/13] walken: demonstrate reversing a revision walk list Emily Shaffer
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:08 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Order the revision walk by author or commit dates, to demonstrate how to
apply topo_sort to a revision walk.

While following the tutorial, new contributors are guided to run a walk
with each sort and compare the results.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 6c0f4e7b7a..716d31f04e 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -61,6 +61,11 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
 	
 	/* add the HEAD to pending so we can start */
 	add_head_to_pending(rev);
+	
+	/* Let's play with the sort order. */
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_COMMIT_DATE;
+	/* rev->sort_order = REV_SORT_BY_AUTHOR_DATE; */
 }
 
 /*
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 09/13] walken: demonstrate reversing a revision walk list
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
                     ` (7 preceding siblings ...)
  2019-06-07  1:08   ` [RFC PATCH 08/13] walken: demonstrate various topographical sorts Emily Shaffer
@ 2019-06-07  1:08   ` Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 10/13] walken: add unfiltered object walk from HEAD Emily Shaffer
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:08 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

The final installment in the tutorial about sorting revision walk
outputs. This commit reverses the commit list, so that we see newer
commits last (handy since we aren't using a pager).

It's important to note that rev->reverse needs to be set after
add_head_to_pending() or before setup_revisions(). (This is mentioned in
the accompanying tutorial.)

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 716d31f04e..86c8d29c48 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -61,6 +61,9 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
 	
 	/* add the HEAD to pending so we can start */
 	add_head_to_pending(rev);
+
+	/* Reverse the order */
+	rev->reverse = 1;
 	
 	/* Let's play with the sort order. */
 	rev->topo_order = 1;
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 10/13] walken: add unfiltered object walk from HEAD
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
                     ` (8 preceding siblings ...)
  2019-06-07  1:08   ` [RFC PATCH 09/13] walken: demonstrate reversing a revision walk list Emily Shaffer
@ 2019-06-07  1:08   ` Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 11/13] walken: add filtered object walk Emily Shaffer
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:08 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Provide a demonstration of a revision walk which traverses all types of
object, not just commits. This type of revision walk is used for
operations such as creating packfiles and performing fetches or clones,
so it's useful to teach new developers how it works. For starters, only
demonstrate the unfiltered version, as this will make the tutorial
easier to follow.

This commit is part of a tutorial on revision walking.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 77 insertions(+), 2 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 86c8d29c48..408af6c841 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -12,6 +12,7 @@
 #include "parse-options.h"
 #include "pretty.h"
 #include "line-log.h"
+#include "list-objects.h"
 #include "grep.h"
 
 static const char * const walken_usage[] = {
@@ -19,6 +20,11 @@ static const char * const walken_usage[] = {
 	NULL,
 };
 
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+
 /*
  * Within init_walken_defaults() we can call into other useful defaults to set
  * in the global scope or on the_repository. It's okay to borrow from other
@@ -93,6 +99,70 @@ static int git_walken_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	commit_count++;
+}
+
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	switch (obj->type) {
+	case OBJ_TREE:
+		tree_count++;
+		break;
+	case OBJ_BLOB:
+		blob_count++;
+		break;
+	case OBJ_TAG:
+		tag_count++;
+		break;
+	case OBJ_COMMIT:
+		printf(_("Unexpectedly encountered a commit in "
+			 "walken_show_object!\n"));
+		commit_count++;
+		break;
+	default:
+		printf(_("Unexpected object type %s!\n"),
+		       type_name(obj->type));
+		break;
+	}
+}
+
+/*
+ * walken_object_walk() is invoked by cmd_walken() after initialization. It does
+ * a walk of all object types.
+ */
+static int walken_object_walk(struct rev_info *rev)
+{
+	struct list_objects_filter_options filter_options = {};
+	struct oidset omitted;
+	oidset_init(&omitted, 0);
+
+	printf("walken_object_walk beginning...\n");
+
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+	rev->exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+
+	commit_count = 0;
+	tag_count = 0;
+	blob_count = 0;
+	tree_count = 0;
+
+	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+	printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, "
+	       "and %d trees.\n"), commit_count, blob_count, tag_count,
+	       tree_count);
+
+	return 0;
+}
+
 /*
  * walken_commit_walk() is invoked by cmd_walken() after initialization. It
  * does the commit walk only.
@@ -151,9 +221,14 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 
 	/* Before we do the walk, we need to set a starting point. It's not
 	 * coming from opt. */
-	final_rev_info_setup(argc, argv, prefix, &rev);
 
-	walken_commit_walk(&rev);
+	if (1) {
+		add_head_to_pending(&rev);
+		walken_object_walk(&rev);
+	} else {
+		final_rev_info_setup(argc, argv, prefix, &rev);
+		walken_commit_walk(&rev);
+	}
 
 	printf(_("cmd_walken incoming...\n"));
 	return 0;
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 11/13] walken: add filtered object walk
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
                     ` (9 preceding siblings ...)
  2019-06-07  1:08   ` [RFC PATCH 10/13] walken: add unfiltered object walk from HEAD Emily Shaffer
@ 2019-06-07  1:08   ` Emily Shaffer
  2019-06-07 19:15     ` Jeff Hostetler
  2019-06-07  1:08   ` [RFC PATCH 12/13] walken: count omitted objects Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 13/13] walken: reverse the object walk order Emily Shaffer
  12 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:08 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Demonstrate how filter specs can be used when performing a revision walk
of all object types. In this case, tree depth is used. Contributors who
are following the revision walking tutorial will be encouraged to run
the revision walk with and without the filter in order to compare the
number of objects seen in each case.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 408af6c841..f2c98bcd6b 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -13,6 +13,7 @@
 #include "pretty.h"
 #include "line-log.h"
 #include "list-objects.h"
+#include "list-objects-filter-options.h"
 #include "grep.h"
 
 static const char * const walken_usage[] = {
@@ -154,7 +155,22 @@ static int walken_object_walk(struct rev_info *rev)
 	blob_count = 0;
 	tree_count = 0;
 
-	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+	if (1) {
+		/* Unfiltered: */
+		printf(_("Unfiltered object walk.\n"));
+		traverse_commit_list(rev, walken_show_commit,
+				walken_show_object, NULL);
+	} else {
+		printf(_("Filtered object walk with filterspec 'tree:1'.\n"));
+		/*
+		 * We can parse a tree depth of 1 to demonstrate the kind of
+		 * filtering that could occur eg during shallow cloning.
+		 */
+		parse_list_objects_filter(&filter_options, "tree:1");
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, &omitted);
+	}
 
 	printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, "
 	       "and %d trees.\n"), commit_count, blob_count, tag_count,
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 12/13] walken: count omitted objects
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
                     ` (10 preceding siblings ...)
  2019-06-07  1:08   ` [RFC PATCH 11/13] walken: add filtered object walk Emily Shaffer
@ 2019-06-07  1:08   ` Emily Shaffer
  2019-06-07  1:08   ` [RFC PATCH 13/13] walken: reverse the object walk order Emily Shaffer
  12 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:08 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

It may be illuminating to see which objects were not included within a
given filter. This also demonstrates, since filter-spec "tree:1" is
used, that the 'omitted' list contains all objects which are omitted,
not just the first objects which were omitted - that is, it continues to
dereference omitted trees and commits.

This is part of a tutorial on performing revision walks.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index f2c98bcd6b..d93725ee88 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -137,6 +137,9 @@ static int walken_object_walk(struct rev_info *rev)
 {
 	struct list_objects_filter_options filter_options = {};
 	struct oidset omitted;
+	struct oidset_iter oit;
+	struct object_id *oid = NULL;
+	int omitted_count = 0;
 	oidset_init(&omitted, 0);
 
 	printf("walken_object_walk beginning...\n");
@@ -172,9 +175,15 @@ static int walken_object_walk(struct rev_info *rev)
 			walken_show_commit, walken_show_object, NULL, &omitted);
 	}
 
+	/* Count the omitted objects. */
+	oidset_iter_init(&omitted, &oit);
+
+	while ((oid = oidset_iter_next(&oit)))
+		omitted_count++;
+
 	printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, "
-	       "and %d trees.\n"), commit_count, blob_count, tag_count,
-	       tree_count);
+	       "and %d trees; %d omitted objects.\n"), commit_count,
+	       blob_count, tag_count, tree_count, omitted_count);
 
 	return 0;
 }
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH 13/13] walken: reverse the object walk order
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
                     ` (11 preceding siblings ...)
  2019-06-07  1:08   ` [RFC PATCH 12/13] walken: count omitted objects Emily Shaffer
@ 2019-06-07  1:08   ` Emily Shaffer
  12 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-07  1:08 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Demonstrate that just like commit walks, object walks can have their
order reversed. Additionally, add verbose logging of objects encountered
in order to let contributors prove to themselves that the walk has
actually been reversed. With this commit, `git walken` becomes extremely
chatty - it's recommended to pipe the output through `head` or `tail` or
to redirect it into a file.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index d93725ee88..4bfee3a2d7 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -102,11 +102,13 @@ static int git_walken_config(const char *var, const char *value, void *cb)
 
 static void walken_show_commit(struct commit *cmt, void *buf)
 {
+	printf(_("commit: %s\n"), oid_to_hex(&cmt->object.oid));
 	commit_count++;
 }
 
 static void walken_show_object(struct object *obj, const char *str, void *buf)
 {
+	printf(_("%s: %s\n"), type_name(obj->type), oid_to_hex(&obj->oid));
 	switch (obj->type) {
 	case OBJ_TREE:
 		tree_count++;
@@ -149,6 +151,7 @@ static int walken_object_walk(struct rev_info *rev)
 	rev->tag_objects = 1;
 	rev->tree_blobs_in_commit_order = 1;
 	rev->exclude_promisor_objects = 1;
+	rev->reverse = 1;
 
 	if (prepare_revision_walk(rev))
 		die(_("revision walk setup failed"));
-- 
2.22.0.rc1.311.g5d7573a151-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-07  1:07 [PATCH] documentation: add tutorial for revision walking Emily Shaffer
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
@ 2019-06-07  6:21 ` Eric Sunshine
  2019-06-10 21:26   ` Junio C Hamano
  2019-06-17 23:19   ` Emily Shaffer
  2019-06-10 20:25 ` Junio C Hamano
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 102+ messages in thread
From: Eric Sunshine @ 2019-06-07  6:21 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Thu, Jun 6, 2019 at 9:08 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> [...]
> The tutorial covers a basic overview of the structs involved during
> revision walk, setting up a basic commit walk, setting up a basic
> all-object walk, and adding some configuration changes to both walk
> types. It intentionally does not cover how to create new commands or
> search for options from the command line or gitconfigs.
> [...]
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> diff --git a/Documentation/.gitignore b/Documentation/.gitignore
> @@ -12,6 +12,7 @@ cmds-*.txt
>  SubmittingPatches.txt
> +MyFirstRevWalk.txt

The new file itself is named Documentation/MyFirstRevWalk.txt, so why
add it to .gitignore?

> diff --git a/Documentation/MyFirstRevWalk.txt b/Documentation/MyFirstRevWalk.txt
> @@ -0,0 +1,826 @@
> +== What's a Revision Walk?
> +
> +The revision walk is a key concept in Git - this is the process that underpins
> +operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the
> +list of objects is found by walking parent relationships between objects. The
> +revision walk can also be usedto determine whether or not a given object is

s/usedto/used to/

> +reachable from the current HEAD pointer.
> +
> +We'll put our fiddling into a new command. For fun, let's name it `git walken`.
> +Open up a new file `builtin/walken.c` and set up the command handler:
> +
> +----
> +/*
> + * "git walken"
> + *
> + * Part of the "My First Revision Walk" tutorial.
> + */
> +
> +#include <stdio.h>
> +#include "builtin.h"

Git source files must always include cache.h or git-compat-util.h (or,
for builtins, builtin.h) as the very first header since those headers
take care of differences which might crop up as problems with system
headers on various platforms. System headers are included after Git
headers. So, stdio.h should be included after builtin.h. In this case,
however, stdio.h will get pulled in by git-compat-util.h anyhow, so
you need not include it here.

> +Add usage text and `-h` handling, in order to pass the test suite:
> +
> +----
> +static const char * const walken_usage[] = {
> +       N_("git walken"),
> +       NULL,
> +}

Unless you plan on referencing this from functions other than
cmd_walken(), it need not be global.

> +int cmd_walken(int argc, const char **argv, const char *prefix)
> +{
> +       struct option options[] = {
> +               OPT_END()
> +       };
> +
> +       argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
> +
> +       ...

Perhaps comment out the "..." or remove it altogether to avoid having
the compiler barf when the below instructions tell the reader to build
the command.

> +}
> +
> +Also add the relevant line in builtin.h near `cmd_whatchanged()`:

s/builtin.h/`&`/

> +Build and test out your command, without forgetting to ensure the `DEVELOPER`
> +flag is set:
> +
> +----
> +echo DEVELOPER=1 >config.mak

This will blast existing content of 'config.mak' which could be
dangerous. It might be better to suggest >> instead.

> +`name` is the SHA-1 of the object - a 40-digit hex string you may be familiar
> +with from using Git to organize your source in the past. Check the tutorial
> +mentioned above towards the top for a discussion of where the SHA-1 can come
> +from.

With all the recent work to move away from SHA-1 and to support other
hash functions, perhaps just call this "object ID" rather than SHA-1,
and drop mention of it being exactly 40 digits. Instead, perhaps say
something like "...is the hexadecimal representation of the object
ID...".

> +== Basic Commit Walk
> +
> +First, let's see if we can replicate the output of `git log --oneline`. We'll
> +refer back to the implementation frequently to discover norms when performing
> +a revision walk of our own.
> +
> +We'll need all the commits, in order, which preceded our current commit. We will
> +also need to know the name and subject.

This paragraph confused me. I read it as these being prerequisites I
would somehow have to provide in order to write the code. Perhaps it
can be rephrased to state that this is what the code will be doing.
Maybe: "To do this, we will find all the commits, in order, which
precede the current commit, and extract from them the name and subject
[of the commit message]" or something.

> +=== Setting Up
> +
> +Preparing for your revision walk has some distinct stages.
> +
> +1. Perform default setup for this mode, and others which may be invoked.
> +2. Check configuration files for relevant settings.
> +3. Set up the rev_info struct.
> +4. Tweak the initialized rev_info to suit the current walk.
> +5. Prepare the rev_info for the walk.

s/rev_info/`&`/ in the above three lines.

> +==== Default Setups
> +
> +Before you begin to examine user configuration for your revision walk, it's
> +common practice for you to initialize to default any switches that your command
> +may have, as well as ask any other components you may invoke to initialize as
> +well. `git log` does this in `init_log_defaults()`; in that case, one global
> +`decoration_style` is initialized, as well as the grep and diff-UI components.
> +
> +For our purposes, within `git walken`, for the first example we do we don't

"we do we don't"?

> +intend to invoke anything, and we don't have any configuration to do. However,

"invoke anything" is pretty nebulous, as is the earlier "components
you may invoke". A newcomer is unlikely to know what this means, so
perhaps it needs an example (even if just a short parenthetical
comment).

> +we may want to add some later, so for now, we can add an empty placeholder.
> +Create a new function in `builtin/walken.c`:
> +
> +----
> +static void init_walken_defaults(void)
> +{
> +       /* We don't actually need the same components `git log` does; leave this
> +        * empty for now.
> +        */
> +}

/*
 * Git multi-line comments
 * are formatted like this.
 */

> +Add a new function to `builtin/walken.c`:
> +
> +----
> +static int git_walken_config(const char *var, const char *value, void *cb)
> +{
> +       /* For now, let's not bother with anything. */
> +       return git_default_config(var, value, cb);
> +}

Comment is somewhat confusing. Perhaps say instead "We don't currently
have custom configuration, so fall back to git_default_config()" or
something.

> +==== Setting Up `rev_info`
> +
> +Now that we've gathered external configuration and options, it's time to
> +initialize the `rev_info` object which we will use to perform the walk. This is
> +typically done by calling `repo_init_revisions()` with the repository you intend
> +to target, as well as the prefix and your `rev_info` struct.

Maybe: s/the prefix/the `&` argument of `cmd_walken`/

> +Add the `struct rev_info` and the `repo_init_revisions()` call:
> +----
> +int cmd_walken(int argc, const char **argv, const char *prefix)
> +{
> +       /* This can go wherever you like in your declarations.*/
> +       struct rev_info rev;
> +       ...

A less verbose way to indicate the same without using a /* comment */:

    ...
    struct rev_info rev;
    ...

> +       /* This should go after the git_config() call. */
> +       repo_init_revisions(the_repository, &rev, prefix);
> +}
> +----
> +static void final_rev_info_setup(struct rev_info *rev)
> +{
> +       /* We want to mimick the appearance of `git log --oneline`, so let's
> +        * force oneline format. */

s/mimick/mimic/

/*
 * Multi-line
 * comment.
 */

> +==== Preparing `rev_info` For the Walk
> +
> +Now that `rev` is all initialized and configured, we've got one more setup step
> +before we get rolling. We can do this in a helper, which will both prepare the
> +`rev_info` for the walk, and perform the walk itself. Let's start the helper
> +with the call to `prepare_revision_walk()`.
> +
> +----
> +static int walken_commit_walk(struct rev_info *rev)
> +{
> +       /* prepare_revision_walk() gets the final steps ready for a revision
> +        * walk. We check the return value for errors. */

Not at all sure what this comment is trying to say. Also, the second
sentence adds no value to what the code itself already says clearly by
actually checking the return value.

> +       if (prepare_revision_walk(rev))
> +               die(_("revision walk setup failed"));
> +}
> +==== Performing the Walk!
> +
> +Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
> +can also be used as an iterator; we move to the next item in the walk by using
> +`get_revision()` repeatedly. Add the listed variable declarations at the top and
> +the walk loop below the `prepare_revision_walk()` call within your
> +`walken_commit_walk()`:
> +
> +----
> +static int walken_commit_walk(struct rev_info *rev)
> +{
> +       struct commit *commit;
> +       struct strbuf prettybuf;
> +       strbuf_init(&prettybuf, 0);

More idiomatic:

    struct strbuf prettybuf = STRBUF_INIT;

> +       while ((commit = get_revision(rev)) != NULL) {
> +               if (commit == NULL)
> +                       continue;

Idiomatic Git code doesn't mention NULL explicitly in conditionals, so:

    while ((commit = get_revision(rev))) {
        if (!commit)
            continue;

> +               strbuf_reset(&prettybuf);
> +               pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);

Earlier, you talked about calling get_commit_format("oneline",...) to
get "oneline" output, so what is the purpose of CMIT_FMT_ONELINE here?
The text should explain more clearly what these two different
"online"-related bits mean.

> +               printf(_("%s\n"), prettybuf.buf);

There is nothing here to localize, so drop _(...):

    printf("%s\n", prettybuf.buf);

or perhaps just:

    puts(prettybuf.buf);

> +       }
> +
> +       return 0;
> +}

What does the return value signify?

> +=== Adding a Filter
> +
> +Next, we can modify the `grep_filter`. This is done with convenience functions
> +found in `grep.h`. For fun, we're filtering to only commits from folks using a
> +gmail.com email address - a not-very-precise guess at who may be working on Git

Perhaps? s/gmail.com/`&`/

> +=== Changing the Order
> +
> +Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
> +`REV_SORT_BY_AUTHOR_DATE`. Add the following:
> +
> +static void final_rev_info_setup(int argc, const char **argv,
> +                const char *prefix, struct rev_info *rev)
> +{
> +       ...
> +
> +       rev->topo_order = 1;
> +       rev->sort_order = REV_SORT_BY_COMMIT_DATE;

The assignment to rev->sort_order is obvious enough, but the
rev->topo_order assignment is quite mysterious to someone coming to
this tutorial to learn about revision walking, thus some commentary
explaining 'topo_order' would be a good idea.

> +Finally, compare the two. This is a little less helpful without object names or
> +dates, but hopefully we get the idea.
> +
> +----
> +$ diff -u commit-date.txt author-date.txt
> +----
> +
> +This display is an indicator for the latency between publishing a commit for
> +review the first time, and getting it actually merged into master.

Perhaps: s/master/`&`/

Even as a long-time contributor to the project, I had to pause over
this statement for several seconds before figuring out what it was
talking about. Without a long-winded explanation of how topics
progress from submission through 'pu' through 'next' through 'master'
and finally into a release, the above statement is likely to be
mystifying to a newcomer. Perhaps it should be dropped.

> +Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
> +However, it needs to be applied after `add_head_to_pending()` is called. Find

This leaves the reader hanging, wondering why 'reverse' needs to be
assigned after add_head_to_pending().

> +== Basic Object Walk
> +
> +static void walken_show_commit(struct commit *cmt, void *buf)
> +{
> +        commit_count++;
> +}
> +----
> +
> +Since we have the `struct commit` object, we can look at all the same parts that
> +we looked at in our earlier commit-only walk. For the sake of this tutorial,
> +though, we'll just increment the commit counter and move on.

This leaves the reader wondering what 'buf' is and what it's used for.
Presumably this is the 'show_data' context mentioned earlier? If so,
perhaps name this 'ctxt' or 'context' or something and, because this
is a tutorial trying to teach revision walking, say a quick word about
how it might be used.

> +static void walken_show_object(struct object *obj, const char *str, void *buf)
> +{
> +        switch (obj->type) {
> +        [...]
> +        case OBJ_COMMIT:
> +                printf(_("Unexpectedly encountered a commit in "
> +                         "walken_show_object!\n"));
> +                commit_count++;
> +                break;
> +        default:
> +                printf(_("Unexpected object type %s!\n"),
> +                       type_name(obj->type));
> +                break;
> +        }
> +}

Modern practice in this project is to start error messages with
lowercase and to not punctuate the end (no need for "!").

Also, same complaint about the mysterious 'str' argument to the
callback as for 'buf' mentioned above.

> +To help assure us that we aren't double-counting commits, we'll include some
> +complaining if a commit object is routed through our non-commit callback; we'll
> +also complain if we see an invalid object type.

 Are these two error cases "impossible" conditions or can they
actually arise in practice? If the former, use die() instead and drop
use of _(...) so as to avoid confusing the reader into thinking that
the behavior is indeterminate.

> +Our main object walk implementation is substantially different from our commit
> +walk implementation, so let's make a new function to perform the object walk. We
> +can perform setup which is applicable to all objects here, too, to keep separate
> +from setup which is applicable to commit-only walks.
> +
> +----
> +static int walken_object_walk(struct rev_info *rev)
> +{
> +}
> +----

This skeleton function definition is populated immediately below, so
it's not clear why it needs to be shown here.

> +We'll start by enabling all types of objects in the `struct rev_info`, and
> +asking to have our trees and blobs shown in commit order. We'll also exclude
> +promisors as the walk becomes more complicated with those types of objects. When
> +our settings are ready, we'll perform the normal revision walk setup and
> +initialize our tracking variables.
> +
> +----
> +static int walken_object_walk(struct rev_info *rev)
> +{
> +        rev->tree_objects = 1;
> +        rev->blob_objects = 1;
> +        rev->tag_objects = 1;
> +        rev->tree_blobs_in_commit_order = 1;
> +        rev->exclude_promisor_objects = 1;
> +        [...]
> +----
> +
> +Unless you cloned or fetched your repository earlier with a filter,
> +`exclude_promisor_objects` is unlikely to make a difference, but we'll turn it
> +on just to make sure our lives are simple.  We'll also turn on
> +`tree_blobs_in_commit_order`, which means that we will walk a commit's tree and
> +everything it points to immediately after we find each commit, as opposed to
> +waiting for the end and walking through all trees after the commit history has
> +been discovered.

This paragraph is repeating much of the information in the paragraph
just above the code snippet. One or the other should be dropped or
thinned to avoid the duplication.

> +Let's start by calling just the unfiltered walk and reporting our counts.
> +Complete your implementation of `walken_object_walk()`:
> +
> +----
> +       traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
> +
> +       printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, "
> +                "and %d trees.\n"), commit_count, blob_count, tag_count,
> +              tree_count);

Or make the output more useful by having it be machine-parseable (and
not localized):

    printf("commits %d\nblobs %d\ntags %d\ntrees %d\n",
        commit_count, blob_count, tag_cont, tree_count);

> +       return 0;
> +}

What does the return value signify?

> +Now we can try to run our command! It should take noticeably longer than the
> +commit walk, but an examination of the output will give you an idea why - for
> +example:
> +
> +----
> +Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
> +----
> +
> +This makes sense. We have more trees than commits because the Git project has
> +lots of subdirectories which can change, plus at least one tree per commit. We
> +have no tags because we started on a commit (`HEAD`) and while tags can point to
> +commits, commits can't point to tags.
> +
> +NOTE: You will have different counts when you run this yourself! The number of
> +objects grows along with the Git project.

Not sure if this NOTE is useful; after all, you introduced the output
by saying "for example".

> +=== Adding a Filter
> +
> +There are a handful of filters that we can apply to the object walk laid out in
> +`Documentation/rev-list-options.txt`. These filters are typically useful for
> +operations such as creating packfiles or performing a partial or shallow clone.
> +They are defined in `list-objects-filter-options.h`. For the purposes of this
> +tutorial we will use the "tree:1" filter, which causes the walk to omit all
> +trees and blobs which are not directly referenced by commits reachable from the
> +commit in `pending` when the walk begins. (In our case, that means we omit trees
> +and blobs not directly referenced by HEAD or HEAD's history.)

Need some explanation of what 'pending' is, as it's just mysterious as written.

> +First, we'll need to `#include "list-objects-filter-options.h`". Then, we can
> +set up the `struct list_objects_filter_options` and `struct oidset` at the top
> +of `walken_object_walk()`:
> +
> +----
> +static int walken_object_walk(struct rev_info *rev)
> +{
> +        struct list_objects_filter_options filter_options = {};
> +        struct oidset omitted;
> +        oidset_init(&omitted, 0);
> +       ...

This 'omitted' is so far removed from the description of the 'omitted'
argument to traverse_commit_list_filtered() way earlier in the
tutorial that a reader is likely to have forgotten what it's about
(indeed, I did). Some explanation, even if superficial, is likely
warranted here or at least mention that it is explained in more detail
below (as I discovered).

> +After we run `traverse_commit_list_filtered()` we would also be able to examine
> +`omitted`, which is a linked-list of all objects we did not include in our walk.
> +Since all omitted objects are included, the performance of
> +`traverse_commit_list_filtered()` with a non-null `omitted` arument is equitable

s/arument/argument/

> +with the performance of `traverse_commit_list()`; so for our purposes, we leave
> +it null. It's easy to provide one and iterate over it, though - check `oidset.h`
> +for the declaration of the accessor methods for `oidset`.

I'm confused. What are we leaving NULL here?

> +=== Changing the Order
> +
> +Finally, let's demonstrate that you can also reorder walks of all objects, not
> +just walks of commits. First, we'll make our handlers chattier - modify
> +`walken_show_commit()` and `walken_show_object` to print the object as they go:

s/walken_show_object/&()/

> +static void walken_show_commit(struct commit *cmt, void *buf)
> +{
> +        printf(_("commit: %s\n"), oid_to_hex(&cmt->object.oid));
> +        commit_count++;
> +}

Is there a bunch of trailing whitespace on these lines of the code
sample (and in some lines below)?

> +static void walken_show_object(struct object *obj, const char *str, void *buf)
> +{
> +        printf(_("%s: %s\n"), type_name(obj->type), oid_to_hex(&obj->oid));

Localizing "%s: %s\n" via _(...) probably doesn't add value, which
implies that you might not want to be localizing "commit" above
either.

> +(Try to leave the counter increment logic in place in `walken_show_object()`.)
> +
> +With only that change, run again (but save yourself some scrollback):
> +
> +----
> +$ ./bin-wrappers/git walken | head -n 10
> +----
> +
> +Take a look at the top commit with `git show` and the OID you printed; it should
> +be the same as the output of `git show HEAD`.

I think this is the first use of "OID", which might be mysterious and
confusing to a newcomer. Earlier, you used SHA-1 and I suggested
"object ID" instead. Perhaps use the same here, or define OID earlier
in the document in place of SHA-1.

> +Next, let's change a setting on our `struct rev_info` within
> +`walken_object_walk()`. Find where you're changing the other settings on `rev`,
> +such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add
> +another setting at the bottom:

Instead of nebulous "another setting", mentioning 'reverse' explicitly
would make this clearer.

> +        rev->tree_objects = 1;
> +        rev->blob_objects = 1;
> +        rev->tag_objects = 1;
> +        rev->tree_blobs_in_commit_order = 1;
> +        rev->exclude_promisor_objects = 1;
> +        rev->reverse = 1;

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 11/13] walken: add filtered object walk
  2019-06-07  1:08   ` [RFC PATCH 11/13] walken: add filtered object walk Emily Shaffer
@ 2019-06-07 19:15     ` Jeff Hostetler
  2019-06-17 20:30       ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Jeff Hostetler @ 2019-06-07 19:15 UTC (permalink / raw)
  To: Emily Shaffer, git



On 6/6/2019 9:08 PM, Emily Shaffer wrote:
> Demonstrate how filter specs can be used when performing a revision walk
> of all object types. In this case, tree depth is used. Contributors who
> are following the revision walking tutorial will be encouraged to run
> the revision walk with and without the filter in order to compare the
> number of objects seen in each case.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
>   builtin/walken.c | 18 +++++++++++++++++-
>   1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/walken.c b/builtin/walken.c
> index 408af6c841..f2c98bcd6b 100644
> --- a/builtin/walken.c
> +++ b/builtin/walken.c
> @@ -13,6 +13,7 @@
>   #include "pretty.h"
>   #include "line-log.h"
>   #include "list-objects.h"
> +#include "list-objects-filter-options.h"
>   #include "grep.h"
>   
>   static const char * const walken_usage[] = {
> @@ -154,7 +155,22 @@ static int walken_object_walk(struct rev_info *rev)
>   	blob_count = 0;
>   	tree_count = 0;
>   
> -	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
> +	if (1) {
> +		/* Unfiltered: */
> +		printf(_("Unfiltered object walk.\n"));
> +		traverse_commit_list(rev, walken_show_commit,
> +				walken_show_object, NULL);
> +	} else {
> +		printf(_("Filtered object walk with filterspec 'tree:1'.\n"));
> +		/*
> +		 * We can parse a tree depth of 1 to demonstrate the kind of
> +		 * filtering that could occur eg during shallow cloning.
> +		 */

I think I'd avoid the term "shallow clone" here.  Shallow clone
refers to getting a limited commit history.  That's orthogonal from
partial clone and the filtered tree walk that operates *within* a commit
or a series of commits.

Granted, a user might want to do both a shallow and partial clone (and
then later partial fetches), but I wouldn't mix the concepts here.


> +		parse_list_objects_filter(&filter_options, "tree:1");
> +
> +		traverse_commit_list_filtered(&filter_options, rev,
> +			walken_show_commit, walken_show_object, NULL, &omitted);
> +	}
>   
>   	printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, "
>   	       "and %d trees.\n"), commit_count, blob_count, tag_count,
> 

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-07  1:07 [PATCH] documentation: add tutorial for revision walking Emily Shaffer
  2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
  2019-06-07  6:21 ` [PATCH] documentation: add tutorial for revision walking Eric Sunshine
@ 2019-06-10 20:25 ` Junio C Hamano
  2019-06-17 23:50   ` Emily Shaffer
  2019-06-10 20:49 ` Junio C Hamano
  2019-06-26 23:49 ` [PATCH v2] " Emily Shaffer
  4 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2019-06-10 20:25 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Emily Shaffer <emilyshaffer@google.com> writes:

> I'll also be mailing an RFC patchset In-Reply-To this message; the RFC
> patchset should not be merged to Git, as I intend to host it in my own
> mirror as an example. I hosted a similar example for the
> MyFirstContribution tutorial; it's visible at
> https://github.com/nasamuffin/git/tree/psuh. There might be a better
> place to host these so I don't "own" them but I'm not sure what it is;
> keeping them as a live branch somewhere struck me as an okay way to keep
> them from getting stale.

Yes, writing the initial version is one thing, but keeping it alive
is more work and more important.  As the underlying API changes over
time, it will become necessary to update the sample implementation,
but for a newbie who wants to learn by building "walken" on top of
the then-current codebase and API, it would not be so helpful to
show "these 7 patches were for older codebase, and the tip 2 are
incremental updates to adjust to the newer API", so the maintenance
of these sample patches may need different paradigm than the norm
for our main codebase that values incremental polishing.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-07  1:07 [PATCH] documentation: add tutorial for revision walking Emily Shaffer
                   ` (2 preceding siblings ...)
  2019-06-10 20:25 ` Junio C Hamano
@ 2019-06-10 20:49 ` Junio C Hamano
  2019-06-17 23:33   ` Emily Shaffer
  2019-06-26 23:49 ` [PATCH v2] " Emily Shaffer
  4 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2019-06-10 20:49 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Emily Shaffer <emilyshaffer@google.com> writes:

> +My First Revision Walk
> +======================
> +
> +== What's a Revision Walk?
> +
> +The revision walk is a key concept in Git - this is the process that underpins
> +operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the
> +list of objects is found by walking parent relationships between objects. The
> +revision walk can also be usedto determine whether or not a given object is
> +reachable from the current HEAD pointer.

s/usedto/used to/;

> +We'll put our fiddling into a new command. For fun, let's name it `git walken`.
> +Open up a new file `builtin/walken.c` and set up the command handler:
> +
> +----
> +/*
> + * "git walken"
> + *
> + * Part of the "My First Revision Walk" tutorial.
> + */
> +
> +#include <stdio.h>

Bad idea.  In the generic part of the codebase, system headers are
supposed to be supplied by including git-compat-util.h (or cache.h
or builtin.h, that are common header files that begin by including
it and are allowed by CodingGuidelines to be used as such).

> +#include "builtin.h"
> +
> +int cmd_walken(int argc, const char **argv, const char *prefix)
> +{
> +        printf(_("cmd_walken incoming...\n"));
> +        return 0;
> +}
> +----

I wonder if it makes sense to use trace instead of printf, as our
reader has already seen the psuh example for doing the above.

> +Add usage text and `-h` handling, in order to pass the test suite:

It is not wrong per-se, and it indeed is a very good practice to
make sure that our subcommands consistently gives usage text and
short usage.  Encouraging them early is a good idea.

But "in order to pass the test suite" invites "eh, the test suite
does not pass without usage and -h?  why?".

Either drop the mention of "the test suite", or perhaps say
something like

	Add usage text and `-h` handling, like all the subcommands
	should consistently do (our test suite will notice and
	complain if you fail to do so).

i.e. the real purpose is consistency and usability; test suite is
merely an enforcement mechanism.

> +----
> +{ "walken", cmd_walken, RUN_SETUP },
> +----
> +
> +Add it to the `Makefile` near the line for `builtin\worktree.o`:

Backslash intended?

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-07  6:21 ` [PATCH] documentation: add tutorial for revision walking Eric Sunshine
@ 2019-06-10 21:26   ` Junio C Hamano
  2019-06-10 21:38     ` Eric Sunshine
  2019-06-17 23:19   ` Emily Shaffer
  1 sibling, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2019-06-10 21:26 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Emily Shaffer, Git List

Eric Sunshine <sunshine@sunshineco.com> writes:

>> +/*
>> + * "git walken"
>> + *
>> + * Part of the "My First Revision Walk" tutorial.
>> + */
>> +
>> +#include <stdio.h>
>> +#include "builtin.h"
>
> Git source files must always include cache.h or git-compat-util.h (or,
> for builtins, builtin.h) as the very first header since those headers
> take care of differences which might crop up as problems with system
> headers on various platforms. System headers are included after Git
> headers. So, stdio.h should be included after builtin.h. In this case,

Actually the idea is that platform agnostic part of the codebase
should not have to include _any_ system header themselves; instead,
including git-compat-util.h should take care of the system header
files *including* the funky ordering requirements some platforms may
have.  So, we'd want to go stronger than "should be included after";
it shouldn't have to be included or our git-compat-util.h is wrong.

I've started reading the patch myself, but it seems that you've
already done a lot more thorough read-thru than I would have done,
so thank you very much for that.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-10 21:26   ` Junio C Hamano
@ 2019-06-10 21:38     ` Eric Sunshine
  0 siblings, 0 replies; 102+ messages in thread
From: Eric Sunshine @ 2019-06-10 21:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Emily Shaffer, Git List

On Mon, Jun 10, 2019 at 5:27 PM Junio C Hamano <gitster@pobox.com> wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
> >> +#include <stdio.h>
> >> +#include "builtin.h"
> >
> > Git source files must always include cache.h or git-compat-util.h (or,
> > for builtins, builtin.h) as the very first header since those headers
> > take care of differences which might crop up as problems with system
> > headers on various platforms. System headers are included after Git
> > headers. So, stdio.h should be included after builtin.h. In this case,
>
> Actually the idea is that platform agnostic part of the codebase
> should not have to include _any_ system header themselves; instead,
> including git-compat-util.h should take care of the system header
> files *including* the funky ordering requirements some platforms may
> have.  So, we'd want to go stronger than "should be included after";
> it shouldn't have to be included or our git-compat-util.h is wrong.

Thanks for clarifying that.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH 11/13] walken: add filtered object walk
  2019-06-07 19:15     ` Jeff Hostetler
@ 2019-06-17 20:30       ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-17 20:30 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: git

On Fri, Jun 07, 2019 at 03:15:53PM -0400, Jeff Hostetler wrote:
> 
> 
> On 6/6/2019 9:08 PM, Emily Shaffer wrote:
> > Demonstrate how filter specs can be used when performing a revision walk
> > of all object types. In this case, tree depth is used. Contributors who
> > are following the revision walking tutorial will be encouraged to run
> > the revision walk with and without the filter in order to compare the
> > number of objects seen in each case.
> > 
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> >   builtin/walken.c | 18 +++++++++++++++++-
> >   1 file changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/builtin/walken.c b/builtin/walken.c
> > index 408af6c841..f2c98bcd6b 100644
> > --- a/builtin/walken.c
> > +++ b/builtin/walken.c
> > @@ -13,6 +13,7 @@
> >   #include "pretty.h"
> >   #include "line-log.h"
> >   #include "list-objects.h"
> > +#include "list-objects-filter-options.h"
> >   #include "grep.h"
> >   static const char * const walken_usage[] = {
> > @@ -154,7 +155,22 @@ static int walken_object_walk(struct rev_info *rev)
> >   	blob_count = 0;
> >   	tree_count = 0;
> > -	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
> > +	if (1) {
> > +		/* Unfiltered: */
> > +		printf(_("Unfiltered object walk.\n"));
> > +		traverse_commit_list(rev, walken_show_commit,
> > +				walken_show_object, NULL);
> > +	} else {
> > +		printf(_("Filtered object walk with filterspec 'tree:1'.\n"));
> > +		/*
> > +		 * We can parse a tree depth of 1 to demonstrate the kind of
> > +		 * filtering that could occur eg during shallow cloning.
> > +		 */
> 
> I think I'd avoid the term "shallow clone" here.  Shallow clone
> refers to getting a limited commit history.  That's orthogonal from
> partial clone and the filtered tree walk that operates *within* a commit
> or a series of commits.
> 
> Granted, a user might want to do both a shallow and partial clone (and
> then later partial fetches), but I wouldn't mix the concepts here.

It's a valid complaint. I removed the mention of shallow cloning and
replaced it with a reference to the documentation for --filter in
rev-list. Thanks.

 - Emily

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-07  6:21 ` [PATCH] documentation: add tutorial for revision walking Eric Sunshine
  2019-06-10 21:26   ` Junio C Hamano
@ 2019-06-17 23:19   ` Emily Shaffer
  2019-06-19  8:13     ` Eric Sunshine
  1 sibling, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-17 23:19 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Fri, Jun 07, 2019 at 02:21:07AM -0400, Eric Sunshine wrote:
> On Thu, Jun 6, 2019 at 9:08 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > [...]
> > The tutorial covers a basic overview of the structs involved during
> > revision walk, setting up a basic commit walk, setting up a basic
> > all-object walk, and adding some configuration changes to both walk
> > types. It intentionally does not cover how to create new commands or
> > search for options from the command line or gitconfigs.
> > [...]
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > diff --git a/Documentation/.gitignore b/Documentation/.gitignore
> > @@ -12,6 +12,7 @@ cmds-*.txt
> >  SubmittingPatches.txt
> > +MyFirstRevWalk.txt
> 
> The new file itself is named Documentation/MyFirstRevWalk.txt, so why
> add it to .gitignore?

Yep, fixed. Holdover from an initial attempt which named the file
MyFirstRevWalk (no extension), which was then corrected for the earlier
tutorial I sent. Thanks.

> 
> > diff --git a/Documentation/MyFirstRevWalk.txt b/Documentation/MyFirstRevWalk.txt
> > @@ -0,0 +1,826 @@
> > +== What's a Revision Walk?
> > +
> > +The revision walk is a key concept in Git - this is the process that underpins
> > +operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the
> > +list of objects is found by walking parent relationships between objects. The
> > +revision walk can also be usedto determine whether or not a given object is
> 
> s/usedto/used to/

Done.

> 
> > +reachable from the current HEAD pointer.
> > +
> > +We'll put our fiddling into a new command. For fun, let's name it `git walken`.
> > +Open up a new file `builtin/walken.c` and set up the command handler:
> > +
> > +----
> > +/*
> > + * "git walken"
> > + *
> > + * Part of the "My First Revision Walk" tutorial.
> > + */
> > +
> > +#include <stdio.h>
> > +#include "builtin.h"
> 
> Git source files must always include cache.h or git-compat-util.h (or,
> for builtins, builtin.h) as the very first header since those headers
> take care of differences which might crop up as problems with system
> headers on various platforms. System headers are included after Git
> headers. So, stdio.h should be included after builtin.h. In this case,
> however, stdio.h will get pulled in by git-compat-util.h anyhow, so
> you need not include it here.

Done.

> 
> > +Add usage text and `-h` handling, in order to pass the test suite:
> > +
> > +----
> > +static const char * const walken_usage[] = {
> > +       N_("git walken"),
> > +       NULL,
> > +}
> 
> Unless you plan on referencing this from functions other than
> cmd_walken(), it need not be global.

Done; bad C++ habits sneaking in. :)

> 
> > +int cmd_walken(int argc, const char **argv, const char *prefix)
> > +{
> > +       struct option options[] = {
> > +               OPT_END()
> > +       };
> > +
> > +       argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
> > +
> > +       ...
> 
> Perhaps comment out the "..." or remove it altogether to avoid having
> the compiler barf when the below instructions tell the reader to build
> the command.

Hmm. That part I'm not so sure about. I like to use the "..." to
indicate where the code in the snippet should be added around the other
code already in the file - which I suppose it does just as clearly if
it's commented - but I also hope folks are not simply copy-pasting
blindly from the tutorial.

It seems like including uncommented "..." in code tutorials is pretty
common.

I don't think I have a good reason to push back on this except that I
think "/* ... */" is ugly :)

I'll go through and replace "..." with some actual hints about what's
supposed to go there; for example, here I'll replace with "/* print and
return */".

> 
> > +}
> > +
> > +Also add the relevant line in builtin.h near `cmd_whatchanged()`:
> 
> s/builtin.h/`&`/

Done.

> 
> > +Build and test out your command, without forgetting to ensure the `DEVELOPER`
> > +flag is set:
> > +
> > +----
> > +echo DEVELOPER=1 >config.mak
> 
> This will blast existing content of 'config.mak' which could be
> dangerous. It might be better to suggest >> instead.

Done.

> 
> > +`name` is the SHA-1 of the object - a 40-digit hex string you may be familiar
> > +with from using Git to organize your source in the past. Check the tutorial
> > +mentioned above towards the top for a discussion of where the SHA-1 can come
> > +from.
> 
> With all the recent work to move away from SHA-1 and to support other
> hash functions, perhaps just call this "object ID" rather than SHA-1,
> and drop mention of it being exactly 40 digits. Instead, perhaps say
> something like "...is the hexadecimal representation of the object
> ID...".

Good point. Will do.

> 
> > +== Basic Commit Walk
> > +
> > +First, let's see if we can replicate the output of `git log --oneline`. We'll
> > +refer back to the implementation frequently to discover norms when performing
> > +a revision walk of our own.
> > +
> > +We'll need all the commits, in order, which preceded our current commit. We will
> > +also need to know the name and subject.
> 
> This paragraph confused me. I read it as these being prerequisites I
> would somehow have to provide in order to write the code. Perhaps it
> can be rephrased to state that this is what the code will be doing.
> Maybe: "To do this, we will find all the commits, in order, which
> precede the current commit, and extract from them the name and subject
> [of the commit message]" or something.

Yeah, good point. Thanks - this is the kind of thing that sounds logical
when you write it but not when you read it later :)

> 
> > +=== Setting Up
> > +
> > +Preparing for your revision walk has some distinct stages.
> > +
> > +1. Perform default setup for this mode, and others which may be invoked.
> > +2. Check configuration files for relevant settings.
> > +3. Set up the rev_info struct.
> > +4. Tweak the initialized rev_info to suit the current walk.
> > +5. Prepare the rev_info for the walk.
> 
> s/rev_info/`&`/ in the above three lines.

Done.

> 
> > +==== Default Setups
> > +
> > +Before you begin to examine user configuration for your revision walk, it's
> > +common practice for you to initialize to default any switches that your command
> > +may have, as well as ask any other components you may invoke to initialize as
> > +well. `git log` does this in `init_log_defaults()`; in that case, one global
> > +`decoration_style` is initialized, as well as the grep and diff-UI components.
> > +
> > +For our purposes, within `git walken`, for the first example we do we don't
> 
> "we do we don't"?
> 
> > +intend to invoke anything, and we don't have any configuration to do. However,
> 
> "invoke anything" is pretty nebulous, as is the earlier "components
> you may invoke". A newcomer is unlikely to know what this means, so
> perhaps it needs an example (even if just a short parenthetical
> comment).

I have tried to reword this; I hope this is a little clearer.

  Before you begin to examine user configuration for your revision walk, it's      
  common practice for you to initialize to default any switches that your command  
  may have, as well as ask any other components you may invoke to initialize as    
  well (for example, how `git log` also uses the `grep` and `diff` components).    
  `git log` does this in `init_log_defaults()`; in that case, one global           
  `decoration_style` is initialized, as well as the grep and diff-UI components.   
                                                                                   
  For our purposes, within `git walken`, for the first example we don't intend to  
  use any other components within Git, and we don't have any configuration to do.  
  However, we may want to add some later, so for now, we can add an empty          
  placeholder. Create a new function in `builtin/walken.c`: 

> 
> > +we may want to add some later, so for now, we can add an empty placeholder.
> > +Create a new function in `builtin/walken.c`:
> > +
> > +----
> > +static void init_walken_defaults(void)
> > +{
> > +       /* We don't actually need the same components `git log` does; leave this
> > +        * empty for now.
> > +        */
> > +}
> 
> /*
>  * Git multi-line comments
>  * are formatted like this.
>  */

Done; I'll look through the rest of the samples for it too.
> 
> > +Add a new function to `builtin/walken.c`:
> > +
> > +----
> > +static int git_walken_config(const char *var, const char *value, void *cb)
> > +{
> > +       /* For now, let's not bother with anything. */
> > +       return git_default_config(var, value, cb);
> > +}
> 
> Comment is somewhat confusing. Perhaps say instead "We don't currently
> have custom configuration, so fall back to git_default_config()" or
> something.

Done.

> 
> > +==== Setting Up `rev_info`
> > +
> > +Now that we've gathered external configuration and options, it's time to
> > +initialize the `rev_info` object which we will use to perform the walk. This is
> > +typically done by calling `repo_init_revisions()` with the repository you intend
> > +to target, as well as the prefix and your `rev_info` struct.
> 
> Maybe: s/the prefix/the `&` argument of `cmd_walken`/

Done.

> 
> > +Add the `struct rev_info` and the `repo_init_revisions()` call:
> > +----
> > +int cmd_walken(int argc, const char **argv, const char *prefix)
> > +{
> > +       /* This can go wherever you like in your declarations.*/
> > +       struct rev_info rev;
> > +       ...
> 
> A less verbose way to indicate the same without using a /* comment */:
> 
>     ...
>     struct rev_info rev;
>     ...

Per the earlier comment about losing "..." I'm not going to take this
comment; I'll also be replacing the "..." after.

> 
> > +       /* This should go after the git_config() call. */
> > +       repo_init_revisions(the_repository, &rev, prefix);
> > +}
> > +----
> > +static void final_rev_info_setup(struct rev_info *rev)
> > +{
> > +       /* We want to mimick the appearance of `git log --oneline`, so let's
> > +        * force oneline format. */
> 
> s/mimick/mimic/
> 
> /*
>  * Multi-line
>  * comment.
>  */

Done.

> 
> > +==== Preparing `rev_info` For the Walk
> > +
> > +Now that `rev` is all initialized and configured, we've got one more setup step
> > +before we get rolling. We can do this in a helper, which will both prepare the
> > +`rev_info` for the walk, and perform the walk itself. Let's start the helper
> > +with the call to `prepare_revision_walk()`.
> > +
> > +----
> > +static int walken_commit_walk(struct rev_info *rev)
> > +{
> > +       /* prepare_revision_walk() gets the final steps ready for a revision
> > +        * walk. We check the return value for errors. */
> 
> Not at all sure what this comment is trying to say. Also, the second
> sentence adds no value to what the code itself already says clearly by
> actually checking the return value.

Attempted to rephrase. I ended up with:

  /*                                                                       
   * prepare_revision_walk() does the final setup needed by revision.h     
   * before a walk. It may return an error if there is a problem.          
   */ 

Maybe the second sentence still doesn't serve a purpose, but I was
trying to express that prepare_revision_walk() won't die() on its own.

> 
> > +       if (prepare_revision_walk(rev))
> > +               die(_("revision walk setup failed"));
> > +}
> > +==== Performing the Walk!
> > +
> > +Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
> > +can also be used as an iterator; we move to the next item in the walk by using
> > +`get_revision()` repeatedly. Add the listed variable declarations at the top and
> > +the walk loop below the `prepare_revision_walk()` call within your
> > +`walken_commit_walk()`:
> > +
> > +----
> > +static int walken_commit_walk(struct rev_info *rev)
> > +{
> > +       struct commit *commit;
> > +       struct strbuf prettybuf;
> > +       strbuf_init(&prettybuf, 0);
> 
> More idiomatic:
> 
>     struct strbuf prettybuf = STRBUF_INIT;

Ok, I'll change it. I wasn't sure which one was preferred, so this is
super helpful. Thanks.

> 
> > +       while ((commit = get_revision(rev)) != NULL) {
> > +               if (commit == NULL)
> > +                       continue;
> 
> Idiomatic Git code doesn't mention NULL explicitly in conditionals, so:
> 
>     while ((commit = get_revision(rev))) {
>         if (!commit)
>             continue;

Done, thanks.

> 
> > +               strbuf_reset(&prettybuf);
> > +               pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
> 
> Earlier, you talked about calling get_commit_format("oneline",...) to
> get "oneline" output, so what is the purpose of CMIT_FMT_ONELINE here?
> The text should explain more clearly what these two different
> "online"-related bits mean.

Thanks. I've got to research a little on this one. I'll clarify it
before the next reroll.

> 
> > +               printf(_("%s\n"), prettybuf.buf);
> 
> There is nothing here to localize, so drop _(...):
> 
>     printf("%s\n", prettybuf.buf);
> 
> or perhaps just:
> 
>     puts(prettybuf.buf);

Sure, I'll use this one.

> 
> > +       }
> > +
> > +       return 0;
> > +}
> 
> What does the return value signify?

Will double check that I don't use it for anything; I can probalby drop
it and make this a void function instead.

> 
> > +=== Adding a Filter
> > +
> > +Next, we can modify the `grep_filter`. This is done with convenience functions
> > +found in `grep.h`. For fun, we're filtering to only commits from folks using a
> > +gmail.com email address - a not-very-precise guess at who may be working on Git
> 
> Perhaps? s/gmail.com/`&`/

Done.

> 
> > +=== Changing the Order
> > +
> > +Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
> > +`REV_SORT_BY_AUTHOR_DATE`. Add the following:
> > +
> > +static void final_rev_info_setup(int argc, const char **argv,
> > +                const char *prefix, struct rev_info *rev)
> > +{
> > +       ...
> > +
> > +       rev->topo_order = 1;
> > +       rev->sort_order = REV_SORT_BY_COMMIT_DATE;
> 
> The assignment to rev->sort_order is obvious enough, but the
> rev->topo_order assignment is quite mysterious to someone coming to
> this tutorial to learn about revision walking, thus some commentary
> explaining 'topo_order' would be a good idea.

Will do.

> 
> > +Finally, compare the two. This is a little less helpful without object names or
> > +dates, but hopefully we get the idea.
> > +
> > +----
> > +$ diff -u commit-date.txt author-date.txt
> > +----
> > +
> > +This display is an indicator for the latency between publishing a commit for
> > +review the first time, and getting it actually merged into master.
> 
> Perhaps: s/master/`&`/
> 
> Even as a long-time contributor to the project, I had to pause over
> this statement for several seconds before figuring out what it was
> talking about. Without a long-winded explanation of how topics
> progress from submission through 'pu' through 'next' through 'master'
> and finally into a release, the above statement is likely to be
> mystifying to a newcomer. Perhaps it should be dropped.

Such an explanation exists in MyFirstContribution.txt. I will include a
shameless plug to that document here. :)

> 
> > +Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
> > +However, it needs to be applied after `add_head_to_pending()` is called. Find
> 
> This leaves the reader hanging, wondering why 'reverse' needs to be
> assigned after add_head_to_pending().

Will address.

> 
> > +== Basic Object Walk
> > +
> > +static void walken_show_commit(struct commit *cmt, void *buf)
> > +{
> > +        commit_count++;
> > +}
> > +----
> > +
> > +Since we have the `struct commit` object, we can look at all the same parts that
> > +we looked at in our earlier commit-only walk. For the sake of this tutorial,
> > +though, we'll just increment the commit counter and move on.
> 
> This leaves the reader wondering what 'buf' is and what it's used for.
> Presumably this is the 'show_data' context mentioned earlier? If so,
> perhaps name this 'ctxt' or 'context' or something and, because this
> is a tutorial trying to teach revision walking, say a quick word about
> how it might be used.
> 
> > +static void walken_show_object(struct object *obj, const char *str, void *buf)
> > +{
> > +        switch (obj->type) {
> > +        [...]
> > +        case OBJ_COMMIT:
> > +                printf(_("Unexpectedly encountered a commit in "
> > +                         "walken_show_object!\n"));
> > +                commit_count++;
> > +                break;
> > +        default:
> > +                printf(_("Unexpected object type %s!\n"),
> > +                       type_name(obj->type));
> > +                break;
> > +        }
> > +}
> 
> Modern practice in this project is to start error messages with
> lowercase and to not punctuate the end (no need for "!").

Done.
 
> Also, same complaint about the mysterious 'str' argument to the
> callback as for 'buf' mentioned above.

Will do.

> 
> > +To help assure us that we aren't double-counting commits, we'll include some
> > +complaining if a commit object is routed through our non-commit callback; we'll
> > +also complain if we see an invalid object type.
> 
>  Are these two error cases "impossible" conditions or can they
> actually arise in practice? If the former, use die() instead and drop
> use of _(...) so as to avoid confusing the reader into thinking that
> the behavior is indeterminate.

Ah, these should be impossible. I'll turn them into die().

> 
> > +Our main object walk implementation is substantially different from our commit
> > +walk implementation, so let's make a new function to perform the object walk. We
> > +can perform setup which is applicable to all objects here, too, to keep separate
> > +from setup which is applicable to commit-only walks.
> > +
> > +----
> > +static int walken_object_walk(struct rev_info *rev)
> > +{
> > +}
> > +----
> 
> This skeleton function definition is populated immediately below, so
> it's not clear why it needs to be shown here.

Yeah, you're right. Removed the skeleton snippet.

> 
> > +We'll start by enabling all types of objects in the `struct rev_info`, and
> > +asking to have our trees and blobs shown in commit order. We'll also exclude
> > +promisors as the walk becomes more complicated with those types of objects. When
> > +our settings are ready, we'll perform the normal revision walk setup and
> > +initialize our tracking variables.
> > +
> > +----
> > +static int walken_object_walk(struct rev_info *rev)
> > +{
> > +        rev->tree_objects = 1;
> > +        rev->blob_objects = 1;
> > +        rev->tag_objects = 1;
> > +        rev->tree_blobs_in_commit_order = 1;
> > +        rev->exclude_promisor_objects = 1;
> > +        [...]
> > +----
> > +
> > +Unless you cloned or fetched your repository earlier with a filter,
> > +`exclude_promisor_objects` is unlikely to make a difference, but we'll turn it
> > +on just to make sure our lives are simple.  We'll also turn on
> > +`tree_blobs_in_commit_order`, which means that we will walk a commit's tree and
> > +everything it points to immediately after we find each commit, as opposed to
> > +waiting for the end and walking through all trees after the commit history has
> > +been discovered.
> 
> This paragraph is repeating much of the information in the paragraph
> just above the code snippet. One or the other should be dropped or
> thinned to avoid the duplication.

  We'll start by enabling all types of objects in the `struct rev_info`. Unless    
  you cloned or fetched your repository earlier with a filter,                     
  `exclude_promisor_objects` is unlikely to make a difference, but we'll turn it   
  on just to make sure our lives are simple. We'll also turn on                    
  `tree_blobs_in_commit_order`, which means that we will walk a commit's tree and  
  everything it points to immediately after we find each commit, as opposed to     
  waiting for the end and walking through all trees after the commit history has   
  been discovered. With the appropriate settings configured, we are ready to call  
  `prepare_revision_walk()`.

> 
> > +Let's start by calling just the unfiltered walk and reporting our counts.
> > +Complete your implementation of `walken_object_walk()`:
> > +
> > +----
> > +       traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
> > +
> > +       printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, "
> > +                "and %d trees.\n"), commit_count, blob_count, tag_count,
> > +              tree_count);
> 
> Or make the output more useful by having it be machine-parseable (and
> not localized):
> 
>     printf("commits %d\nblobs %d\ntags %d\ntrees %d\n",
>         commit_count, blob_count, tag_cont, tree_count);

I'm not sure whether I agree, since it's a useless toy command only for human
parsing.

> 
> > +       return 0;
> > +}
> 
> What does the return value signify?

Yeah, again I think I can get rid of this; I'll take a look at the final
sample code and make sure it can go.

> 
> > +Now we can try to run our command! It should take noticeably longer than the
> > +commit walk, but an examination of the output will give you an idea why - for
> > +example:
> > +
> > +----
> > +Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
> > +----
> > +
> > +This makes sense. We have more trees than commits because the Git project has
> > +lots of subdirectories which can change, plus at least one tree per commit. We
> > +have no tags because we started on a commit (`HEAD`) and while tags can point to
> > +commits, commits can't point to tags.
> > +
> > +NOTE: You will have different counts when you run this yourself! The number of
> > +objects grows along with the Git project.
> 
> Not sure if this NOTE is useful; after all, you introduced the output
> by saying "for example".

I think you're probably right, but I'll try to fix this by slightly
fleshing out the "for example" phrasing.

> 
> > +=== Adding a Filter
> > +
> > +There are a handful of filters that we can apply to the object walk laid out in
> > +`Documentation/rev-list-options.txt`. These filters are typically useful for
> > +operations such as creating packfiles or performing a partial or shallow clone.
> > +They are defined in `list-objects-filter-options.h`. For the purposes of this
> > +tutorial we will use the "tree:1" filter, which causes the walk to omit all
> > +trees and blobs which are not directly referenced by commits reachable from the
> > +commit in `pending` when the walk begins. (In our case, that means we omit trees
> > +and blobs not directly referenced by HEAD or HEAD's history.)
> 
> Need some explanation of what 'pending' is, as it's just mysterious as written.

Done. I've tried to explain it by drawing a parallel to BFS tree
traversal, although that might be even more confusing as the DAG isn't
quite the same.

> 
> > +First, we'll need to `#include "list-objects-filter-options.h`". Then, we can
> > +set up the `struct list_objects_filter_options` and `struct oidset` at the top
> > +of `walken_object_walk()`:
> > +
> > +----
> > +static int walken_object_walk(struct rev_info *rev)
> > +{
> > +        struct list_objects_filter_options filter_options = {};
> > +        struct oidset omitted;
> > +        oidset_init(&omitted, 0);
> > +       ...
> 
> This 'omitted' is so far removed from the description of the 'omitted'
> argument to traverse_commit_list_filtered() way earlier in the
> tutorial that a reader is likely to have forgotten what it's about
> (indeed, I did). Some explanation, even if superficial, is likely
> warranted here or at least mention that it is explained in more detail
> below (as I discovered).
> 
> > +After we run `traverse_commit_list_filtered()` we would also be able to examine
> > +`omitted`, which is a linked-list of all objects we did not include in our walk.
> > +Since all omitted objects are included, the performance of
> > +`traverse_commit_list_filtered()` with a non-null `omitted` arument is equitable
> 
> s/arument/argument/
> 
> > +with the performance of `traverse_commit_list()`; so for our purposes, we leave
> > +it null. It's easy to provide one and iterate over it, though - check `oidset.h`
> > +for the declaration of the accessor methods for `oidset`.
> 
> I'm confused. What are we leaving NULL here?

Yeah, this isn't very well written. I'll try to rephrase it; I think I
meant to leave `omitted` out of the arglist to
`traverse_comit_list_filtered()` but looks like I didn't manage to do so
in the actual impl.  I think I'll break out an additional section to show
how `--filter-print-omitted` works, instead of just leaving this with an
RTFM at the end. (This will also end up with a reroll of the example
patchset, too.)
> 
> > +=== Changing the Order
> > +
> > +Finally, let's demonstrate that you can also reorder walks of all objects, not
> > +just walks of commits. First, we'll make our handlers chattier - modify
> > +`walken_show_commit()` and `walken_show_object` to print the object as they go:
> 
> s/walken_show_object/&()/

Done.

> 
> > +static void walken_show_commit(struct commit *cmt, void *buf)
> > +{
> > +        printf(_("commit: %s\n"), oid_to_hex(&cmt->object.oid));
> > +        commit_count++;
> > +}
> 
> Is there a bunch of trailing whitespace on these lines of the code
> sample (and in some lines below)?

Oh no, there might be. Bad on me for my copy/paste between vim windows
workflow; I thought I had trimmed from all of them but guess not. I'll
check over the whole doc and fix it up.
> 
> > +static void walken_show_object(struct object *obj, const char *str, void *buf)
> > +{
> > +        printf(_("%s: %s\n"), type_name(obj->type), oid_to_hex(&obj->oid));
> 
> Localizing "%s: %s\n" via _(...) probably doesn't add value, which
> implies that you might not want to be localizing "commit" above
> either.

This is closer to machine-readable, so I'll remove the locale.

> 
> > +(Try to leave the counter increment logic in place in `walken_show_object()`.)
> > +
> > +With only that change, run again (but save yourself some scrollback):
> > +
> > +----
> > +$ ./bin-wrappers/git walken | head -n 10
> > +----
> > +
> > +Take a look at the top commit with `git show` and the OID you printed; it should
> > +be the same as the output of `git show HEAD`.
> 
> I think this is the first use of "OID", which might be mysterious and
> confusing to a newcomer. Earlier, you used SHA-1 and I suggested
> "object ID" instead. Perhaps use the same here, or define OID earlier
> in the document in place of SHA-1.

Yeah, I ended up replacing it above with "object ID (OID)" but this is
far enough along that I think I'll replace it with "object ID" here too.

> 
> > +Next, let's change a setting on our `struct rev_info` within
> > +`walken_object_walk()`. Find where you're changing the other settings on `rev`,
> > +such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add
> > +another setting at the bottom:
> 
> Instead of nebulous "another setting", mentioning 'reverse' explicitly
> would make this clearer.

Done.

> 
> > +        rev->tree_objects = 1;
> > +        rev->blob_objects = 1;
> > +        rev->tag_objects = 1;
> > +        rev->tree_blobs_in_commit_order = 1;
> > +        rev->exclude_promisor_objects = 1;
> > +        rev->reverse = 1;

Thank you so much for taking the time to do a detailed review of this.
This is great feedback.

 - Emily

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-10 20:49 ` Junio C Hamano
@ 2019-06-17 23:33   ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-17 23:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Jun 10, 2019 at 01:49:41PM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > +My First Revision Walk
> > +======================
> > +
> > +== What's a Revision Walk?
> > +
> > +The revision walk is a key concept in Git - this is the process that underpins
> > +operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the
> > +list of objects is found by walking parent relationships between objects. The
> > +revision walk can also be usedto determine whether or not a given object is
> > +reachable from the current HEAD pointer.
> 
> s/usedto/used to/;
Done.
> 
> > +We'll put our fiddling into a new command. For fun, let's name it `git walken`.
> > +Open up a new file `builtin/walken.c` and set up the command handler:
> > +
> > +----
> > +/*
> > + * "git walken"
> > + *
> > + * Part of the "My First Revision Walk" tutorial.
> > + */
> > +
> > +#include <stdio.h>
> 
> Bad idea.  In the generic part of the codebase, system headers are
> supposed to be supplied by including git-compat-util.h (or cache.h
> or builtin.h, that are common header files that begin by including
> it and are allowed by CodingGuidelines to be used as such).
Done.
> 
> > +#include "builtin.h"
> > +
> > +int cmd_walken(int argc, const char **argv, const char *prefix)
> > +{
> > +        printf(_("cmd_walken incoming...\n"));
> > +        return 0;
> > +}
> > +----
> 
> I wonder if it makes sense to use trace instead of printf, as our
> reader has already seen the psuh example for doing the above.

Hmmm. I will think about it and look into the intended use of each. I
hadn't considered using a different logging method.

> 
> > +Add usage text and `-h` handling, in order to pass the test suite:
> 
> It is not wrong per-se, and it indeed is a very good practice to
> make sure that our subcommands consistently gives usage text and
> short usage.  Encouraging them early is a good idea.
> 
> But "in order to pass the test suite" invites "eh, the test suite
> does not pass without usage and -h?  why?".
> 
> Either drop the mention of "the test suite", or perhaps say
> something like
> 
> 	Add usage text and `-h` handling, like all the subcommands
> 	should consistently do (our test suite will notice and
> 	complain if you fail to do so).
> 
> i.e. the real purpose is consistency and usability; test suite is
> merely an enforcement mechanism.

Yeah, you're right. I'll reword this.

> 
> > +----
> > +{ "walken", cmd_walken, RUN_SETUP },
> > +----
> > +
> > +Add it to the `Makefile` near the line for `builtin\worktree.o`:
> 
> Backslash intended?

Nope, typo.


Thanks for the comments, Junio.

 - Emily

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-10 20:25 ` Junio C Hamano
@ 2019-06-17 23:50   ` Emily Shaffer
  2019-06-19 15:17     ` Junio C Hamano
  0 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-17 23:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Jun 10, 2019 at 01:25:14PM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > I'll also be mailing an RFC patchset In-Reply-To this message; the RFC
> > patchset should not be merged to Git, as I intend to host it in my own
> > mirror as an example. I hosted a similar example for the
> > MyFirstContribution tutorial; it's visible at
> > https://github.com/nasamuffin/git/tree/psuh. There might be a better
> > place to host these so I don't "own" them but I'm not sure what it is;
> > keeping them as a live branch somewhere struck me as an okay way to keep
> > them from getting stale.
> 
> Yes, writing the initial version is one thing, but keeping it alive
> is more work and more important.  As the underlying API changes over
> time, it will become necessary to update the sample implementation,
> but for a newbie who wants to learn by building "walken" on top of
> the then-current codebase and API, it would not be so helpful to
> show "these 7 patches were for older codebase, and the tip 2 are
> incremental updates to adjust to the newer API", so the maintenance
> of these sample patches may need different paradigm than the norm
> for our main codebase that values incremental polishing.
>
I'm trying to think of how it would end up working if I tried to use a
Github workflow. I think it wouldn't - someone would open a PR, and then
I'd have to rewrite that change into the appropriate commit in the live
branch and push the entire branch anew. Considering that workflow leaves
me doubly convinced that leaving it in my personal fork indefinitely
might not be wise (what if I become unable to continue maintaining it)?

I wonder if this is something that might fit well in
one of the more closely-associated mirrors, like gitster/git or
gitgitgadget/git - although I wonder if those count as "owned" by Junio
and Johannes, respectively. Hmmmm.

Maybe there's a case for storing them as a set of patch files that are
revision-controlled somewhere within Documentation/? There was some
discussion on the IRC a few weeks ago about trying to organize these
tutorials into their own directory to form a sort of "Git Contribution
101" course, maybe it makes sense to store there?

  Documentation/contributing/myfirstcontrib/MyFirstContrib.txt
  Documentation/contributing/myfirstcontrib/sample/*.patch
  Documentation/contributing/myfirstrevwalk/MyFirstRevWalk.txt
  Documentation/contributing/myfirstrevwalk/sample/*.patch

I don't love the idea of maintaining text patches with the expectation
that they should cleanly apply always, but it might make the idea that
they shouldn't contain 2 patches on the tip for API adjustment more
clear. And it would be probably pretty easy to inflate and build them
with a build target or something. Hmmmmmmmmm.

 - Emily

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-17 23:19   ` Emily Shaffer
@ 2019-06-19  8:13     ` Eric Sunshine
  2019-06-19 23:35       ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Sunshine @ 2019-06-19  8:13 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Mon, Jun 17, 2019 at 7:20 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> On Fri, Jun 07, 2019 at 02:21:07AM -0400, Eric Sunshine wrote:
> > On Thu, Jun 6, 2019 at 9:08 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > > +int cmd_walken(int argc, const char **argv, const char *prefix)
> > > +{
> > > +       struct option options[] = {
> > > +               OPT_END()
> > > +       };
> > > +
> > > +       argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
> > > +
> > > +       ...
> >
> > Perhaps comment out the "..." or remove it altogether to avoid having
> > the compiler barf when the below instructions tell the reader to build
> > the command.
>
> Hmm. That part I'm not so sure about. I like to use the "..." to
> indicate where the code in the snippet should be added around the other
> code already in the file - which I suppose it does just as clearly if
> it's commented - but I also hope folks are not simply copy-pasting
> blindly from the tutorial.
>
> It seems like including uncommented "..." in code tutorials is pretty
> common.

You're right, and that's not what I was "complaining" about. Looking
back at your original email, I see that I somehow got confused and
didn't realize or (quickly) forgot that you had already presented a
_complete_ cmd_walken() snippet just above that spot, and that the
cmd_walken() snippet upon which I was commenting was _incomplete_,
thus the "..." was perfectly justified. Not realizing that the
incomplete cmd_walken() example was just that (incomplete), I
"complained" that the following "compile the project" instructions
would barf on "...".

Maybe I got confused because the tiny cmd_walken() snippets followed
one another so closely (or because I got interrupted several times
during the review), but one way to avoid that would be to present a
single _complete_ snippet from the start, followed by a bit of
explanation. That is, something like this:

    Open up a new file `builtin/walken.c` and set up the command handler:

    ----
    /* "git walken" -- Part of the "My First Revision Walk" tutorial. */
    #include "builtin.h"

    int cmd_walken(int argc, const char **argv, const char *prefix)
    {
        const char * const usage[] = {
            N_("git walken"),
            NULL,
        }
        struct option options[] = {
            OPT_END()
        };

        argc = parse_options(argc, argv, prefix, options, usage, 0);

        printf(_("cmd_walken incoming...\n"));
        return 0;
    }
    ----

    `usage` is the usage message presented by `git -h walken`, and
    `options` will eventually specify command-line options.

> I don't think I have a good reason to push back on this except that I
> think "/* ... */" is ugly :)
>
> I'll go through and replace "..." with some actual hints about what's
> supposed to go there; for example, here I'll replace with "/* print and
> return */".

Seeing as my initial review comment was in error, I'm not sure that
you ought to replace "..." with anything else.

> > "invoke anything" is pretty nebulous, as is the earlier "components
> > you may invoke". A newcomer is unlikely to know what this means, so
> > perhaps it needs an example (even if just a short parenthetical
> > comment).
>
> I have tried to reword this; I hope this is a little clearer.
>
>   Before you begin to examine user configuration for your revision walk, it's
>   common practice for you to initialize to default any switches that your command
>   may have, as well as ask any other components you may invoke to initialize as
>   well (for example, how `git log` also uses the `grep` and `diff` components).
>   `git log` does this in `init_log_defaults()`; in that case, one global
>   `decoration_style` is initialized, as well as the grep and diff-UI components.

By trying to express too many things at once, it's still difficult to
follow. Perhaps use shorter, more easily digestible sentences, like
this:

    Before examining configuration files which may modify command
    behavior, set up default state for switches or options your
    command may have. If your command utilizes other Git components,
    ask them to set up their default states, as well. For instance,
    `git log` takes advantage of `grep` and `diff` functionality; its
    init_log_defaults() sets its own state (`decoration_style`) and
    asks `grep` and `diff` to initialize themselves by calling their
    initialization functions.

> > > +static int walken_commit_walk(struct rev_info *rev)
> > > +{
> > > +       /* prepare_revision_walk() gets the final steps ready for a revision
> > > +        * walk. We check the return value for errors. */
> >
> > Not at all sure what this comment is trying to say. Also, the second
> > sentence adds no value to what the code itself already says clearly by
> > actually checking the return value.
>
> Attempted to rephrase. I ended up with:
>
>   /*
>    * prepare_revision_walk() does the final setup needed by revision.h
>    * before a walk. It may return an error if there is a problem.
>    */
>
> Maybe the second sentence still doesn't serve a purpose, but I was
> trying to express that prepare_revision_walk() won't die() on its own.
>
> >
> > > +       if (prepare_revision_walk(rev))
> > > +               die(_("revision walk setup failed"));

As this is just a toy example, I don't care too strongly about the
unnecessary second sentence. On the other hand, the tutorial is trying
to teach people how to contribute to this project, and on this
project, that sort of pointless comment is likely to be called out in
review. In fact, given that view, the entire comment block is
unnecessary (it doesn't add any value for anyone reviewing or reading
the code), so it might make more sense to drop the comment from the
code entirely, and just do a better job explaining in prose above the
snippet why you are calling that function. For instance:

    ... Let's start the helper with the call to `prepare_revision_walk()`,
    which does the final setup of the `rev_info` structure before it can
    be used.

The above observation may be more widely applicable than to just this
one instance. Don't use in-code comments for what should be explained
in prose if the in-code comment adds no value to the code itself (to
wit, if a reviewer would say "don't repeat in a comment what the code
already says clearly" or "don't use a comment to state the obvious").

> > > +This display is an indicator for the latency between publishing a commit for
> > > +review the first time, and getting it actually merged into master.
> >
> > Perhaps: s/master/`&`/
> >
> > Even as a long-time contributor to the project, I had to pause over
> > this statement for several seconds before figuring out what it was
> > talking about. Without a long-winded explanation of how topics
> > progress from submission through 'pu' through 'next' through 'master'
> > and finally into a release, the above statement is likely to be
> > mystifying to a newcomer. Perhaps it should be dropped.
>
> Such an explanation exists in MyFirstContribution.txt. I will include a
> shameless plug to that document here. :)

I found that this sort of tangential reference disturbed the flow of
the tutorial, leading the mind astray from the otherwise natural
progression of the presentation. So, I'm not convinced that talking
about the migration of a topic in the Git project itself adds value to
this tutorial. The same effect could be seen when commits have been
re-ordered via git-rebase, too, right? Perhaps mention that instead?

> > > +       printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, "
> > > +                "and %d trees.\n"), commit_count, blob_count, tag_count,
> > > +              tree_count);
> >
> > Or make the output more useful by having it be machine-parseable (and
> > not localized):
> >
> >     printf("commits %d\nblobs %d\ntags %d\ntrees %d\n",
> >         commit_count, blob_count, tag_cont, tree_count);
>
> I'm not sure whether I agree, since it's a useless toy command only for human
> parsing.

True, it's not a big deal, and I don't insist upon it. But, if you
mention in prose that this output is easily machine-parseable, then
perhaps that nudges the reader a bit in the direction of thinking
about porcelain vs. plumbing, which is something a contributor to this
project eventually has to be concerned with (the sooner, the better).

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-17 23:50   ` Emily Shaffer
@ 2019-06-19 15:17     ` Junio C Hamano
  2019-06-20 21:06       ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2019-06-19 15:17 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Emily Shaffer <emilyshaffer@google.com> writes:

> Maybe there's a case for storing them as a set of patch files that are
> revision-controlled somewhere within Documentation/? There was some
> discussion on the IRC a few weeks ago about trying to organize these
> tutorials into their own directory to form a sort of "Git Contribution
> 101" course, maybe it makes sense to store there?
>
>   Documentation/contributing/myfirstcontrib/MyFirstContrib.txt
>   Documentation/contributing/myfirstcontrib/sample/*.patch
>   Documentation/contributing/myfirstrevwalk/MyFirstRevWalk.txt
>   Documentation/contributing/myfirstrevwalk/sample/*.patch
>
> I don't love the idea of maintaining text patches with the expectation
> that they should cleanly apply always,...

Well, I actually think the above organization does match the intent
of the "My first contribution codelab" perfectly.  When the codebase,
the workflow used by the project, and/or the coding or documentation
guideline gets updated, the text that documents how to contribute to
the project as well as the sample patches must be updated to match
the updated reality.

I agree with you that maintaining the *.patch files to always
cleanly apply is less than ideal.  A topic to update the sample
patches and tutorial text may be competing with another topic that
updates the very API the tutorials are teaching, and the sample
patches may not apply cleanly when two topics are merged together,
even if the "update sample patches and tutorial text" topic does
update them to match the API at the tip of the topic branch itself.
One thing we _could_ do is to pin the target version of the codebase
for the sake of tutorial.  IOW, the sample/*.patch may not apply
cleanly to the version of the tree these patches were taken from,
but would always apply cleanly to the most recent released version
before the last update to the tutorial, or something like that.

Also having to review the patch to sample/*.patch files will be
unpleasant.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-19  8:13     ` Eric Sunshine
@ 2019-06-19 23:35       ` Emily Shaffer
  2019-06-23 18:54         ` Eric Sunshine
  0 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-19 23:35 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Wed, Jun 19, 2019 at 04:13:35AM -0400, Eric Sunshine wrote:
> On Mon, Jun 17, 2019 at 7:20 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > On Fri, Jun 07, 2019 at 02:21:07AM -0400, Eric Sunshine wrote:
> > > On Thu, Jun 6, 2019 at 9:08 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > > > +int cmd_walken(int argc, const char **argv, const char *prefix)
> > > > +{
> > > > +       struct option options[] = {
> > > > +               OPT_END()
> > > > +       };
> > > > +
> > > > +       argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
> > > > +
> > > > +       ...
> > >
> > > Perhaps comment out the "..." or remove it altogether to avoid having
> > > the compiler barf when the below instructions tell the reader to build
> > > the command.
> >
> > Hmm. That part I'm not so sure about. I like to use the "..." to
> > indicate where the code in the snippet should be added around the other
> > code already in the file - which I suppose it does just as clearly if
> > it's commented - but I also hope folks are not simply copy-pasting
> > blindly from the tutorial.
> >
> > It seems like including uncommented "..." in code tutorials is pretty
> > common.
> 
> You're right, and that's not what I was "complaining" about. Looking
> back at your original email, I see that I somehow got confused and
> didn't realize or (quickly) forgot that you had already presented a
> _complete_ cmd_walken() snippet just above that spot, and that the
> cmd_walken() snippet upon which I was commenting was _incomplete_,
> thus the "..." was perfectly justified. Not realizing that the
> incomplete cmd_walken() example was just that (incomplete), I
> "complained" that the following "compile the project" instructions
> would barf on "...".
> 
> Maybe I got confused because the tiny cmd_walken() snippets followed
> one another so closely (or because I got interrupted several times
> during the review), but one way to avoid that would be to present a
> single _complete_ snippet from the start, followed by a bit of
> explanation. That is, something like this:
> 
>     Open up a new file `builtin/walken.c` and set up the command handler:
> 
>     ----
>     /* "git walken" -- Part of the "My First Revision Walk" tutorial. */
>     #include "builtin.h"
> 
>     int cmd_walken(int argc, const char **argv, const char *prefix)
>     {
>         const char * const usage[] = {
>             N_("git walken"),
>             NULL,
>         }
>         struct option options[] = {
>             OPT_END()
>         };
> 
>         argc = parse_options(argc, argv, prefix, options, usage, 0);
> 
>         printf(_("cmd_walken incoming...\n"));
>         return 0;
>     }
>     ----
> 
>     `usage` is the usage message presented by `git -h walken`, and
>     `options` will eventually specify command-line options.

Hmm. I can say that I personally would find that much more difficult to
follow interactively, and I'd be tempted to copy-and-paste and skim
through the wall of text if I was presented with such a snippet.
However, I could also imagine the reverse - someone becoming tired of
having their hand held through a fairly straightforward implementation,
when they're perfectly capable of reading a long description and would
just like to get on with it.

I'm really curious about what others think in this scenario, since I
imagine it boils down to individual learning styles.

(Maybe we can split the difference and present a complete patch or new
function, followed by a breakdown? That would end up even more verbose
than the current approach, though.)

... Now that I'm thinking more about this, and reading some of your
later comments on this mail, I think it might be valuable to lean on the
sample patchset for complete code samples, especially if we figure a
good way to distribute the patchset near the tutorial (as Junio and I
are discussing in another branch of this thread). Then we can keep the
tutorial concise, but have the complete code available for those who
prefer to look there.

> 
> > I don't think I have a good reason to push back on this except that I
> > think "/* ... */" is ugly :)
> >
> > I'll go through and replace "..." with some actual hints about what's
> > supposed to go there; for example, here I'll replace with "/* print and
> > return */".
> 
> Seeing as my initial review comment was in error, I'm not sure that
> you ought to replace "..." with anything else.
> 
> > > "invoke anything" is pretty nebulous, as is the earlier "components
> > > you may invoke". A newcomer is unlikely to know what this means, so
> > > perhaps it needs an example (even if just a short parenthetical
> > > comment).
> >
> > I have tried to reword this; I hope this is a little clearer.
> >
> >   Before you begin to examine user configuration for your revision walk, it's
> >   common practice for you to initialize to default any switches that your command
> >   may have, as well as ask any other components you may invoke to initialize as
> >   well (for example, how `git log` also uses the `grep` and `diff` components).
> >   `git log` does this in `init_log_defaults()`; in that case, one global
> >   `decoration_style` is initialized, as well as the grep and diff-UI components.
> 
> By trying to express too many things at once, it's still difficult to
> follow. Perhaps use shorter, more easily digestible sentences, like
> this:
> 
>     Before examining configuration files which may modify command
>     behavior, set up default state for switches or options your
>     command may have. If your command utilizes other Git components,
>     ask them to set up their default states, as well. For instance,
>     `git log` takes advantage of `grep` and `diff` functionality; its
>     init_log_defaults() sets its own state (`decoration_style`) and
>     asks `grep` and `diff` to initialize themselves by calling their
>     initialization functions.

Yeah, I like this a lot. Thanks! I took it word for word; will be adding
you to the Helped-by line of the commit.

> As this is just a toy example, I don't care too strongly about the
> unnecessary second sentence. On the other hand, the tutorial is trying
> to teach people how to contribute to this project, and on this
> project, that sort of pointless comment is likely to be called out in
> review. In fact, given that view, the entire comment block is
> unnecessary (it doesn't add any value for anyone reviewing or reading
> the code), so it might make more sense to drop the comment from the
> code entirely, and just do a better job explaining in prose above the
> snippet why you are calling that function. For instance:
> 
>     ... Let's start the helper with the call to `prepare_revision_walk()`,
>     which does the final setup of the `rev_info` structure before it can
>     be used.
> 
> The above observation may be more widely applicable than to just this
> one instance. Don't use in-code comments for what should be explained
> in prose if the in-code comment adds no value to the code itself (to
> wit, if a reviewer would say "don't repeat in a comment what the code
> already says clearly" or "don't use a comment to state the obvious").

I'm of two minds about this. On the one hand, I'm somewhat in favor of
leaving contextual, informational comments in the sample code, so the
sample code can teach on its own without the tutorial (specifically, I
mean the patchset that was sent alongside this one as RFC). On the other
hand, you're right that adding these informational comments doesn't
model best practices for real commits.

I don't have a strong opposition to removing those comments from the
in-place samples in the tutorial itself. But I do think it's useful to
include them in the sample patchset, which is intended as an additional
learning tool, rather than as a pristine code example - especially if we
make it clear in the commit messages there.

> 
> > > > +This display is an indicator for the latency between publishing a commit for
> > > > +review the first time, and getting it actually merged into master.
> > >
> > > Perhaps: s/master/`&`/
> > >
> > > Even as a long-time contributor to the project, I had to pause over
> > > this statement for several seconds before figuring out what it was
> > > talking about. Without a long-winded explanation of how topics
> > > progress from submission through 'pu' through 'next' through 'master'
> > > and finally into a release, the above statement is likely to be
> > > mystifying to a newcomer. Perhaps it should be dropped.
> >
> > Such an explanation exists in MyFirstContribution.txt. I will include a
> > shameless plug to that document here. :)
> 
> I found that this sort of tangential reference disturbed the flow of
> the tutorial, leading the mind astray from the otherwise natural
> progression of the presentation. So, I'm not convinced that talking
> about the migration of a topic in the Git project itself adds value to
> this tutorial. The same effect could be seen when commits have been
> re-ordered via git-rebase, too, right? Perhaps mention that instead?

Yeah, that's a good point. I'll try to mention it in a more
universally-applicable way, like you suggested.

> 
> > > > +       printf(_("Object walk completed. Found %d commits, %d blobs, %d tags, "
> > > > +                "and %d trees.\n"), commit_count, blob_count, tag_count,
> > > > +              tree_count);
> > >
> > > Or make the output more useful by having it be machine-parseable (and
> > > not localized):
> > >
> > >     printf("commits %d\nblobs %d\ntags %d\ntrees %d\n",
> > >         commit_count, blob_count, tag_cont, tree_count);
> >
> > I'm not sure whether I agree, since it's a useless toy command only for human
> > parsing.
> 
> True, it's not a big deal, and I don't insist upon it. But, if you
> mention in prose that this output is easily machine-parseable, then
> perhaps that nudges the reader a bit in the direction of thinking
> about porcelain vs. plumbing, which is something a contributor to this
> project eventually has to be concerned with (the sooner, the better).

Oh, that's a very good point. I'll frame it that way - that's a handy
place to slip in some bonus context about Git. Thanks.

  NOTE: We aren't localizing the printf here because we have purposefully
  formatted it in a machine-parseable way. Commands in Git are divided into
  "plumbing" and "porcelain"; the "plumbing" commands are machine-parseable and
  intended for use in scripts, while the "porcelain" commands are intended for
  human interaction. Output intended for script usage doesn't need to be
  localized; output intended for humans does.


Thanks again for the review effort.

 - Emily

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-19 15:17     ` Junio C Hamano
@ 2019-06-20 21:06       ` Emily Shaffer
  2019-07-13  0:39         ` Josh Steadmon
  0 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-20 21:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Jun 19, 2019 at 08:17:29AM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > Maybe there's a case for storing them as a set of patch files that are
> > revision-controlled somewhere within Documentation/? There was some
> > discussion on the IRC a few weeks ago about trying to organize these
> > tutorials into their own directory to form a sort of "Git Contribution
> > 101" course, maybe it makes sense to store there?
> >
> >   Documentation/contributing/myfirstcontrib/MyFirstContrib.txt
> >   Documentation/contributing/myfirstcontrib/sample/*.patch
> >   Documentation/contributing/myfirstrevwalk/MyFirstRevWalk.txt
> >   Documentation/contributing/myfirstrevwalk/sample/*.patch
> >
> > I don't love the idea of maintaining text patches with the expectation
> > that they should cleanly apply always,...
> 
> Well, I actually think the above organization does match the intent
> of the "My first contribution codelab" perfectly.  When the codebase,
> the workflow used by the project, and/or the coding or documentation
> guideline gets updated, the text that documents how to contribute to
> the project as well as the sample patches must be updated to match
> the updated reality.
> 
> I agree with you that maintaining the *.patch files to always
> cleanly apply is less than ideal.  A topic to update the sample
> patches and tutorial text may be competing with another topic that
> updates the very API the tutorials are teaching, and the sample
> patches may not apply cleanly when two topics are merged together,
> even if the "update sample patches and tutorial text" topic does
> update them to match the API at the tip of the topic branch itself.
> One thing we _could_ do is to pin the target version of the codebase
> for the sake of tutorial.  IOW, the sample/*.patch may not apply
> cleanly to the version of the tree these patches were taken from,
> but would always apply cleanly to the most recent released version
> before the last update to the tutorial, or something like that.
> 
> Also having to review the patch to sample/*.patch files will be
> unpleasant.

I wonder if we can ease some pain for both of the above issues by
including some scripts to "inflate" the patch files into a topic branch,
or figure out some more easily-reviewed (but more complicated, I
suppose) method for sending updates to the sample/*.patch files.

Imagining workflows like this:

Doing the tutorial:
 - In worktree a/.
 - Run a magic script which creates a worktree with the sample code, b/.
 - Read through a/Documentation/MyFirstContribution.txt and generate
   a/builtins/psuh.c, referring to b/builtins/psuh.c if confused.

Rebasing the tutorial patches:
 - In worktree a/.
 - Run a magic script which checks out a new branch at the last known
   good base for the patchset, then applies all the patches.
 - Now faced with, likely, a topic branch based on v<n-1> (where n is
   latest release).
 - `git rebase v<n> -x (make && ./bin-wrappers/git psuh)`
 - Interactively fix conflicts
 - Run a script to generate a magic interdiff from the old version of
   patches
 - Mail out magic interdiff to list and get approval
 - (Maybe maintainer does this when interdiff is happy? Maybe updater
   does this when review looks good?) Run a magic script to regenerate
   patches from rebased branch, and note somewhere they are based on
   v<n>
 - Mail sample/*.patch (based on v<n>) to list (if maintainer rolled the
   patches after interdiff approval, this step can be skipped)

(This seems to still be a lot of steps, even with the magic script..)

Alternatively, for the same process:
 Updater: Run a magic script to create topic branch based on v<n-1>
   (like before)
 U: `git rebase v<n> -x (make && ./bin-wrappers/git psuh)`
 U: Interactively fix conflicts
 U: Run a script to turn topic branch back into sample/*.patch
 U: Send email with changes to sample/*.patch (this will be ugly and
    unreadable) - message ID <M1>
 Reviewer: Run a magic script, providing <M1> argument, which grabs the
    diff-of-.patch and generates an interdiff, or a topic branch based
    on v<n>
 R: Send comments explaining where issue is (tricky to find where to
    inline in the diff-of-.patch)
 U: Reroll diff-of-.patch email
 R: Accepts
 Maintainer: Applies diff-of-.patch email normally

 I suppose for the first suggestion, there ends up being quite a lot of
 onus on the maintainer, and a lot of trust that there is no difference
 between the RFC easy-to-read interdiff patchset. For the second
 suggestion, there ends up being onus on the reviewers to run some
 magical script. Maybe we can split the difference by expecting Updater
 to provide the interdiff below the --- line? Maybe in practice the
 diff-of-.patch isn't so unreadable, if it's only minor changes needed
 to bring the tutorial up to latest?

 I'm not sure there's a way to make this totally painless using email
 tools.

  - Emily

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-19 23:35       ` Emily Shaffer
@ 2019-06-23 18:54         ` Eric Sunshine
  0 siblings, 0 replies; 102+ messages in thread
From: Eric Sunshine @ 2019-06-23 18:54 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Wed, Jun 19, 2019 at 7:36 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> On Wed, Jun 19, 2019 at 04:13:35AM -0400, Eric Sunshine wrote:
> > Maybe I got confused because the tiny cmd_walken() snippets followed
> > one another so closely (or because I got interrupted several times
> > during the review), but one way to avoid that would be to present a
> > single _complete_ snippet from the start, followed by a bit of
> > explanation. [...]
>
> Hmm. I can say that I personally would find that much more difficult to
> follow interactively, and I'd be tempted to copy-and-paste and skim
> through the wall of text if I was presented with such a snippet.
> However, I could also imagine the reverse - someone becoming tired of
> having their hand held through a fairly straightforward implementation,
> when they're perfectly capable of reading a long description and would
> just like to get on with it.
>
> (Maybe we can split the difference and present a complete patch or new
> function, followed by a breakdown? That would end up even more verbose
> than the current approach, though.)

It might not be that important and may not need fixing considering
that I read it correctly the second time, and don't know how I managed
to get confused on the first read.

> > As this is just a toy example, I don't care too strongly about the
> > unnecessary second sentence. On the other hand, the tutorial is trying
> > to teach people how to contribute to this project, and on this
> > project, that sort of pointless comment is likely to be called out in
> > review. In fact, given that view, the entire comment block is
> > unnecessary (it doesn't add any value for anyone reviewing or reading
> > the code), so it might make more sense to drop the comment from the
> > code entirely, and just do a better job explaining in prose above the
> > snippet why you are calling that function. For instance:
> >
> >     ... Let's start the helper with the call to `prepare_revision_walk()`,
> >     which does the final setup of the `rev_info` structure before it can
> >     be used.
> >
> > The above observation may be more widely applicable than to just this
> > one instance. Don't use in-code comments for what should be explained
> > in prose if the in-code comment adds no value to the code itself (to
> > wit, if a reviewer would say "don't repeat in a comment what the code
> > already says clearly" or "don't use a comment to state the obvious").
>
> I'm of two minds about this. On the one hand, I'm somewhat in favor of
> leaving contextual, informational comments in the sample code, so the
> sample code can teach on its own without the tutorial (specifically, I
> mean the patchset that was sent alongside this one as RFC). On the other
> hand, you're right that adding these informational comments doesn't
> model best practices for real commits.
>
> I don't have a strong opposition to removing those comments from the
> in-place samples in the tutorial itself. But I do think it's useful to
> include them in the sample patchset, which is intended as an additional
> learning tool, rather than as a pristine code example - especially if we
> make it clear in the commit messages there.

Indeed, having the comments in the sample patch-set makes sense for
people who learn better that way (by seeing a complete piece of code).

> > > > Or make the output more useful by having it be machine-parseable (and
> > > > not localized):
> > > >
> > > >     printf("commits %d\nblobs %d\ntags %d\ntrees %d\n",
> > > >         commit_count, blob_count, tag_cont, tree_count);
> > >
> > > I'm not sure whether I agree, since it's a useless toy command only for human
> > > parsing.
> >
> > True, it's not a big deal, and I don't insist upon it. But, if you
> > mention in prose that this output is easily machine-parseable, then
> > perhaps that nudges the reader a bit in the direction of thinking
> > about porcelain vs. plumbing, which is something a contributor to this
> > project eventually has to be concerned with (the sooner, the better).
>
> Oh, that's a very good point. I'll frame it that way - that's a handy
> place to slip in some bonus context about Git. Thanks.
>
>   NOTE: We aren't localizing the printf here because we have purposefully
>   formatted it in a machine-parseable way. Commands in Git are divided into
>   "plumbing" and "porcelain"; the "plumbing" commands are machine-parseable and
>   intended for use in scripts, while the "porcelain" commands are intended for
>   human interaction. Output intended for script usage doesn't need to be
>   localized; output intended for humans does.

I'd go with stronger language than "doesn't need to be localized" and
say instead that plumbing output "must not be localized" since scripts
depend upon stable output (and stable API).

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v2] documentation: add tutorial for revision walking
  2019-06-07  1:07 [PATCH] documentation: add tutorial for revision walking Emily Shaffer
                   ` (3 preceding siblings ...)
  2019-06-10 20:49 ` Junio C Hamano
@ 2019-06-26 23:49 ` Emily Shaffer
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
  2019-07-01 20:19   ` [PATCH v3] documentation: add tutorial for revision walking Emily Shaffer
  4 siblings, 2 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:49 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Eric Sunshine, Junio C Hamano

Existing documentation on revision walks seems to be primarily intended
as a reference for those already familiar with the procedure. This
tutorial attempts to give an entry-level guide to a couple of bare-bones
revision walks so that new Git contributors can learn the concepts
without having to wade through options parsing or special casing.

The target audience is a Git contributor who is just getting started
with the concept of revision walking. The goal is to prepare this
contributor to be able to understand and modify existing commands which
perform revision walks more easily, although it will also prepare
contributors to create new commands which perform walks.

The tutorial covers a basic overview of the structs involved during
revision walk, setting up a basic commit walk, setting up a basic
all-object walk, and adding some configuration changes to both walk
types. It intentionally does not cover how to create new commands or
search for options from the command line or gitconfigs.

There is an associated patchset at
https://github.com/nasamuffin/git/tree/revwalk that contains a reference
implementation of the code generated by this tutorial.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
---
Significant changes since r1 related to Eric and Junio's comments:
 - Formatting fixes (multiline comments, whitespaces, etc.)
 - Clarifications:
   - oneline options
   - `buf` arg in object walk callback
   - topo_order
   - placement of reverse setting
   - '-h' handling
 - Converted patchset to plumbing tool, using trace where appropriate
 - Removed unnecessary return values in walken_commit_walk and
   walken_object_walk
 - Added an entire section devoted to the `omitted` list
 - Tried to remove unnecessary explanatory comments in tutorial (and
   added more explanatory comments in sample code)

There's still the question of how the samples should be
source-controlled and kept fresh. I think we left it at "Maybe checking
in a stack of *.patch files would be tolerable", but it didn't sound
like anybody was quite happy with that solution.

 - Emily

 Documentation/Makefile           |   1 +
 Documentation/MyFirstRevWalk.txt | 910 +++++++++++++++++++++++++++++++
 2 files changed, 911 insertions(+)
 create mode 100644 Documentation/MyFirstRevWalk.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 76f2ecfc1b..91e5da67c4 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -78,6 +78,7 @@ SP_ARTICLES += $(API_DOCS)
 
 TECH_DOCS += MyFirstContribution
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += MyFirstRevWalk
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
diff --git a/Documentation/MyFirstRevWalk.txt b/Documentation/MyFirstRevWalk.txt
new file mode 100644
index 0000000000..ea2f16c4b1
--- /dev/null
+++ b/Documentation/MyFirstRevWalk.txt
@@ -0,0 +1,910 @@
+My First Revision Walk
+======================
+
+== What's a Revision Walk?
+
+The revision walk is a key concept in Git - this is the process that underpins
+operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the
+list of objects is found by walking parent relationships between objects. The
+revision walk can also be used to determine whether or not a given object is
+reachable from the current HEAD pointer.
+
+=== Related Reading
+
+- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
+  the revision walker in its various incarnations.
+- `Documentation/technical/api-revision-walking.txt`
+- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
+  gives a good overview of the types of objects in Git and what your revision
+  walk is really describing.
+
+== Setting Up
+
+Create a new branch from `master`.
+
+----
+git checkout -b revwalk origin/master
+----
+
+We'll put our fiddling into a new command. For fun, let's name it `git walken`.
+Open up a new file `builtin/walken.c` and set up the command handler:
+
+----
+/*
+ * "git walken"
+ *
+ * Part of the "My First Revision Walk" tutorial.
+ */
+
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	trace_printf(_("cmd_walken incoming...\n"));
+	return 0;
+}
+----
+
+NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
+off at runtime. For the purposes of this tutorial, we will write `walken` as
+though it is intended for use as a "plumbing" command: that is, a command which
+is used primarily in scripts, rather than interactively by humans (a "porcelain"
+command). So we will send our debug output to `trace_printf()` instead. When
+running, enable trace output by setting the environment variable `GIT_TRACE`.
+
+Add usage text and `-h` handling, like all subcommands should consistently do
+(our test suite will notice and complain if you fail to do so).
+
+----
+const char * const walken_usage[] = {
+	N_("git walken"),
+	NULL,
+}
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	struct option options[] = {
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
+
+	...
+}
+----
+
+Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
+
+----
+extern int cmd_walken(int argc, const char **argv, const char *prefix);
+----
+
+Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
+maintaining alphabetical ordering:
+
+----
+{ "walken", cmd_walken, RUN_SETUP },
+----
+
+Add it to the `Makefile` near the line for `builtin\worktree.o`:
+
+----
+BUILTIN_OBJS += builtin/walken.o
+----
+
+Build and test out your command, without forgetting to ensure the `DEVELOPER`
+flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
+
+----
+$ echo DEVELOPER=1 >>config.mak
+$ make
+$ GIT_TRACE=1 ./bin-wrappers/git walken
+----
+
+NOTE: For a more exhaustive overview of the new command process, take a look at
+`Documentation/MyFirstContribution.txt`.
+
+NOTE: A reference implementation can be found at TODO LINK.
+
+=== `struct rev_cmdline_info`
+
+The definition of `struct rev_cmdline_info` can be found in `revision.h`.
+
+This struct is contained within the `rev_info` struct and is used to reflect
+parameters provided by the user over the CLI.
+
+`nr` represents the number of `rev_cmdline_entry` present in the array.
+
+`alloc` is used by the `ALLOC_GROW` macro. Check
+`Documentation/technical/api-allocation-growing.txt` - this variable is used to
+track the allocated size of the list.
+
+Per entry, we find:
+
+`item` is the object provided upon which to base the revision walk. Items in Git
+can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
+
+`name` is the object ID (OID) of the object - a hex string you may be familiar
+with from using Git to organize your source in the past. Check the tutorial
+mentioned above towards the top for a discussion of where the OID can come
+from.
+
+`whence` indicates some information about what to do with the parents of the
+specified object. We'll explore this flag more later on; take a look at
+`Documentation/revisions.txt` to get an idea of what could set the `whence`
+value.
+
+`flags` are used to hint the beginning of the revision walk and are the first
+block under the `#include`s in `revision.h`. The most likely ones to be set in
+the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
+can be used during the walk, as well.
+
+=== `struct rev_info`
+
+This one is quite a bit longer, and many fields are only used during the walk
+by `revision.c` - not configuration options. Most of the configurable flags in
+`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
+good idea to take some time and read through that document.
+
+== Basic Commit Walk
+
+First, let's see if we can replicate the output of `git log --oneline`. We'll
+refer back to the implementation frequently to discover norms when performing
+a revision walk of our own.
+
+To do so, we'll first find all the commits, in order, which preceded the current
+commit. We'll extract the name and subject of the commit from each.
+
+Ideally, we will also be able to find out which ones are currently at the tip of
+various branches.
+
+=== Setting Up
+
+Preparing for your revision walk has some distinct stages.
+
+1. Perform default setup for this mode, and others which may be invoked.
+2. Check configuration files for relevant settings.
+3. Set up the `rev_info` struct.
+4. Tweak the initialized `rev_info` to suit the current walk.
+5. Prepare the `rev_info` for the walk.
+6. Iterate over the objects, processing each one.
+
+==== Default Setups
+
+Before examining configuration files which may modify command behavior, set up
+default state for switches or options your command may have. If your command
+utilizes other Git components, ask them to set up their default states as well.
+For instance, `git log` takes advantage of `grep` and `diff` functionality, so
+its `init_log_defaults()` sets its own state (`decoration_style`) and asks
+`grep` and `diff` to initialize themselves by calling each of their
+initialization functions.
+
+For our purposes, within `git walken`, for the first example we don't intend to
+use any other components within Git, and we don't have any configuration to do.
+However, we may want to add some later, so for now, we can add an empty
+placeholder. Create a new function in `builtin/walken.c`:
+
+----
+static void init_walken_defaults(void)
+{
+	/*
+	 * We don't actually need the same components `git log` does; leave this
+	 * empty for now.
+	 */
+}
+----
+
+Make sure to add a line invoking it inside of `cmd_walken()`.
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	init_walken_defaults();
+}
+----
+
+==== Configuring From `.gitconfig`
+
+Next, we should have a look at any relevant configuration settings (i.e.,
+settings readable and settable from `git config`). This is done by providing a
+callback to `git_config()`; within that callback, you can also invoke methods
+from other components you may need that need to intercept these options. Your
+callback will be invoked once per each configuration value which Git knows about
+(global, local, worktree, etc.).
+
+Similarly to the default values, we don't have anything to do here yet
+ourselves; however, we should call `git_default_config()` if we aren't calling
+any other existing config callbacks.
+
+Add a new function to `builtin/walken.c`:
+
+----
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	/*
+	 * For now, we don't have any custom configuration, so fall back to
+	 * the default config.
+	 */
+	return git_default_config(var, value, cb);
+}
+----
+
+Make sure to invoke `git_config()` with it in your `cmd_walken()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	git_config(git_walken_config, NULL);
+
+	...
+}
+----
+
+// TODO: Checking CLI options
+
+==== Setting Up `rev_info`
+
+Now that we've gathered external configuration and options, it's time to
+initialize the `rev_info` object which we will use to perform the walk. This is
+typically done by calling `repo_init_revisions()` with the repository you intend
+to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
+struct.
+
+Add the `struct rev_info` and the `repo_init_revisions()` call:
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	/* This can go wherever you like in your declarations.*/
+	struct rev_info rev;
+	...
+
+	/* This should go after the git_config() call. */
+	repo_init_revisions(the_repository, &rev, prefix);
+
+	...
+}
+----
+
+==== Tweaking `rev_info` For the Walk
+
+We're getting close, but we're still not quite ready to go. Now that `rev` is
+initialized, we can modify it to fit our needs. This is usually done within a
+helper for clarity, so let's add one:
+
+----
+static void final_rev_info_setup(struct rev_info *rev)
+{
+	/*
+	 * We want to mimic the appearance of `git log --oneline`, so let's
+	 * force oneline format.
+	 */
+	get_commit_format("oneline", rev);
+
+	/* Start our revision walk at HEAD. */
+	add_head_to_pending(rev);
+}
+----
+
+[NOTE]
+====
+Instead of using the shorthand `add_head_to_pending()`, you could do
+something like this:
+----
+	struct setup_revision_opt opt;
+
+	memset(&opt, 0, sizeof(opt));
+	opt.def = "HEAD";
+	opt.revarg_opt = REVARG_COMMITTISH;
+	setup_revisions(argc, argv, rev, &opt);
+----
+Using a `setup_revision_opt` gives you finer control over your walk's starting
+point.
+====
+
+Then let's invoke `final_rev_info_setup()` after the call to
+`repo_init_revisions()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	final_rev_info_setup(&rev);
+
+	...
+}
+----
+
+Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
+now, this is all we need.
+
+==== Preparing `rev_info` For the Walk
+
+Now that `rev` is all initialized and configured, we've got one more setup step
+before we get rolling. We can do this in a helper, which will both prepare the
+`rev_info` for the walk, and perform the walk itself. Let's start the helper
+with the call to `prepare_revision_walk()`, which can return an error without
+dying on its own:
+
+----
+static void walken_commit_walk(struct rev_info *rev)
+{
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+}
+----
+
+NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
+`stderr` it's likely to be seen by a human, so we will localize it.
+
+==== Performing the Walk!
+
+Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
+can also be used as an iterator; we move to the next item in the walk by using
+`get_revision()` repeatedly. Add the listed variable declarations at the top and
+the walk loop below the `prepare_revision_walk()` call within your
+`walken_commit_walk()`:
+
+----
+static void walken_commit_walk(struct rev_info *rev)
+{
+	struct commit *commit;
+	struct strbuf prettybuf = STRBUF_INIT;
+
+	...
+
+	while ((commit = get_revision(rev))) {
+		if (!commit)
+			continue;
+
+		strbuf_reset(&prettybuf);
+		pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
+		puts(prettybuf.buf);
+	}
+}
+----
+
+NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
+command we expect to be machine-parsed, we're sending it directly to stdout.
+
+Give it a shot.
+
+----
+$ make
+$ ./bin-wrappers/git walken
+----
+
+You should see all of the subject lines of all the commits in
+your tree's history, in order, ending with the initial commit, "Initial revision
+of "git", the information manager from hell". Congratulations! You've written
+your first revision walk. You can play with printing some additional fields
+from each commit if you're curious; have a look at the functions available in
+`commit.h`.
+
+=== Adding a Filter
+
+Next, let's try to filter the commits we see based on their author. This is
+equivalent to running `git log --author=<pattern>`. We can add a filter by
+modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
+
+First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add
+`grep_config()` to `git_walken_config()`:
+
+----
+static void init_walken_defaults(void)
+{
+	init_grep_defaults(the_repository);
+}
+
+...
+
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	grep_config(var, value, cb);
+	return git_default_config(var, value, cb);
+}
+----
+
+Next, we can modify the `grep_filter`. This is done with convenience functions
+found in `grep.h`. For fun, we're filtering to only commits from folks using a
+`gmail.com` email address - a not-very-precise guess at who may be working on
+Git as a hobby. Since we're checking the author, which is a specific line in the
+header, we'll use the `append_header_grep_pattern()` helper. We can use
+the `enum grep_header_field` to indicate which part of the commit header we want
+to search.
+
+In `final_rev_info_setup()`, add your filter line:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
+		"gmail");
+	compile_grep_patterns(&rev->grep_filter);
+
+	...
+}
+----
+
+`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
+it won't work unless we compile it with `compile_grep_patterns()`.
+
+NOTE: If you are using `setup_revisions()` (for example, if you are passing a
+`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
+to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
+
+NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
+wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
+`enum grep_pat_token` for us.
+
+=== Changing the Order
+
+There are a few ways that we can change the order of the commits during a
+revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
+sane orderings.
+
+`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
+before all of its children have been shown, and we avoid mixing commits which
+are in different lines of history. (`git help log`'s section on `--topo-order`
+has a very nice diagram to illustrate this.)
+
+Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
+`REV_SORT_BY_AUTHOR_DATE`. Add the following:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_COMMIT_DATE;
+
+	...
+}
+----
+
+Let's output this into a file so we can easily diff it with the walk sorted by
+author date.
+
+----
+$ make
+$ ./bin-wrappers/git walken > commit-date.txt
+----
+
+Then, let's sort by author date and run it again.
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
+
+	...
+}
+----
+
+----
+$ make
+$ ./bin-wrappers/git walken > author-date.txt
+----
+
+Finally, compare the two. This is a little less helpful without object names or
+dates, but hopefully we get the idea.
+
+----
+$ diff -u commit-date.txt author-date.txt
+----
+
+This display indicates that commits can be reordered after they're written, for
+example with `git rebase`.
+
+Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
+Set that flag somewhere inside of `final_rev_info_setup()`:
+
+----
+static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
+		struct rev_info *rev)
+{
+	...
+
+	rev->reverse = 1;
+
+	...
+}
+----
+
+Run your walk again and note the difference in order. (If you remove the grep
+pattern, you should see the last commit this call gives you as your current
+HEAD.)
+
+== Basic Object Walk
+
+So far we've been walking only commits. But Git has more types of objects than
+that! Let's see if we can walk _all_ objects, and find out some information
+about each one.
+
+We can base our work on an example. `git pack-objects` prepares all kinds of
+objects for packing into a bitmap or packfile. The work we are interested in
+resides in `builtins/pack-objects.c:get_object_list()`; examination of that
+function shows that the all-object walk is being performed by
+`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
+functions reside in `list-objects.c`; examining the source shows that, despite
+the name, these functions traverse all kinds of objects. Let's have a look at
+the arguments to `traverse_commit_list_filtered()`, which are a superset of the
+arguments to the unfiltered version.
+
+- `struct list_objects_filter_options *filter_options`: This is a struct which
+  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
+- `struct rev_info *revs`: This is the `rev_info` used for the walk.
+- `show_commit_fn show_commit`: A callback which will be used to handle each
+  individual commit object.
+- `show_object_fn show_object`: A callback which will be used to handle each
+  non-commit object (so each blob, tree, or tag).
+- `void *show_data`: A context buffer which is passed in turn to `show_commit`
+  and `show_object`.
+- `struct oidset *omitted`: A linked-list of object IDs which the provided
+  filter caused to be omitted.
+
+It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
+instead of needing us to call it repeatedly ourselves. Cool! Let's add the
+callbacks first.
+
+For the sake of this tutorial, we'll simply keep track of how many of each kind
+of object we find. At file scope in `builtin/walken.c` add the following
+tracking variables:
+
+----
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+----
+
+Commits are handled by a different callback than other objects; let's do that
+one first:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	commit_count++;
+}
+----
+
+The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
+the `buf` argument is actually the context buffer that we can provide to the
+traversal calls - `show_data`, which we mentioned a moment ago.
+
+Since we have the `struct commit` object, we can look at all the same parts that
+we looked at in our earlier commit-only walk. For the sake of this tutorial,
+though, we'll just increment the commit counter and move on.
+
+The callback for non-commits is a little different, as we'll need to check
+which kind of object we're dealing with:
+
+----
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	switch (obj->type) {
+	case OBJ_TREE:
+		tree_count++;
+		break;
+	case OBJ_BLOB:
+		blob_count++;
+		break;
+	case OBJ_TAG:
+		tag_count++;
+		break;
+	case OBJ_COMMIT:
+		die("unexpectedly encountered a commit in "
+			 "walken_show_object\n");
+		commit_count++;
+		break;
+	default:
+		die("unexpected object type %s\n", type_name(obj->type));
+		break;
+	}
+}
+----
+
+Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
+context pointer that `walken_show_commit()` receives: the `show_data` argument
+to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
+`str` contains the name of the object, which ends up being something like
+`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
+
+To help assure us that we aren't double-counting commits, we'll include some
+complaining if a commit object is routed through our non-commit callback; we'll
+also complain if we see an invalid object type.
+
+Our main object walk implementation is substantially different from our commit
+walk implementation, so let's make a new function to perform the object walk. We
+can perform setup which is applicable to all objects here, too, to keep separate
+from setup which is applicable to commit-only walks.
+
+We'll start by enabling all types of objects in the `struct rev_info`. Unless
+you cloned or fetched your repository earlier with a filter,
+`exclude_promisor_objects` is unlikely to make a difference, but we'll turn it
+on just to make sure our lives are simple. We'll also turn on
+`tree_blobs_in_commit_order`, which means that we will walk a commit's tree and
+everything it points to immediately after we find each commit, as opposed to
+waiting for the end and walking through all trees after the commit history has
+been discovered. With the appropriate settings configured, we are ready to call
+`prepare_revision_walk()`.
+
+----
+static void walken_object_walk(struct rev_info *rev)
+{
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+	rev->exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+
+	commit_count = 0;
+	tag_count = 0;
+	blob_count = 0;
+	tree_count = 0;
+----
+
+Let's start by calling just the unfiltered walk and reporting our counts.
+Complete your implementation of `walken_object_walk()`:
+
+----
+	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+	printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
+		blob_count, tag_count, tree_count);
+}
+----
+
+NOTE: This output is intended to be machine-parsed. Therefore, we are not
+sending it to `trace_printf()`, and we are not localizing it - we need scripts
+to be able to count on the formatting to be exactly the way it is shown here.
+If we were intending this output to be read by humans, we would need to localize
+it with `_()`.
+
+Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
+command line options is out of scope for this tutorial, so we'll just hardcode
+a branch we can change at compile time. Where you call `final_rev_info_setup()`
+and `walken_commit_walk()`, instead branch like so:
+
+----
+	if (1) {
+		add_head_to_pending(&rev);
+		walken_object_walk(&rev);
+	} else {
+		final_rev_info_setup(argc, argv, prefix, &rev);
+		walken_commit_walk(&rev);
+	}
+----
+
+NOTE: For simplicity, we've avoided all the filters and sorts we applied in
+`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
+want, you can certainly use the filters we added before by moving
+`final_rev_info_setup()` out of the conditional and removing the call to
+`add_head_to_pending()`.
+
+Now we can try to run our command! It should take noticeably longer than the
+commit walk, but an examination of the output will give you an idea why. Your
+output should look similar to this example, but with different counts:
+
+----
+Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
+----
+
+This makes sense. We have more trees than commits because the Git project has
+lots of subdirectories which can change, plus at least one tree per commit. We
+have no tags because we started on a commit (`HEAD`) and while tags can point to
+commits, commits can't point to tags.
+
+NOTE: You will have different counts when you run this yourself! The number of
+objects grows along with the Git project.
+
+=== Adding a Filter
+
+There are a handful of filters that we can apply to the object walk laid out in
+`Documentation/rev-list-options.txt`. These filters are typically useful for
+operations such as creating packfiles or performing a partial clone. They are
+defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
+will use the "tree:1" filter, which causes the walk to omit all trees and blobs
+which are not directly referenced by commits reachable from the commit in
+`pending` when the walk begins. (`pending` is the list of objects which need to
+be traversed during a walk; you can imagine a breadth-first tree traversal to
+help understand. In our case, that means we omit trees and blobs not directly
+referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
+`HEAD` in the `pending` list.)
+
+First, we'll need to `#include "list-objects-filter-options.h`" and set up the
+`struct list_objects_filter_options` at the top of the function.
+
+----
+static void walken_object_walk(struct rev_info *rev)
+{
+	struct list_objects_filter_options filter_options = {};
+
+	...
+----
+
+For now, we are not going to track the omitted objects, so we'll replace those
+parameters with `NULL`. For the sake of simplicity, we'll add a simple
+build-time branch to use our filter or not. Replace the line calling
+`traverse_commit_list()` with the following, which will remind us which kind of
+walk we've just performed:
+
+----
+	if (1) {
+		/* Unfiltered: */
+		trace_printf(_("Unfiltered object walk.\n"));
+		traverse_commit_list(rev, walken_show_commit,
+				walken_show_object, NULL);
+	} else {
+		trace_printf(
+			_("Filtered object walk with filterspec 'tree:1'.\n"));
+		/*
+		 * We can parse a tree depth of 1 to demonstrate the kind of
+		 * filtering that could occur eg during shallow cloning.
+		 */
+		parse_list_objects_filter(&filter_options, "tree:1");
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, NULL);
+	}
+----
+
+`struct list_objects_filter_options` is usually built directly from a command
+line argument, so the module provides an easy way to build one from a string.
+Even though we aren't taking user input right now, we can still build one with
+a hardcoded string using `parse_list_objects_filter()`.
+
+With the filter spec "tree:1", we are expecting to see _only_ the root tree for
+each commit; therefore, the tree object count should be less than or equal to
+the number of commits. (For an example of why that's true: `git commit --revert`
+points to the same tree object as its grandparent.)
+
+=== Counting Omitted Objects
+
+We also have the capability to enumerate all objects which were omitted by a
+filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
+`traverse_commit_list_filtered()` to populate the `omitted` list means that our
+revision walk does not perform any better than an unfiltered revision walk; all
+reachable objects are walked in order to populate the list.
+
+First, add the `struct oidset` and related items we will use to iterate it:
+
+----
+static void walken_object_walk(
+	...
+
+	struct oidset omitted;
+	struct oidset_iter oit;
+	struct object_id *oid = NULL;
+	int omitted_count = 0;
+	oidset_init(&omitted, 0);
+
+	...
+----
+
+Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
+object:
+
+----
+	...
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, &omitted);
+
+	...
+----
+
+Then, after your traversal, the `oidset` traversal is pretty straightforward.
+Count all the objects within and modify the print statement:
+
+----
+	/* Count the omitted objects. */
+	oidset_iter_init(&omitted, &oit);
+
+	while ((oid = oidset_iter_next(&oit)))
+		omitted_count++;
+
+	printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n",
+		commit_count, blob_count, tag_count, tree_count, omitted_count);
+----
+
+By running your walk with and without the filter, you should find that the total
+object count in each case is identical. You can also time each invocation of
+the `walken` subcommand, with and without `omitted` being passed in, to confirm
+to yourself the runtime impact of tracking all omitted objects.
+
+=== Changing the Order
+
+Finally, let's demonstrate that you can also reorder walks of all objects, not
+just walks of commits. First, we'll make our handlers chattier - modify
+`walken_show_commit()` and `walken_show_object()` to print the object as they
+go:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
+	commit_count++;
+}
+
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
+
+	...
+}
+----
+
+NOTE: Since we will be examining this output directly as humans, we'll use
+`trace_printf()` here. Additionally, since this change introduces a significant
+number of printed lines, using `trace_printf()` will allow us to easily silence
+those lines without having to recompile.
+
+(Leave the counter increment logic in place.)
+
+With only that change, run again (but save yourself some scrollback):
+
+----
+$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
+----
+
+Take a look at the top commit with `git show` and the object ID you printed; it
+should be the same as the output of `git show HEAD`.
+
+Next, let's change a setting on our `struct rev_info` within
+`walken_object_walk()`. Find where you're changing the other settings on `rev`,
+such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
+`reverse` setting at the bottom:
+
+----
+	...
+
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+	rev->exclude_promisor_objects = 1;
+	rev->reverse = 1;
+
+	...
+----
+
+Now, run again, but this time, let's grab the last handful of objects instead
+of the first handful:
+
+----
+$ make
+$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
+----
+
+The last commit object given should have the same OID as the one we saw at the
+top before, and running `git show <oid>` with that OID should give you again
+the same results as `git show HEAD`. Furthermore, if you run and examine the
+first ten lines again (with `head` instead of `tail` like we did before applying
+the `reverse` setting), you should see that now the first commit printed is the
+initial commit, `e83c5163`.
+
+== Wrapping Up
+
+Let's review. In this tutorial, we:
+
+- Built a commit walk from the ground up
+- Enabled a grep filter for that commit walk
+- Changed the sort order of that filtered commit walk
+- Built an object walk (tags, commits, trees, and blobs) from the ground up
+- Learned how to add a filter-spec to an object walk
+- Changed the display order of the filtered object walk
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 00/13] example implementation of revwalk tutorial
  2019-06-26 23:49 ` [PATCH v2] " Emily Shaffer
@ 2019-06-26 23:50   ` Emily Shaffer
  2019-06-26 23:50     ` [RFC PATCH v2 01/13] walken: add infrastructure for revwalk demo Emily Shaffer
                       ` (13 more replies)
  2019-07-01 20:19   ` [PATCH v3] documentation: add tutorial for revision walking Emily Shaffer
  1 sibling, 14 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Jeff Hostetler

Since r1, made some significant changes.

 - Added a commit for counting the 'omitted' list, to match the new
   section added to the tutorial.
 - Added significant comments to allow the sample to better stand on its
   own.
 - Fixed style issues (die() formatting, etc)
 - Distinguished between human- and machine-readable output with
   trace_printf() and printf(), to turn the command into plumbing.
 - More changes as mentioned in the tutorial patch.

Thanks.
 - Emily

Emily Shaffer (13):
  walken: add infrastructure for revwalk demo
  walken: add usage to enable -h
  walken: add placeholder to initialize defaults
  walken: add handler to git_config
  walken: configure rev_info and prepare for walk
  walken: perform our basic revision walk
  walken: filter for authors from gmail address
  walken: demonstrate various topographical sorts
  walken: demonstrate reversing a revision walk list
  walken: add unfiltered object walk from HEAD
  walken: add filtered object walk
  walken: count omitted objects
  walken: reverse the object walk order

 Makefile         |   1 +
 builtin.h        |   1 +
 builtin/walken.c | 290 +++++++++++++++++++++++++++++++++++++++++++++++
 git.c            |   1 +
 4 files changed, 293 insertions(+)
 create mode 100644 builtin/walken.c

-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 01/13] walken: add infrastructure for revwalk demo
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-26 23:50     ` [RFC PATCH v2 02/13] walken: add usage to enable -h Emily Shaffer
                       ` (12 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Begin to add scaffolding for `git walken`, a toy command which we will
teach to perform a number of revision walks, in order to demonstrate the
mechanics of revision walking for developers new to the Git project.

This commit is the beginning of an educational series which correspond
to the tutorial in Documentation/MyFirstRevWalk.txt.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I64297621919412f54701e111366e99c4ef0feae3
---
 Makefile         |  1 +
 builtin.h        |  1 +
 builtin/walken.c | 13 +++++++++++++
 3 files changed, 15 insertions(+)
 create mode 100644 builtin/walken.c

diff --git a/Makefile b/Makefile
index f58bf14c7b..5bac1dbf8d 100644
--- a/Makefile
+++ b/Makefile
@@ -1137,6 +1137,7 @@ BUILTIN_OBJS += builtin/var.o
 BUILTIN_OBJS += builtin/verify-commit.o
 BUILTIN_OBJS += builtin/verify-pack.o
 BUILTIN_OBJS += builtin/verify-tag.o
+BUILTIN_OBJS += builtin/walken.o
 BUILTIN_OBJS += builtin/worktree.o
 BUILTIN_OBJS += builtin/write-tree.o
 
diff --git a/builtin.h b/builtin.h
index ec7e0954c4..c919736c36 100644
--- a/builtin.h
+++ b/builtin.h
@@ -242,6 +242,7 @@ int cmd_var(int argc, const char **argv, const char *prefix);
 int cmd_verify_commit(int argc, const char **argv, const char *prefix);
 int cmd_verify_tag(int argc, const char **argv, const char *prefix);
 int cmd_version(int argc, const char **argv, const char *prefix);
+int cmd_walken(int argc, const char **argv, const char *prefix);
 int cmd_whatchanged(int argc, const char **argv, const char *prefix);
 int cmd_worktree(int argc, const char **argv, const char *prefix);
 int cmd_write_tree(int argc, const char **argv, const char *prefix);
diff --git a/builtin/walken.c b/builtin/walken.c
new file mode 100644
index 0000000000..d2772a0e8f
--- /dev/null
+++ b/builtin/walken.c
@@ -0,0 +1,13 @@
+/*
+ * "git walken"
+ *
+ * Part of the "My First Revision Walk" tutorial.
+ */
+
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	trace_printf(_("cmd_walken incoming...\n"));
+	return 0;
+}
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 02/13] walken: add usage to enable -h
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
  2019-06-26 23:50     ` [RFC PATCH v2 01/13] walken: add infrastructure for revwalk demo Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-27  4:47       ` Eric Sunshine
  2019-06-27  4:50       ` Eric Sunshine
  2019-06-26 23:50     ` [RFC PATCH v2 03/13] walken: add placeholder to initialize defaults Emily Shaffer
                       ` (11 subsequent siblings)
  13 siblings, 2 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

It's expected that Git commands support '-h' in order to provide a
consistent user experience (and this expectation is enforced by the
test suite). '-h' is captured by parse_options() by default; in order to
support this flag, we add a short usage text to walken.c and invoke
parse_options().

With this change, we can now add cmd_walken to the builtins set and
expect tests to pass, so we'll do so - cmd_walken is now open for
business.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I2919dc1efadb82acb335617ea24371c84b03bbce
---
 builtin/walken.c | 25 +++++++++++++++++++++++++
 git.c            |  1 +
 2 files changed, 26 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index d2772a0e8f..9eea51ff70 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -5,9 +5,34 @@
  */
 
 #include "builtin.h"
+#include "parse-options.h"
+
+/*
+ * All builtins are expected to provide a usage to provide a consistent user
+ * experience.
+ */
+const char * const walken_usage[] = {
+	N_("git walken"),
+	NULL,
+};
 
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
+	struct option options[] = {
+		OPT_END()
+	};
+
+	/*
+	 * parse_options() handles showing usage if incorrect options are
+	 * provided, or if '-h' is passed.
+	 */
+	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
+
+	/*
+	 * This line is "human-readable" and we are writing a plumbing command,
+	 * so we localize it and use the trace library to print only when
+	 * the GIT_TRACE environment variable is set.
+	 */
 	trace_printf(_("cmd_walken incoming...\n"));
 	return 0;
 }
diff --git a/git.c b/git.c
index c2eec470c9..2a7fb9714f 100644
--- a/git.c
+++ b/git.c
@@ -601,6 +601,7 @@ static struct cmd_struct commands[] = {
 	{ "verify-pack", cmd_verify_pack },
 	{ "verify-tag", cmd_verify_tag, RUN_SETUP },
 	{ "version", cmd_version },
+	{ "walken", cmd_walken, RUN_SETUP },
 	{ "whatchanged", cmd_whatchanged, RUN_SETUP },
 	{ "worktree", cmd_worktree, RUN_SETUP | NO_PARSEOPT },
 	{ "write-tree", cmd_write_tree, RUN_SETUP },
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 03/13] walken: add placeholder to initialize defaults
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
  2019-06-26 23:50     ` [RFC PATCH v2 01/13] walken: add infrastructure for revwalk demo Emily Shaffer
  2019-06-26 23:50     ` [RFC PATCH v2 02/13] walken: add usage to enable -h Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-26 23:50     ` [RFC PATCH v2 04/13] walken: add handler to git_config Emily Shaffer
                       ` (10 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Eventually, we will want a good place to initialize default variables
for use during our revision walk(s) in `git walken`. For now, there's
nothing to do here, but let's add the scaffolding so that it's easy to
tell where to put the setup later on.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 9eea51ff70..daae4f811a 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -16,6 +16,19 @@ const char * const walken_usage[] = {
 	NULL,
 };
 
+/*
+ * Within init_walken_defaults() we can call into other useful defaults to set
+ * in the global scope or on the_repository. It's okay to borrow from other
+ * functions which are doing something relatively similar to yours.
+ */
+static void init_walken_defaults(void)
+{
+	/*
+	 * We don't actually need the same components `git log` does; leave this
+	 * empty for now.
+	 */
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
 	struct option options[] = {
@@ -28,6 +41,8 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 	 */
 	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
 
+	init_walken_defaults();
+
 	/*
 	 * This line is "human-readable" and we are writing a plumbing command,
 	 * so we localize it and use the trace library to print only when
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 04/13] walken: add handler to git_config
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
                       ` (2 preceding siblings ...)
  2019-06-26 23:50     ` [RFC PATCH v2 03/13] walken: add placeholder to initialize defaults Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-27  4:54       ` Eric Sunshine
  2019-06-26 23:50     ` [RFC PATCH v2 05/13] walken: configure rev_info and prepare for walk Emily Shaffer
                       ` (9 subsequent siblings)
  13 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

For now, we have no configuration options we want to set up for
ourselves, but in the future we may need to. At the very least, we
should invoke git_default_config() for each config option; we will do so
inside of a skeleton config callback so that we know where to add
configuration handling later on when we need it.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index daae4f811a..2474a0d7b2 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -5,6 +5,7 @@
  */
 
 #include "builtin.h"
+#include "config.h"
 #include "parse-options.h"
 
 /*
@@ -24,11 +25,36 @@ const char * const walken_usage[] = {
 static void init_walken_defaults(void)
 {
 	/*
-	 * We don't actually need the same components `git log` does; leave this
-	 * empty for now.
+	 * We don't use any other components or have settings to initialize, so
+	 * leave this empty.
 	 */
 }
 
+/*
+ * This method will be called back by git_config(). It is used to gather values
+ * from the configuration files available to Git.
+ *
+ * Each time git_config() finds a configuration file entry, it calls this
+ * callback. Then, this function should compare it to entries which concern us,
+ * and make settings changes as necessary.
+ *
+ * If we are called with a config setting we care about, we should use one of
+ * the helpers which exist in config.h to pull out the value for ourselves, i.e.
+ * git_config_string(...) or git_config_bool(...).
+ *
+ * If we don't match anything, we should pass it along to another stakeholder
+ * who may otherwise care - in log's case, grep, gpg, and diff-ui. For our case,
+ * we'll ignore everybody else.
+ */
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	/*
+	 * For now, we don't have any custom configuration, so fall back on the
+	 * default config.
+	 */
+	return git_default_config(var, value, cb);
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
 	struct option options[] = {
@@ -43,6 +69,8 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 
 	init_walken_defaults();
 
+	git_config(git_walken_config, NULL);
+
 	/*
 	 * This line is "human-readable" and we are writing a plumbing command,
 	 * so we localize it and use the trace library to print only when
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 05/13] walken: configure rev_info and prepare for walk
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
                       ` (3 preceding siblings ...)
  2019-06-26 23:50     ` [RFC PATCH v2 04/13] walken: add handler to git_config Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-27  5:06       ` Eric Sunshine
  2019-06-26 23:50     ` [RFC PATCH v2 06/13] walken: perform our basic revision walk Emily Shaffer
                       ` (8 subsequent siblings)
  13 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

`struct rev_info` is what's used by the struct itself.
`repo_init_revisions()` initializes the struct; then we need to set it
up for the walk we want to perform, which is done in
`final_rev_info_setup()`.

The most important step here is adding the first object we want to walk
to the pending array. Here, we take the easy road and use
`add_head_to_pending()`; there is also a way to do it with
`setup_revision_opt()` and `setup_revisions()` which we demonstrate but
do not use. If we were to forget this step, the walk would do nothing -
the pending queue would be checked, determined to be empty, and the walk
would terminate immediately.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I76754b740227cf17a449f3f536dbbe37031e6f9a
---
 builtin/walken.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 2474a0d7b2..c463eca843 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -5,6 +5,7 @@
  */
 
 #include "builtin.h"
+#include "revision.h"
 #include "config.h"
 #include "parse-options.h"
 
@@ -30,6 +31,40 @@ static void init_walken_defaults(void)
 	 */
 }
 
+/*
+ * cmd_log calls a second set of init after the repo_init_revisions call. We'll
+ * mirror those settings in post_repo_init_init.
+ */
+static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
+		struct rev_info *rev)
+{
+	/*
+	 * Optional:
+	 * setup_revision_opt is used to pass options to the setup_revisions()
+	 * call. It's got some special items for submodules and other types of
+	 * optimizations, but for now, we'll just point it to HEAD and call it
+	 * good. First we should make sure to reset it. This is useful for more
+	 * complicated stuff but a decent shortcut for the first pass is
+	 * add_head_to_pending().
+	 */
+
+	/*
+	 * struct setup_revision_opt opt;
+
+	 * memset(&opt, 0, sizeof(opt));
+	 * opt.def = "HEAD";
+	 * opt.revarg_opt = REVARG_COMMITTISH;
+	 * setup_revisions(argc, argv, rev, &opt);
+	 */
+
+	/* Let's force oneline format. */
+	get_commit_format("oneline", rev);
+	rev->verbose_header = 1;
+
+	/* add the HEAD to pending so we can start */
+	add_head_to_pending(rev);
+}
+
 /*
  * This method will be called back by git_config(). It is used to gather values
  * from the configuration files available to Git.
@@ -61,6 +96,8 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 		OPT_END()
 	};
 
+	struct rev_info rev;
+
 	/*
 	 * parse_options() handles showing usage if incorrect options are
 	 * provided, or if '-h' is passed.
@@ -71,6 +108,19 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 
 	git_config(git_walken_config, NULL);
 
+	/*
+	 * Time to set up the walk. repo_init_revisions sets up rev_info with
+	 * the defaults, but then you need to make some configuration settings
+	 * to make it do what's special about your walk.
+	 */
+	repo_init_revisions(the_repository, &rev, prefix);
+
+	/*
+	 * Before we do the walk, we need to set a starting point by giving it
+	 * something to go in `pending` - that happens in here
+	 */
+	final_rev_info_setup(argc, argv, prefix, &rev);
+
 	/*
 	 * This line is "human-readable" and we are writing a plumbing command,
 	 * so we localize it and use the trace library to print only when
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 06/13] walken: perform our basic revision walk
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
                       ` (4 preceding siblings ...)
  2019-06-26 23:50     ` [RFC PATCH v2 05/13] walken: configure rev_info and prepare for walk Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-27  5:16       ` Eric Sunshine
  2019-06-26 23:50     ` [RFC PATCH v2 07/13] walken: filter for authors from gmail address Emily Shaffer
                       ` (7 subsequent siblings)
  13 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Add the final steps needed and implement the walk loop itself. We add a
method walken_commit_walk() which performs the final setup to revision.c
and then iterates over commits from get_revision().

This basic walk only prints the subject line of each commit in the
history. It is nearly equivalent to `git log --oneline`.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: If6dc5f3c9d14df077b99e42806cf790c96191582
---
 builtin/walken.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index c463eca843..335dcb6b21 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -6,8 +6,11 @@
 
 #include "builtin.h"
 #include "revision.h"
+#include "commit.h"
 #include "config.h"
 #include "parse-options.h"
+#include "pretty.h"
+#include "line-log.h"
 
 /*
  * All builtins are expected to provide a usage to provide a consistent user
@@ -90,6 +93,41 @@ static int git_walken_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
+/*
+ * walken_commit_walk() is invoked by cmd_walken() after initialization. It
+ * does the commit walk only.
+ */
+static void walken_commit_walk(struct rev_info *rev)
+{
+	struct commit *commit;
+	struct strbuf prettybuf = STRBUF_INIT;
+
+	/*
+         * prepare_revision_walk() gets the final steps ready for a revision
+	 * walk. We check the return value for errors.
+         */
+	if (prepare_revision_walk(rev)) {
+		die(_("revision walk setup failed"));
+	}
+
+	/*
+         * Now we can start the real commit walk. get_revision grabs the next
+	 * revision based on the contents of rev.
+	 */
+	rev->diffopt.close_file = 0;
+	while ((commit = get_revision(rev))) {
+		if (!commit)
+			continue;
+		strbuf_reset(&prettybuf);
+		pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
+		/*
+		 * We expect this part of the output to be machine-parseable -
+		 * one commit message per line - so we must not localize it.
+		 */
+		puts(prettybuf.buf);
+	}
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
 	struct option options[] = {
@@ -115,12 +153,17 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 	 */
 	repo_init_revisions(the_repository, &rev, prefix);
 
+	/* We can set our traversal flags here. */
+	rev.always_show_header = 1;
+
 	/*
 	 * Before we do the walk, we need to set a starting point by giving it
 	 * something to go in `pending` - that happens in here
 	 */
 	final_rev_info_setup(argc, argv, prefix, &rev);
 
+	walken_commit_walk(&rev);
+
 	/*
 	 * This line is "human-readable" and we are writing a plumbing command,
 	 * so we localize it and use the trace library to print only when
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 07/13] walken: filter for authors from gmail address
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
                       ` (5 preceding siblings ...)
  2019-06-26 23:50     ` [RFC PATCH v2 06/13] walken: perform our basic revision walk Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-27  5:20       ` Eric Sunshine
  2019-06-26 23:50     ` [RFC PATCH v2 08/13] walken: demonstrate various topographical sorts Emily Shaffer
                       ` (6 subsequent siblings)
  13 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

In order to demonstrate how to create grep filters for revision walks,
filter the walk performed by cmd_walken() to print only commits which
are authored by someone with a gmail address.

This commit demonstrates how to append a grep pattern to a
rev_info.grep_filter, to teach new contributors how to create their own
more generalized grep filters during revision walks.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 335dcb6b21..da2d197914 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -11,6 +11,7 @@
 #include "parse-options.h"
 #include "pretty.h"
 #include "line-log.h"
+#include "grep.h"
 
 /*
  * All builtins are expected to provide a usage to provide a consistent user
@@ -28,10 +29,8 @@ const char * const walken_usage[] = {
  */
 static void init_walken_defaults(void)
 {
-	/*
-	 * We don't use any other components or have settings to initialize, so
-	 * leave this empty.
-	 */
+	/* Needed by our grep filter. */
+	init_grep_defaults(the_repository);
 }
 
 /*
@@ -60,6 +59,10 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
 	 * setup_revisions(argc, argv, rev, &opt);
 	 */
 
+	/* Add a grep pattern to the author line in the header. */
+	append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, "gmail");
+	compile_grep_patterns(&rev->grep_filter);
+
 	/* Let's force oneline format. */
 	get_commit_format("oneline", rev);
 	rev->verbose_header = 1;
@@ -86,10 +89,7 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
  */
 static int git_walken_config(const char *var, const char *value, void *cb)
 {
-	/*
-	 * For now, we don't have any custom configuration, so fall back on the
-	 * default config.
-	 */
+	grep_config(var, value, cb);
 	return git_default_config(var, value, cb);
 }
 
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 08/13] walken: demonstrate various topographical sorts
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
                       ` (6 preceding siblings ...)
  2019-06-26 23:50     ` [RFC PATCH v2 07/13] walken: filter for authors from gmail address Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-27  5:22       ` Eric Sunshine
  2019-06-26 23:50     ` [RFC PATCH v2 09/13] walken: demonstrate reversing a revision walk list Emily Shaffer
                       ` (5 subsequent siblings)
  13 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Order the revision walk by author or commit dates, to demonstrate how to
apply topo_sort to a revision walk.

While following the tutorial, new contributors are guided to run a walk
with each sort and compare the results.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I7ce2f3e8a77c42001293637ae209087afec4ce2c
---
 builtin/walken.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index da2d197914..6cc451324a 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -69,6 +69,13 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
 
 	/* add the HEAD to pending so we can start */
 	add_head_to_pending(rev);
+	
+	/* Let's play with the sort order. */
+	rev->topo_order = 1;
+
+	/* Toggle between these and observe the difference. */
+	rev->sort_order = REV_SORT_BY_COMMIT_DATE;
+	/* rev->sort_order = REV_SORT_BY_AUTHOR_DATE; */
 }
 
 /*
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 09/13] walken: demonstrate reversing a revision walk list
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
                       ` (7 preceding siblings ...)
  2019-06-26 23:50     ` [RFC PATCH v2 08/13] walken: demonstrate various topographical sorts Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-27  5:26       ` Eric Sunshine
  2019-06-26 23:50     ` [RFC PATCH v2 10/13] walken: add unfiltered object walk from HEAD Emily Shaffer
                       ` (4 subsequent siblings)
  13 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

The final installment in the tutorial about sorting revision walk
outputs. This commit reverses the commit list, so that we see newer
commits last (handy since we aren't using a pager).

It's important to note that rev->reverse needs to be set after
add_head_to_pending() or before setup_revisions(). (This is mentioned in
the accompanying tutorial.)

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 6cc451324a..958923c172 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -69,6 +69,9 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
 
 	/* add the HEAD to pending so we can start */
 	add_head_to_pending(rev);
+
+	/* Reverse the order */
+	rev->reverse = 1;
 	
 	/* Let's play with the sort order. */
 	rev->topo_order = 1;
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 10/13] walken: add unfiltered object walk from HEAD
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
                       ` (8 preceding siblings ...)
  2019-06-26 23:50     ` [RFC PATCH v2 09/13] walken: demonstrate reversing a revision walk list Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-27  5:37       ` Eric Sunshine
  2019-06-26 23:50     ` [RFC PATCH v2 11/13] walken: add filtered object walk Emily Shaffer
                       ` (3 subsequent siblings)
  13 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Provide a demonstration of a revision walk which traverses all types of
object, not just commits. This type of revision walk is used for
operations such as creating packfiles and performing fetches or clones,
so it's useful to teach new developers how it works. For starters, only
demonstrate the unfiltered version, as this will make the tutorial
easier to follow.

This commit is part of a tutorial on revision walking.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: If3b11652ba011b28d29b1c3984dac4a3f80a5f53
---
 builtin/walken.c | 88 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 79 insertions(+), 9 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 958923c172..42b23ba1ec 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -11,6 +11,7 @@
 #include "parse-options.h"
 #include "pretty.h"
 #include "line-log.h"
+#include "list-objects.h"
 #include "grep.h"
 
 /*
@@ -22,6 +23,11 @@ const char * const walken_usage[] = {
 	NULL,
 };
 
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+
 /*
  * Within init_walken_defaults() we can call into other useful defaults to set
  * in the global scope or on the_repository. It's okay to borrow from other
@@ -103,6 +109,65 @@ static int git_walken_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	commit_count++;
+}
+
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	switch (obj->type) {
+	case OBJ_TREE:
+		tree_count++;
+		break;
+	case OBJ_BLOB:
+		blob_count++;
+		break;
+	case OBJ_TAG:
+		tag_count++;
+		break;
+	case OBJ_COMMIT:
+		die(_("unexpectedly encountered a commit in "
+		      "walken_show_object\n"));
+		commit_count++;
+		break;
+	default:
+		die(_("unexpected object type %s\n"), type_name(obj->type));
+		break;
+	}
+}
+
+/*
+ * walken_object_walk() is invoked by cmd_walken() after initialization. It does
+ * a walk of all object types.
+ */
+static void walken_object_walk(struct rev_info *rev)
+{
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+	rev->exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+
+	commit_count = 0;
+	tag_count = 0;
+	blob_count = 0;
+	tree_count = 0;
+
+	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+	/*
+	 * This print statement is designed to be script-parseable. Script
+	 * authors will rely on the output not to change, so we will not
+	 * localize this string. It will go to stdout directly.
+	 */
+	printf("commits %d\n blobs %d\n tags %d\n trees %d\n", commit_count,
+	       blob_count, tag_count, tree_count);
+}
+
 /*
  * walken_commit_walk() is invoked by cmd_walken() after initialization. It
  * does the commit walk only.
@@ -113,15 +178,15 @@ static void walken_commit_walk(struct rev_info *rev)
 	struct strbuf prettybuf = STRBUF_INIT;
 
 	/*
-         * prepare_revision_walk() gets the final steps ready for a revision
+	 * prepare_revision_walk() gets the final steps ready for a revision
 	 * walk. We check the return value for errors.
-         */
+	 */
 	if (prepare_revision_walk(rev)) {
 		die(_("revision walk setup failed"));
 	}
 
 	/*
-         * Now we can start the real commit walk. get_revision grabs the next
+	 * Now we can start the real commit walk. get_revision grabs the next
 	 * revision based on the contents of rev.
 	 */
 	rev->diffopt.close_file = 0;
@@ -166,13 +231,18 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 	/* We can set our traversal flags here. */
 	rev.always_show_header = 1;
 
-	/*
-	 * Before we do the walk, we need to set a starting point by giving it
-	 * something to go in `pending` - that happens in here
-	 */
-	final_rev_info_setup(argc, argv, prefix, &rev);
 
-	walken_commit_walk(&rev);
+	if (1) {
+		add_head_to_pending(&rev);
+		walken_object_walk(&rev);
+	} else {
+		/*
+		 * Before we do the walk, we need to set a starting point by giving it
+		 * something to go in `pending` - that happens in here
+		 */
+		final_rev_info_setup(argc, argv, prefix, &rev);
+		walken_commit_walk(&rev);
+	}
 
 	/*
 	 * This line is "human-readable" and we are writing a plumbing command,
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 11/13] walken: add filtered object walk
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
                       ` (9 preceding siblings ...)
  2019-06-26 23:50     ` [RFC PATCH v2 10/13] walken: add unfiltered object walk from HEAD Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-27  5:42       ` Eric Sunshine
  2019-06-26 23:50     ` [RFC PATCH v2 12/13] walken: count omitted objects Emily Shaffer
                       ` (2 subsequent siblings)
  13 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Demonstrate how filter specs can be used when performing a revision walk
of all object types. In this case, tree depth is used. Contributors who
are following the revision walking tutorial will be encouraged to run
the revision walk with and without the filter in order to compare the
number of objects seen in each case.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I6d22ba153c1afbc780c261c47f1fa03ea478b5ed
---
 builtin/walken.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 42b23ba1ec..a744d042d8 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -12,6 +12,7 @@
 #include "pretty.h"
 #include "line-log.h"
 #include "list-objects.h"
+#include "list-objects-filter-options.h"
 #include "grep.h"
 
 /*
@@ -143,6 +144,10 @@ static void walken_show_object(struct object *obj, const char *str, void *buf)
  */
 static void walken_object_walk(struct rev_info *rev)
 {
+	struct list_objects_filter_options filter_options = {};
+
+	printf("walken_object_walk beginning...\n");
+
 	rev->tree_objects = 1;
 	rev->blob_objects = 1;
 	rev->tag_objects = 1;
@@ -157,7 +162,24 @@ static void walken_object_walk(struct rev_info *rev)
 	blob_count = 0;
 	tree_count = 0;
 
-	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+	if (1) {
+		/* Unfiltered: */
+		trace_printf(_("Unfiltered object walk.\n"));
+		traverse_commit_list(rev, walken_show_commit,
+				walken_show_object, NULL);
+	} else {
+		trace_printf(_("Filtered object walk with filterspec "
+				"'tree:1'.\n"));
+		/*
+		 * We can parse a tree depth of 1 to demonstrate the kind of
+		 * filtering that could occur during various operations (see
+		 * `git help rev-list` and read the entry on `--filter`).
+		 */
+		parse_list_objects_filter(&filter_options, "tree:1");
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, NULL);
+	}
 
 	/*
 	 * This print statement is designed to be script-parseable. Script
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 12/13] walken: count omitted objects
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
                       ` (10 preceding siblings ...)
  2019-06-26 23:50     ` [RFC PATCH v2 11/13] walken: add filtered object walk Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-27  5:44       ` Eric Sunshine
  2019-06-26 23:50     ` [RFC PATCH v2 13/13] walken: reverse the object walk order Emily Shaffer
  2019-06-27 22:56     ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
  13 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

It may be illuminating to see which objects were not included within a
given filter. This also demonstrates, since filter-spec "tree:1" is
used, that the 'omitted' list contains all objects which are omitted,
not just the first objects which were omitted - that is, it continues to
dereference omitted trees and commits.

This is part of a tutorial on performing revision walks.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index a744d042d8..dc59ff5009 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -45,7 +45,7 @@ static void init_walken_defaults(void)
  * mirror those settings in post_repo_init_init.
  */
 static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
-		struct rev_info *rev)
+				 struct rev_info *rev)
 {
 	/*
 	 * Optional:
@@ -145,6 +145,11 @@ static void walken_show_object(struct object *obj, const char *str, void *buf)
 static void walken_object_walk(struct rev_info *rev)
 {
 	struct list_objects_filter_options filter_options = {};
+	struct oidset omitted;
+	struct oidset_iter oit;
+	struct object_id *oid = NULL;
+	int omitted_count = 0;
+	oidset_init(&omitted, 0);
 
 	printf("walken_object_walk beginning...\n");
 
@@ -181,13 +186,19 @@ static void walken_object_walk(struct rev_info *rev)
 			walken_show_commit, walken_show_object, NULL, NULL);
 	}
 
+	/* Count the omitted objects. */
+	oidset_iter_init(&omitted, &oit);
+
+	while ((oid = oidset_iter_next(&oit)))
+		omitted_count++;
+
 	/*
 	 * This print statement is designed to be script-parseable. Script
 	 * authors will rely on the output not to change, so we will not
 	 * localize this string. It will go to stdout directly.
 	 */
-	printf("commits %d\n blobs %d\n tags %d\n trees %d\n", commit_count,
-	       blob_count, tag_count, tree_count);
+	printf("commits %d\n blobs %d\n tags %d\n trees %d omitted %d\n",
+	       commit_count, blob_count, tag_count, tree_count, omitted_count);
 }
 
 /*
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v2 13/13] walken: reverse the object walk order
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
                       ` (11 preceding siblings ...)
  2019-06-26 23:50     ` [RFC PATCH v2 12/13] walken: count omitted objects Emily Shaffer
@ 2019-06-26 23:50     ` Emily Shaffer
  2019-06-27 22:56     ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-26 23:50 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Demonstrate that just like commit walks, object walks can have their
order reversed. Additionally, add verbose logging of objects encountered
in order to let contributors prove to themselves that the walk has
actually been reversed. With this commit, `git walken` becomes extremely
chatty - it's recommended to pipe the output through `head` or `tail` or
to redirect it into a file.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I91883b209a61ae4d87855878291e487fe36220c4
---
 builtin/walken.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index dc59ff5009..37b02887a5 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -112,11 +112,13 @@ static int git_walken_config(const char *var, const char *value, void *cb)
 
 static void walken_show_commit(struct commit *cmt, void *buf)
 {
+	printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
 	commit_count++;
 }
 
 static void walken_show_object(struct object *obj, const char *str, void *buf)
 {
+	printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
 	switch (obj->type) {
 	case OBJ_TREE:
 		tree_count++;
@@ -158,6 +160,7 @@ static void walken_object_walk(struct rev_info *rev)
 	rev->tag_objects = 1;
 	rev->tree_blobs_in_commit_order = 1;
 	rev->exclude_promisor_objects = 1;
+	rev->reverse = 1;
 
 	if (prepare_revision_walk(rev))
 		die(_("revision walk setup failed"));
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 02/13] walken: add usage to enable -h
  2019-06-26 23:50     ` [RFC PATCH v2 02/13] walken: add usage to enable -h Emily Shaffer
@ 2019-06-27  4:47       ` Eric Sunshine
  2019-06-27 18:40         ` Emily Shaffer
  2019-06-27  4:50       ` Eric Sunshine
  1 sibling, 1 reply; 102+ messages in thread
From: Eric Sunshine @ 2019-06-27  4:47 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> It's expected that Git commands support '-h' in order to provide a
> consistent user experience (and this expectation is enforced by the
> test suite). '-h' is captured by parse_options() by default; in order to
> support this flag, we add a short usage text to walken.c and invoke
> parse_options().
> [...]
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> diff --git a/builtin/walken.c b/builtin/walken.c
> @@ -5,9 +5,34 @@
> +const char * const walken_usage[] = {
> +       N_("git walken"),
> +       NULL,
> +};

Unless you expect to reference this from multiple functions, there is
no need for it to reside here; instead, it can live inside
cmd_walken(). (And, if you do leave it in the global scope for some
reason, it should be 'static'.)

>  int cmd_walken(int argc, const char **argv, const char *prefix)
>  {
> +       struct option options[] = {
> +               OPT_END()
> +       };

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 02/13] walken: add usage to enable -h
  2019-06-26 23:50     ` [RFC PATCH v2 02/13] walken: add usage to enable -h Emily Shaffer
  2019-06-27  4:47       ` Eric Sunshine
@ 2019-06-27  4:50       ` Eric Sunshine
  1 sibling, 0 replies; 102+ messages in thread
From: Eric Sunshine @ 2019-06-27  4:50 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> It's expected that Git commands support '-h' in order to provide a
> consistent user experience (and this expectation is enforced by the
> test suite). '-h' is captured by parse_options() by default; in order to
> support this flag, we add a short usage text to walken.c and invoke
> parse_options().
> [...]
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> diff --git a/builtin/walken.c b/builtin/walken.c
> @@ -5,9 +5,34 @@
>  int cmd_walken(int argc, const char **argv, const char *prefix)
>  {
> +       [...]
> +       /*
> +        * This line is "human-readable" and we are writing a plumbing command,
> +        * so we localize it and use the trace library to print only when
> +        * the GIT_TRACE environment variable is set.
> +        */
>         trace_printf(_("cmd_walken incoming...\n"));

Also, this in-code comment should have been introduced in patch 1/13,
not here in 2/13.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 04/13] walken: add handler to git_config
  2019-06-26 23:50     ` [RFC PATCH v2 04/13] walken: add handler to git_config Emily Shaffer
@ 2019-06-27  4:54       ` Eric Sunshine
  2019-06-27 18:47         ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Sunshine @ 2019-06-27  4:54 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> For now, we have no configuration options we want to set up for
> ourselves, but in the future we may need to. At the very least, we
> should invoke git_default_config() for each config option; we will do so
> inside of a skeleton config callback so that we know where to add
> configuration handling later on when we need it.
>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> diff --git a/builtin/walken.c b/builtin/walken.c
> @@ -24,11 +25,36 @@ const char * const walken_usage[] = {
>  static void init_walken_defaults(void)
>  {
>         /*
> -        * We don't actually need the same components `git log` does; leave this
> -        * empty for now.
> +        * We don't use any other components or have settings to initialize, so
> +        * leave this empty.
>          */
>  }

Meh, I don't think this change has anything to do with this patch. If
the rewritten text is the one you prefer, then just introduce it like
that in patch 3/13 where the function itself was introduced.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 05/13] walken: configure rev_info and prepare for walk
  2019-06-26 23:50     ` [RFC PATCH v2 05/13] walken: configure rev_info and prepare for walk Emily Shaffer
@ 2019-06-27  5:06       ` Eric Sunshine
  2019-06-27 18:56         ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Sunshine @ 2019-06-27  5:06 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> `struct rev_info` is what's used by the struct itself.

What "struct itself"? Do you mean 'struct rev_info' is used by the
_walk_ itself? Or something?

> `repo_init_revisions()` initializes the struct; then we need to set it
> up for the walk we want to perform, which is done in
> `final_rev_info_setup()`.
> [...]
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> diff --git a/builtin/walken.c b/builtin/walken.c
> @@ -30,6 +31,40 @@ static void init_walken_defaults(void)
> +/*
> + * cmd_log calls a second set of init after the repo_init_revisions call. We'll
> + * mirror those settings in post_repo_init_init.
> + */

What is 'post_repo_init_init'?

I found the reference to cmd_log() confusing because I was looking for
it in this patch (as if it was being introduced here). Newcomers might
be even more confused. Perhaps if you state explicitly that you're
referring to existing code in an existing file, it might be clearer.
Maybe:

    builtin/log.c:cmd_log() calls a second ...

Overall, I find this entire function comment mystifying.

> +static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
> +               struct rev_info *rev)
> +{
> +       /*
> +        * Optional:
> +        * setup_revision_opt is used to pass options to the setup_revisions()
> +        * call. It's got some special items for submodules and other types of
> +        * optimizations, but for now, we'll just point it to HEAD and call it
> +        * good. First we should make sure to reset it. This is useful for more
> +        * complicated stuff but a decent shortcut for the first pass is
> +        * add_head_to_pending().
> +        */

I had to pause over "call it good" for several seconds (since I
couldn't understand why someone would want to write "bad" code) until
I figured out you meant "do nothing else". It would be clearer simply
to drop that, ending the sentence at "HEAD":

    ..., but for now, we'll just point it at HEAD.

> +       /*
> +        * struct setup_revision_opt opt;
> +
> +        * memset(&opt, 0, sizeof(opt));
> +        * opt.def = "HEAD";
> +        * opt.revarg_opt = REVARG_COMMITTISH;
> +        * setup_revisions(argc, argv, rev, &opt);
> +        */
> +
> +       /* Let's force oneline format. */
> +       get_commit_format("oneline", rev);
> +       rev->verbose_header = 1;
> +
> +       /* add the HEAD to pending so we can start */
> +       add_head_to_pending(rev);
> +}

It would be easier for the reader to associate the
add_head_to_pending() invocation with the commented-out setting of
"HEAD" via 'setup_revision_opt' if the two bits abutted one another
without being separated by the "oneline" gunk.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 06/13] walken: perform our basic revision walk
  2019-06-26 23:50     ` [RFC PATCH v2 06/13] walken: perform our basic revision walk Emily Shaffer
@ 2019-06-27  5:16       ` Eric Sunshine
  2019-06-27 20:54         ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Sunshine @ 2019-06-27  5:16 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> Add the final steps needed and implement the walk loop itself. We add a
> method walken_commit_walk() which performs the final setup to revision.c
> and then iterates over commits from get_revision().
> [...]
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> diff --git a/builtin/walken.c b/builtin/walken.c
> +/*
> + * walken_commit_walk() is invoked by cmd_walken() after initialization. It
> + * does the commit walk only.
> + */

"only" as opposed to what? Maybe just say:

    ... after initialization. It performs the actual commit walk.

> +static void walken_commit_walk(struct rev_info *rev)
> +{
> +       struct commit *commit;
> +       struct strbuf prettybuf = STRBUF_INIT;
> +
> +       /*
> +         * prepare_revision_walk() gets the final steps ready for a revision
> +        * walk. We check the return value for errors.
> +         */

You have some funky mix of spaces and tabs indenting the comment
lines. Same for the next comment block.

> +       if (prepare_revision_walk(rev)) {
> +               die(_("revision walk setup failed"));
> +       }
> +
> +       /*
> +         * Now we can start the real commit walk. get_revision grabs the next
> +        * revision based on the contents of rev.
> +        */

s/get_revision/&()/

> +       rev->diffopt.close_file = 0;

Why this? And, why isn't it set up where other 'rev' options are initialized?

> +       while ((commit = get_revision(rev))) {
> +               if (!commit)
> +                       continue;

If get_revision() returns NULL, then the while-loop exits, which means
that the "if (!commit)" condition will never be satisfied, thus is
unnecessary code.

> +               strbuf_reset(&prettybuf);
> +               pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
> +               /*
> +                * We expect this part of the output to be machine-parseable -
> +                * one commit message per line - so we must not localize it.
> +                */
> +               puts(prettybuf.buf);

Meh, but there isn't any literal text here to localize anyway, so the
comment talking about not localizing it is just confusing.

> +       }

Leaking 'prettybuf'. Add here:

    strbuf_release(&prettybuf);

> +}

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 07/13] walken: filter for authors from gmail address
  2019-06-26 23:50     ` [RFC PATCH v2 07/13] walken: filter for authors from gmail address Emily Shaffer
@ 2019-06-27  5:20       ` Eric Sunshine
  2019-06-27 20:58         ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Sunshine @ 2019-06-27  5:20 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> In order to demonstrate how to create grep filters for revision walks,
> filter the walk performed by cmd_walken() to print only commits which
> are authored by someone with a gmail address.
> [...]
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> diff --git a/builtin/walken.c b/builtin/walken.c
> @@ -60,6 +59,10 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
> +       /* Add a grep pattern to the author line in the header. */

This sounds as if we are adding something to the author line, which is
confusing. Maybe say instead:

    /* Apply a 'grep' pattern to the author header. */

> +       append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, "gmail");
> +       compile_grep_patterns(&rev->grep_filter);

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 08/13] walken: demonstrate various topographical sorts
  2019-06-26 23:50     ` [RFC PATCH v2 08/13] walken: demonstrate various topographical sorts Emily Shaffer
@ 2019-06-27  5:22       ` Eric Sunshine
  2019-06-27 22:12         ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Sunshine @ 2019-06-27  5:22 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> Order the revision walk by author or commit dates, to demonstrate how to

s/,//

> apply topo_sort to a revision walk.
>
> While following the tutorial, new contributors are guided to run a walk
> with each sort and compare the results.
>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 09/13] walken: demonstrate reversing a revision walk list
  2019-06-26 23:50     ` [RFC PATCH v2 09/13] walken: demonstrate reversing a revision walk list Emily Shaffer
@ 2019-06-27  5:26       ` Eric Sunshine
  2019-06-27 22:20         ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Sunshine @ 2019-06-27  5:26 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> The final installment in the tutorial about sorting revision walk
> outputs. This commit reverses the commit list, so that we see newer
> commits last (handy since we aren't using a pager).
>
> It's important to note that rev->reverse needs to be set after
> add_head_to_pending() or before setup_revisions(). (This is mentioned in
> the accompanying tutorial.)

This leaves the reader wondering "why that requirement?". Is it
because those functions may change the value or otherwise depend upon
the value?

Also, something this important probably deserves an in-code comment
(and need not be mentioned in the commit message if the in-code
comment explains it well.)

> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> diff --git a/builtin/walken.c b/builtin/walken.c
> @@ -69,6 +69,9 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
>         /* add the HEAD to pending so we can start */
>         add_head_to_pending(rev);
> +
> +       /* Reverse the order */
> +       rev->reverse = 1;

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 10/13] walken: add unfiltered object walk from HEAD
  2019-06-26 23:50     ` [RFC PATCH v2 10/13] walken: add unfiltered object walk from HEAD Emily Shaffer
@ 2019-06-27  5:37       ` Eric Sunshine
  2019-06-27 22:31         ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Sunshine @ 2019-06-27  5:37 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> Provide a demonstration of a revision walk which traverses all types of
> object, not just commits. This type of revision walk is used for
> operations such as creating packfiles and performing fetches or clones,
> so it's useful to teach new developers how it works. For starters, only
> demonstrate the unfiltered version, as this will make the tutorial
> easier to follow.
> [...]
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> diff --git a/builtin/walken.c b/builtin/walken.c
> @@ -103,6 +109,65 @@ static int git_walken_config(const char *var, const char *value, void *cb)
> +static void walken_show_commit(struct commit *cmt, void *buf)
> +{
> +       commit_count++;
> +}
> +
> +static void walken_show_object(struct object *obj, const char *str, void *buf)
> +{
> +       switch (obj->type) {
> +       [...]
> +       case OBJ_TAG:
> +               tag_count++;
> +               break;
> +       case OBJ_COMMIT:
> +               die(_("unexpectedly encountered a commit in "
> +                     "walken_show_object\n"));
> +               commit_count++;

The "commit_count++" line is not only dead code, but it also confuses
the reader (or makes the reader think that the author of this code
doesn't understand C programming). You should drop this line.

> +               break;
> +       default:
> +               die(_("unexpected object type %s\n"), type_name(obj->type));
> +               break;

Likewise, this "break" (and the one in the OBJ_COMMIT case) are dead
code which should be dropped to avoid confusing the reader.

Don't localize the die() message via _() here or in the preceding
OBJ_COMMIT case.

The two die() messages are unnecessarily dissimilar. Try to unify them
so that they read in the same way.

> +       }
> +}> @@ -113,15 +178,15 @@ static void walken_commit_walk(struct rev_info *rev)
>         /*
> -         * prepare_revision_walk() gets the final steps ready for a revision
> +        * prepare_revision_walk() gets the final steps ready for a revision
>          * walk. We check the return value for errors.
> -         */
> +        */
>         /*
> -         * Now we can start the real commit walk. get_revision grabs the next
> +        * Now we can start the real commit walk. get_revision grabs the next
>          * revision based on the contents of rev.
>          */

I think these are just correcting whitespace/indentation errors I
pointed out in an earlier patch (so they are unnecessary noise in this
patch).

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 11/13] walken: add filtered object walk
  2019-06-26 23:50     ` [RFC PATCH v2 11/13] walken: add filtered object walk Emily Shaffer
@ 2019-06-27  5:42       ` Eric Sunshine
  2019-06-27 22:33         ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Sunshine @ 2019-06-27  5:42 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> Demonstrate how filter specs can be used when performing a revision walk
> of all object types. In this case, tree depth is used. Contributors who
> are following the revision walking tutorial will be encouraged to run
> the revision walk with and without the filter in order to compare the
> number of objects seen in each case.
>
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> diff --git a/builtin/walken.c b/builtin/walken.c
> @@ -143,6 +144,10 @@ static void walken_show_object(struct object *obj, const char *str, void *buf)
>  static void walken_object_walk(struct rev_info *rev)
>  {
> +       struct list_objects_filter_options filter_options = {};
> +
> +       printf("walken_object_walk beginning...\n");

Is this debugging code which you accidentally left in? Or is it meant
to use trace_printf()? Or something else? If it is a genuine message,
should it be localizable?

> @@ -157,7 +162,24 @@ static void walken_object_walk(struct rev_info *rev)
>         blob_count = 0;
>         tree_count = 0;
>
> -       traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
> +       if (1) {
> +               /* Unfiltered: */

The subject talks about adding a _filtered_ object walk (which is in
the 'else' arm), so should this be "if (0)" instead?

> +               trace_printf(_("Unfiltered object walk.\n"));
> +               traverse_commit_list(rev, walken_show_commit,
> +                               walken_show_object, NULL);
> +       } else {
> +               trace_printf(_("Filtered object walk with filterspec "
> +                               "'tree:1'.\n"));
> +               /*
> +                * We can parse a tree depth of 1 to demonstrate the kind of
> +                * filtering that could occur during various operations (see
> +                * `git help rev-list` and read the entry on `--filter`).
> +                */
> +               parse_list_objects_filter(&filter_options, "tree:1");
> +
> +               traverse_commit_list_filtered(&filter_options, rev,
> +                       walken_show_commit, walken_show_object, NULL, NULL);
> +       }

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 12/13] walken: count omitted objects
  2019-06-26 23:50     ` [RFC PATCH v2 12/13] walken: count omitted objects Emily Shaffer
@ 2019-06-27  5:44       ` Eric Sunshine
  0 siblings, 0 replies; 102+ messages in thread
From: Eric Sunshine @ 2019-06-27  5:44 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> It may be illuminating to see which objects were not included within a
> given filter. This also demonstrates, since filter-spec "tree:1" is
> used, that the 'omitted' list contains all objects which are omitted,
> not just the first objects which were omitted - that is, it continues to
> dereference omitted trees and commits.
> [...]
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> ---
> diff --git a/builtin/walken.c b/builtin/walken.c
> @@ -45,7 +45,7 @@ static void init_walken_defaults(void)
>  static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
> -               struct rev_info *rev)
> +                                struct rev_info *rev)

Use the correct indentation in the patch which introduces this code
rather than adjusting it in this patch.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 02/13] walken: add usage to enable -h
  2019-06-27  4:47       ` Eric Sunshine
@ 2019-06-27 18:40         ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-27 18:40 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Jun 27, 2019 at 12:47:25AM -0400, Eric Sunshine wrote:
> On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > It's expected that Git commands support '-h' in order to provide a
> > consistent user experience (and this expectation is enforced by the
> > test suite). '-h' is captured by parse_options() by default; in order to
> > support this flag, we add a short usage text to walken.c and invoke
> > parse_options().
> > [...]
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > diff --git a/builtin/walken.c b/builtin/walken.c
> > @@ -5,9 +5,34 @@
> > +const char * const walken_usage[] = {
> > +       N_("git walken"),
> > +       NULL,
> > +};
> 
> Unless you expect to reference this from multiple functions, there is
> no need for it to reside here; instead, it can live inside
> cmd_walken(). (And, if you do leave it in the global scope for some
> reason, it should be 'static'.)

Thanks, done.

> 
> >  int cmd_walken(int argc, const char **argv, const char *prefix)
> >  {
> > +       struct option options[] = {
> > +               OPT_END()
> > +       };

Fixed the comment from your other mail too.

Thanks!

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 04/13] walken: add handler to git_config
  2019-06-27  4:54       ` Eric Sunshine
@ 2019-06-27 18:47         ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-27 18:47 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Jun 27, 2019 at 12:54:15AM -0400, Eric Sunshine wrote:
> On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > For now, we have no configuration options we want to set up for
> > ourselves, but in the future we may need to. At the very least, we
> > should invoke git_default_config() for each config option; we will do so
> > inside of a skeleton config callback so that we know where to add
> > configuration handling later on when we need it.
> >
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > diff --git a/builtin/walken.c b/builtin/walken.c
> > @@ -24,11 +25,36 @@ const char * const walken_usage[] = {
> >  static void init_walken_defaults(void)
> >  {
> >         /*
> > -        * We don't actually need the same components `git log` does; leave this
> > -        * empty for now.
> > +        * We don't use any other components or have settings to initialize, so
> > +        * leave this empty.
> >          */
> >  }
> 
> Meh, I don't think this change has anything to do with this patch. If
> the rewritten text is the one you prefer, then just introduce it like
> that in patch 3/13 where the function itself was introduced.

Whoops, yeah. I removed this change, I don't think it's significant.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 05/13] walken: configure rev_info and prepare for walk
  2019-06-27  5:06       ` Eric Sunshine
@ 2019-06-27 18:56         ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-27 18:56 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Jun 27, 2019 at 01:06:32AM -0400, Eric Sunshine wrote:
> On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > `struct rev_info` is what's used by the struct itself.
> 
> What "struct itself"? Do you mean 'struct rev_info' is used by the
> _walk_ itself? Or something?

Yep, that's the one. Thanks for the fresh eyes.

> 
> > `repo_init_revisions()` initializes the struct; then we need to set it
> > up for the walk we want to perform, which is done in
> > `final_rev_info_setup()`.
> > [...]
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > diff --git a/builtin/walken.c b/builtin/walken.c
> > @@ -30,6 +31,40 @@ static void init_walken_defaults(void)
> > +/*
> > + * cmd_log calls a second set of init after the repo_init_revisions call. We'll
> > + * mirror those settings in post_repo_init_init.
> > + */
> 
> What is 'post_repo_init_init'?
> 
> I found the reference to cmd_log() confusing because I was looking for
> it in this patch (as if it was being introduced here). Newcomers might
> be even more confused. Perhaps if you state explicitly that you're
> referring to existing code in an existing file, it might be clearer.
> Maybe:
> 
>     builtin/log.c:cmd_log() calls a second ...
> 
> Overall, I find this entire function comment mystifying.

Yeah, this is very stale and never got updated when I realized cmd_log()
was calling two init functions for apparently legacy reasons, and I
didn't need to mirror it here. Again a case where fresh eyes caught
something that became invisible after I stared at it for weeks. I really
appreciate you doing the deep review, Eric.

I've replaced it:

 /*
  * Perform configuration for commit walk here. Within this function we set a
  * starting point, and can customize our walk in various ways.
  */

> 
> > +static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
> > +               struct rev_info *rev)
> > +{
> > +       /*
> > +        * Optional:
> > +        * setup_revision_opt is used to pass options to the setup_revisions()
> > +        * call. It's got some special items for submodules and other types of
> > +        * optimizations, but for now, we'll just point it to HEAD and call it
> > +        * good. First we should make sure to reset it. This is useful for more
> > +        * complicated stuff but a decent shortcut for the first pass is
> > +        * add_head_to_pending().
> > +        */
> 
> I had to pause over "call it good" for several seconds (since I
> couldn't understand why someone would want to write "bad" code) until
> I figured out you meant "do nothing else". It would be clearer simply
> to drop that, ending the sentence at "HEAD":
> 
>     ..., but for now, we'll just point it at HEAD.

Definitely. Done.

> 
> > +       /*
> > +        * struct setup_revision_opt opt;
> > +
> > +        * memset(&opt, 0, sizeof(opt));
> > +        * opt.def = "HEAD";
> > +        * opt.revarg_opt = REVARG_COMMITTISH;
> > +        * setup_revisions(argc, argv, rev, &opt);
> > +        */
> > +
> > +       /* Let's force oneline format. */
> > +       get_commit_format("oneline", rev);
> > +       rev->verbose_header = 1;
> > +
> > +       /* add the HEAD to pending so we can start */
> > +       add_head_to_pending(rev);
> > +}
> 
> It would be easier for the reader to associate the
> add_head_to_pending() invocation with the commented-out setting of
> "HEAD" via 'setup_revision_opt' if the two bits abutted one another
> without being separated by the "oneline" gunk.

Good point, done.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 06/13] walken: perform our basic revision walk
  2019-06-27  5:16       ` Eric Sunshine
@ 2019-06-27 20:54         ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-27 20:54 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Jun 27, 2019 at 01:16:58AM -0400, Eric Sunshine wrote:
> On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > Add the final steps needed and implement the walk loop itself. We add a
> > method walken_commit_walk() which performs the final setup to revision.c
> > and then iterates over commits from get_revision().
> > [...]
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > diff --git a/builtin/walken.c b/builtin/walken.c
> > +/*
> > + * walken_commit_walk() is invoked by cmd_walken() after initialization. It
> > + * does the commit walk only.
> > + */
> 
> "only" as opposed to what? Maybe just say:
> 
>     ... after initialization. It performs the actual commit walk.

Done.

> 
> > +static void walken_commit_walk(struct rev_info *rev)
> > +{
> > +       struct commit *commit;
> > +       struct strbuf prettybuf = STRBUF_INIT;
> > +
> > +       /*
> > +         * prepare_revision_walk() gets the final steps ready for a revision
> > +        * walk. We check the return value for errors.
> > +         */
> 
> You have some funky mix of spaces and tabs indenting the comment
> lines. Same for the next comment block.

Done.

> 
> > +       if (prepare_revision_walk(rev)) {
> > +               die(_("revision walk setup failed"));
> > +       }
> > +
> > +       /*
> > +         * Now we can start the real commit walk. get_revision grabs the next
> > +        * revision based on the contents of rev.
> > +        */
> 
> s/get_revision/&()/

Done.

> 
> > +       rev->diffopt.close_file = 0;
> 
> Why this? And, why isn't it set up where other 'rev' options are initialized?

Removed. Artifact of closely mirroring log.

> 
> > +       while ((commit = get_revision(rev))) {
> > +               if (!commit)
> > +                       continue;
> 
> If get_revision() returns NULL, then the while-loop exits, which means
> that the "if (!commit)" condition will never be satisfied, thus is
> unnecessary code.

Yep, removed.

> 
> > +               strbuf_reset(&prettybuf);
> > +               pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
> > +               /*
> > +                * We expect this part of the output to be machine-parseable -
> > +                * one commit message per line - so we must not localize it.
> > +                */
> > +               puts(prettybuf.buf);
> 
> Meh, but there isn't any literal text here to localize anyway, so the
> comment talking about not localizing it is just confusing.

Yeah, you're right. I'll change to "so we send it to stdout", which is
less obvious from puts().

> 
> > +       }
> 
> Leaking 'prettybuf'. Add here:
> 
>     strbuf_release(&prettybuf);

Thanks, done.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 07/13] walken: filter for authors from gmail address
  2019-06-27  5:20       ` Eric Sunshine
@ 2019-06-27 20:58         ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-27 20:58 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Jun 27, 2019 at 01:20:25AM -0400, Eric Sunshine wrote:
> On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > In order to demonstrate how to create grep filters for revision walks,
> > filter the walk performed by cmd_walken() to print only commits which
> > are authored by someone with a gmail address.
> > [...]
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > diff --git a/builtin/walken.c b/builtin/walken.c
> > @@ -60,6 +59,10 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
> > +       /* Add a grep pattern to the author line in the header. */
> 
> This sounds as if we are adding something to the author line, which is
> confusing. Maybe say instead:
> 
>     /* Apply a 'grep' pattern to the author header. */
> 

I also s/author/'&'/; thanks.

> > +       append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, "gmail");
> > +       compile_grep_patterns(&rev->grep_filter);

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 08/13] walken: demonstrate various topographical sorts
  2019-06-27  5:22       ` Eric Sunshine
@ 2019-06-27 22:12         ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-27 22:12 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Jun 27, 2019 at 01:22:10AM -0400, Eric Sunshine wrote:
> On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > Order the revision walk by author or commit dates, to demonstrate how to
> 
> s/,//

Done.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 09/13] walken: demonstrate reversing a revision walk list
  2019-06-27  5:26       ` Eric Sunshine
@ 2019-06-27 22:20         ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-27 22:20 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Jun 27, 2019 at 01:26:19AM -0400, Eric Sunshine wrote:
> On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > The final installment in the tutorial about sorting revision walk
> > outputs. This commit reverses the commit list, so that we see newer
> > commits last (handy since we aren't using a pager).
> >
> > It's important to note that rev->reverse needs to be set after
> > add_head_to_pending() or before setup_revisions(). (This is mentioned in
> > the accompanying tutorial.)
> 
> This leaves the reader wondering "why that requirement?". Is it
> because those functions may change the value or otherwise depend upon
> the value?
> 
> Also, something this important probably deserves an in-code comment
> (and need not be mentioned in the commit message if the in-code
> comment explains it well.)

This I will remove. I removed from the tutorial as it turned out I was
incorrect. Thanks.

> 
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > diff --git a/builtin/walken.c b/builtin/walken.c
> > @@ -69,6 +69,9 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
> >         /* add the HEAD to pending so we can start */
> >         add_head_to_pending(rev);
> > +
> > +       /* Reverse the order */
> > +       rev->reverse = 1;

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 10/13] walken: add unfiltered object walk from HEAD
  2019-06-27  5:37       ` Eric Sunshine
@ 2019-06-27 22:31         ` Emily Shaffer
  2019-06-28  0:48           ` Eric Sunshine
  0 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-06-27 22:31 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Jun 27, 2019 at 01:37:58AM -0400, Eric Sunshine wrote:
> On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > Provide a demonstration of a revision walk which traverses all types of
> > object, not just commits. This type of revision walk is used for
> > operations such as creating packfiles and performing fetches or clones,
> > so it's useful to teach new developers how it works. For starters, only
> > demonstrate the unfiltered version, as this will make the tutorial
> > easier to follow.
> > [...]
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > diff --git a/builtin/walken.c b/builtin/walken.c
> > @@ -103,6 +109,65 @@ static int git_walken_config(const char *var, const char *value, void *cb)
> > +static void walken_show_commit(struct commit *cmt, void *buf)
> > +{
> > +       commit_count++;
> > +}
> > +
> > +static void walken_show_object(struct object *obj, const char *str, void *buf)
> > +{
> > +       switch (obj->type) {
> > +       [...]
> > +       case OBJ_TAG:
> > +               tag_count++;
> > +               break;
> > +       case OBJ_COMMIT:
> > +               die(_("unexpectedly encountered a commit in "
> > +                     "walken_show_object\n"));
> > +               commit_count++;
> 
> The "commit_count++" line is not only dead code, but it also confuses
> the reader (or makes the reader think that the author of this code
> doesn't understand C programming). You should drop this line.

Ow, yes. Removed. This is stale (pre-die()).

> 
> > +               break;
> > +       default:
> > +               die(_("unexpected object type %s\n"), type_name(obj->type));
> > +               break;
> 
> Likewise, this "break" (and the one in the OBJ_COMMIT case) are dead
> code which should be dropped to avoid confusing the reader.

Done.

> 
> Don't localize the die() message via _() here or in the preceding
> OBJ_COMMIT case.

I'm a little surprised by that. Is it because die() is expected to only
be seen by the developer? It seems like a poor user experience if
someone in non-English locale encounters a bug that Git team didn't
find, and needed to try to translate the English die() string and figure
out if a workaround is possible.

> 
> The two die() messages are unnecessarily dissimilar. Try to unify them
> so that they read in the same way.

I'm a little surprised by this too; it seems to me the root cause of
each would be different. In the former case, I'd guess that
traverse_commit_list()'s behavior changed, and in the latter case I'd
guess that a new object type was recently added to the model. Can you
help me understand the motivation for making the messages similar?

(I don't think, though, that I did a good job of indicating either root
cause in the die() messages as they are now.)

> 
> > +       }
> > +}> @@ -113,15 +178,15 @@ static void walken_commit_walk(struct rev_info *rev)
> >         /*
> > -         * prepare_revision_walk() gets the final steps ready for a revision
> > +        * prepare_revision_walk() gets the final steps ready for a revision
> >          * walk. We check the return value for errors.
> > -         */
> > +        */
> >         /*
> > -         * Now we can start the real commit walk. get_revision grabs the next
> > +        * Now we can start the real commit walk. get_revision grabs the next
> >          * revision based on the contents of rev.
> >          */
> 
> I think these are just correcting whitespace/indentation errors I
> pointed out in an earlier patch (so they are unnecessary noise in this
> patch).

ACK

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 11/13] walken: add filtered object walk
  2019-06-27  5:42       ` Eric Sunshine
@ 2019-06-27 22:33         ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-27 22:33 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Jun 27, 2019 at 01:42:45AM -0400, Eric Sunshine wrote:
> On Wed, Jun 26, 2019 at 7:51 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > Demonstrate how filter specs can be used when performing a revision walk
> > of all object types. In this case, tree depth is used. Contributors who
> > are following the revision walking tutorial will be encouraged to run
> > the revision walk with and without the filter in order to compare the
> > number of objects seen in each case.
> >
> > Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> > ---
> > diff --git a/builtin/walken.c b/builtin/walken.c
> > @@ -143,6 +144,10 @@ static void walken_show_object(struct object *obj, const char *str, void *buf)
> >  static void walken_object_walk(struct rev_info *rev)
> >  {
> > +       struct list_objects_filter_options filter_options = {};
> > +
> > +       printf("walken_object_walk beginning...\n");
> 
> Is this debugging code which you accidentally left in? Or is it meant
> to use trace_printf()? Or something else? If it is a genuine message,
> should it be localizable?

The former. Removed.

> 
> > @@ -157,7 +162,24 @@ static void walken_object_walk(struct rev_info *rev)
> >         blob_count = 0;
> >         tree_count = 0;
> >
> > -       traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
> > +       if (1) {
> > +               /* Unfiltered: */
> 
> The subject talks about adding a _filtered_ object walk (which is in
> the 'else' arm), so should this be "if (0)" instead?

Done.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 00/13] example implementation of revwalk tutorial
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
                       ` (12 preceding siblings ...)
  2019-06-26 23:50     ` [RFC PATCH v2 13/13] walken: reverse the object walk order Emily Shaffer
@ 2019-06-27 22:56     ` Emily Shaffer
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-06-27 22:56 UTC (permalink / raw)
  To: git; +Cc: Jeff Hostetler

On Wed, Jun 26, 2019 at 04:50:19PM -0700, Emily Shaffer wrote:
> Since r1, made some significant changes.
> 
>  - Added a commit for counting the 'omitted' list, to match the new
>    section added to the tutorial.
>  - Added significant comments to allow the sample to better stand on its
>    own.
>  - Fixed style issues (die() formatting, etc)
>  - Distinguished between human- and machine-readable output with
>    trace_printf() and printf(), to turn the command into plumbing.
>  - More changes as mentioned in the tutorial patch.
> 
> Thanks.
>  - Emily

I have a v3 ready with changes based on Eric's suggestions. However,
since they're almost all verbatim changes, I'm going to hold off on
sending it for a couple days to see if anybody else wants to peek
though. I'll likely be sending v3 on Monday.

 - Emily

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 10/13] walken: add unfiltered object walk from HEAD
  2019-06-27 22:31         ` Emily Shaffer
@ 2019-06-28  0:48           ` Eric Sunshine
  2019-07-01 19:19             ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Eric Sunshine @ 2019-06-28  0:48 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Git List

On Thu, Jun 27, 2019 at 6:31 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> On Thu, Jun 27, 2019 at 01:37:58AM -0400, Eric Sunshine wrote:
> > Don't localize the die() message via _() here or in the preceding
> > OBJ_COMMIT case.
>
> I'm a little surprised by that. Is it because die() is expected to only
> be seen by the developer?

Sorry, I was reading those as BUG(), not die(), and we don't localize
BUG() text. But, why aren't those BUG()? Can those cases arise in
practice? (Genuine question; I haven't familiarized myself with that
code yet.)

If they legitimately should be die(), then ignore my comment about not
localizing them.

> > The two die() messages are unnecessarily dissimilar. Try to unify them
> > so that they read in the same way.
>
> I'm a little surprised by this too; it seems to me the root cause of
> each would be different. In the former case, I'd guess that
> traverse_commit_list()'s behavior changed, and in the latter case I'd
> guess that a new object type was recently added to the model. Can you
> help me understand the motivation for making the messages similar?

Both causes you describe here sound like BUG() cases, not die(). If
I'm understanding correctly, they could only trigger if someone made
some breaking or behavior changing modifications within Git and failed
to update all the code in the project impacted by the change. In other
words, these can't be triggered by user input, hence they would be
BUG()s indicating that a Git developer needs to fix the code.

As for the messages themselves, I was referring to the grammatical
dissimilarity of "unexpectedly" and "unexpected", and I also don't
understand why one messages mentions walken_show_object() explicitly,
whereas the other does not.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v2 10/13] walken: add unfiltered object walk from HEAD
  2019-06-28  0:48           ` Eric Sunshine
@ 2019-07-01 19:19             ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 19:19 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Thu, Jun 27, 2019 at 08:48:31PM -0400, Eric Sunshine wrote:
> On Thu, Jun 27, 2019 at 6:31 PM Emily Shaffer <emilyshaffer@google.com> wrote:
> > On Thu, Jun 27, 2019 at 01:37:58AM -0400, Eric Sunshine wrote:
> > > Don't localize the die() message via _() here or in the preceding
> > > OBJ_COMMIT case.
> >
> > I'm a little surprised by that. Is it because die() is expected to only
> > be seen by the developer?
> 
> Sorry, I was reading those as BUG(), not die(), and we don't localize
> BUG() text. But, why aren't those BUG()? Can those cases arise in
> practice? (Genuine question; I haven't familiarized myself with that
> code yet.)
> 
> If they legitimately should be die(), then ignore my comment about not
> localizing them.

Hmmm. Yeah, I'll switch them to BUG() - I think there are other
instances of die() in the example and it'd be good to describe yet
another way of reporting behavior.

> 
> > > The two die() messages are unnecessarily dissimilar. Try to unify them
> > > so that they read in the same way.
> >
> > I'm a little surprised by this too; it seems to me the root cause of
> > each would be different. In the former case, I'd guess that
> > traverse_commit_list()'s behavior changed, and in the latter case I'd
> > guess that a new object type was recently added to the model. Can you
> > help me understand the motivation for making the messages similar?
> 
> Both causes you describe here sound like BUG() cases, not die(). If
> I'm understanding correctly, they could only trigger if someone made
> some breaking or behavior changing modifications within Git and failed
> to update all the code in the project impacted by the change. In other
> words, these can't be triggered by user input, hence they would be
> BUG()s indicating that a Git developer needs to fix the code.
> 
> As for the messages themselves, I was referring to the grammatical
> dissimilarity of "unexpectedly" and "unexpected", and I also don't
> understand why one messages mentions walken_show_object() explicitly,
> whereas the other does not.

I see - ok, I have reworded.


Thanks!
 - Emily

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v3] documentation: add tutorial for revision walking
  2019-06-26 23:49 ` [PATCH v2] " Emily Shaffer
  2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
@ 2019-07-01 20:19   ` Emily Shaffer
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
                       ` (3 more replies)
  1 sibling, 4 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:19 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer, Junio C Hamano, Eric Sunshine

Existing documentation on revision walks seems to be primarily intended
as a reference for those already familiar with the procedure. This
tutorial attempts to give an entry-level guide to a couple of bare-bones
revision walks so that new Git contributors can learn the concepts
without having to wade through options parsing or special casing.

The target audience is a Git contributor who is just getting started
with the concept of revision walking. The goal is to prepare this
contributor to be able to understand and modify existing commands which
perform revision walks more easily, although it will also prepare
contributors to create new commands which perform walks.

The tutorial covers a basic overview of the structs involved during
revision walk, setting up a basic commit walk, setting up a basic
all-object walk, and adding some configuration changes to both walk
types. It intentionally does not cover how to create new commands or
search for options from the command line or gitconfigs.

There is an associated patchset at
https://github.com/nasamuffin/git/tree/revwalk that contains a reference
implementation of the code generated by this tutorial.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
---
Since v2, responded to Eric's substantiative review of the sample
codebase. In most cases that review pointed to comments in the sample
code, but in a couple cases the code itself changed, and that's
reflected here.

 - Move usage and options struct to local scope of cmd_walken().
 - Use BUG() to complain when an unexpected object is found in
   show_object() during object walk.
 - Fix a hardcoded switch between commit and object walks pointing the
   wrong way for its place in the tutorial.
 - Fix a memory leak during commit walk.

 Documentation/Makefile           |   1 +
 Documentation/MyFirstRevWalk.txt | 908 +++++++++++++++++++++++++++++++
 2 files changed, 909 insertions(+)
 create mode 100644 Documentation/MyFirstRevWalk.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 76f2ecfc1b..91e5da67c4 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -78,6 +78,7 @@ SP_ARTICLES += $(API_DOCS)
 
 TECH_DOCS += MyFirstContribution
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += MyFirstRevWalk
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
diff --git a/Documentation/MyFirstRevWalk.txt b/Documentation/MyFirstRevWalk.txt
new file mode 100644
index 0000000000..6a432f35a0
--- /dev/null
+++ b/Documentation/MyFirstRevWalk.txt
@@ -0,0 +1,908 @@
+My First Revision Walk
+======================
+
+== What's a Revision Walk?
+
+The revision walk is a key concept in Git - this is the process that underpins
+operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the
+list of objects is found by walking parent relationships between objects. The
+revision walk can also be used to determine whether or not a given object is
+reachable from the current HEAD pointer.
+
+=== Related Reading
+
+- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
+  the revision walker in its various incarnations.
+- `Documentation/technical/api-revision-walking.txt`
+- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
+  gives a good overview of the types of objects in Git and what your revision
+  walk is really describing.
+
+== Setting Up
+
+Create a new branch from `master`.
+
+----
+git checkout -b revwalk origin/master
+----
+
+We'll put our fiddling into a new command. For fun, let's name it `git walken`.
+Open up a new file `builtin/walken.c` and set up the command handler:
+
+----
+/*
+ * "git walken"
+ *
+ * Part of the "My First Revision Walk" tutorial.
+ */
+
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	trace_printf(_("cmd_walken incoming...\n"));
+	return 0;
+}
+----
+
+NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
+off at runtime. For the purposes of this tutorial, we will write `walken` as
+though it is intended for use as a "plumbing" command: that is, a command which
+is used primarily in scripts, rather than interactively by humans (a "porcelain"
+command). So we will send our debug output to `trace_printf()` instead. When
+running, enable trace output by setting the environment variable `GIT_TRACE`.
+
+Add usage text and `-h` handling, like all subcommands should consistently do
+(our test suite will notice and complain if you fail to do so).
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	const char * const walken_usage[] = {
+		N_("git walken"),
+		NULL,
+	}
+	struct option options[] = {
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
+
+	...
+}
+----
+
+Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
+
+----
+extern int cmd_walken(int argc, const char **argv, const char *prefix);
+----
+
+Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
+maintaining alphabetical ordering:
+
+----
+{ "walken", cmd_walken, RUN_SETUP },
+----
+
+Add it to the `Makefile` near the line for `builtin\worktree.o`:
+
+----
+BUILTIN_OBJS += builtin/walken.o
+----
+
+Build and test out your command, without forgetting to ensure the `DEVELOPER`
+flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
+
+----
+$ echo DEVELOPER=1 >>config.mak
+$ make
+$ GIT_TRACE=1 ./bin-wrappers/git walken
+----
+
+NOTE: For a more exhaustive overview of the new command process, take a look at
+`Documentation/MyFirstContribution.txt`.
+
+NOTE: A reference implementation can be found at TODO LINK.
+
+=== `struct rev_cmdline_info`
+
+The definition of `struct rev_cmdline_info` can be found in `revision.h`.
+
+This struct is contained within the `rev_info` struct and is used to reflect
+parameters provided by the user over the CLI.
+
+`nr` represents the number of `rev_cmdline_entry` present in the array.
+
+`alloc` is used by the `ALLOC_GROW` macro. Check
+`Documentation/technical/api-allocation-growing.txt` - this variable is used to
+track the allocated size of the list.
+
+Per entry, we find:
+
+`item` is the object provided upon which to base the revision walk. Items in Git
+can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
+
+`name` is the object ID (OID) of the object - a hex string you may be familiar
+with from using Git to organize your source in the past. Check the tutorial
+mentioned above towards the top for a discussion of where the OID can come
+from.
+
+`whence` indicates some information about what to do with the parents of the
+specified object. We'll explore this flag more later on; take a look at
+`Documentation/revisions.txt` to get an idea of what could set the `whence`
+value.
+
+`flags` are used to hint the beginning of the revision walk and are the first
+block under the `#include`s in `revision.h`. The most likely ones to be set in
+the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
+can be used during the walk, as well.
+
+=== `struct rev_info`
+
+This one is quite a bit longer, and many fields are only used during the walk
+by `revision.c` - not configuration options. Most of the configurable flags in
+`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
+good idea to take some time and read through that document.
+
+== Basic Commit Walk
+
+First, let's see if we can replicate the output of `git log --oneline`. We'll
+refer back to the implementation frequently to discover norms when performing
+a revision walk of our own.
+
+To do so, we'll first find all the commits, in order, which preceded the current
+commit. We'll extract the name and subject of the commit from each.
+
+Ideally, we will also be able to find out which ones are currently at the tip of
+various branches.
+
+=== Setting Up
+
+Preparing for your revision walk has some distinct stages.
+
+1. Perform default setup for this mode, and others which may be invoked.
+2. Check configuration files for relevant settings.
+3. Set up the `rev_info` struct.
+4. Tweak the initialized `rev_info` to suit the current walk.
+5. Prepare the `rev_info` for the walk.
+6. Iterate over the objects, processing each one.
+
+==== Default Setups
+
+Before examining configuration files which may modify command behavior, set up
+default state for switches or options your command may have. If your command
+utilizes other Git components, ask them to set up their default states as well.
+For instance, `git log` takes advantage of `grep` and `diff` functionality, so
+its `init_log_defaults()` sets its own state (`decoration_style`) and asks
+`grep` and `diff` to initialize themselves by calling each of their
+initialization functions.
+
+For our purposes, within `git walken`, for the first example we don't intend to
+use any other components within Git, and we don't have any configuration to do.
+However, we may want to add some later, so for now, we can add an empty
+placeholder. Create a new function in `builtin/walken.c`:
+
+----
+static void init_walken_defaults(void)
+{
+	/*
+	 * We don't actually need the same components `git log` does; leave this
+	 * empty for now.
+	 */
+}
+----
+
+Make sure to add a line invoking it inside of `cmd_walken()`.
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	init_walken_defaults();
+}
+----
+
+==== Configuring From `.gitconfig`
+
+Next, we should have a look at any relevant configuration settings (i.e.,
+settings readable and settable from `git config`). This is done by providing a
+callback to `git_config()`; within that callback, you can also invoke methods
+from other components you may need that need to intercept these options. Your
+callback will be invoked once per each configuration value which Git knows about
+(global, local, worktree, etc.).
+
+Similarly to the default values, we don't have anything to do here yet
+ourselves; however, we should call `git_default_config()` if we aren't calling
+any other existing config callbacks.
+
+Add a new function to `builtin/walken.c`:
+
+----
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	/*
+	 * For now, we don't have any custom configuration, so fall back to
+	 * the default config.
+	 */
+	return git_default_config(var, value, cb);
+}
+----
+
+Make sure to invoke `git_config()` with it in your `cmd_walken()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	git_config(git_walken_config, NULL);
+
+	...
+}
+----
+
+// TODO: Checking CLI options
+
+==== Setting Up `rev_info`
+
+Now that we've gathered external configuration and options, it's time to
+initialize the `rev_info` object which we will use to perform the walk. This is
+typically done by calling `repo_init_revisions()` with the repository you intend
+to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
+struct.
+
+Add the `struct rev_info` and the `repo_init_revisions()` call:
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	/* This can go wherever you like in your declarations.*/
+	struct rev_info rev;
+	...
+
+	/* This should go after the git_config() call. */
+	repo_init_revisions(the_repository, &rev, prefix);
+
+	...
+}
+----
+
+==== Tweaking `rev_info` For the Walk
+
+We're getting close, but we're still not quite ready to go. Now that `rev` is
+initialized, we can modify it to fit our needs. This is usually done within a
+helper for clarity, so let's add one:
+
+----
+static void final_rev_info_setup(struct rev_info *rev)
+{
+	/*
+	 * We want to mimic the appearance of `git log --oneline`, so let's
+	 * force oneline format.
+	 */
+	get_commit_format("oneline", rev);
+
+	/* Start our revision walk at HEAD. */
+	add_head_to_pending(rev);
+}
+----
+
+[NOTE]
+====
+Instead of using the shorthand `add_head_to_pending()`, you could do
+something like this:
+----
+	struct setup_revision_opt opt;
+
+	memset(&opt, 0, sizeof(opt));
+	opt.def = "HEAD";
+	opt.revarg_opt = REVARG_COMMITTISH;
+	setup_revisions(argc, argv, rev, &opt);
+----
+Using a `setup_revision_opt` gives you finer control over your walk's starting
+point.
+====
+
+Then let's invoke `final_rev_info_setup()` after the call to
+`repo_init_revisions()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	final_rev_info_setup(&rev);
+
+	...
+}
+----
+
+Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
+now, this is all we need.
+
+==== Preparing `rev_info` For the Walk
+
+Now that `rev` is all initialized and configured, we've got one more setup step
+before we get rolling. We can do this in a helper, which will both prepare the
+`rev_info` for the walk, and perform the walk itself. Let's start the helper
+with the call to `prepare_revision_walk()`, which can return an error without
+dying on its own:
+
+----
+static void walken_commit_walk(struct rev_info *rev)
+{
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+}
+----
+
+NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
+`stderr` it's likely to be seen by a human, so we will localize it.
+
+==== Performing the Walk!
+
+Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
+can also be used as an iterator; we move to the next item in the walk by using
+`get_revision()` repeatedly. Add the listed variable declarations at the top and
+the walk loop below the `prepare_revision_walk()` call within your
+`walken_commit_walk()`:
+
+----
+static void walken_commit_walk(struct rev_info *rev)
+{
+	struct commit *commit;
+	struct strbuf prettybuf = STRBUF_INIT;
+
+	...
+
+	while ((commit = get_revision(rev))) {
+		if (!commit)
+			continue;
+
+		strbuf_reset(&prettybuf);
+		pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
+		puts(prettybuf.buf);
+	}
+	strbuf_release(&prettybuf);
+}
+----
+
+NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
+command we expect to be machine-parsed, we're sending it directly to stdout.
+
+Give it a shot.
+
+----
+$ make
+$ ./bin-wrappers/git walken
+----
+
+You should see all of the subject lines of all the commits in
+your tree's history, in order, ending with the initial commit, "Initial revision
+of "git", the information manager from hell". Congratulations! You've written
+your first revision walk. You can play with printing some additional fields
+from each commit if you're curious; have a look at the functions available in
+`commit.h`.
+
+=== Adding a Filter
+
+Next, let's try to filter the commits we see based on their author. This is
+equivalent to running `git log --author=<pattern>`. We can add a filter by
+modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
+
+First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add
+`grep_config()` to `git_walken_config()`:
+
+----
+static void init_walken_defaults(void)
+{
+	init_grep_defaults(the_repository);
+}
+
+...
+
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	grep_config(var, value, cb);
+	return git_default_config(var, value, cb);
+}
+----
+
+Next, we can modify the `grep_filter`. This is done with convenience functions
+found in `grep.h`. For fun, we're filtering to only commits from folks using a
+`gmail.com` email address - a not-very-precise guess at who may be working on
+Git as a hobby. Since we're checking the author, which is a specific line in the
+header, we'll use the `append_header_grep_pattern()` helper. We can use
+the `enum grep_header_field` to indicate which part of the commit header we want
+to search.
+
+In `final_rev_info_setup()`, add your filter line:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
+		"gmail");
+	compile_grep_patterns(&rev->grep_filter);
+
+	...
+}
+----
+
+`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
+it won't work unless we compile it with `compile_grep_patterns()`.
+
+NOTE: If you are using `setup_revisions()` (for example, if you are passing a
+`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
+to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
+
+NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
+wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
+`enum grep_pat_token` for us.
+
+=== Changing the Order
+
+There are a few ways that we can change the order of the commits during a
+revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
+sane orderings.
+
+`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
+before all of its children have been shown, and we avoid mixing commits which
+are in different lines of history. (`git help log`'s section on `--topo-order`
+has a very nice diagram to illustrate this.)
+
+Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
+`REV_SORT_BY_AUTHOR_DATE`. Add the following:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_COMMIT_DATE;
+
+	...
+}
+----
+
+Let's output this into a file so we can easily diff it with the walk sorted by
+author date.
+
+----
+$ make
+$ ./bin-wrappers/git walken > commit-date.txt
+----
+
+Then, let's sort by author date and run it again.
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
+
+	...
+}
+----
+
+----
+$ make
+$ ./bin-wrappers/git walken > author-date.txt
+----
+
+Finally, compare the two. This is a little less helpful without object names or
+dates, but hopefully we get the idea.
+
+----
+$ diff -u commit-date.txt author-date.txt
+----
+
+This display indicates that commits can be reordered after they're written, for
+example with `git rebase`.
+
+Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
+Set that flag somewhere inside of `final_rev_info_setup()`:
+
+----
+static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
+		struct rev_info *rev)
+{
+	...
+
+	rev->reverse = 1;
+
+	...
+}
+----
+
+Run your walk again and note the difference in order. (If you remove the grep
+pattern, you should see the last commit this call gives you as your current
+HEAD.)
+
+== Basic Object Walk
+
+So far we've been walking only commits. But Git has more types of objects than
+that! Let's see if we can walk _all_ objects, and find out some information
+about each one.
+
+We can base our work on an example. `git pack-objects` prepares all kinds of
+objects for packing into a bitmap or packfile. The work we are interested in
+resides in `builtins/pack-objects.c:get_object_list()`; examination of that
+function shows that the all-object walk is being performed by
+`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
+functions reside in `list-objects.c`; examining the source shows that, despite
+the name, these functions traverse all kinds of objects. Let's have a look at
+the arguments to `traverse_commit_list_filtered()`, which are a superset of the
+arguments to the unfiltered version.
+
+- `struct list_objects_filter_options *filter_options`: This is a struct which
+  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
+- `struct rev_info *revs`: This is the `rev_info` used for the walk.
+- `show_commit_fn show_commit`: A callback which will be used to handle each
+  individual commit object.
+- `show_object_fn show_object`: A callback which will be used to handle each
+  non-commit object (so each blob, tree, or tag).
+- `void *show_data`: A context buffer which is passed in turn to `show_commit`
+  and `show_object`.
+- `struct oidset *omitted`: A linked-list of object IDs which the provided
+  filter caused to be omitted.
+
+It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
+instead of needing us to call it repeatedly ourselves. Cool! Let's add the
+callbacks first.
+
+For the sake of this tutorial, we'll simply keep track of how many of each kind
+of object we find. At file scope in `builtin/walken.c` add the following
+tracking variables:
+
+----
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+----
+
+Commits are handled by a different callback than other objects; let's do that
+one first:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	commit_count++;
+}
+----
+
+The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
+the `buf` argument is actually the context buffer that we can provide to the
+traversal calls - `show_data`, which we mentioned a moment ago.
+
+Since we have the `struct commit` object, we can look at all the same parts that
+we looked at in our earlier commit-only walk. For the sake of this tutorial,
+though, we'll just increment the commit counter and move on.
+
+The callback for non-commits is a little different, as we'll need to check
+which kind of object we're dealing with:
+
+----
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	switch (obj->type) {
+	case OBJ_TREE:
+		tree_count++;
+		break;
+	case OBJ_BLOB:
+		blob_count++;
+		break;
+	case OBJ_TAG:
+		tag_count++;
+		break;
+	case OBJ_COMMIT:
+		BUG("unexpected commit object in walken_show_object\n");
+	default:
+		BUG("unexpected object type %s in walken_show_object\n",
+			type_name(obj->type));
+	}
+}
+----
+
+Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
+context pointer that `walken_show_commit()` receives: the `show_data` argument
+to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
+`str` contains the name of the object, which ends up being something like
+`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
+
+To help assure us that we aren't double-counting commits, we'll include some
+complaining if a commit object is routed through our non-commit callback; we'll
+also complain if we see an invalid object type. Since those two cases should be
+unreachable, and would only change in the event of a semantic change to the Git
+codebase, we complain by using `BUG()` - which is a signal to a developer that
+the change they made caused unintended consequences, and the rest of the
+codebase needs to be updated to understand that change. `BUG()` is not intended
+to be seen by the public, so it is not localized.
+
+Our main object walk implementation is substantially different from our commit
+walk implementation, so let's make a new function to perform the object walk. We
+can perform setup which is applicable to all objects here, too, to keep separate
+from setup which is applicable to commit-only walks.
+
+We'll start by enabling all types of objects in the `struct rev_info`. Unless
+you cloned or fetched your repository earlier with a filter,
+`exclude_promisor_objects` is unlikely to make a difference, but we'll turn it
+on just to make sure our lives are simple. We'll also turn on
+`tree_blobs_in_commit_order`, which means that we will walk a commit's tree and
+everything it points to immediately after we find each commit, as opposed to
+waiting for the end and walking through all trees after the commit history has
+been discovered. With the appropriate settings configured, we are ready to call
+`prepare_revision_walk()`.
+
+----
+static void walken_object_walk(struct rev_info *rev)
+{
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+	rev->exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+
+	commit_count = 0;
+	tag_count = 0;
+	blob_count = 0;
+	tree_count = 0;
+----
+
+Let's start by calling just the unfiltered walk and reporting our counts.
+Complete your implementation of `walken_object_walk()`:
+
+----
+	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+	printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
+		blob_count, tag_count, tree_count);
+}
+----
+
+NOTE: This output is intended to be machine-parsed. Therefore, we are not
+sending it to `trace_printf()`, and we are not localizing it - we need scripts
+to be able to count on the formatting to be exactly the way it is shown here.
+If we were intending this output to be read by humans, we would need to localize
+it with `_()`.
+
+Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
+command line options is out of scope for this tutorial, so we'll just hardcode
+a branch we can change at compile time. Where you call `final_rev_info_setup()`
+and `walken_commit_walk()`, instead branch like so:
+
+----
+	if (1) {
+		add_head_to_pending(&rev);
+		walken_object_walk(&rev);
+	} else {
+		final_rev_info_setup(argc, argv, prefix, &rev);
+		walken_commit_walk(&rev);
+	}
+----
+
+NOTE: For simplicity, we've avoided all the filters and sorts we applied in
+`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
+want, you can certainly use the filters we added before by moving
+`final_rev_info_setup()` out of the conditional and removing the call to
+`add_head_to_pending()`.
+
+Now we can try to run our command! It should take noticeably longer than the
+commit walk, but an examination of the output will give you an idea why. Your
+output should look similar to this example, but with different counts:
+
+----
+Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
+----
+
+This makes sense. We have more trees than commits because the Git project has
+lots of subdirectories which can change, plus at least one tree per commit. We
+have no tags because we started on a commit (`HEAD`) and while tags can point to
+commits, commits can't point to tags.
+
+NOTE: You will have different counts when you run this yourself! The number of
+objects grows along with the Git project.
+
+=== Adding a Filter
+
+There are a handful of filters that we can apply to the object walk laid out in
+`Documentation/rev-list-options.txt`. These filters are typically useful for
+operations such as creating packfiles or performing a partial clone. They are
+defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
+will use the "tree:1" filter, which causes the walk to omit all trees and blobs
+which are not directly referenced by commits reachable from the commit in
+`pending` when the walk begins. (`pending` is the list of objects which need to
+be traversed during a walk; you can imagine a breadth-first tree traversal to
+help understand. In our case, that means we omit trees and blobs not directly
+referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
+`HEAD` in the `pending` list.)
+
+First, we'll need to `#include "list-objects-filter-options.h`" and set up the
+`struct list_objects_filter_options` at the top of the function.
+
+----
+static void walken_object_walk(struct rev_info *rev)
+{
+	struct list_objects_filter_options filter_options = {};
+
+	...
+----
+
+For now, we are not going to track the omitted objects, so we'll replace those
+parameters with `NULL`. For the sake of simplicity, we'll add a simple
+build-time branch to use our filter or not. Replace the line calling
+`traverse_commit_list()` with the following, which will remind us which kind of
+walk we've just performed:
+
+----
+	if (0) {
+		/* Unfiltered: */
+		trace_printf(_("Unfiltered object walk.\n"));
+		traverse_commit_list(rev, walken_show_commit,
+				walken_show_object, NULL);
+	} else {
+		trace_printf(
+			_("Filtered object walk with filterspec 'tree:1'.\n"));
+		parse_list_objects_filter(&filter_options, "tree:1");
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, NULL);
+	}
+----
+
+`struct list_objects_filter_options` is usually built directly from a command
+line argument, so the module provides an easy way to build one from a string.
+Even though we aren't taking user input right now, we can still build one with
+a hardcoded string using `parse_list_objects_filter()`.
+
+With the filter spec "tree:1", we are expecting to see _only_ the root tree for
+each commit; therefore, the tree object count should be less than or equal to
+the number of commits. (For an example of why that's true: `git commit --revert`
+points to the same tree object as its grandparent.)
+
+=== Counting Omitted Objects
+
+We also have the capability to enumerate all objects which were omitted by a
+filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
+`traverse_commit_list_filtered()` to populate the `omitted` list means that our
+revision walk does not perform any better than an unfiltered revision walk; all
+reachable objects are walked in order to populate the list.
+
+First, add the `struct oidset` and related items we will use to iterate it:
+
+----
+static void walken_object_walk(
+	...
+
+	struct oidset omitted;
+	struct oidset_iter oit;
+	struct object_id *oid = NULL;
+	int omitted_count = 0;
+	oidset_init(&omitted, 0);
+
+	...
+----
+
+Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
+object:
+
+----
+	...
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, &omitted);
+
+	...
+----
+
+Then, after your traversal, the `oidset` traversal is pretty straightforward.
+Count all the objects within and modify the print statement:
+
+----
+	/* Count the omitted objects. */
+	oidset_iter_init(&omitted, &oit);
+
+	while ((oid = oidset_iter_next(&oit)))
+		omitted_count++;
+
+	printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n",
+		commit_count, blob_count, tag_count, tree_count, omitted_count);
+----
+
+By running your walk with and without the filter, you should find that the total
+object count in each case is identical. You can also time each invocation of
+the `walken` subcommand, with and without `omitted` being passed in, to confirm
+to yourself the runtime impact of tracking all omitted objects.
+
+=== Changing the Order
+
+Finally, let's demonstrate that you can also reorder walks of all objects, not
+just walks of commits. First, we'll make our handlers chattier - modify
+`walken_show_commit()` and `walken_show_object()` to print the object as they
+go:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
+	commit_count++;
+}
+
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
+
+	...
+}
+----
+
+NOTE: Since we will be examining this output directly as humans, we'll use
+`trace_printf()` here. Additionally, since this change introduces a significant
+number of printed lines, using `trace_printf()` will allow us to easily silence
+those lines without having to recompile.
+
+(Leave the counter increment logic in place.)
+
+With only that change, run again (but save yourself some scrollback):
+
+----
+$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
+----
+
+Take a look at the top commit with `git show` and the object ID you printed; it
+should be the same as the output of `git show HEAD`.
+
+Next, let's change a setting on our `struct rev_info` within
+`walken_object_walk()`. Find where you're changing the other settings on `rev`,
+such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
+`reverse` setting at the bottom:
+
+----
+	...
+
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+	rev->exclude_promisor_objects = 1;
+	rev->reverse = 1;
+
+	...
+----
+
+Now, run again, but this time, let's grab the last handful of objects instead
+of the first handful:
+
+----
+$ make
+$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
+----
+
+The last commit object given should have the same OID as the one we saw at the
+top before, and running `git show <oid>` with that OID should give you again
+the same results as `git show HEAD`. Furthermore, if you run and examine the
+first ten lines again (with `head` instead of `tail` like we did before applying
+the `reverse` setting), you should see that now the first commit printed is the
+initial commit, `e83c5163`.
+
+== Wrapping Up
+
+Let's review. In this tutorial, we:
+
+- Built a commit walk from the ground up
+- Enabled a grep filter for that commit walk
+- Changed the sort order of that filtered commit walk
+- Built an object walk (tags, commits, trees, and blobs) from the ground up
+- Learned how to add a filter-spec to an object walk
+- Changed the display order of the filtered object walk
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 00/13] example implementation of revwalk tutorial
  2019-07-01 20:19   ` [PATCH v3] documentation: add tutorial for revision walking Emily Shaffer
@ 2019-07-01 20:20     ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 01/13] walken: add infrastructure for revwalk demo Emily Shaffer
                         ` (13 more replies)
  2019-07-24 23:11     ` [PATCH v3] documentation: add tutorial for revision walking Josh Steadmon
                       ` (2 subsequent siblings)
  3 siblings, 14 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Since v2, mostly reworded comments, plus fixed the issues mentioned in
the tutorial itself. Thanks Eric for the review.

Emily Shaffer (13):
  walken: add infrastructure for revwalk demo
  walken: add usage to enable -h
  walken: add placeholder to initialize defaults
  walken: add handler to git_config
  walken: configure rev_info and prepare for walk
  walken: perform our basic revision walk
  walken: filter for authors from gmail address
  walken: demonstrate various topographical sorts
  walken: demonstrate reversing a revision walk list
  walken: add unfiltered object walk from HEAD
  walken: add filtered object walk
  walken: count omitted objects
  walken: reverse the object walk order

 Makefile         |   1 +
 builtin.h        |   1 +
 builtin/walken.c | 297 +++++++++++++++++++++++++++++++++++++++++++++++
 git.c            |   1 +
 4 files changed, 300 insertions(+)
 create mode 100644 builtin/walken.c

-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 01/13] walken: add infrastructure for revwalk demo
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 02/13] walken: add usage to enable -h Emily Shaffer
                         ` (12 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Begin to add scaffolding for `git walken`, a toy command which we will
teach to perform a number of revision walks, in order to demonstrate the
mechanics of revision walking for developers new to the Git project.

This commit is the beginning of an educational series which correspond
to the tutorial in Documentation/MyFirstRevWalk.txt.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I64297621919412f54701e111366e99c4ef0feae3
---
 Makefile         |  1 +
 builtin.h        |  1 +
 builtin/walken.c | 18 ++++++++++++++++++
 3 files changed, 20 insertions(+)
 create mode 100644 builtin/walken.c

diff --git a/Makefile b/Makefile
index f58bf14c7b..5bac1dbf8d 100644
--- a/Makefile
+++ b/Makefile
@@ -1137,6 +1137,7 @@ BUILTIN_OBJS += builtin/var.o
 BUILTIN_OBJS += builtin/verify-commit.o
 BUILTIN_OBJS += builtin/verify-pack.o
 BUILTIN_OBJS += builtin/verify-tag.o
+BUILTIN_OBJS += builtin/walken.o
 BUILTIN_OBJS += builtin/worktree.o
 BUILTIN_OBJS += builtin/write-tree.o
 
diff --git a/builtin.h b/builtin.h
index ec7e0954c4..c919736c36 100644
--- a/builtin.h
+++ b/builtin.h
@@ -242,6 +242,7 @@ int cmd_var(int argc, const char **argv, const char *prefix);
 int cmd_verify_commit(int argc, const char **argv, const char *prefix);
 int cmd_verify_tag(int argc, const char **argv, const char *prefix);
 int cmd_version(int argc, const char **argv, const char *prefix);
+int cmd_walken(int argc, const char **argv, const char *prefix);
 int cmd_whatchanged(int argc, const char **argv, const char *prefix);
 int cmd_worktree(int argc, const char **argv, const char *prefix);
 int cmd_write_tree(int argc, const char **argv, const char *prefix);
diff --git a/builtin/walken.c b/builtin/walken.c
new file mode 100644
index 0000000000..db3ca50b04
--- /dev/null
+++ b/builtin/walken.c
@@ -0,0 +1,18 @@
+/*
+ * "git walken"
+ *
+ * Part of the "My First Revision Walk" tutorial.
+ */
+
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	/*
+	 * This line is "human-readable" and we are writing a plumbing command,
+	 * so we localize it and use the trace library to print only when
+	 * the GIT_TRACE environment variable is set.
+	 */
+	trace_printf(_("cmd_walken incoming...\n"));
+	return 0;
+}
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 02/13] walken: add usage to enable -h
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 01/13] walken: add infrastructure for revwalk demo Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 03/13] walken: add placeholder to initialize defaults Emily Shaffer
                         ` (11 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

It's expected that Git commands support '-h' in order to provide a
consistent user experience (and this expectation is enforced by the
test suite). '-h' is captured by parse_options() by default; in order to
support this flag, we add a short usage text to walken.c and invoke
parse_options().

With this change, we can now add cmd_walken to the builtins set and
expect tests to pass, so we'll do so - cmd_walken is now open for
business.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I2919dc1efadb82acb335617ea24371c84b03bbce
---
 builtin/walken.c | 21 +++++++++++++++++++++
 git.c            |  1 +
 2 files changed, 22 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index db3ca50b04..dd55f3b350 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -5,9 +5,30 @@
  */
 
 #include "builtin.h"
+#include "parse-options.h"
+
 
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
+	/*
+	 * All builtins are expected to provide a usage to provide a consistent user
+	 * experience.
+	 */
+	const char * const walken_usage[] = {
+		N_("git walken"),
+		NULL,
+	};
+
+	struct option options[] = {
+		OPT_END()
+	};
+
+	/*
+	 * parse_options() handles showing usage if incorrect options are
+	 * provided, or if '-h' is passed.
+	 */
+	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
+
 	/*
 	 * This line is "human-readable" and we are writing a plumbing command,
 	 * so we localize it and use the trace library to print only when
diff --git a/git.c b/git.c
index c2eec470c9..2a7fb9714f 100644
--- a/git.c
+++ b/git.c
@@ -601,6 +601,7 @@ static struct cmd_struct commands[] = {
 	{ "verify-pack", cmd_verify_pack },
 	{ "verify-tag", cmd_verify_tag, RUN_SETUP },
 	{ "version", cmd_version },
+	{ "walken", cmd_walken, RUN_SETUP },
 	{ "whatchanged", cmd_whatchanged, RUN_SETUP },
 	{ "worktree", cmd_worktree, RUN_SETUP | NO_PARSEOPT },
 	{ "write-tree", cmd_write_tree, RUN_SETUP },
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 03/13] walken: add placeholder to initialize defaults
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 01/13] walken: add infrastructure for revwalk demo Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 02/13] walken: add usage to enable -h Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 04/13] walken: add handler to git_config Emily Shaffer
                         ` (10 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Eventually, we will want a good place to initialize default variables
for use during our revision walk(s) in `git walken`. For now, there's
nothing to do here, but let's add the scaffolding so that it's easy to
tell where to put the setup later on.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index dd55f3b350..19657b5e31 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -8,6 +8,19 @@
 #include "parse-options.h"
 
 
+/*
+ * Within init_walken_defaults() we can call into other useful defaults to set
+ * in the global scope or on the_repository. It's okay to borrow from other
+ * functions which are doing something relatively similar to yours.
+ */
+static void init_walken_defaults(void)
+{
+	/*
+	 * We don't actually need the same components `git log` does; leave this
+	 * empty for now.
+	 */
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
 	/*
@@ -29,6 +42,8 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 	 */
 	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
 
+	init_walken_defaults();
+
 	/*
 	 * This line is "human-readable" and we are writing a plumbing command,
 	 * so we localize it and use the trace library to print only when
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 04/13] walken: add handler to git_config
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
                         ` (2 preceding siblings ...)
  2019-07-01 20:20       ` [RFC PATCH v3 03/13] walken: add placeholder to initialize defaults Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 05/13] walken: configure rev_info and prepare for walk Emily Shaffer
                         ` (9 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

For now, we have no configuration options we want to set up for
ourselves, but in the future we may need to. At the very least, we
should invoke git_default_config() for each config option; we will do so
inside of a skeleton config callback so that we know where to add
configuration handling later on when we need it.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 19657b5e31..e53c42ea18 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -5,6 +5,7 @@
  */
 
 #include "builtin.h"
+#include "config.h"
 #include "parse-options.h"
 
 
@@ -16,11 +17,36 @@
 static void init_walken_defaults(void)
 {
 	/*
-	 * We don't actually need the same components `git log` does; leave this
-	 * empty for now.
+	 * We don't use any other components or have settings to initialize, so
+	 * leave this empty.
 	 */
 }
 
+/*
+ * This method will be called back by git_config(). It is used to gather values
+ * from the configuration files available to Git.
+ *
+ * Each time git_config() finds a configuration file entry, it calls this
+ * callback. Then, this function should compare it to entries which concern us,
+ * and make settings changes as necessary.
+ *
+ * If we are called with a config setting we care about, we should use one of
+ * the helpers which exist in config.h to pull out the value for ourselves, i.e.
+ * git_config_string(...) or git_config_bool(...).
+ *
+ * If we don't match anything, we should pass it along to another stakeholder
+ * who may otherwise care - in log's case, grep, gpg, and diff-ui. For our case,
+ * we'll ignore everybody else.
+ */
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	/*
+	 * For now, we don't have any custom configuration, so fall back on the
+	 * default config.
+	 */
+	return git_default_config(var, value, cb);
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
 	/*
@@ -44,6 +70,8 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 
 	init_walken_defaults();
 
+	git_config(git_walken_config, NULL);
+
 	/*
 	 * This line is "human-readable" and we are writing a plumbing command,
 	 * so we localize it and use the trace library to print only when
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 05/13] walken: configure rev_info and prepare for walk
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
                         ` (3 preceding siblings ...)
  2019-07-01 20:20       ` [RFC PATCH v3 04/13] walken: add handler to git_config Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 06/13] walken: perform our basic revision walk Emily Shaffer
                         ` (8 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

`struct rev_info` is what's used by the walk itself.
`repo_init_revisions()` initializes the struct; then we need to set it
up for the walk we want to perform, which is done in
`final_rev_info_setup()`.

The most important step here is adding the first object we want to walk
to the pending array. Here, we take the easy road and use
`add_head_to_pending()`; there is also a way to do it with
`setup_revision_opt()` and `setup_revisions()` which we demonstrate but
do not use. If we were to forget this step, the walk would do nothing -
the pending queue would be checked, determined to be empty, and the walk
would terminate immediately.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I76754b740227cf17a449f3f536dbbe37031e6f9a
---
 builtin/walken.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index e53c42ea18..333d9ecc5e 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -5,6 +5,7 @@
  */
 
 #include "builtin.h"
+#include "revision.h"
 #include "config.h"
 #include "parse-options.h"
 
@@ -22,6 +23,40 @@ static void init_walken_defaults(void)
 	 */
 }
 
+/*
+ * Perform configuration for commit walk here. Within this function we set a
+ * starting point, and can customize our walk in various ways.
+ */
+static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
+		struct rev_info *rev)
+{
+	/*
+	 * Optional:
+	 * setup_revision_opt is used to pass options to the setup_revisions()
+	 * call. It's got some special items for submodules and other types of
+	 * optimizations, but for now, we'll just point it to HEAD. First we
+	 * should make sure to reset it. This is useful for more complicated
+	 * stuff but a decent shortcut for the first pass is
+	 * add_head_to_pending().
+	 */
+
+	/*
+	 * struct setup_revision_opt opt;
+
+	 * memset(&opt, 0, sizeof(opt));
+	 * opt.def = "HEAD";
+	 * opt.revarg_opt = REVARG_COMMITTISH;
+	 * setup_revisions(argc, argv, rev, &opt);
+	 */
+
+	/* add the HEAD to pending so we can start */
+	add_head_to_pending(rev);
+
+	/* Let's force oneline format. */
+	get_commit_format("oneline", rev);
+	rev->verbose_header = 1;
+}
+
 /*
  * This method will be called back by git_config(). It is used to gather values
  * from the configuration files available to Git.
@@ -62,6 +97,8 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 		OPT_END()
 	};
 
+	struct rev_info rev;
+
 	/*
 	 * parse_options() handles showing usage if incorrect options are
 	 * provided, or if '-h' is passed.
@@ -72,6 +109,19 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 
 	git_config(git_walken_config, NULL);
 
+	/*
+	 * Time to set up the walk. repo_init_revisions sets up rev_info with
+	 * the defaults, but then you need to make some configuration settings
+	 * to make it do what's special about your walk.
+	 */
+	repo_init_revisions(the_repository, &rev, prefix);
+
+	/*
+	 * Before we do the walk, we need to set a starting point by giving it
+	 * something to go in `pending` - that happens in here
+	 */
+	final_rev_info_setup(argc, argv, prefix, &rev);
+
 	/*
 	 * This line is "human-readable" and we are writing a plumbing command,
 	 * so we localize it and use the trace library to print only when
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 06/13] walken: perform our basic revision walk
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
                         ` (4 preceding siblings ...)
  2019-07-01 20:20       ` [RFC PATCH v3 05/13] walken: configure rev_info and prepare for walk Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 07/13] walken: filter for authors from gmail address Emily Shaffer
                         ` (7 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Add the final steps needed and implement the walk loop itself. We add a
method walken_commit_walk() which performs the final setup to revision.c
and then iterates over commits from get_revision().

This basic walk only prints the subject line of each commit in the
history. It is nearly equivalent to `git log --oneline`.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: If6dc5f3c9d14df077b99e42806cf790c96191582
---
 builtin/walken.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 333d9ecc5e..f116bb6fca 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -6,8 +6,11 @@
 
 #include "builtin.h"
 #include "revision.h"
+#include "commit.h"
 #include "config.h"
 #include "parse-options.h"
+#include "pretty.h"
+#include "line-log.h"
 
 
 /*
@@ -82,6 +85,40 @@ static int git_walken_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
+/*
+ * walken_commit_walk() is invoked by cmd_walken() after initialization. It
+ * performs the actual commit walk.
+ */
+static void walken_commit_walk(struct rev_info *rev)
+{
+	struct commit *commit;
+	struct strbuf prettybuf = STRBUF_INIT;
+
+	/*
+	 * prepare_revision_walk() gets the final steps ready for a revision
+	 * walk. We check the return value for errors.
+	 */
+	if (prepare_revision_walk(rev)) {
+		die(_("revision walk setup failed"));
+	}
+
+	/*
+	 * Now we can start the real commit walk. get_revision() grabs the next
+	 * revision based on the contents of rev.
+	 */
+	while ((commit = get_revision(rev))) {
+		strbuf_reset(&prettybuf);
+		pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
+		/*
+		 * We expect this part of the output to be machine-parseable -
+		 * one commit message per line - so we send it to stdout.
+		 */
+		puts(prettybuf.buf);
+	}
+
+	strbuf_release(&prettybuf);
+}
+
 int cmd_walken(int argc, const char **argv, const char *prefix)
 {
 	/*
@@ -116,12 +153,17 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 	 */
 	repo_init_revisions(the_repository, &rev, prefix);
 
+	/* We can set our traversal flags here. */
+	rev.always_show_header = 1;
+
 	/*
 	 * Before we do the walk, we need to set a starting point by giving it
 	 * something to go in `pending` - that happens in here
 	 */
 	final_rev_info_setup(argc, argv, prefix, &rev);
 
+	walken_commit_walk(&rev);
+
 	/*
 	 * This line is "human-readable" and we are writing a plumbing command,
 	 * so we localize it and use the trace library to print only when
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 07/13] walken: filter for authors from gmail address
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
                         ` (5 preceding siblings ...)
  2019-07-01 20:20       ` [RFC PATCH v3 06/13] walken: perform our basic revision walk Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 08/13] walken: demonstrate various topographical sorts Emily Shaffer
                         ` (6 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

In order to demonstrate how to create grep filters for revision walks,
filter the walk performed by cmd_walken() to print only commits which
are authored by someone with a gmail address.

This commit demonstrates how to append a grep pattern to a
rev_info.grep_filter, to teach new contributors how to create their own
more generalized grep filters during revision walks.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index f116bb6fca..a600f88cf6 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -11,6 +11,7 @@
 #include "parse-options.h"
 #include "pretty.h"
 #include "line-log.h"
+#include "grep.h"
 
 
 /*
@@ -20,10 +21,8 @@
  */
 static void init_walken_defaults(void)
 {
-	/*
-	 * We don't use any other components or have settings to initialize, so
-	 * leave this empty.
-	 */
+	/* Needed by our grep filter. */
+	init_grep_defaults(the_repository);
 }
 
 /*
@@ -55,6 +54,10 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
 	/* add the HEAD to pending so we can start */
 	add_head_to_pending(rev);
 
+	/* Apply a 'grep' pattern to the 'author' header. */
+	append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, "gmail");
+	compile_grep_patterns(&rev->grep_filter);
+
 	/* Let's force oneline format. */
 	get_commit_format("oneline", rev);
 	rev->verbose_header = 1;
@@ -78,10 +81,7 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
  */
 static int git_walken_config(const char *var, const char *value, void *cb)
 {
-	/*
-	 * For now, we don't have any custom configuration, so fall back on the
-	 * default config.
-	 */
+	grep_config(var, value, cb);
 	return git_default_config(var, value, cb);
 }
 
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 08/13] walken: demonstrate various topographical sorts
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
                         ` (6 preceding siblings ...)
  2019-07-01 20:20       ` [RFC PATCH v3 07/13] walken: filter for authors from gmail address Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 09/13] walken: demonstrate reversing a revision walk list Emily Shaffer
                         ` (5 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Order the revision walk by author or commit dates to demonstrate how to
apply topo_sort to a revision walk.

While following the tutorial, new contributors are guided to run a walk
with each sort and compare the results.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I7ce2f3e8a77c42001293637ae209087afec4ce2c
---
 builtin/walken.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index a600f88cf6..b334f61e69 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -61,6 +61,13 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
 	/* Let's force oneline format. */
 	get_commit_format("oneline", rev);
 	rev->verbose_header = 1;
+	
+	/* Let's play with the sort order. */
+	rev->topo_order = 1;
+
+	/* Toggle between these and observe the difference. */
+	rev->sort_order = REV_SORT_BY_COMMIT_DATE;
+	/* rev->sort_order = REV_SORT_BY_AUTHOR_DATE; */
 }
 
 /*
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 09/13] walken: demonstrate reversing a revision walk list
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
                         ` (7 preceding siblings ...)
  2019-07-01 20:20       ` [RFC PATCH v3 08/13] walken: demonstrate various topographical sorts Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 10/13] walken: add unfiltered object walk from HEAD Emily Shaffer
                         ` (4 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

The final installment in the tutorial about sorting revision walk
outputs. This commit reverses the commit list, so that we see newer
commits last (handy since we aren't using a pager).

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index b334f61e69..03a50158fb 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -61,6 +61,9 @@ static void final_rev_info_setup(int argc, const char **argv, const char *prefix
 	/* Let's force oneline format. */
 	get_commit_format("oneline", rev);
 	rev->verbose_header = 1;
+
+	/* Reverse the order */
+	rev->reverse = 1;
 	
 	/* Let's play with the sort order. */
 	rev->topo_order = 1;
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 10/13] walken: add unfiltered object walk from HEAD
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
                         ` (8 preceding siblings ...)
  2019-07-01 20:20       ` [RFC PATCH v3 09/13] walken: demonstrate reversing a revision walk list Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 11/13] walken: add filtered object walk Emily Shaffer
                         ` (3 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Provide a demonstration of a revision walk which traverses all types of
object, not just commits. This type of revision walk is used for
operations such as creating packfiles and performing fetches or clones,
so it's useful to teach new developers how it works. For starters, only
demonstrate the unfiltered version, as this will make the tutorial
easier to follow.

This commit is part of a tutorial on revision walking.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: If3b11652ba011b28d29b1c3984dac4a3f80a5f53
---
 builtin/walken.c | 91 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 85 insertions(+), 6 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 03a50158fb..b613102cfb 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -11,9 +11,15 @@
 #include "parse-options.h"
 #include "pretty.h"
 #include "line-log.h"
+#include "list-objects.h"
 #include "grep.h"
 
 
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+
 /*
  * Within init_walken_defaults() we can call into other useful defaults to set
  * in the global scope or on the_repository. It's okay to borrow from other
@@ -95,6 +101,74 @@ static int git_walken_config(const char *var, const char *value, void *cb)
 	return git_default_config(var, value, cb);
 }
 
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	commit_count++;
+}
+
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	switch (obj->type) {
+	case OBJ_TREE:
+		tree_count++;
+		break;
+	case OBJ_BLOB:
+		blob_count++;
+		break;
+	case OBJ_TAG:
+		tag_count++;
+		break;
+	case OBJ_COMMIT:
+		/*
+		 * BUG() is used to warn developers when they've made a change
+		 * which breaks some relied-upon behavior of Git. In this case,
+		 * we're telling developers that we don't expect commits to be
+		 * routed as objects during an object walk. BUG() messages
+		 * should not be localized.
+		 */
+		BUG("unexpected commit object in walken_show_object\n");
+	default:
+		/*
+		 * This statement will only be hit if a new object type is added
+		 * to Git; we BUG() to tell developers that the new object type
+		 * needs to be handled and counted here.
+		 */
+		BUG("unexpected object type %s in walken_show_object\n"),
+				type_name(obj->type);
+	}
+}
+
+/*
+ * walken_object_walk() is invoked by cmd_walken() after initialization. It does
+ * a walk of all object types.
+ */
+static void walken_object_walk(struct rev_info *rev)
+{
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+	rev->exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+
+	commit_count = 0;
+	tag_count = 0;
+	blob_count = 0;
+	tree_count = 0;
+
+	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+	/*
+	 * This print statement is designed to be script-parseable. Script
+	 * authors will rely on the output not to change, so we will not
+	 * localize this string. It will go to stdout directly.
+	 */
+	printf("commits %d\n blobs %d\n tags %d\n trees %d\n", commit_count,
+	       blob_count, tag_count, tree_count);
+}
+
 /*
  * walken_commit_walk() is invoked by cmd_walken() after initialization. It
  * performs the actual commit walk.
@@ -166,13 +240,18 @@ int cmd_walken(int argc, const char **argv, const char *prefix)
 	/* We can set our traversal flags here. */
 	rev.always_show_header = 1;
 
-	/*
-	 * Before we do the walk, we need to set a starting point by giving it
-	 * something to go in `pending` - that happens in here
-	 */
-	final_rev_info_setup(argc, argv, prefix, &rev);
 
-	walken_commit_walk(&rev);
+	if (1) {
+		add_head_to_pending(&rev);
+		walken_object_walk(&rev);
+	} else {
+		/*
+		 * Before we do the walk, we need to set a starting point by giving it
+		 * something to go in `pending` - that happens in here
+		 */
+		final_rev_info_setup(argc, argv, prefix, &rev);
+		walken_commit_walk(&rev);
+	}
 
 	/*
 	 * This line is "human-readable" and we are writing a plumbing command,
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 11/13] walken: add filtered object walk
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
                         ` (9 preceding siblings ...)
  2019-07-01 20:20       ` [RFC PATCH v3 10/13] walken: add unfiltered object walk from HEAD Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 12/13] walken: count omitted objects Emily Shaffer
                         ` (2 subsequent siblings)
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Demonstrate how filter specs can be used when performing a revision walk
of all object types. In this case, tree depth is used. Contributors who
are following the revision walking tutorial will be encouraged to run
the revision walk with and without the filter in order to compare the
number of objects seen in each case.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I6d22ba153c1afbc780c261c47f1fa03ea478b5ed
---
 builtin/walken.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index b613102cfb..7b46377a2e 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -12,6 +12,7 @@
 #include "pretty.h"
 #include "line-log.h"
 #include "list-objects.h"
+#include "list-objects-filter-options.h"
 #include "grep.h"
 
 
@@ -144,6 +145,8 @@ static void walken_show_object(struct object *obj, const char *str, void *buf)
  */
 static void walken_object_walk(struct rev_info *rev)
 {
+	struct list_objects_filter_options filter_options = {};
+
 	rev->tree_objects = 1;
 	rev->blob_objects = 1;
 	rev->tag_objects = 1;
@@ -158,7 +161,24 @@ static void walken_object_walk(struct rev_info *rev)
 	blob_count = 0;
 	tree_count = 0;
 
-	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+	if (0) {
+		/* Unfiltered: */
+		trace_printf(_("Unfiltered object walk.\n"));
+		traverse_commit_list(rev, walken_show_commit,
+				walken_show_object, NULL);
+	} else {
+		trace_printf(_("Filtered object walk with filterspec "
+				"'tree:1'.\n"));
+		/*
+		 * We can parse a tree depth of 1 to demonstrate the kind of
+		 * filtering that could occur during various operations (see
+		 * `git help rev-list` and read the entry on `--filter`).
+		 */
+		parse_list_objects_filter(&filter_options, "tree:1");
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, NULL);
+	}
 
 	/*
 	 * This print statement is designed to be script-parseable. Script
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 12/13] walken: count omitted objects
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
                         ` (10 preceding siblings ...)
  2019-07-01 20:20       ` [RFC PATCH v3 11/13] walken: add filtered object walk Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-01 20:20       ` [RFC PATCH v3 13/13] walken: reverse the object walk order Emily Shaffer
  2019-07-25  9:25       ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Johannes Schindelin
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

It may be illuminating to see which objects were not included within a
given filter. This also demonstrates, since filter-spec "tree:1" is
used, that the 'omitted' list contains all objects which are omitted,
not just the first objects which were omitted - that is, it continues to
dereference omitted trees and commits.

This is part of a tutorial on performing revision walks.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
 builtin/walken.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/builtin/walken.c b/builtin/walken.c
index 7b46377a2e..1638f679f2 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -146,6 +146,11 @@ static void walken_show_object(struct object *obj, const char *str, void *buf)
 static void walken_object_walk(struct rev_info *rev)
 {
 	struct list_objects_filter_options filter_options = {};
+	struct oidset omitted;
+	struct oidset_iter oit;
+	struct object_id *oid = NULL;
+	int omitted_count = 0;
+	oidset_init(&omitted, 0);
 
 	rev->tree_objects = 1;
 	rev->blob_objects = 1;
@@ -180,13 +185,19 @@ static void walken_object_walk(struct rev_info *rev)
 			walken_show_commit, walken_show_object, NULL, NULL);
 	}
 
+	/* Count the omitted objects. */
+	oidset_iter_init(&omitted, &oit);
+
+	while ((oid = oidset_iter_next(&oit)))
+		omitted_count++;
+
 	/*
 	 * This print statement is designed to be script-parseable. Script
 	 * authors will rely on the output not to change, so we will not
 	 * localize this string. It will go to stdout directly.
 	 */
-	printf("commits %d\n blobs %d\n tags %d\n trees %d\n", commit_count,
-	       blob_count, tag_count, tree_count);
+	printf("commits %d\n blobs %d\n tags %d\n trees %d omitted %d\n",
+	       commit_count, blob_count, tag_count, tree_count, omitted_count);
 }
 
 /*
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* [RFC PATCH v3 13/13] walken: reverse the object walk order
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
                         ` (11 preceding siblings ...)
  2019-07-01 20:20       ` [RFC PATCH v3 12/13] walken: count omitted objects Emily Shaffer
@ 2019-07-01 20:20       ` Emily Shaffer
  2019-07-25  9:25       ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Johannes Schindelin
  13 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-07-01 20:20 UTC (permalink / raw)
  To: git; +Cc: Emily Shaffer

Demonstrate that just like commit walks, object walks can have their
order reversed. Additionally, add verbose logging of objects encountered
in order to let contributors prove to themselves that the walk has
actually been reversed. With this commit, `git walken` becomes extremely
chatty - it's recommended to pipe the output through `head` or `tail` or
to redirect it into a file.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Change-Id: I91883b209a61ae4d87855878291e487fe36220c4
---
 builtin/walken.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/walken.c b/builtin/walken.c
index 1638f679f2..2eb12f92ed 100644
--- a/builtin/walken.c
+++ b/builtin/walken.c
@@ -104,11 +104,13 @@ static int git_walken_config(const char *var, const char *value, void *cb)
 
 static void walken_show_commit(struct commit *cmt, void *buf)
 {
+	printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
 	commit_count++;
 }
 
 static void walken_show_object(struct object *obj, const char *str, void *buf)
 {
+	printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
 	switch (obj->type) {
 	case OBJ_TREE:
 		tree_count++;
@@ -157,6 +159,7 @@ static void walken_object_walk(struct rev_info *rev)
 	rev->tag_objects = 1;
 	rev->tree_blobs_in_commit_order = 1;
 	rev->exclude_promisor_objects = 1;
+	rev->reverse = 1;
 
 	if (prepare_revision_walk(rev))
 		die(_("revision walk setup failed"));
-- 
2.22.0.410.gd8fdbe21b5-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-06-20 21:06       ` Emily Shaffer
@ 2019-07-13  0:39         ` Josh Steadmon
  2019-07-16  0:06           ` Emily Shaffer
  0 siblings, 1 reply; 102+ messages in thread
From: Josh Steadmon @ 2019-07-13  0:39 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Junio C Hamano, git

On 2019.06.20 14:06, Emily Shaffer wrote:
> On Wed, Jun 19, 2019 at 08:17:29AM -0700, Junio C Hamano wrote:
> > Emily Shaffer <emilyshaffer@google.com> writes:
> > 
> > > Maybe there's a case for storing them as a set of patch files that are
> > > revision-controlled somewhere within Documentation/? There was some
> > > discussion on the IRC a few weeks ago about trying to organize these
> > > tutorials into their own directory to form a sort of "Git Contribution
> > > 101" course, maybe it makes sense to store there?
> > >
> > >   Documentation/contributing/myfirstcontrib/MyFirstContrib.txt
> > >   Documentation/contributing/myfirstcontrib/sample/*.patch
> > >   Documentation/contributing/myfirstrevwalk/MyFirstRevWalk.txt
> > >   Documentation/contributing/myfirstrevwalk/sample/*.patch
> > >
> > > I don't love the idea of maintaining text patches with the expectation
> > > that they should cleanly apply always,...
> > 
> > Well, I actually think the above organization does match the intent
> > of the "My first contribution codelab" perfectly.  When the codebase,
> > the workflow used by the project, and/or the coding or documentation
> > guideline gets updated, the text that documents how to contribute to
> > the project as well as the sample patches must be updated to match
> > the updated reality.
> > 
> > I agree with you that maintaining the *.patch files to always
> > cleanly apply is less than ideal.  A topic to update the sample
> > patches and tutorial text may be competing with another topic that
> > updates the very API the tutorials are teaching, and the sample
> > patches may not apply cleanly when two topics are merged together,
> > even if the "update sample patches and tutorial text" topic does
> > update them to match the API at the tip of the topic branch itself.
> > One thing we _could_ do is to pin the target version of the codebase
> > for the sake of tutorial.  IOW, the sample/*.patch may not apply
> > cleanly to the version of the tree these patches were taken from,
> > but would always apply cleanly to the most recent released version
> > before the last update to the tutorial, or something like that.
> > 
> > Also having to review the patch to sample/*.patch files will be
> > unpleasant.
> 
> I wonder if we can ease some pain for both of the above issues by
> including some scripts to "inflate" the patch files into a topic branch,
> or figure out some more easily-reviewed (but more complicated, I
> suppose) method for sending updates to the sample/*.patch files.
> 
> Imagining workflows like this:
> 
> Doing the tutorial:
>  - In worktree a/.
>  - Run a magic script which creates a worktree with the sample code, b/.
>  - Read through a/Documentation/MyFirstContribution.txt and generate
>    a/builtins/psuh.c, referring to b/builtins/psuh.c if confused.
> 
> Rebasing the tutorial patches:
>  - In worktree a/.
>  - Run a magic script which checks out a new branch at the last known
>    good base for the patchset, then applies all the patches.
>  - Now faced with, likely, a topic branch based on v<n-1> (where n is
>    latest release).
>  - `git rebase v<n> -x (make && ./bin-wrappers/git psuh)`
>  - Interactively fix conflicts
>  - Run a script to generate a magic interdiff from the old version of
>    patches
>  - Mail out magic interdiff to list and get approval
>  - (Maybe maintainer does this when interdiff is happy? Maybe updater
>    does this when review looks good?) Run a magic script to regenerate
>    patches from rebased branch, and note somewhere they are based on
>    v<n>
>  - Mail sample/*.patch (based on v<n>) to list (if maintainer rolled the
>    patches after interdiff approval, this step can be skipped)
> 
> (This seems to still be a lot of steps, even with the magic script..)
> 
> Alternatively, for the same process:
>  Updater: Run a magic script to create topic branch based on v<n-1>
>    (like before)
>  U: `git rebase v<n> -x (make && ./bin-wrappers/git psuh)`
>  U: Interactively fix conflicts
>  U: Run a script to turn topic branch back into sample/*.patch
>  U: Send email with changes to sample/*.patch (this will be ugly and
>     unreadable) - message ID <M1>
>  Reviewer: Run a magic script, providing <M1> argument, which grabs the
>     diff-of-.patch and generates an interdiff, or a topic branch based
>     on v<n>
>  R: Send comments explaining where issue is (tricky to find where to
>     inline in the diff-of-.patch)
>  U: Reroll diff-of-.patch email
>  R: Accepts
>  Maintainer: Applies diff-of-.patch email normally
> 
>  I suppose for the first suggestion, there ends up being quite a lot of
>  onus on the maintainer, and a lot of trust that there is no difference
>  between the RFC easy-to-read interdiff patchset. For the second
>  suggestion, there ends up being onus on the reviewers to run some
>  magical script. Maybe we can split the difference by expecting Updater
>  to provide the interdiff below the --- line? Maybe in practice the
>  diff-of-.patch isn't so unreadable, if it's only minor changes needed
>  to bring the tutorial up to latest?
> 
>  I'm not sure there's a way to make this totally painless using email
>  tools.

Random thought about the "magic scripts": if we keep an mbox instead of
a directory of *.patch files, then it seems like git-format-patch and
git-am would solve the bulk of this. I don't think dealing with
diffs-of-patches-in-mbox is much worse than dealing with
diffs-of-patches-in-multiple-files. And for the "Doing the tutorial"
workflow, it nudges the new contributor to learn git-am.

But I guess the hard part here is the reviewing diffs-of-diffs part.
I'm leaning towards the second option here; I personally would not feel
too troubled as a reviewer by having to run an extra script. And as you
say, diff-of-diffs may not be so bad in practice. Reviewers already see
these whenever someone includes a range-diff in their v>=2 emails.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-07-13  0:39         ` Josh Steadmon
@ 2019-07-16  0:06           ` Emily Shaffer
  2019-07-16 17:24             ` Junio C Hamano
  0 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-07-16  0:06 UTC (permalink / raw)
  To: Josh Steadmon, Junio C Hamano, git

On Fri, Jul 12, 2019 at 05:39:48PM -0700, Josh Steadmon wrote:
> On 2019.06.20 14:06, Emily Shaffer wrote:
> > On Wed, Jun 19, 2019 at 08:17:29AM -0700, Junio C Hamano wrote:
> > > Emily Shaffer <emilyshaffer@google.com> writes:
> > > 
> > > > Maybe there's a case for storing them as a set of patch files that are
> > > > revision-controlled somewhere within Documentation/? There was some
> > > > discussion on the IRC a few weeks ago about trying to organize these
> > > > tutorials into their own directory to form a sort of "Git Contribution
> > > > 101" course, maybe it makes sense to store there?
> > > >
> > > >   Documentation/contributing/myfirstcontrib/MyFirstContrib.txt
> > > >   Documentation/contributing/myfirstcontrib/sample/*.patch
> > > >   Documentation/contributing/myfirstrevwalk/MyFirstRevWalk.txt
> > > >   Documentation/contributing/myfirstrevwalk/sample/*.patch
> > > >
> > > > I don't love the idea of maintaining text patches with the expectation
> > > > that they should cleanly apply always,...
> > > 
> > > Well, I actually think the above organization does match the intent
> > > of the "My first contribution codelab" perfectly.  When the codebase,
> > > the workflow used by the project, and/or the coding or documentation
> > > guideline gets updated, the text that documents how to contribute to
> > > the project as well as the sample patches must be updated to match
> > > the updated reality.
> > > 
> > > I agree with you that maintaining the *.patch files to always
> > > cleanly apply is less than ideal.  A topic to update the sample
> > > patches and tutorial text may be competing with another topic that
> > > updates the very API the tutorials are teaching, and the sample
> > > patches may not apply cleanly when two topics are merged together,
> > > even if the "update sample patches and tutorial text" topic does
> > > update them to match the API at the tip of the topic branch itself.
> > > One thing we _could_ do is to pin the target version of the codebase
> > > for the sake of tutorial.  IOW, the sample/*.patch may not apply
> > > cleanly to the version of the tree these patches were taken from,
> > > but would always apply cleanly to the most recent released version
> > > before the last update to the tutorial, or something like that.
> > > 
> > > Also having to review the patch to sample/*.patch files will be
> > > unpleasant.
> > 
> > I wonder if we can ease some pain for both of the above issues by
> > including some scripts to "inflate" the patch files into a topic branch,
> > or figure out some more easily-reviewed (but more complicated, I
> > suppose) method for sending updates to the sample/*.patch files.
> > 
> > Imagining workflows like this:
> > 
> > Doing the tutorial:
> >  - In worktree a/.
> >  - Run a magic script which creates a worktree with the sample code, b/.
> >  - Read through a/Documentation/MyFirstContribution.txt and generate
> >    a/builtins/psuh.c, referring to b/builtins/psuh.c if confused.
> > 
> > Rebasing the tutorial patches:
> >  - In worktree a/.
> >  - Run a magic script which checks out a new branch at the last known
> >    good base for the patchset, then applies all the patches.
> >  - Now faced with, likely, a topic branch based on v<n-1> (where n is
> >    latest release).
> >  - `git rebase v<n> -x (make && ./bin-wrappers/git psuh)`
> >  - Interactively fix conflicts
> >  - Run a script to generate a magic interdiff from the old version of
> >    patches
> >  - Mail out magic interdiff to list and get approval
> >  - (Maybe maintainer does this when interdiff is happy? Maybe updater
> >    does this when review looks good?) Run a magic script to regenerate
> >    patches from rebased branch, and note somewhere they are based on
> >    v<n>
> >  - Mail sample/*.patch (based on v<n>) to list (if maintainer rolled the
> >    patches after interdiff approval, this step can be skipped)
> > 
> > (This seems to still be a lot of steps, even with the magic script..)
> > 
> > Alternatively, for the same process:
> >  Updater: Run a magic script to create topic branch based on v<n-1>
> >    (like before)
> >  U: `git rebase v<n> -x (make && ./bin-wrappers/git psuh)`
> >  U: Interactively fix conflicts
> >  U: Run a script to turn topic branch back into sample/*.patch
> >  U: Send email with changes to sample/*.patch (this will be ugly and
> >     unreadable) - message ID <M1>
> >  Reviewer: Run a magic script, providing <M1> argument, which grabs the
> >     diff-of-.patch and generates an interdiff, or a topic branch based
> >     on v<n>
> >  R: Send comments explaining where issue is (tricky to find where to
> >     inline in the diff-of-.patch)
> >  U: Reroll diff-of-.patch email
> >  R: Accepts
> >  Maintainer: Applies diff-of-.patch email normally
> > 
> >  I suppose for the first suggestion, there ends up being quite a lot of
> >  onus on the maintainer, and a lot of trust that there is no difference
> >  between the RFC easy-to-read interdiff patchset. For the second
> >  suggestion, there ends up being onus on the reviewers to run some
> >  magical script. Maybe we can split the difference by expecting Updater
> >  to provide the interdiff below the --- line? Maybe in practice the
> >  diff-of-.patch isn't so unreadable, if it's only minor changes needed
> >  to bring the tutorial up to latest?
> > 
> >  I'm not sure there's a way to make this totally painless using email
> >  tools.
> 
> Random thought about the "magic scripts": if we keep an mbox instead of
> a directory of *.patch files, then it seems like git-format-patch and
> git-am would solve the bulk of this. I don't think dealing with
> diffs-of-patches-in-mbox is much worse than dealing with
> diffs-of-patches-in-multiple-files. And for the "Doing the tutorial"
> workflow, it nudges the new contributor to learn git-am.
> 
> But I guess the hard part here is the reviewing diffs-of-diffs part.
> I'm leaning towards the second option here; I personally would not feel
> too troubled as a reviewer by having to run an extra script. And as you
> say, diff-of-diffs may not be so bad in practice. Reviewers already see
> these whenever someone includes a range-diff in their v>=2 emails.

There was also some suggestion of instead checking in ed scripts or
similar to populate the changes. On one hand, it might be nicer, as
there aren't diff markers on the front of all the code... but on the
other hand, I'm not sure how many folks are familiar with ed (I know I'm
not) and it might be complex to indicate where to insert changes.

I have been in a position of reviewing diff-of-.patch in a past life,
albeit via Gerrit, and it's not the worst when the code is simple (as we
should always hope this example tutorial code would be).

 - Emily

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH] documentation: add tutorial for revision walking
  2019-07-16  0:06           ` Emily Shaffer
@ 2019-07-16 17:24             ` Junio C Hamano
  0 siblings, 0 replies; 102+ messages in thread
From: Junio C Hamano @ 2019-07-16 17:24 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Josh Steadmon, git

Emily Shaffer <emilyshaffer@google.com> writes:

> I have been in a position of reviewing diff-of-.patch in a past life,
> albeit via Gerrit, and it's not the worst when the code is simple (as we
> should always hope this example tutorial code would be).

I personally think a directory full of patch files is OK.  I am not
sure if they (together with this rev walk tutorial) belong to the
main part of the project, though.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3] documentation: add tutorial for revision walking
  2019-07-01 20:19   ` [PATCH v3] documentation: add tutorial for revision walking Emily Shaffer
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
@ 2019-07-24 23:11     ` Josh Steadmon
  2019-07-24 23:32     ` Jonathan Tan
  2019-08-06 23:19     ` [PATCH v4] " Emily Shaffer
  3 siblings, 0 replies; 102+ messages in thread
From: Josh Steadmon @ 2019-07-24 23:11 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git, Junio C Hamano, Eric Sunshine

On 2019.07.01 13:19, Emily Shaffer wrote:
> Existing documentation on revision walks seems to be primarily intended
> as a reference for those already familiar with the procedure. This
> tutorial attempts to give an entry-level guide to a couple of bare-bones
> revision walks so that new Git contributors can learn the concepts
> without having to wade through options parsing or special casing.
> 
> The target audience is a Git contributor who is just getting started
> with the concept of revision walking. The goal is to prepare this
> contributor to be able to understand and modify existing commands which
> perform revision walks more easily, although it will also prepare
> contributors to create new commands which perform walks.
> 
> The tutorial covers a basic overview of the structs involved during
> revision walk, setting up a basic commit walk, setting up a basic
> all-object walk, and adding some configuration changes to both walk
> types. It intentionally does not cover how to create new commands or
> search for options from the command line or gitconfigs.
> 
> There is an associated patchset at
> https://github.com/nasamuffin/git/tree/revwalk that contains a reference
> implementation of the code generated by this tutorial.
> 
> Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
> Helped-by: Eric Sunshine <sunshine@sunshineco.com>

This looks good to me; as a new Git developer I found it informative and
wish it had been present before I took my first look at the rev-walk
code.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3] documentation: add tutorial for revision walking
  2019-07-01 20:19   ` [PATCH v3] documentation: add tutorial for revision walking Emily Shaffer
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
  2019-07-24 23:11     ` [PATCH v3] documentation: add tutorial for revision walking Josh Steadmon
@ 2019-07-24 23:32     ` Jonathan Tan
  2019-08-06 23:10       ` Emily Shaffer
  2019-08-06 23:19     ` [PATCH v4] " Emily Shaffer
  3 siblings, 1 reply; 102+ messages in thread
From: Jonathan Tan @ 2019-07-24 23:32 UTC (permalink / raw)
  To: emilyshaffer; +Cc: git, gitster, sunshine, Jonathan Tan

Thanks - I think this is a useful guide to what can be a complicated
topic. It looks good overall; I just have some minor comments below.

> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index 76f2ecfc1b..91e5da67c4 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -78,6 +78,7 @@ SP_ARTICLES += $(API_DOCS)
>  
>  TECH_DOCS += MyFirstContribution
>  TECH_DOCS += SubmittingPatches
> +TECH_DOCS += MyFirstRevWalk

Any reason why this is not in alphabetical order?

> +Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
> +
> +----
> +extern int cmd_walken(int argc, const char **argv, const char *prefix);
> +----

builtin.h no longer has "extern", so we can delete it.

> +Add it to the `Makefile` near the line for `builtin\worktree.o`:
> +
> +----
> +BUILTIN_OBJS += builtin/walken.o
> +----

In the first line, change the backslash to a slash. (The line in
Makefile for "builtin/worktree.o" uses a forward slash as expected.)

> +NOTE: For a more exhaustive overview of the new command process, take a look at
> +`Documentation/MyFirstContribution.txt`.
> +
> +NOTE: A reference implementation can be found at TODO LINK.

I think you have a reference implementation at
https://github.com/nasamuffin/git/tree/revwalk?

> +We'll start by enabling all types of objects in the `struct rev_info`. Unless
> +you cloned or fetched your repository earlier with a filter,
> +`exclude_promisor_objects` is unlikely to make a difference, but we'll turn it
> +on just to make sure our lives are simple. We'll also turn on
> +`tree_blobs_in_commit_order`, which means that we will walk a commit's tree and
> +everything it points to immediately after we find each commit, as opposed to
> +waiting for the end and walking through all trees after the commit history has
> +been discovered. With the appropriate settings configured, we are ready to call
> +`prepare_revision_walk()`.
> +
> +----
> +static void walken_object_walk(struct rev_info *rev)
> +{
> +	rev->tree_objects = 1;
> +	rev->blob_objects = 1;
> +	rev->tag_objects = 1;
> +	rev->tree_blobs_in_commit_order = 1;
> +	rev->exclude_promisor_objects = 1;

Optional: I think we should not bother with exclude_promisor_objects. If
the user really cloned with a filter, then every object would be a
promisor object and the revision walk should output nothing, which is
very confusing.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v3 00/13] example implementation of revwalk tutorial
  2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
                         ` (12 preceding siblings ...)
  2019-07-01 20:20       ` [RFC PATCH v3 13/13] walken: reverse the object walk order Emily Shaffer
@ 2019-07-25  9:25       ` Johannes Schindelin
  2019-08-06 23:13         ` Emily Shaffer
  13 siblings, 1 reply; 102+ messages in thread
From: Johannes Schindelin @ 2019-07-25  9:25 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Mon, 1 Jul 2019, Emily Shaffer wrote:

> Since v2, mostly reworded comments, plus fixed the issues mentioned in
> the tutorial itself. Thanks Eric for the review.
>
> Emily Shaffer (13):
>   walken: add infrastructure for revwalk demo
>   walken: add usage to enable -h
>   walken: add placeholder to initialize defaults
>   walken: add handler to git_config
>   walken: configure rev_info and prepare for walk
>   walken: perform our basic revision walk
>   walken: filter for authors from gmail address
>   walken: demonstrate various topographical sorts
>   walken: demonstrate reversing a revision walk list
>   walken: add unfiltered object walk from HEAD
>   walken: add filtered object walk
>   walken: count omitted objects
>   walken: reverse the object walk order
>
>  Makefile         |   1 +
>  builtin.h        |   1 +
>  builtin/walken.c | 297 +++++++++++++++++++++++++++++++++++++++++++++++

Since this is not really intended to be an end user-facing command, I
think it should not become a built-in, to be carried into every Git
user's setup.

Instead, I would recommend to implement this as a test helper.

This would have the following advantages:

- it won't clutter the end user installations,

- it will still be compile-tested with every build (guaranteeing that
  the tutorial won't become stale over time as so many other tutorials),

- it really opens the door very wide to follow up with another tutorial
  to guide new contributors to write stellar regression tests.

Thanks,
Dscho

>  git.c            |   1 +
>  4 files changed, 300 insertions(+)
>  create mode 100644 builtin/walken.c
>
> --
> 2.22.0.410.gd8fdbe21b5-goog
>
>
>

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v3] documentation: add tutorial for revision walking
  2019-07-24 23:32     ` Jonathan Tan
@ 2019-08-06 23:10       ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-08-06 23:10 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, sunshine

On Wed, Jul 24, 2019 at 04:32:53PM -0700, Jonathan Tan wrote:
> Thanks - I think this is a useful guide to what can be a complicated
> topic. It looks good overall; I just have some minor comments below.
> 
> > diff --git a/Documentation/Makefile b/Documentation/Makefile
> > index 76f2ecfc1b..91e5da67c4 100644
> > --- a/Documentation/Makefile
> > +++ b/Documentation/Makefile
> > @@ -78,6 +78,7 @@ SP_ARTICLES += $(API_DOCS)
> >  
> >  TECH_DOCS += MyFirstContribution
> >  TECH_DOCS += SubmittingPatches
> > +TECH_DOCS += MyFirstRevWalk
> 
> Any reason why this is not in alphabetical order?

No reason, will fix.

> 
> > +Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
> > +
> > +----
> > +extern int cmd_walken(int argc, const char **argv, const char *prefix);
> > +----
> 
> builtin.h no longer has "extern", so we can delete it.

Done.

> 
> > +Add it to the `Makefile` near the line for `builtin\worktree.o`:
> > +
> > +----
> > +BUILTIN_OBJS += builtin/walken.o
> > +----
> 
> In the first line, change the backslash to a slash. (The line in
> Makefile for "builtin/worktree.o" uses a forward slash as expected.)

Done, not sure how this got in there. Thanks!

> 
> > +NOTE: For a more exhaustive overview of the new command process, take a look at
> > +`Documentation/MyFirstContribution.txt`.
> > +
> > +NOTE: A reference implementation can be found at TODO LINK.
> 
> I think you have a reference implementation at
> https://github.com/nasamuffin/git/tree/revwalk?

Yep, although it's not very fresh. I was hoping to wait for a way for us
to check in the reference implementation to Git source, although that
can wait and the off-project branch is maybe OK for now.

> 
> > +We'll start by enabling all types of objects in the `struct rev_info`. Unless
> > +you cloned or fetched your repository earlier with a filter,
> > +`exclude_promisor_objects` is unlikely to make a difference, but we'll turn it
> > +on just to make sure our lives are simple. We'll also turn on
> > +`tree_blobs_in_commit_order`, which means that we will walk a commit's tree and
> > +everything it points to immediately after we find each commit, as opposed to
> > +waiting for the end and walking through all trees after the commit history has
> > +been discovered. With the appropriate settings configured, we are ready to call
> > +`prepare_revision_walk()`.
> > +
> > +----
> > +static void walken_object_walk(struct rev_info *rev)
> > +{
> > +	rev->tree_objects = 1;
> > +	rev->blob_objects = 1;
> > +	rev->tag_objects = 1;
> > +	rev->tree_blobs_in_commit_order = 1;
> > +	rev->exclude_promisor_objects = 1;
> 
> Optional: I think we should not bother with exclude_promisor_objects. If
> the user really cloned with a filter, then every object would be a
> promisor object and the revision walk should output nothing, which is
> very confusing.

Sure, that makes sense. Ok, I removed it.


Thanks for looking - and for the patience with the latency on the reply.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v3 00/13] example implementation of revwalk tutorial
  2019-07-25  9:25       ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Johannes Schindelin
@ 2019-08-06 23:13         ` Emily Shaffer
  2019-08-08 19:19           ` Johannes Schindelin
  0 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-08-06 23:13 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

On Thu, Jul 25, 2019 at 11:25:02AM +0200, Johannes Schindelin wrote:
> Hi Emily,
> 
> On Mon, 1 Jul 2019, Emily Shaffer wrote:
> 
> > Since v2, mostly reworded comments, plus fixed the issues mentioned in
> > the tutorial itself. Thanks Eric for the review.
> >
> > Emily Shaffer (13):
> >   walken: add infrastructure for revwalk demo
> >   walken: add usage to enable -h
> >   walken: add placeholder to initialize defaults
> >   walken: add handler to git_config
> >   walken: configure rev_info and prepare for walk
> >   walken: perform our basic revision walk
> >   walken: filter for authors from gmail address
> >   walken: demonstrate various topographical sorts
> >   walken: demonstrate reversing a revision walk list
> >   walken: add unfiltered object walk from HEAD
> >   walken: add filtered object walk
> >   walken: count omitted objects
> >   walken: reverse the object walk order
> >
> >  Makefile         |   1 +
> >  builtin.h        |   1 +
> >  builtin/walken.c | 297 +++++++++++++++++++++++++++++++++++++++++++++++
> 
> Since this is not really intended to be an end user-facing command, I
> think it should not become a built-in, to be carried into every Git
> user's setup.

It's not intended to be checked into Git source as-is.

> 
> Instead, I would recommend to implement this as a test helper.

I'm not sure I follow how you imagine this looking, but the drawback I
see of implementing this in a different way than you would typically do
when writing a real feature for the project is that it becomes less
useful as a reference for new contributors.

> 
> This would have the following advantages:
> 
> - it won't clutter the end user installations,
> 
> - it will still be compile-tested with every build (guaranteeing that
>   the tutorial won't become stale over time as so many other tutorials),

This part of your suggestion appeals to me; so I'm really curious how
you would do it. Do you have something else written in the way you're
suggesting in mind?

> 
> - it really opens the door very wide to follow up with another tutorial
>   to guide new contributors to write stellar regression tests.
> 
> Thanks,
> Dscho
> 
> >  git.c            |   1 +
> >  4 files changed, 300 insertions(+)
> >  create mode 100644 builtin/walken.c
> >
> > --
> > 2.22.0.410.gd8fdbe21b5-goog
> >
> >
> >

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v4] documentation: add tutorial for revision walking
  2019-07-01 20:19   ` [PATCH v3] documentation: add tutorial for revision walking Emily Shaffer
                       ` (2 preceding siblings ...)
  2019-07-24 23:32     ` Jonathan Tan
@ 2019-08-06 23:19     ` Emily Shaffer
  2019-08-07 19:19       ` Junio C Hamano
  2019-10-10 15:19       ` [PATCH v5] documentation: add tutorial for object walking Emily Shaffer
  3 siblings, 2 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-08-06 23:19 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, Junio C Hamano, Eric Sunshine, Jonathan Tan,
	Josh Steadmon

Existing documentation on revision walks seems to be primarily intended
as a reference for those already familiar with the procedure. This
tutorial attempts to give an entry-level guide to a couple of bare-bones
revision walks so that new Git contributors can learn the concepts
without having to wade through options parsing or special casing.

The target audience is a Git contributor who is just getting started
with the concept of revision walking. The goal is to prepare this
contributor to be able to understand and modify existing commands which
perform revision walks more easily, although it will also prepare
contributors to create new commands which perform walks.

The tutorial covers a basic overview of the structs involved during
revision walk, setting up a basic commit walk, setting up a basic
all-object walk, and adding some configuration changes to both walk
types. It intentionally does not cover how to create new commands or
search for options from the command line or gitconfigs.

There is an associated patchset at
https://github.com/nasamuffin/git/tree/revwalk that contains a reference
implementation of the code generated by this tutorial.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
---

Since v3, only a couple of minor changes from Jonathan Tan - thanks.

I'm dropping the updates for the RFC set, since they're incremental from
now. Next time you all see them they will be in a form which we would
hope to maintain over a long period of time, checked into source -
likely in the form of an mbox or dir of .patch file.

I think the tutorial itself is pretty much ready; the example source is
more like "supporting material" - so I'd like to try to get this back on
people's minds and hopefully checked in.

Thanks!
 - Emily

 Documentation/Makefile           |   1 +
 Documentation/MyFirstRevWalk.txt | 904 +++++++++++++++++++++++++++++++
 2 files changed, 905 insertions(+)
 create mode 100644 Documentation/MyFirstRevWalk.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 76f2ecfc1b..7d136b480c 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -77,6 +77,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica
 SP_ARTICLES += $(API_DOCS)
 
 TECH_DOCS += MyFirstContribution
+TECH_DOCS += MyFirstRevWalk
 TECH_DOCS += SubmittingPatches
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
diff --git a/Documentation/MyFirstRevWalk.txt b/Documentation/MyFirstRevWalk.txt
new file mode 100644
index 0000000000..5aa249df5c
--- /dev/null
+++ b/Documentation/MyFirstRevWalk.txt
@@ -0,0 +1,904 @@
+My First Revision Walk
+======================
+
+== What's a Revision Walk?
+
+The revision walk is a key concept in Git - this is the process that underpins
+operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the
+list of objects is found by walking parent relationships between objects. The
+revision walk can also be used to determine whether or not a given object is
+reachable from the current HEAD pointer.
+
+=== Related Reading
+
+- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
+  the revision walker in its various incarnations.
+- `Documentation/technical/api-revision-walking.txt`
+- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
+  gives a good overview of the types of objects in Git and what your revision
+  walk is really describing.
+
+== Setting Up
+
+Create a new branch from `master`.
+
+----
+git checkout -b revwalk origin/master
+----
+
+We'll put our fiddling into a new command. For fun, let's name it `git walken`.
+Open up a new file `builtin/walken.c` and set up the command handler:
+
+----
+/*
+ * "git walken"
+ *
+ * Part of the "My First Revision Walk" tutorial.
+ */
+
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	trace_printf(_("cmd_walken incoming...\n"));
+	return 0;
+}
+----
+
+NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
+off at runtime. For the purposes of this tutorial, we will write `walken` as
+though it is intended for use as a "plumbing" command: that is, a command which
+is used primarily in scripts, rather than interactively by humans (a "porcelain"
+command). So we will send our debug output to `trace_printf()` instead. When
+running, enable trace output by setting the environment variable `GIT_TRACE`.
+
+Add usage text and `-h` handling, like all subcommands should consistently do
+(our test suite will notice and complain if you fail to do so).
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	const char * const walken_usage[] = {
+		N_("git walken"),
+		NULL,
+	}
+	struct option options[] = {
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
+
+	...
+}
+----
+
+Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix);
+----
+
+Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
+maintaining alphabetical ordering:
+
+----
+{ "walken", cmd_walken, RUN_SETUP },
+----
+
+Add it to the `Makefile` near the line for `builtin/worktree.o`:
+
+----
+BUILTIN_OBJS += builtin/walken.o
+----
+
+Build and test out your command, without forgetting to ensure the `DEVELOPER`
+flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
+
+----
+$ echo DEVELOPER=1 >>config.mak
+$ make
+$ GIT_TRACE=1 ./bin-wrappers/git walken
+----
+
+NOTE: For a more exhaustive overview of the new command process, take a look at
+`Documentation/MyFirstContribution.txt`.
+
+NOTE: A reference implementation can be found at
+https://github.com/nasamuffin/git/tree/revwalk.
+
+=== `struct rev_cmdline_info`
+
+The definition of `struct rev_cmdline_info` can be found in `revision.h`.
+
+This struct is contained within the `rev_info` struct and is used to reflect
+parameters provided by the user over the CLI.
+
+`nr` represents the number of `rev_cmdline_entry` present in the array.
+
+`alloc` is used by the `ALLOC_GROW` macro. Check
+`Documentation/technical/api-allocation-growing.txt` - this variable is used to
+track the allocated size of the list.
+
+Per entry, we find:
+
+`item` is the object provided upon which to base the revision walk. Items in Git
+can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
+
+`name` is the object ID (OID) of the object - a hex string you may be familiar
+with from using Git to organize your source in the past. Check the tutorial
+mentioned above towards the top for a discussion of where the OID can come
+from.
+
+`whence` indicates some information about what to do with the parents of the
+specified object. We'll explore this flag more later on; take a look at
+`Documentation/revisions.txt` to get an idea of what could set the `whence`
+value.
+
+`flags` are used to hint the beginning of the revision walk and are the first
+block under the `#include`s in `revision.h`. The most likely ones to be set in
+the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
+can be used during the walk, as well.
+
+=== `struct rev_info`
+
+This one is quite a bit longer, and many fields are only used during the walk
+by `revision.c` - not configuration options. Most of the configurable flags in
+`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
+good idea to take some time and read through that document.
+
+== Basic Commit Walk
+
+First, let's see if we can replicate the output of `git log --oneline`. We'll
+refer back to the implementation frequently to discover norms when performing
+a revision walk of our own.
+
+To do so, we'll first find all the commits, in order, which preceded the current
+commit. We'll extract the name and subject of the commit from each.
+
+Ideally, we will also be able to find out which ones are currently at the tip of
+various branches.
+
+=== Setting Up
+
+Preparing for your revision walk has some distinct stages.
+
+1. Perform default setup for this mode, and others which may be invoked.
+2. Check configuration files for relevant settings.
+3. Set up the `rev_info` struct.
+4. Tweak the initialized `rev_info` to suit the current walk.
+5. Prepare the `rev_info` for the walk.
+6. Iterate over the objects, processing each one.
+
+==== Default Setups
+
+Before examining configuration files which may modify command behavior, set up
+default state for switches or options your command may have. If your command
+utilizes other Git components, ask them to set up their default states as well.
+For instance, `git log` takes advantage of `grep` and `diff` functionality, so
+its `init_log_defaults()` sets its own state (`decoration_style`) and asks
+`grep` and `diff` to initialize themselves by calling each of their
+initialization functions.
+
+For our purposes, within `git walken`, for the first example we don't intend to
+use any other components within Git, and we don't have any configuration to do.
+However, we may want to add some later, so for now, we can add an empty
+placeholder. Create a new function in `builtin/walken.c`:
+
+----
+static void init_walken_defaults(void)
+{
+	/*
+	 * We don't actually need the same components `git log` does; leave this
+	 * empty for now.
+	 */
+}
+----
+
+Make sure to add a line invoking it inside of `cmd_walken()`.
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	init_walken_defaults();
+}
+----
+
+==== Configuring From `.gitconfig`
+
+Next, we should have a look at any relevant configuration settings (i.e.,
+settings readable and settable from `git config`). This is done by providing a
+callback to `git_config()`; within that callback, you can also invoke methods
+from other components you may need that need to intercept these options. Your
+callback will be invoked once per each configuration value which Git knows about
+(global, local, worktree, etc.).
+
+Similarly to the default values, we don't have anything to do here yet
+ourselves; however, we should call `git_default_config()` if we aren't calling
+any other existing config callbacks.
+
+Add a new function to `builtin/walken.c`:
+
+----
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	/*
+	 * For now, we don't have any custom configuration, so fall back to
+	 * the default config.
+	 */
+	return git_default_config(var, value, cb);
+}
+----
+
+Make sure to invoke `git_config()` with it in your `cmd_walken()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	git_config(git_walken_config, NULL);
+
+	...
+}
+----
+
+// TODO: Checking CLI options
+
+==== Setting Up `rev_info`
+
+Now that we've gathered external configuration and options, it's time to
+initialize the `rev_info` object which we will use to perform the walk. This is
+typically done by calling `repo_init_revisions()` with the repository you intend
+to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
+struct.
+
+Add the `struct rev_info` and the `repo_init_revisions()` call:
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	/* This can go wherever you like in your declarations.*/
+	struct rev_info rev;
+	...
+
+	/* This should go after the git_config() call. */
+	repo_init_revisions(the_repository, &rev, prefix);
+
+	...
+}
+----
+
+==== Tweaking `rev_info` For the Walk
+
+We're getting close, but we're still not quite ready to go. Now that `rev` is
+initialized, we can modify it to fit our needs. This is usually done within a
+helper for clarity, so let's add one:
+
+----
+static void final_rev_info_setup(struct rev_info *rev)
+{
+	/*
+	 * We want to mimic the appearance of `git log --oneline`, so let's
+	 * force oneline format.
+	 */
+	get_commit_format("oneline", rev);
+
+	/* Start our revision walk at HEAD. */
+	add_head_to_pending(rev);
+}
+----
+
+[NOTE]
+====
+Instead of using the shorthand `add_head_to_pending()`, you could do
+something like this:
+----
+	struct setup_revision_opt opt;
+
+	memset(&opt, 0, sizeof(opt));
+	opt.def = "HEAD";
+	opt.revarg_opt = REVARG_COMMITTISH;
+	setup_revisions(argc, argv, rev, &opt);
+----
+Using a `setup_revision_opt` gives you finer control over your walk's starting
+point.
+====
+
+Then let's invoke `final_rev_info_setup()` after the call to
+`repo_init_revisions()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	final_rev_info_setup(&rev);
+
+	...
+}
+----
+
+Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
+now, this is all we need.
+
+==== Preparing `rev_info` For the Walk
+
+Now that `rev` is all initialized and configured, we've got one more setup step
+before we get rolling. We can do this in a helper, which will both prepare the
+`rev_info` for the walk, and perform the walk itself. Let's start the helper
+with the call to `prepare_revision_walk()`, which can return an error without
+dying on its own:
+
+----
+static void walken_commit_walk(struct rev_info *rev)
+{
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+}
+----
+
+NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
+`stderr` it's likely to be seen by a human, so we will localize it.
+
+==== Performing the Walk!
+
+Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
+can also be used as an iterator; we move to the next item in the walk by using
+`get_revision()` repeatedly. Add the listed variable declarations at the top and
+the walk loop below the `prepare_revision_walk()` call within your
+`walken_commit_walk()`:
+
+----
+static void walken_commit_walk(struct rev_info *rev)
+{
+	struct commit *commit;
+	struct strbuf prettybuf = STRBUF_INIT;
+
+	...
+
+	while ((commit = get_revision(rev))) {
+		if (!commit)
+			continue;
+
+		strbuf_reset(&prettybuf);
+		pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
+		puts(prettybuf.buf);
+	}
+	strbuf_release(&prettybuf);
+}
+----
+
+NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
+command we expect to be machine-parsed, we're sending it directly to stdout.
+
+Give it a shot.
+
+----
+$ make
+$ ./bin-wrappers/git walken
+----
+
+You should see all of the subject lines of all the commits in
+your tree's history, in order, ending with the initial commit, "Initial revision
+of "git", the information manager from hell". Congratulations! You've written
+your first revision walk. You can play with printing some additional fields
+from each commit if you're curious; have a look at the functions available in
+`commit.h`.
+
+=== Adding a Filter
+
+Next, let's try to filter the commits we see based on their author. This is
+equivalent to running `git log --author=<pattern>`. We can add a filter by
+modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
+
+First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add
+`grep_config()` to `git_walken_config()`:
+
+----
+static void init_walken_defaults(void)
+{
+	init_grep_defaults(the_repository);
+}
+
+...
+
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	grep_config(var, value, cb);
+	return git_default_config(var, value, cb);
+}
+----
+
+Next, we can modify the `grep_filter`. This is done with convenience functions
+found in `grep.h`. For fun, we're filtering to only commits from folks using a
+`gmail.com` email address - a not-very-precise guess at who may be working on
+Git as a hobby. Since we're checking the author, which is a specific line in the
+header, we'll use the `append_header_grep_pattern()` helper. We can use
+the `enum grep_header_field` to indicate which part of the commit header we want
+to search.
+
+In `final_rev_info_setup()`, add your filter line:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
+		"gmail");
+	compile_grep_patterns(&rev->grep_filter);
+
+	...
+}
+----
+
+`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
+it won't work unless we compile it with `compile_grep_patterns()`.
+
+NOTE: If you are using `setup_revisions()` (for example, if you are passing a
+`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
+to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
+
+NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
+wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
+`enum grep_pat_token` for us.
+
+=== Changing the Order
+
+There are a few ways that we can change the order of the commits during a
+revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
+sane orderings.
+
+`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
+before all of its children have been shown, and we avoid mixing commits which
+are in different lines of history. (`git help log`'s section on `--topo-order`
+has a very nice diagram to illustrate this.)
+
+Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
+`REV_SORT_BY_AUTHOR_DATE`. Add the following:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_COMMIT_DATE;
+
+	...
+}
+----
+
+Let's output this into a file so we can easily diff it with the walk sorted by
+author date.
+
+----
+$ make
+$ ./bin-wrappers/git walken > commit-date.txt
+----
+
+Then, let's sort by author date and run it again.
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
+
+	...
+}
+----
+
+----
+$ make
+$ ./bin-wrappers/git walken > author-date.txt
+----
+
+Finally, compare the two. This is a little less helpful without object names or
+dates, but hopefully we get the idea.
+
+----
+$ diff -u commit-date.txt author-date.txt
+----
+
+This display indicates that commits can be reordered after they're written, for
+example with `git rebase`.
+
+Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
+Set that flag somewhere inside of `final_rev_info_setup()`:
+
+----
+static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
+		struct rev_info *rev)
+{
+	...
+
+	rev->reverse = 1;
+
+	...
+}
+----
+
+Run your walk again and note the difference in order. (If you remove the grep
+pattern, you should see the last commit this call gives you as your current
+HEAD.)
+
+== Basic Object Walk
+
+So far we've been walking only commits. But Git has more types of objects than
+that! Let's see if we can walk _all_ objects, and find out some information
+about each one.
+
+We can base our work on an example. `git pack-objects` prepares all kinds of
+objects for packing into a bitmap or packfile. The work we are interested in
+resides in `builtins/pack-objects.c:get_object_list()`; examination of that
+function shows that the all-object walk is being performed by
+`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
+functions reside in `list-objects.c`; examining the source shows that, despite
+the name, these functions traverse all kinds of objects. Let's have a look at
+the arguments to `traverse_commit_list_filtered()`, which are a superset of the
+arguments to the unfiltered version.
+
+- `struct list_objects_filter_options *filter_options`: This is a struct which
+  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
+- `struct rev_info *revs`: This is the `rev_info` used for the walk.
+- `show_commit_fn show_commit`: A callback which will be used to handle each
+  individual commit object.
+- `show_object_fn show_object`: A callback which will be used to handle each
+  non-commit object (so each blob, tree, or tag).
+- `void *show_data`: A context buffer which is passed in turn to `show_commit`
+  and `show_object`.
+- `struct oidset *omitted`: A linked-list of object IDs which the provided
+  filter caused to be omitted.
+
+It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
+instead of needing us to call it repeatedly ourselves. Cool! Let's add the
+callbacks first.
+
+For the sake of this tutorial, we'll simply keep track of how many of each kind
+of object we find. At file scope in `builtin/walken.c` add the following
+tracking variables:
+
+----
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+----
+
+Commits are handled by a different callback than other objects; let's do that
+one first:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	commit_count++;
+}
+----
+
+The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
+the `buf` argument is actually the context buffer that we can provide to the
+traversal calls - `show_data`, which we mentioned a moment ago.
+
+Since we have the `struct commit` object, we can look at all the same parts that
+we looked at in our earlier commit-only walk. For the sake of this tutorial,
+though, we'll just increment the commit counter and move on.
+
+The callback for non-commits is a little different, as we'll need to check
+which kind of object we're dealing with:
+
+----
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	switch (obj->type) {
+	case OBJ_TREE:
+		tree_count++;
+		break;
+	case OBJ_BLOB:
+		blob_count++;
+		break;
+	case OBJ_TAG:
+		tag_count++;
+		break;
+	case OBJ_COMMIT:
+		BUG("unexpected commit object in walken_show_object\n");
+	default:
+		BUG("unexpected object type %s in walken_show_object\n",
+			type_name(obj->type));
+	}
+}
+----
+
+Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
+context pointer that `walken_show_commit()` receives: the `show_data` argument
+to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
+`str` contains the name of the object, which ends up being something like
+`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
+
+To help assure us that we aren't double-counting commits, we'll include some
+complaining if a commit object is routed through our non-commit callback; we'll
+also complain if we see an invalid object type. Since those two cases should be
+unreachable, and would only change in the event of a semantic change to the Git
+codebase, we complain by using `BUG()` - which is a signal to a developer that
+the change they made caused unintended consequences, and the rest of the
+codebase needs to be updated to understand that change. `BUG()` is not intended
+to be seen by the public, so it is not localized.
+
+Our main object walk implementation is substantially different from our commit
+walk implementation, so let's make a new function to perform the object walk. We
+can perform setup which is applicable to all objects here, too, to keep separate
+from setup which is applicable to commit-only walks.
+
+We'll start by enabling all types of objects in the `struct rev_info`.  We'll
+also turn on `tree_blobs_in_commit_order`, which means that we will walk a
+commit's tree and everything it points to immediately after we find each commit,
+as opposed to waiting for the end and walking through all trees after the commit
+history has been discovered. With the appropriate settings configured, we are
+ready to call `prepare_revision_walk()`.
+
+----
+static void walken_object_walk(struct rev_info *rev)
+{
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+
+	commit_count = 0;
+	tag_count = 0;
+	blob_count = 0;
+	tree_count = 0;
+----
+
+Let's start by calling just the unfiltered walk and reporting our counts.
+Complete your implementation of `walken_object_walk()`:
+
+----
+	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+	printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
+		blob_count, tag_count, tree_count);
+}
+----
+
+NOTE: This output is intended to be machine-parsed. Therefore, we are not
+sending it to `trace_printf()`, and we are not localizing it - we need scripts
+to be able to count on the formatting to be exactly the way it is shown here.
+If we were intending this output to be read by humans, we would need to localize
+it with `_()`.
+
+Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
+command line options is out of scope for this tutorial, so we'll just hardcode
+a branch we can change at compile time. Where you call `final_rev_info_setup()`
+and `walken_commit_walk()`, instead branch like so:
+
+----
+	if (1) {
+		add_head_to_pending(&rev);
+		walken_object_walk(&rev);
+	} else {
+		final_rev_info_setup(argc, argv, prefix, &rev);
+		walken_commit_walk(&rev);
+	}
+----
+
+NOTE: For simplicity, we've avoided all the filters and sorts we applied in
+`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
+want, you can certainly use the filters we added before by moving
+`final_rev_info_setup()` out of the conditional and removing the call to
+`add_head_to_pending()`.
+
+Now we can try to run our command! It should take noticeably longer than the
+commit walk, but an examination of the output will give you an idea why. Your
+output should look similar to this example, but with different counts:
+
+----
+Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
+----
+
+This makes sense. We have more trees than commits because the Git project has
+lots of subdirectories which can change, plus at least one tree per commit. We
+have no tags because we started on a commit (`HEAD`) and while tags can point to
+commits, commits can't point to tags.
+
+NOTE: You will have different counts when you run this yourself! The number of
+objects grows along with the Git project.
+
+=== Adding a Filter
+
+There are a handful of filters that we can apply to the object walk laid out in
+`Documentation/rev-list-options.txt`. These filters are typically useful for
+operations such as creating packfiles or performing a partial clone. They are
+defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
+will use the "tree:1" filter, which causes the walk to omit all trees and blobs
+which are not directly referenced by commits reachable from the commit in
+`pending` when the walk begins. (`pending` is the list of objects which need to
+be traversed during a walk; you can imagine a breadth-first tree traversal to
+help understand. In our case, that means we omit trees and blobs not directly
+referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
+`HEAD` in the `pending` list.)
+
+First, we'll need to `#include "list-objects-filter-options.h`" and set up the
+`struct list_objects_filter_options` at the top of the function.
+
+----
+static void walken_object_walk(struct rev_info *rev)
+{
+	struct list_objects_filter_options filter_options = {};
+
+	...
+----
+
+For now, we are not going to track the omitted objects, so we'll replace those
+parameters with `NULL`. For the sake of simplicity, we'll add a simple
+build-time branch to use our filter or not. Replace the line calling
+`traverse_commit_list()` with the following, which will remind us which kind of
+walk we've just performed:
+
+----
+	if (0) {
+		/* Unfiltered: */
+		trace_printf(_("Unfiltered object walk.\n"));
+		traverse_commit_list(rev, walken_show_commit,
+				walken_show_object, NULL);
+	} else {
+		trace_printf(
+			_("Filtered object walk with filterspec 'tree:1'.\n"));
+		parse_list_objects_filter(&filter_options, "tree:1");
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, NULL);
+	}
+----
+
+`struct list_objects_filter_options` is usually built directly from a command
+line argument, so the module provides an easy way to build one from a string.
+Even though we aren't taking user input right now, we can still build one with
+a hardcoded string using `parse_list_objects_filter()`.
+
+With the filter spec "tree:1", we are expecting to see _only_ the root tree for
+each commit; therefore, the tree object count should be less than or equal to
+the number of commits. (For an example of why that's true: `git commit --revert`
+points to the same tree object as its grandparent.)
+
+=== Counting Omitted Objects
+
+We also have the capability to enumerate all objects which were omitted by a
+filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
+`traverse_commit_list_filtered()` to populate the `omitted` list means that our
+revision walk does not perform any better than an unfiltered revision walk; all
+reachable objects are walked in order to populate the list.
+
+First, add the `struct oidset` and related items we will use to iterate it:
+
+----
+static void walken_object_walk(
+	...
+
+	struct oidset omitted;
+	struct oidset_iter oit;
+	struct object_id *oid = NULL;
+	int omitted_count = 0;
+	oidset_init(&omitted, 0);
+
+	...
+----
+
+Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
+object:
+
+----
+	...
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, &omitted);
+
+	...
+----
+
+Then, after your traversal, the `oidset` traversal is pretty straightforward.
+Count all the objects within and modify the print statement:
+
+----
+	/* Count the omitted objects. */
+	oidset_iter_init(&omitted, &oit);
+
+	while ((oid = oidset_iter_next(&oit)))
+		omitted_count++;
+
+	printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n",
+		commit_count, blob_count, tag_count, tree_count, omitted_count);
+----
+
+By running your walk with and without the filter, you should find that the total
+object count in each case is identical. You can also time each invocation of
+the `walken` subcommand, with and without `omitted` being passed in, to confirm
+to yourself the runtime impact of tracking all omitted objects.
+
+=== Changing the Order
+
+Finally, let's demonstrate that you can also reorder walks of all objects, not
+just walks of commits. First, we'll make our handlers chattier - modify
+`walken_show_commit()` and `walken_show_object()` to print the object as they
+go:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
+	commit_count++;
+}
+
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
+
+	...
+}
+----
+
+NOTE: Since we will be examining this output directly as humans, we'll use
+`trace_printf()` here. Additionally, since this change introduces a significant
+number of printed lines, using `trace_printf()` will allow us to easily silence
+those lines without having to recompile.
+
+(Leave the counter increment logic in place.)
+
+With only that change, run again (but save yourself some scrollback):
+
+----
+$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
+----
+
+Take a look at the top commit with `git show` and the object ID you printed; it
+should be the same as the output of `git show HEAD`.
+
+Next, let's change a setting on our `struct rev_info` within
+`walken_object_walk()`. Find where you're changing the other settings on `rev`,
+such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
+`reverse` setting at the bottom:
+
+----
+	...
+
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+	rev->reverse = 1;
+
+	...
+----
+
+Now, run again, but this time, let's grab the last handful of objects instead
+of the first handful:
+
+----
+$ make
+$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
+----
+
+The last commit object given should have the same OID as the one we saw at the
+top before, and running `git show <oid>` with that OID should give you again
+the same results as `git show HEAD`. Furthermore, if you run and examine the
+first ten lines again (with `head` instead of `tail` like we did before applying
+the `reverse` setting), you should see that now the first commit printed is the
+initial commit, `e83c5163`.
+
+== Wrapping Up
+
+Let's review. In this tutorial, we:
+
+- Built a commit walk from the ground up
+- Enabled a grep filter for that commit walk
+- Changed the sort order of that filtered commit walk
+- Built an object walk (tags, commits, trees, and blobs) from the ground up
+- Learned how to add a filter-spec to an object walk
+- Changed the display order of the filtered object walk
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v4] documentation: add tutorial for revision walking
  2019-08-06 23:19     ` [PATCH v4] " Emily Shaffer
@ 2019-08-07 19:19       ` Junio C Hamano
  2019-08-14 18:33         ` Emily Shaffer
  2019-10-10 15:19       ` [PATCH v5] documentation: add tutorial for object walking Emily Shaffer
  1 sibling, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2019-08-07 19:19 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git, Eric Sunshine, Jonathan Tan, Josh Steadmon

Emily Shaffer <emilyshaffer@google.com> writes:

> Since v3, only a couple of minor changes from Jonathan Tan - thanks.
>
> I'm dropping the updates for the RFC set, since they're incremental from
> now. Next time you all see them they will be in a form which we would
> hope to maintain over a long period of time, checked into source -
> likely in the form of an mbox or dir of .patch file.

Sure.

> I think the tutorial itself is pretty much ready...

A few comments after skimming this round; none of them may be a show
stopper, but others may have different opinions.

 - There is still a leftover "TODO: checking CLI options"; is that
   something we postpone teaching?

 - This is an offtopic tangent, but "my first contribution" being an
   addition of an entire command probably mistakenly raised the bar
   to contributors a bit too much.  A typical first contribution is
   a typofix, fix to a small (e.g. off-by-one) bug, etc.

 - For a revision walk tutorial, not seeing any mention of pathspec
   filtering and associated history simplification is somewhat
   unsatisfying.  On the other hand, I expect that enumeration of
   objects contained within commits is (hence various --filter
   options are) totally uninteresting for end users who run the
   command interactively and view the output of the command on
   screen.

 - Enumeration of objects is useful at least in three places in Git:
   (1) enumerate objects to be packed, with some filtering based on
   various criteria; (2) enumerate objects that are reachable from
   anchor points like refs, index, reflog, etc., to discover what
   are not reachable and can be discarded; (3) enumerate objects
   that still matter (i.e. the opposite of (2)) so that they can be
   fed to validation mechanisms (e.g. "cat-file --batch-check").  If
   this were titled "My first object enumeration", the reaction led
   to the latter half of the previous point may not have occurred
   (pathspec filtering would still be relevant, but not as
   much---for packing to create a narrow clone, you do not want to
   use pathspec with history simplification, but you would want to
   use something more like "root and intermediate trees that are
   necessary to cover these paths" filter in the
   list--objects-filter layer).

And from the point of view of the last item, I would think the new
document is covering a need that is different from what we
traditionally would call "revision walk", which is more about "git
log", not the upstream of "git pack-objects", which this new
document is more geared towards.

Unless "git walken" is an exercise of how to write code that does
random thing, use of --grep filter however may be out of place,
though.  I do not offhand think of a use case where --grep would be
useful in the revision walk/object enumeration that is placed
upstream of "pack-objects".

Thanks.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [RFC PATCH v3 00/13] example implementation of revwalk tutorial
  2019-08-06 23:13         ` Emily Shaffer
@ 2019-08-08 19:19           ` Johannes Schindelin
  0 siblings, 0 replies; 102+ messages in thread
From: Johannes Schindelin @ 2019-08-08 19:19 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git

Hi Emily,

On Tue, 6 Aug 2019, Emily Shaffer wrote:

> On Thu, Jul 25, 2019 at 11:25:02AM +0200, Johannes Schindelin wrote:
> >
> > On Mon, 1 Jul 2019, Emily Shaffer wrote:
> >
> > > Since v2, mostly reworded comments, plus fixed the issues mentioned in
> > > the tutorial itself. Thanks Eric for the review.
> > >
> > > Emily Shaffer (13):
> > >   walken: add infrastructure for revwalk demo
> > >   walken: add usage to enable -h
> > >   walken: add placeholder to initialize defaults
> > >   walken: add handler to git_config
> > >   walken: configure rev_info and prepare for walk
> > >   walken: perform our basic revision walk
> > >   walken: filter for authors from gmail address
> > >   walken: demonstrate various topographical sorts
> > >   walken: demonstrate reversing a revision walk list
> > >   walken: add unfiltered object walk from HEAD
> > >   walken: add filtered object walk
> > >   walken: count omitted objects
> > >   walken: reverse the object walk order
> > >
> > >  Makefile         |   1 +
> > >  builtin.h        |   1 +
> > >  builtin/walken.c | 297 +++++++++++++++++++++++++++++++++++++++++++++++
> >
> > Since this is not really intended to be an end user-facing command, I
> > think it should not become a built-in, to be carried into every Git
> > user's setup.
>
> It's not intended to be checked into Git source as-is.

Then it runs the very real danger of becoming stale: we do _not_
guarantee a stable API, not even an internal one.

> > Instead, I would recommend to implement this as a test helper.
>
> I'm not sure I follow how you imagine this looking, but the drawback I
> see of implementing this in a different way than you would typically do
> when writing a real feature for the project is that it becomes less
> useful as a reference for new contributors.

To the contrary. Some code in `t/helper/` is intended to test
functionality in a way that is copy-editable.

Your use case strikes me a perfect example for such a test helper:

- It guarantees that the example is valid,
- It demonstrates how to use the API,
- In case the API changes, the changes to the helper will inform
  contributors how to change their copy-edited versions

> > This would have the following advantages:
> >
> > - it won't clutter the end user installations,
> >
> > - it will still be compile-tested with every build (guaranteeing that
> >   the tutorial won't become stale over time as so many other tutorials),
>
> This part of your suggestion appeals to me; so I'm really curious how
> you would do it. Do you have something else written in the way you're
> suggesting in mind?

I looked at `t/helper/test-hashmap.c` and it looks _almost_ like a
perfect example for what I have in mind: it uses a given API,
demonstrates how to use it properly, and is copy-editable.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v4] documentation: add tutorial for revision walking
  2019-08-07 19:19       ` Junio C Hamano
@ 2019-08-14 18:33         ` Emily Shaffer
  2019-08-14 19:18           ` Junio C Hamano
  0 siblings, 1 reply; 102+ messages in thread
From: Emily Shaffer @ 2019-08-14 18:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine, Jonathan Tan, Josh Steadmon

On Wed, Aug 07, 2019 at 12:19:12PM -0700, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > Since v3, only a couple of minor changes from Jonathan Tan - thanks.
> >
> > I'm dropping the updates for the RFC set, since they're incremental from
> > now. Next time you all see them they will be in a form which we would
> > hope to maintain over a long period of time, checked into source -
> > likely in the form of an mbox or dir of .patch file.
> 
> Sure.
> 
> > I think the tutorial itself is pretty much ready...
> 
> A few comments after skimming this round; none of them may be a show
> stopper, but others may have different opinions.
> 
>  - There is still a leftover "TODO: checking CLI options"; is that
>    something we postpone teaching?

Yeah, I think it would make more sense to include it into the other one
(my first contribution) instead.

> 
>  - This is an offtopic tangent, but "my first contribution" being an
>    addition of an entire command probably mistakenly raised the bar
>    to contributors a bit too much.  A typical first contribution is
>    a typofix, fix to a small (e.g. off-by-one) bug, etc.

Sure, I agree with that; but I think the larger (though less common)
project of new command allowed me to explain more about the architecture
of the project overall than a typo fix would. Maybe there's a clearer
name to use?

> 
>  - For a revision walk tutorial, not seeing any mention of pathspec
>    filtering and associated history simplification is somewhat
>    unsatisfying.  On the other hand, I expect that enumeration of
>    objects contained within commits is (hence various --filter
>    options are) totally uninteresting for end users who run the
>    command interactively and view the output of the command on
>    screen.
> 
>  - Enumeration of objects is useful at least in three places in Git:
>    (1) enumerate objects to be packed, with some filtering based on
>    various criteria; (2) enumerate objects that are reachable from
>    anchor points like refs, index, reflog, etc., to discover what
>    are not reachable and can be discarded; (3) enumerate objects
>    that still matter (i.e. the opposite of (2)) so that they can be
>    fed to validation mechanisms (e.g. "cat-file --batch-check").  If
>    this were titled "My first object enumeration", the reaction led
>    to the latter half of the previous point may not have occurred
>    (pathspec filtering would still be relevant, but not as
>    much---for packing to create a narrow clone, you do not want to
>    use pathspec with history simplification, but you would want to
>    use something more like "root and intermediate trees that are
>    necessary to cover these paths" filter in the
>    list--objects-filter layer).
> 
> And from the point of view of the last item, I would think the new
> document is covering a need that is different from what we
> traditionally would call "revision walk", which is more about "git
> log", not the upstream of "git pack-objects", which this new
> document is more geared towards.

Hmmm. It sounds like you're saying:

- This object covers walking objects, which is surprising since it's
  titled about "revision walks". Revision walks are more about commits
  ("git log").
- Using grep on objects doesn't make any sense.
- Other filters (like pathspecs) which do make sense for object walks
  aren't covered.

> 
> Unless "git walken" is an exercise of how to write code that does
> random thing, use of --grep filter however may be out of place,
> though.  I do not offhand think of a use case where --grep would be
> useful in the revision walk/object enumeration that is placed
> upstream of "pack-objects".

In this case, it might make sense to do one of these things:

- Apply the grep filter to the commit walk, and apply a more interesting
  object filter to the object walk.

Or,

- Choose a different kind of filter which is interesting when applied to
  commits alone _and_ all objects.

In the interest of covering more ground with this kind of tutorial, I'd
lean more towards the former.


It's possible that the added scope will make the document large enough
that we'd rather split it into two (one for "git log"-ish, one for "git
pack-objects"-ish). I think that's fine if we end up there.

Thanks. I hope to get back to this soon...

 - Emily

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v4] documentation: add tutorial for revision walking
  2019-08-14 18:33         ` Emily Shaffer
@ 2019-08-14 19:18           ` Junio C Hamano
  0 siblings, 0 replies; 102+ messages in thread
From: Junio C Hamano @ 2019-08-14 19:18 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git, Eric Sunshine, Jonathan Tan, Josh Steadmon

Emily Shaffer <emilyshaffer@google.com> writes:

>> > I think the tutorial itself is pretty much ready...
>> 
>> A few comments after skimming this round; none of them may be a show
>> stopper, but others may have different opinions.
>>  ...
> Hmmm. It sounds like you're saying:
>
> - This object covers walking objects, which is surprising since it's
>   titled about "revision walks". Revision walks are more about commits
>   ("git log").

Yes, the document does not duplicate what existing docs on "revision
walk" API would cover, which is a very good thing, as it is (or at
least "feels to be") primarly about walking objects.

> - Using grep on objects doesn't make any sense.

The grep filter works on commit's log messages, and does not even
look at other types of objects, so while that point is true, what I
was driving at was that skipping commits using grep filter would
mean showing trees and blobs related only to the chosen commits, and
while it can be explained as such (i.e. "trees and blobs contained
only in commits without these strings are excluded"), the practical
usefulness of such a "feature" is dubious (here I am imagining the
primary practical use of "object walk" is to feed pack-objects).

> - Other filters (like pathspecs) which do make sense for object walks
>   aren't covered.

Yup.  For example, "trees and blobs that appear only outside of this
directory hierarchy are excluded" would be useful to enumerate
objects necessary for a narrow commits (again, to feed pack-objects).

> - Apply the grep filter to the commit walk, and apply a more interesting
>   object filter to the object walk.
>
> Or,
>
> - Choose a different kind of filter which is interesting when applied to
>   commits alone _and_ all objects.
>
> In the interest of covering more ground with this kind of tutorial, I'd
> lean more towards the former.

Sorry, I do not have enough imagination to cheer for either of these
two options---these may be "interesting" in the same way as "trees
and blobs contained only in commits without these strings are
excluded" enumeration, but I fail to see practical usefulness
(i.e. the reason why a user may be tempted to learn how to achieve
it).

In any case, that was my personal take and not a strong request to
change anything, as I said upfront.  The document just gave me an
impression that it was teaching coding exercise that may be
interesting but of dubious utility.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v5] documentation: add tutorial for object walking
  2019-08-06 23:19     ` [PATCH v4] " Emily Shaffer
  2019-08-07 19:19       ` Junio C Hamano
@ 2019-10-10 15:19       ` Emily Shaffer
  2019-10-11  5:50         ` Junio C Hamano
                           ` (2 more replies)
  1 sibling, 3 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-10-10 15:19 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, Junio C Hamano, Eric Sunshine, Jonathan Tan,
	Josh Steadmon

The target audience is a Git contributor who is just getting started
with the concept of object walking. The goal is to prepare this
contributor to be able to understand and modify existing commands which
perform revision walks more easily, although it will also prepare
contributors to create new commands which perform walks.

The tutorial covers a basic overview of the structs involved during
object walk, setting up a basic commit walk, setting up a basic
all-object walk, and adding some configuration changes to both walk
types. It intentionally does not cover how to create new commands or
search for options from the command line or gitconfigs.

There is an associated patchset at
https://github.com/nasamuffin/git/tree/revwalk that contains a reference
implementation of the code generated by this tutorial.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
---
Primarily since v5 this is a reword to try to frame the tutorial as
"object walk tutorial which also covers commit-only walk", rather than
"revision walk tutorial which also covers all-object walk" - per Junio's
suggestion.

Since I had spent some time away from the patch, I also made some small
changes to the wording.

Here's the rangediff from gitster/es/walken-tutorial:

1:  1ed29a34d1 ! 1:  554d940af7 documentation: add tutorial for revision walking
     a => b | 0
     1 file changed, 0 insertions(+), 0 deletions(-)
    
    @@ Metadata
     Author: Emily Shaffer <emilyshaffer@google.com>
     
      ## Commit message ##
    -    documentation: add tutorial for revision walking
    +    documentation: add tutorial for object walking
     
    -    Existing documentation on revision walks seems to be primarily intended
    +    Existing documentation on object walks seems to be primarily intended
         as a reference for those already familiar with the procedure. This
         tutorial attempts to give an entry-level guide to a couple of bare-bones
    -    revision walks so that new Git contributors can learn the concepts
    +    object walks so that new Git contributors can learn the concepts
         without having to wade through options parsing or special casing.
     
         The target audience is a Git contributor who is just getting started
    -    with the concept of revision walking. The goal is to prepare this
    +    with the concept of object walking. The goal is to prepare this
         contributor to be able to understand and modify existing commands which
         perform revision walks more easily, although it will also prepare
         contributors to create new commands which perform walks.
     
         The tutorial covers a basic overview of the structs involved during
    -    revision walk, setting up a basic commit walk, setting up a basic
    +    object walk, setting up a basic commit walk, setting up a basic
         all-object walk, and adding some configuration changes to both walk
         types. It intentionally does not cover how to create new commands or
         search for options from the command line or gitconfigs.
    @@ Commit message
     
         Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
         Helped-by: Eric Sunshine <sunshine@sunshineco.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      ## Documentation/Makefile ##
     @@ Documentation/Makefile: API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica
    @@ Documentation/Makefile: API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api
      TECH_DOCS += technical/hash-function-transition
      TECH_DOCS += technical/http-protocol
     
    - ## Documentation/MyFirstRevWalk.txt (new) ##
    + ## Documentation/MyFirstObjectWalk.txt (new) ##
     @@
    -+My First Revision Walk
    ++My First Object Walk
     +======================
     +
    -+== What's a Revision Walk?
    ++== What's an Object Walk?
     +
    -+The revision walk is a key concept in Git - this is the process that underpins
    ++The object walk is a key concept in Git - this is the process that underpins
     +operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the
     +list of objects is found by walking parent relationships between objects. The
    -+revision walk can also be used to determine whether or not a given object is
    ++object walk can also be used to determine whether or not a given object is
     +reachable from the current HEAD pointer.
     +
    ++A related concept is the revision walk, which is focused on commit objects and
    ++their relationships.
    ++
     +=== Related Reading
     +
     +- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
     +  the revision walker in its various incarnations.
     +- `Documentation/technical/api-revision-walking.txt`
     +- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
    -+  gives a good overview of the types of objects in Git and what your revision
    ++  gives a good overview of the types of objects in Git and what your object
     +  walk is really describing.
     +
     +== Setting Up
    @@ Documentation/MyFirstRevWalk.txt (new)
     +/*
     + * "git walken"
     + *
    -+ * Part of the "My First Revision Walk" tutorial.
    ++ * Part of the "My First Object Walk" tutorial.
     + */
     +
     +#include "builtin.h"
    @@ Documentation/MyFirstRevWalk.txt (new)
     +
     +Per entry, we find:
     +
    -+`item` is the object provided upon which to base the revision walk. Items in Git
    ++`item` is the object provided upon which to base the object walk. Items in Git
     +can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
     +
     +`name` is the object ID (OID) of the object - a hex string you may be familiar
    @@ Documentation/MyFirstRevWalk.txt (new)
     +
     +First, let's see if we can replicate the output of `git log --oneline`. We'll
     +refer back to the implementation frequently to discover norms when performing
    -+a revision walk of our own.
    ++an object walk of our own.
     +
     +To do so, we'll first find all the commits, in order, which preceded the current
     +commit. We'll extract the name and subject of the commit from each.
    @@ Documentation/MyFirstRevWalk.txt (new)
     +
     +=== Setting Up
     +
    -+Preparing for your revision walk has some distinct stages.
    ++Preparing for your object walk has some distinct stages.
     +
     +1. Perform default setup for this mode, and others which may be invoked.
     +2. Check configuration files for relevant settings.
    @@ Documentation/MyFirstRevWalk.txt (new)
     +`grep` and `diff` to initialize themselves by calling each of their
     +initialization functions.
     +
    -+For our purposes, within `git walken`, for the first example we don't intend to
    -+use any other components within Git, and we don't have any configuration to do.
    -+However, we may want to add some later, so for now, we can add an empty
    -+placeholder. Create a new function in `builtin/walken.c`:
    ++For our first example within `git walken`, we don't intend to use any other
    ++components within Git, and we don't have any configuration to do.  However, we
    ++may want to add some later, so for now, we can add an empty placeholder. Create
    ++a new function in `builtin/walken.c`:
     +
     +----
     +static void init_walken_defaults(void)
    @@ Documentation/MyFirstRevWalk.txt (new)
     +}
     +----
     +
    -+// TODO: Checking CLI options
    -+
     +==== Setting Up `rev_info`
     +
     +Now that we've gathered external configuration and options, it's time to
    @@ Documentation/MyFirstRevWalk.txt (new)
     +	 */
     +	get_commit_format("oneline", rev);
     +
    -+	/* Start our revision walk at HEAD. */
    ++	/* Start our object walk at HEAD. */
     +	add_head_to_pending(rev);
     +}
     +----
    @@ Documentation/MyFirstRevWalk.txt (new)
     +
     +There are a few ways that we can change the order of the commits during a
     +revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
    -+sane orderings.
    ++typical orderings.
     +
     +`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
     +before all of its children have been shown, and we avoid mixing commits which
    @@ Documentation/MyFirstRevWalk.txt (new)
     +We also have the capability to enumerate all objects which were omitted by a
     +filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
     +`traverse_commit_list_filtered()` to populate the `omitted` list means that our
    -+revision walk does not perform any better than an unfiltered revision walk; all
    ++object walk does not perform any better than an unfiltered object walk; all
     +reachable objects are walked in order to populate the list.
     +
     +First, add the `struct oidset` and related items we will use to iterate it:

<end rangediff>

 Documentation/Makefile              |   1 +
 Documentation/MyFirstObjectWalk.txt | 905 ++++++++++++++++++++++++++++
 2 files changed, 906 insertions(+)
 create mode 100644 Documentation/MyFirstObjectWalk.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 06d85ad958..fc35cb29b1 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -77,6 +77,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica
 SP_ARTICLES += $(API_DOCS)
 
 TECH_DOCS += MyFirstContribution
+TECH_DOCS += MyFirstRevWalk
 TECH_DOCS += SubmittingPatches
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
new file mode 100644
index 0000000000..7085f17072
--- /dev/null
+++ b/Documentation/MyFirstObjectWalk.txt
@@ -0,0 +1,905 @@
+My First Object Walk
+======================
+
+== What's an Object Walk?
+
+The object walk is a key concept in Git - this is the process that underpins
+operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the
+list of objects is found by walking parent relationships between objects. The
+object walk can also be used to determine whether or not a given object is
+reachable from the current HEAD pointer.
+
+A related concept is the revision walk, which is focused on commit objects and
+their relationships.
+
+=== Related Reading
+
+- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
+  the revision walker in its various incarnations.
+- `Documentation/technical/api-revision-walking.txt`
+- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
+  gives a good overview of the types of objects in Git and what your object
+  walk is really describing.
+
+== Setting Up
+
+Create a new branch from `master`.
+
+----
+git checkout -b revwalk origin/master
+----
+
+We'll put our fiddling into a new command. For fun, let's name it `git walken`.
+Open up a new file `builtin/walken.c` and set up the command handler:
+
+----
+/*
+ * "git walken"
+ *
+ * Part of the "My First Object Walk" tutorial.
+ */
+
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	trace_printf(_("cmd_walken incoming...\n"));
+	return 0;
+}
+----
+
+NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
+off at runtime. For the purposes of this tutorial, we will write `walken` as
+though it is intended for use as a "plumbing" command: that is, a command which
+is used primarily in scripts, rather than interactively by humans (a "porcelain"
+command). So we will send our debug output to `trace_printf()` instead. When
+running, enable trace output by setting the environment variable `GIT_TRACE`.
+
+Add usage text and `-h` handling, like all subcommands should consistently do
+(our test suite will notice and complain if you fail to do so).
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	const char * const walken_usage[] = {
+		N_("git walken"),
+		NULL,
+	}
+	struct option options[] = {
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
+
+	...
+}
+----
+
+Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix);
+----
+
+Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
+maintaining alphabetical ordering:
+
+----
+{ "walken", cmd_walken, RUN_SETUP },
+----
+
+Add it to the `Makefile` near the line for `builtin/worktree.o`:
+
+----
+BUILTIN_OBJS += builtin/walken.o
+----
+
+Build and test out your command, without forgetting to ensure the `DEVELOPER`
+flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
+
+----
+$ echo DEVELOPER=1 >>config.mak
+$ make
+$ GIT_TRACE=1 ./bin-wrappers/git walken
+----
+
+NOTE: For a more exhaustive overview of the new command process, take a look at
+`Documentation/MyFirstContribution.txt`.
+
+NOTE: A reference implementation can be found at
+https://github.com/nasamuffin/git/tree/revwalk.
+
+=== `struct rev_cmdline_info`
+
+The definition of `struct rev_cmdline_info` can be found in `revision.h`.
+
+This struct is contained within the `rev_info` struct and is used to reflect
+parameters provided by the user over the CLI.
+
+`nr` represents the number of `rev_cmdline_entry` present in the array.
+
+`alloc` is used by the `ALLOC_GROW` macro. Check
+`Documentation/technical/api-allocation-growing.txt` - this variable is used to
+track the allocated size of the list.
+
+Per entry, we find:
+
+`item` is the object provided upon which to base the object walk. Items in Git
+can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
+
+`name` is the object ID (OID) of the object - a hex string you may be familiar
+with from using Git to organize your source in the past. Check the tutorial
+mentioned above towards the top for a discussion of where the OID can come
+from.
+
+`whence` indicates some information about what to do with the parents of the
+specified object. We'll explore this flag more later on; take a look at
+`Documentation/revisions.txt` to get an idea of what could set the `whence`
+value.
+
+`flags` are used to hint the beginning of the revision walk and are the first
+block under the `#include`s in `revision.h`. The most likely ones to be set in
+the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
+can be used during the walk, as well.
+
+=== `struct rev_info`
+
+This one is quite a bit longer, and many fields are only used during the walk
+by `revision.c` - not configuration options. Most of the configurable flags in
+`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
+good idea to take some time and read through that document.
+
+== Basic Commit Walk
+
+First, let's see if we can replicate the output of `git log --oneline`. We'll
+refer back to the implementation frequently to discover norms when performing
+an object walk of our own.
+
+To do so, we'll first find all the commits, in order, which preceded the current
+commit. We'll extract the name and subject of the commit from each.
+
+Ideally, we will also be able to find out which ones are currently at the tip of
+various branches.
+
+=== Setting Up
+
+Preparing for your object walk has some distinct stages.
+
+1. Perform default setup for this mode, and others which may be invoked.
+2. Check configuration files for relevant settings.
+3. Set up the `rev_info` struct.
+4. Tweak the initialized `rev_info` to suit the current walk.
+5. Prepare the `rev_info` for the walk.
+6. Iterate over the objects, processing each one.
+
+==== Default Setups
+
+Before examining configuration files which may modify command behavior, set up
+default state for switches or options your command may have. If your command
+utilizes other Git components, ask them to set up their default states as well.
+For instance, `git log` takes advantage of `grep` and `diff` functionality, so
+its `init_log_defaults()` sets its own state (`decoration_style`) and asks
+`grep` and `diff` to initialize themselves by calling each of their
+initialization functions.
+
+For our first example within `git walken`, we don't intend to use any other
+components within Git, and we don't have any configuration to do.  However, we
+may want to add some later, so for now, we can add an empty placeholder. Create
+a new function in `builtin/walken.c`:
+
+----
+static void init_walken_defaults(void)
+{
+	/*
+	 * We don't actually need the same components `git log` does; leave this
+	 * empty for now.
+	 */
+}
+----
+
+Make sure to add a line invoking it inside of `cmd_walken()`.
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	init_walken_defaults();
+}
+----
+
+==== Configuring From `.gitconfig`
+
+Next, we should have a look at any relevant configuration settings (i.e.,
+settings readable and settable from `git config`). This is done by providing a
+callback to `git_config()`; within that callback, you can also invoke methods
+from other components you may need that need to intercept these options. Your
+callback will be invoked once per each configuration value which Git knows about
+(global, local, worktree, etc.).
+
+Similarly to the default values, we don't have anything to do here yet
+ourselves; however, we should call `git_default_config()` if we aren't calling
+any other existing config callbacks.
+
+Add a new function to `builtin/walken.c`:
+
+----
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	/*
+	 * For now, we don't have any custom configuration, so fall back to
+	 * the default config.
+	 */
+	return git_default_config(var, value, cb);
+}
+----
+
+Make sure to invoke `git_config()` with it in your `cmd_walken()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	git_config(git_walken_config, NULL);
+
+	...
+}
+----
+
+==== Setting Up `rev_info`
+
+Now that we've gathered external configuration and options, it's time to
+initialize the `rev_info` object which we will use to perform the walk. This is
+typically done by calling `repo_init_revisions()` with the repository you intend
+to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
+struct.
+
+Add the `struct rev_info` and the `repo_init_revisions()` call:
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	/* This can go wherever you like in your declarations.*/
+	struct rev_info rev;
+	...
+
+	/* This should go after the git_config() call. */
+	repo_init_revisions(the_repository, &rev, prefix);
+
+	...
+}
+----
+
+==== Tweaking `rev_info` For the Walk
+
+We're getting close, but we're still not quite ready to go. Now that `rev` is
+initialized, we can modify it to fit our needs. This is usually done within a
+helper for clarity, so let's add one:
+
+----
+static void final_rev_info_setup(struct rev_info *rev)
+{
+	/*
+	 * We want to mimic the appearance of `git log --oneline`, so let's
+	 * force oneline format.
+	 */
+	get_commit_format("oneline", rev);
+
+	/* Start our object walk at HEAD. */
+	add_head_to_pending(rev);
+}
+----
+
+[NOTE]
+====
+Instead of using the shorthand `add_head_to_pending()`, you could do
+something like this:
+----
+	struct setup_revision_opt opt;
+
+	memset(&opt, 0, sizeof(opt));
+	opt.def = "HEAD";
+	opt.revarg_opt = REVARG_COMMITTISH;
+	setup_revisions(argc, argv, rev, &opt);
+----
+Using a `setup_revision_opt` gives you finer control over your walk's starting
+point.
+====
+
+Then let's invoke `final_rev_info_setup()` after the call to
+`repo_init_revisions()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	final_rev_info_setup(&rev);
+
+	...
+}
+----
+
+Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
+now, this is all we need.
+
+==== Preparing `rev_info` For the Walk
+
+Now that `rev` is all initialized and configured, we've got one more setup step
+before we get rolling. We can do this in a helper, which will both prepare the
+`rev_info` for the walk, and perform the walk itself. Let's start the helper
+with the call to `prepare_revision_walk()`, which can return an error without
+dying on its own:
+
+----
+static void walken_commit_walk(struct rev_info *rev)
+{
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+}
+----
+
+NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
+`stderr` it's likely to be seen by a human, so we will localize it.
+
+==== Performing the Walk!
+
+Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
+can also be used as an iterator; we move to the next item in the walk by using
+`get_revision()` repeatedly. Add the listed variable declarations at the top and
+the walk loop below the `prepare_revision_walk()` call within your
+`walken_commit_walk()`:
+
+----
+static void walken_commit_walk(struct rev_info *rev)
+{
+	struct commit *commit;
+	struct strbuf prettybuf = STRBUF_INIT;
+
+	...
+
+	while ((commit = get_revision(rev))) {
+		if (!commit)
+			continue;
+
+		strbuf_reset(&prettybuf);
+		pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
+		puts(prettybuf.buf);
+	}
+	strbuf_release(&prettybuf);
+}
+----
+
+NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
+command we expect to be machine-parsed, we're sending it directly to stdout.
+
+Give it a shot.
+
+----
+$ make
+$ ./bin-wrappers/git walken
+----
+
+You should see all of the subject lines of all the commits in
+your tree's history, in order, ending with the initial commit, "Initial revision
+of "git", the information manager from hell". Congratulations! You've written
+your first revision walk. You can play with printing some additional fields
+from each commit if you're curious; have a look at the functions available in
+`commit.h`.
+
+=== Adding a Filter
+
+Next, let's try to filter the commits we see based on their author. This is
+equivalent to running `git log --author=<pattern>`. We can add a filter by
+modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
+
+First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add
+`grep_config()` to `git_walken_config()`:
+
+----
+static void init_walken_defaults(void)
+{
+	init_grep_defaults(the_repository);
+}
+
+...
+
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	grep_config(var, value, cb);
+	return git_default_config(var, value, cb);
+}
+----
+
+Next, we can modify the `grep_filter`. This is done with convenience functions
+found in `grep.h`. For fun, we're filtering to only commits from folks using a
+`gmail.com` email address - a not-very-precise guess at who may be working on
+Git as a hobby. Since we're checking the author, which is a specific line in the
+header, we'll use the `append_header_grep_pattern()` helper. We can use
+the `enum grep_header_field` to indicate which part of the commit header we want
+to search.
+
+In `final_rev_info_setup()`, add your filter line:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
+		"gmail");
+	compile_grep_patterns(&rev->grep_filter);
+
+	...
+}
+----
+
+`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
+it won't work unless we compile it with `compile_grep_patterns()`.
+
+NOTE: If you are using `setup_revisions()` (for example, if you are passing a
+`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
+to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
+
+NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
+wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
+`enum grep_pat_token` for us.
+
+=== Changing the Order
+
+There are a few ways that we can change the order of the commits during a
+revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
+typical orderings.
+
+`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
+before all of its children have been shown, and we avoid mixing commits which
+are in different lines of history. (`git help log`'s section on `--topo-order`
+has a very nice diagram to illustrate this.)
+
+Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
+`REV_SORT_BY_AUTHOR_DATE`. Add the following:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_COMMIT_DATE;
+
+	...
+}
+----
+
+Let's output this into a file so we can easily diff it with the walk sorted by
+author date.
+
+----
+$ make
+$ ./bin-wrappers/git walken > commit-date.txt
+----
+
+Then, let's sort by author date and run it again.
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
+
+	...
+}
+----
+
+----
+$ make
+$ ./bin-wrappers/git walken > author-date.txt
+----
+
+Finally, compare the two. This is a little less helpful without object names or
+dates, but hopefully we get the idea.
+
+----
+$ diff -u commit-date.txt author-date.txt
+----
+
+This display indicates that commits can be reordered after they're written, for
+example with `git rebase`.
+
+Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
+Set that flag somewhere inside of `final_rev_info_setup()`:
+
+----
+static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
+		struct rev_info *rev)
+{
+	...
+
+	rev->reverse = 1;
+
+	...
+}
+----
+
+Run your walk again and note the difference in order. (If you remove the grep
+pattern, you should see the last commit this call gives you as your current
+HEAD.)
+
+== Basic Object Walk
+
+So far we've been walking only commits. But Git has more types of objects than
+that! Let's see if we can walk _all_ objects, and find out some information
+about each one.
+
+We can base our work on an example. `git pack-objects` prepares all kinds of
+objects for packing into a bitmap or packfile. The work we are interested in
+resides in `builtins/pack-objects.c:get_object_list()`; examination of that
+function shows that the all-object walk is being performed by
+`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
+functions reside in `list-objects.c`; examining the source shows that, despite
+the name, these functions traverse all kinds of objects. Let's have a look at
+the arguments to `traverse_commit_list_filtered()`, which are a superset of the
+arguments to the unfiltered version.
+
+- `struct list_objects_filter_options *filter_options`: This is a struct which
+  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
+- `struct rev_info *revs`: This is the `rev_info` used for the walk.
+- `show_commit_fn show_commit`: A callback which will be used to handle each
+  individual commit object.
+- `show_object_fn show_object`: A callback which will be used to handle each
+  non-commit object (so each blob, tree, or tag).
+- `void *show_data`: A context buffer which is passed in turn to `show_commit`
+  and `show_object`.
+- `struct oidset *omitted`: A linked-list of object IDs which the provided
+  filter caused to be omitted.
+
+It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
+instead of needing us to call it repeatedly ourselves. Cool! Let's add the
+callbacks first.
+
+For the sake of this tutorial, we'll simply keep track of how many of each kind
+of object we find. At file scope in `builtin/walken.c` add the following
+tracking variables:
+
+----
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+----
+
+Commits are handled by a different callback than other objects; let's do that
+one first:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	commit_count++;
+}
+----
+
+The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
+the `buf` argument is actually the context buffer that we can provide to the
+traversal calls - `show_data`, which we mentioned a moment ago.
+
+Since we have the `struct commit` object, we can look at all the same parts that
+we looked at in our earlier commit-only walk. For the sake of this tutorial,
+though, we'll just increment the commit counter and move on.
+
+The callback for non-commits is a little different, as we'll need to check
+which kind of object we're dealing with:
+
+----
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	switch (obj->type) {
+	case OBJ_TREE:
+		tree_count++;
+		break;
+	case OBJ_BLOB:
+		blob_count++;
+		break;
+	case OBJ_TAG:
+		tag_count++;
+		break;
+	case OBJ_COMMIT:
+		BUG("unexpected commit object in walken_show_object\n");
+	default:
+		BUG("unexpected object type %s in walken_show_object\n",
+			type_name(obj->type));
+	}
+}
+----
+
+Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
+context pointer that `walken_show_commit()` receives: the `show_data` argument
+to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
+`str` contains the name of the object, which ends up being something like
+`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
+
+To help assure us that we aren't double-counting commits, we'll include some
+complaining if a commit object is routed through our non-commit callback; we'll
+also complain if we see an invalid object type. Since those two cases should be
+unreachable, and would only change in the event of a semantic change to the Git
+codebase, we complain by using `BUG()` - which is a signal to a developer that
+the change they made caused unintended consequences, and the rest of the
+codebase needs to be updated to understand that change. `BUG()` is not intended
+to be seen by the public, so it is not localized.
+
+Our main object walk implementation is substantially different from our commit
+walk implementation, so let's make a new function to perform the object walk. We
+can perform setup which is applicable to all objects here, too, to keep separate
+from setup which is applicable to commit-only walks.
+
+We'll start by enabling all types of objects in the `struct rev_info`.  We'll
+also turn on `tree_blobs_in_commit_order`, which means that we will walk a
+commit's tree and everything it points to immediately after we find each commit,
+as opposed to waiting for the end and walking through all trees after the commit
+history has been discovered. With the appropriate settings configured, we are
+ready to call `prepare_revision_walk()`.
+
+----
+static void walken_object_walk(struct rev_info *rev)
+{
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+
+	commit_count = 0;
+	tag_count = 0;
+	blob_count = 0;
+	tree_count = 0;
+----
+
+Let's start by calling just the unfiltered walk and reporting our counts.
+Complete your implementation of `walken_object_walk()`:
+
+----
+	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+	printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
+		blob_count, tag_count, tree_count);
+}
+----
+
+NOTE: This output is intended to be machine-parsed. Therefore, we are not
+sending it to `trace_printf()`, and we are not localizing it - we need scripts
+to be able to count on the formatting to be exactly the way it is shown here.
+If we were intending this output to be read by humans, we would need to localize
+it with `_()`.
+
+Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
+command line options is out of scope for this tutorial, so we'll just hardcode
+a branch we can change at compile time. Where you call `final_rev_info_setup()`
+and `walken_commit_walk()`, instead branch like so:
+
+----
+	if (1) {
+		add_head_to_pending(&rev);
+		walken_object_walk(&rev);
+	} else {
+		final_rev_info_setup(argc, argv, prefix, &rev);
+		walken_commit_walk(&rev);
+	}
+----
+
+NOTE: For simplicity, we've avoided all the filters and sorts we applied in
+`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
+want, you can certainly use the filters we added before by moving
+`final_rev_info_setup()` out of the conditional and removing the call to
+`add_head_to_pending()`.
+
+Now we can try to run our command! It should take noticeably longer than the
+commit walk, but an examination of the output will give you an idea why. Your
+output should look similar to this example, but with different counts:
+
+----
+Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
+----
+
+This makes sense. We have more trees than commits because the Git project has
+lots of subdirectories which can change, plus at least one tree per commit. We
+have no tags because we started on a commit (`HEAD`) and while tags can point to
+commits, commits can't point to tags.
+
+NOTE: You will have different counts when you run this yourself! The number of
+objects grows along with the Git project.
+
+=== Adding a Filter
+
+There are a handful of filters that we can apply to the object walk laid out in
+`Documentation/rev-list-options.txt`. These filters are typically useful for
+operations such as creating packfiles or performing a partial clone. They are
+defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
+will use the "tree:1" filter, which causes the walk to omit all trees and blobs
+which are not directly referenced by commits reachable from the commit in
+`pending` when the walk begins. (`pending` is the list of objects which need to
+be traversed during a walk; you can imagine a breadth-first tree traversal to
+help understand. In our case, that means we omit trees and blobs not directly
+referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
+`HEAD` in the `pending` list.)
+
+First, we'll need to `#include "list-objects-filter-options.h`" and set up the
+`struct list_objects_filter_options` at the top of the function.
+
+----
+static void walken_object_walk(struct rev_info *rev)
+{
+	struct list_objects_filter_options filter_options = {};
+
+	...
+----
+
+For now, we are not going to track the omitted objects, so we'll replace those
+parameters with `NULL`. For the sake of simplicity, we'll add a simple
+build-time branch to use our filter or not. Replace the line calling
+`traverse_commit_list()` with the following, which will remind us which kind of
+walk we've just performed:
+
+----
+	if (0) {
+		/* Unfiltered: */
+		trace_printf(_("Unfiltered object walk.\n"));
+		traverse_commit_list(rev, walken_show_commit,
+				walken_show_object, NULL);
+	} else {
+		trace_printf(
+			_("Filtered object walk with filterspec 'tree:1'.\n"));
+		parse_list_objects_filter(&filter_options, "tree:1");
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, NULL);
+	}
+----
+
+`struct list_objects_filter_options` is usually built directly from a command
+line argument, so the module provides an easy way to build one from a string.
+Even though we aren't taking user input right now, we can still build one with
+a hardcoded string using `parse_list_objects_filter()`.
+
+With the filter spec "tree:1", we are expecting to see _only_ the root tree for
+each commit; therefore, the tree object count should be less than or equal to
+the number of commits. (For an example of why that's true: `git commit --revert`
+points to the same tree object as its grandparent.)
+
+=== Counting Omitted Objects
+
+We also have the capability to enumerate all objects which were omitted by a
+filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
+`traverse_commit_list_filtered()` to populate the `omitted` list means that our
+object walk does not perform any better than an unfiltered object walk; all
+reachable objects are walked in order to populate the list.
+
+First, add the `struct oidset` and related items we will use to iterate it:
+
+----
+static void walken_object_walk(
+	...
+
+	struct oidset omitted;
+	struct oidset_iter oit;
+	struct object_id *oid = NULL;
+	int omitted_count = 0;
+	oidset_init(&omitted, 0);
+
+	...
+----
+
+Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
+object:
+
+----
+	...
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, &omitted);
+
+	...
+----
+
+Then, after your traversal, the `oidset` traversal is pretty straightforward.
+Count all the objects within and modify the print statement:
+
+----
+	/* Count the omitted objects. */
+	oidset_iter_init(&omitted, &oit);
+
+	while ((oid = oidset_iter_next(&oit)))
+		omitted_count++;
+
+	printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n",
+		commit_count, blob_count, tag_count, tree_count, omitted_count);
+----
+
+By running your walk with and without the filter, you should find that the total
+object count in each case is identical. You can also time each invocation of
+the `walken` subcommand, with and without `omitted` being passed in, to confirm
+to yourself the runtime impact of tracking all omitted objects.
+
+=== Changing the Order
+
+Finally, let's demonstrate that you can also reorder walks of all objects, not
+just walks of commits. First, we'll make our handlers chattier - modify
+`walken_show_commit()` and `walken_show_object()` to print the object as they
+go:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
+	commit_count++;
+}
+
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
+
+	...
+}
+----
+
+NOTE: Since we will be examining this output directly as humans, we'll use
+`trace_printf()` here. Additionally, since this change introduces a significant
+number of printed lines, using `trace_printf()` will allow us to easily silence
+those lines without having to recompile.
+
+(Leave the counter increment logic in place.)
+
+With only that change, run again (but save yourself some scrollback):
+
+----
+$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
+----
+
+Take a look at the top commit with `git show` and the object ID you printed; it
+should be the same as the output of `git show HEAD`.
+
+Next, let's change a setting on our `struct rev_info` within
+`walken_object_walk()`. Find where you're changing the other settings on `rev`,
+such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
+`reverse` setting at the bottom:
+
+----
+	...
+
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+	rev->reverse = 1;
+
+	...
+----
+
+Now, run again, but this time, let's grab the last handful of objects instead
+of the first handful:
+
+----
+$ make
+$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
+----
+
+The last commit object given should have the same OID as the one we saw at the
+top before, and running `git show <oid>` with that OID should give you again
+the same results as `git show HEAD`. Furthermore, if you run and examine the
+first ten lines again (with `head` instead of `tail` like we did before applying
+the `reverse` setting), you should see that now the first commit printed is the
+initial commit, `e83c5163`.
+
+== Wrapping Up
+
+Let's review. In this tutorial, we:
+
+- Built a commit walk from the ground up
+- Enabled a grep filter for that commit walk
+- Changed the sort order of that filtered commit walk
+- Built an object walk (tags, commits, trees, and blobs) from the ground up
+- Learned how to add a filter-spec to an object walk
+- Changed the display order of the filtered object walk
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

* Re: [PATCH v5] documentation: add tutorial for object walking
  2019-10-10 15:19       ` [PATCH v5] documentation: add tutorial for object walking Emily Shaffer
@ 2019-10-11  5:50         ` Junio C Hamano
  2019-10-11 23:26           ` Emily Shaffer
  2019-10-11 17:50         ` SZEDER Gábor
  2019-10-11 23:55         ` [PATCH v6] " Emily Shaffer
  2 siblings, 1 reply; 102+ messages in thread
From: Junio C Hamano @ 2019-10-11  5:50 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: git, Eric Sunshine, Jonathan Tan, Josh Steadmon

Emily Shaffer <emilyshaffer@google.com> writes:

> @@ -77,6 +77,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica
>  SP_ARTICLES += $(API_DOCS)
>  
>  TECH_DOCS += MyFirstContribution
> +TECH_DOCS += MyFirstRevWalk

s/Rev/Object/ probably (if so I can locally amend).

> diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
> new file mode 100644
> index 0000000000..7085f17072
> --- /dev/null
> +++ b/Documentation/MyFirstObjectWalk.txt
> @@ -0,0 +1,905 @@
> +My First Object Walk
> +======================
> +
> +== What's an Object Walk?
> +
> +The object walk is a key concept in Git - this is the process that underpins
> +operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the
> +list of objects is found by walking parent relationships between objects.

The above is more about revision walk, for which we have plenty of
docs already, isn't it?  Walking objects, while walking the commit
DAG, is a lessor concept than the key "revision walk" concept and
underpins different set of operations like object transfer and fsck.

Also, the object walk, unlike the revision walk, follows containment
relationships between objects.

> +A related concept is the revision walk, which is focused on commit objects and
> +their relationships.

Yes, s/their/& parent/ perhaps, to contrast the two a bit better.
`git log` and friends, if they need to be listed, should come on
this side, I think.


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5] documentation: add tutorial for object walking
  2019-10-10 15:19       ` [PATCH v5] documentation: add tutorial for object walking Emily Shaffer
  2019-10-11  5:50         ` Junio C Hamano
@ 2019-10-11 17:50         ` SZEDER Gábor
  2019-10-11 23:33           ` Emily Shaffer
  2019-10-11 23:55         ` [PATCH v6] " Emily Shaffer
  2 siblings, 1 reply; 102+ messages in thread
From: SZEDER Gábor @ 2019-10-11 17:50 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Junio C Hamano, Eric Sunshine, Jonathan Tan, Josh Steadmon

On Thu, Oct 10, 2019 at 08:19:32AM -0700, Emily Shaffer wrote:
> diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
> new file mode 100644
> index 0000000000..7085f17072
> --- /dev/null
> +++ b/Documentation/MyFirstObjectWalk.txt
> @@ -0,0 +1,905 @@
> +My First Object Walk
> +======================

In our CI builds [1] Asciidoctor complains about the above line like
this:

  asciidoctor: WARNING: MyFirstObjectWalk.txt: line 2: unterminated example block

I have no idea what it is trying to say, but I suspect that it
complains about the length of that '=====' line not matching the
length of the previous title line.  I kicked off a build with the
'====' line shortened, and it did silence that warning, and the build
succeeded.

Note, however, that we recently had a patch [2] that argued that a
different header notation is better, at least for the Git User Manual.
I'm not sure whether that applies for this tutorial as well; just
mentioning it for consideration.


[1] https://travis-ci.org/git/git/jobs/596474664#L1192
[2] fd5b820d9c (user-manual.txt: change header notation, 2019-09-22)


^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5] documentation: add tutorial for object walking
  2019-10-11  5:50         ` Junio C Hamano
@ 2019-10-11 23:26           ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-10-11 23:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine, Jonathan Tan, Josh Steadmon

On Fri, Oct 11, 2019 at 02:50:34PM +0900, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > @@ -77,6 +77,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica
> >  SP_ARTICLES += $(API_DOCS)
> >  
> >  TECH_DOCS += MyFirstContribution
> > +TECH_DOCS += MyFirstRevWalk
> 
> s/Rev/Object/ probably (if so I can locally amend).
Yes, that's the case, although I'll send a reroll shortly if you don't
want to amend locally - there's other stuff to fix according to you and
SZEDER.

> 
> > diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
> > new file mode 100644
> > index 0000000000..7085f17072
> > --- /dev/null
> > +++ b/Documentation/MyFirstObjectWalk.txt
> > @@ -0,0 +1,905 @@
> > +My First Object Walk
> > +======================
> > +
> > +== What's an Object Walk?
> > +
> > +The object walk is a key concept in Git - this is the process that underpins
> > +operations like `git log`, `git blame`, and `git reflog`. Beginning at HEAD, the
> > +list of objects is found by walking parent relationships between objects.
> 
> The above is more about revision walk, for which we have plenty of
> docs already, isn't it?  Walking objects, while walking the commit
> DAG, is a lessor concept than the key "revision walk" concept and
> underpins different set of operations like object transfer and fsck.
> 
> Also, the object walk, unlike the revision walk, follows containment
> relationships between objects.

Good point; I'll rewrite this paragraph rather than trying to just edit
it a little for the next reroll.

> 
> > +A related concept is the revision walk, which is focused on commit objects and
> > +their relationships.
> 
> Yes, s/their/& parent/ perhaps, to contrast the two a bit better.
> `git log` and friends, if they need to be listed, should come on
> this side, I think.
> 
OK.

^ permalink raw reply	[flat|nested] 102+ messages in thread

* Re: [PATCH v5] documentation: add tutorial for object walking
  2019-10-11 17:50         ` SZEDER Gábor
@ 2019-10-11 23:33           ` Emily Shaffer
  0 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-10-11 23:33 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: git, Junio C Hamano, Eric Sunshine, Jonathan Tan, Josh Steadmon

On Fri, Oct 11, 2019 at 07:50:10PM +0200, SZEDER Gábor wrote:
> On Thu, Oct 10, 2019 at 08:19:32AM -0700, Emily Shaffer wrote:
> > diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
> > new file mode 100644
> > index 0000000000..7085f17072
> > --- /dev/null
> > +++ b/Documentation/MyFirstObjectWalk.txt
> > @@ -0,0 +1,905 @@
> > +My First Object Walk
> > +======================
> 
> In our CI builds [1] Asciidoctor complains about the above line like
> this:
> 
>   asciidoctor: WARNING: MyFirstObjectWalk.txt: line 2: unterminated example block
> 
> I have no idea what it is trying to say, but I suspect that it
> complains about the length of that '=====' line not matching the
> length of the previous title line.  I kicked off a build with the
> '====' line shortened, and it did silence that warning, and the build
> succeeded.

Oh, I'm sure it's a consequence of changing it from "Revision" to
"Object" in the title.

> 
> Note, however, that we recently had a patch [2] that argued that a
> different header notation is better, at least for the Git User Manual.
> I'm not sure whether that applies for this tutorial as well; just
> mentioning it for consideration.

Interesting read! It looks like we actually do use the format that that
patch suggests is better, except for the title. I'll change the title to
come into line with everything else, since I'm sending a reroll anyways.

Thanks!
 - Emily

^ permalink raw reply	[flat|nested] 102+ messages in thread

* [PATCH v6] documentation: add tutorial for object walking
  2019-10-10 15:19       ` [PATCH v5] documentation: add tutorial for object walking Emily Shaffer
  2019-10-11  5:50         ` Junio C Hamano
  2019-10-11 17:50         ` SZEDER Gábor
@ 2019-10-11 23:55         ` Emily Shaffer
  2 siblings, 0 replies; 102+ messages in thread
From: Emily Shaffer @ 2019-10-11 23:55 UTC (permalink / raw)
  To: git
  Cc: Emily Shaffer, Junio C Hamano, Eric Sunshine, Jonathan Tan,
	Josh Steadmon, SZEDER Gábor

Existing documentation on object walks seems to be primarily intended
as a reference for those already familiar with the procedure. This
tutorial attempts to give an entry-level guide to a couple of bare-bones
object walks so that new Git contributors can learn the concepts
without having to wade through options parsing or special casing.

The target audience is a Git contributor who is just getting started
with the concept of object walking. The goal is to prepare this
contributor to be able to understand and modify existing commands which
perform revision walks more easily, although it will also prepare
contributors to create new commands which perform walks.

The tutorial covers a basic overview of the structs involved during
object walk, setting up a basic commit walk, setting up a basic
all-object walk, and adding some configuration changes to both walk
types. It intentionally does not cover how to create new commands or
search for options from the command line or gitconfigs.

There is an associated patchset at
https://github.com/nasamuffin/git/tree/revwalk that contains a reference
implementation of the code generated by this tutorial.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
---
Since v5 only a couple small changes:

The opening paragraph was reworked per Junio's comments to hopefully
make more sense for this document's new life as an object walking
tutorial (instead of a revision walking one).

The title formatting line was changed from "underlining" with === to
being prepended with a single "=", to both silence a warning and match
the headers through the rest of the document.

Thanks!
 - Emily

 Documentation/Makefile              |   1 +
 Documentation/MyFirstObjectWalk.txt | 906 ++++++++++++++++++++++++++++
 2 files changed, 907 insertions(+)
 create mode 100644 Documentation/MyFirstObjectWalk.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 06d85ad958..8fe829cc1b 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -77,6 +77,7 @@ API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt technica
 SP_ARTICLES += $(API_DOCS)
 
 TECH_DOCS += MyFirstContribution
+TECH_DOCS += MyFirstObjectWalk
 TECH_DOCS += SubmittingPatches
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
diff --git a/Documentation/MyFirstObjectWalk.txt b/Documentation/MyFirstObjectWalk.txt
new file mode 100644
index 0000000000..4d24daeb9f
--- /dev/null
+++ b/Documentation/MyFirstObjectWalk.txt
@@ -0,0 +1,906 @@
+= My First Object Walk
+
+== What's an Object Walk?
+
+The object walk is a key concept in Git - this is the process that underpins
+operations like object transfer and fsck. Beginning from a given commit, the
+list of objects is found by walking parent relationships between commits (commit
+X based on commit W) and containment relationships between objects (tree Y is
+contained within commit X, and blob Z is located within tree Y, giving our
+working tree for commit X something like `y/z.txt`).
+
+A related concept is the revision walk, which is focused on commit objects and
+their parent relationships and does not delve into other object types. The
+revision walk is used for operations like `git log`.
+
+=== Related Reading
+
+- `Documentation/user-manual.txt` under "Hacking Git" contains some coverage of
+  the revision walker in its various incarnations.
+- `Documentation/technical/api-revision-walking.txt`
+- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
+  gives a good overview of the types of objects in Git and what your object
+  walk is really describing.
+
+== Setting Up
+
+Create a new branch from `master`.
+
+----
+git checkout -b revwalk origin/master
+----
+
+We'll put our fiddling into a new command. For fun, let's name it `git walken`.
+Open up a new file `builtin/walken.c` and set up the command handler:
+
+----
+/*
+ * "git walken"
+ *
+ * Part of the "My First Object Walk" tutorial.
+ */
+
+#include "builtin.h"
+
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	trace_printf(_("cmd_walken incoming...\n"));
+	return 0;
+}
+----
+
+NOTE: `trace_printf()` differs from `printf()` in that it can be turned on or
+off at runtime. For the purposes of this tutorial, we will write `walken` as
+though it is intended for use as a "plumbing" command: that is, a command which
+is used primarily in scripts, rather than interactively by humans (a "porcelain"
+command). So we will send our debug output to `trace_printf()` instead. When
+running, enable trace output by setting the environment variable `GIT_TRACE`.
+
+Add usage text and `-h` handling, like all subcommands should consistently do
+(our test suite will notice and complain if you fail to do so).
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	const char * const walken_usage[] = {
+		N_("git walken"),
+		NULL,
+	}
+	struct option options[] = {
+		OPT_END()
+	};
+
+	argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
+
+	...
+}
+----
+
+Also add the relevant line in `builtin.h` near `cmd_whatchanged()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix);
+----
+
+Include the command in `git.c` in `commands[]` near the entry for `whatchanged`,
+maintaining alphabetical ordering:
+
+----
+{ "walken", cmd_walken, RUN_SETUP },
+----
+
+Add it to the `Makefile` near the line for `builtin/worktree.o`:
+
+----
+BUILTIN_OBJS += builtin/walken.o
+----
+
+Build and test out your command, without forgetting to ensure the `DEVELOPER`
+flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
+
+----
+$ echo DEVELOPER=1 >>config.mak
+$ make
+$ GIT_TRACE=1 ./bin-wrappers/git walken
+----
+
+NOTE: For a more exhaustive overview of the new command process, take a look at
+`Documentation/MyFirstContribution.txt`.
+
+NOTE: A reference implementation can be found at
+https://github.com/nasamuffin/git/tree/revwalk.
+
+=== `struct rev_cmdline_info`
+
+The definition of `struct rev_cmdline_info` can be found in `revision.h`.
+
+This struct is contained within the `rev_info` struct and is used to reflect
+parameters provided by the user over the CLI.
+
+`nr` represents the number of `rev_cmdline_entry` present in the array.
+
+`alloc` is used by the `ALLOC_GROW` macro. Check
+`Documentation/technical/api-allocation-growing.txt` - this variable is used to
+track the allocated size of the list.
+
+Per entry, we find:
+
+`item` is the object provided upon which to base the object walk. Items in Git
+can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.txt`.)
+
+`name` is the object ID (OID) of the object - a hex string you may be familiar
+with from using Git to organize your source in the past. Check the tutorial
+mentioned above towards the top for a discussion of where the OID can come
+from.
+
+`whence` indicates some information about what to do with the parents of the
+specified object. We'll explore this flag more later on; take a look at
+`Documentation/revisions.txt` to get an idea of what could set the `whence`
+value.
+
+`flags` are used to hint the beginning of the revision walk and are the first
+block under the `#include`s in `revision.h`. The most likely ones to be set in
+the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
+can be used during the walk, as well.
+
+=== `struct rev_info`
+
+This one is quite a bit longer, and many fields are only used during the walk
+by `revision.c` - not configuration options. Most of the configurable flags in
+`struct rev_info` have a mirror in `Documentation/rev-list-options.txt`. It's a
+good idea to take some time and read through that document.
+
+== Basic Commit Walk
+
+First, let's see if we can replicate the output of `git log --oneline`. We'll
+refer back to the implementation frequently to discover norms when performing
+an object walk of our own.
+
+To do so, we'll first find all the commits, in order, which preceded the current
+commit. We'll extract the name and subject of the commit from each.
+
+Ideally, we will also be able to find out which ones are currently at the tip of
+various branches.
+
+=== Setting Up
+
+Preparing for your object walk has some distinct stages.
+
+1. Perform default setup for this mode, and others which may be invoked.
+2. Check configuration files for relevant settings.
+3. Set up the `rev_info` struct.
+4. Tweak the initialized `rev_info` to suit the current walk.
+5. Prepare the `rev_info` for the walk.
+6. Iterate over the objects, processing each one.
+
+==== Default Setups
+
+Before examining configuration files which may modify command behavior, set up
+default state for switches or options your command may have. If your command
+utilizes other Git components, ask them to set up their default states as well.
+For instance, `git log` takes advantage of `grep` and `diff` functionality, so
+its `init_log_defaults()` sets its own state (`decoration_style`) and asks
+`grep` and `diff` to initialize themselves by calling each of their
+initialization functions.
+
+For our first example within `git walken`, we don't intend to use any other
+components within Git, and we don't have any configuration to do.  However, we
+may want to add some later, so for now, we can add an empty placeholder. Create
+a new function in `builtin/walken.c`:
+
+----
+static void init_walken_defaults(void)
+{
+	/*
+	 * We don't actually need the same components `git log` does; leave this
+	 * empty for now.
+	 */
+}
+----
+
+Make sure to add a line invoking it inside of `cmd_walken()`.
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	init_walken_defaults();
+}
+----
+
+==== Configuring From `.gitconfig`
+
+Next, we should have a look at any relevant configuration settings (i.e.,
+settings readable and settable from `git config`). This is done by providing a
+callback to `git_config()`; within that callback, you can also invoke methods
+from other components you may need that need to intercept these options. Your
+callback will be invoked once per each configuration value which Git knows about
+(global, local, worktree, etc.).
+
+Similarly to the default values, we don't have anything to do here yet
+ourselves; however, we should call `git_default_config()` if we aren't calling
+any other existing config callbacks.
+
+Add a new function to `builtin/walken.c`:
+
+----
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	/*
+	 * For now, we don't have any custom configuration, so fall back to
+	 * the default config.
+	 */
+	return git_default_config(var, value, cb);
+}
+----
+
+Make sure to invoke `git_config()` with it in your `cmd_walken()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	git_config(git_walken_config, NULL);
+
+	...
+}
+----
+
+==== Setting Up `rev_info`
+
+Now that we've gathered external configuration and options, it's time to
+initialize the `rev_info` object which we will use to perform the walk. This is
+typically done by calling `repo_init_revisions()` with the repository you intend
+to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
+struct.
+
+Add the `struct rev_info` and the `repo_init_revisions()` call:
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	/* This can go wherever you like in your declarations.*/
+	struct rev_info rev;
+	...
+
+	/* This should go after the git_config() call. */
+	repo_init_revisions(the_repository, &rev, prefix);
+
+	...
+}
+----
+
+==== Tweaking `rev_info` For the Walk
+
+We're getting close, but we're still not quite ready to go. Now that `rev` is
+initialized, we can modify it to fit our needs. This is usually done within a
+helper for clarity, so let's add one:
+
+----
+static void final_rev_info_setup(struct rev_info *rev)
+{
+	/*
+	 * We want to mimic the appearance of `git log --oneline`, so let's
+	 * force oneline format.
+	 */
+	get_commit_format("oneline", rev);
+
+	/* Start our object walk at HEAD. */
+	add_head_to_pending(rev);
+}
+----
+
+[NOTE]
+====
+Instead of using the shorthand `add_head_to_pending()`, you could do
+something like this:
+----
+	struct setup_revision_opt opt;
+
+	memset(&opt, 0, sizeof(opt));
+	opt.def = "HEAD";
+	opt.revarg_opt = REVARG_COMMITTISH;
+	setup_revisions(argc, argv, rev, &opt);
+----
+Using a `setup_revision_opt` gives you finer control over your walk's starting
+point.
+====
+
+Then let's invoke `final_rev_info_setup()` after the call to
+`repo_init_revisions()`:
+
+----
+int cmd_walken(int argc, const char **argv, const char *prefix)
+{
+	...
+
+	final_rev_info_setup(&rev);
+
+	...
+}
+----
+
+Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
+now, this is all we need.
+
+==== Preparing `rev_info` For the Walk
+
+Now that `rev` is all initialized and configured, we've got one more setup step
+before we get rolling. We can do this in a helper, which will both prepare the
+`rev_info` for the walk, and perform the walk itself. Let's start the helper
+with the call to `prepare_revision_walk()`, which can return an error without
+dying on its own:
+
+----
+static void walken_commit_walk(struct rev_info *rev)
+{
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+}
+----
+
+NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
+`stderr` it's likely to be seen by a human, so we will localize it.
+
+==== Performing the Walk!
+
+Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
+can also be used as an iterator; we move to the next item in the walk by using
+`get_revision()` repeatedly. Add the listed variable declarations at the top and
+the walk loop below the `prepare_revision_walk()` call within your
+`walken_commit_walk()`:
+
+----
+static void walken_commit_walk(struct rev_info *rev)
+{
+	struct commit *commit;
+	struct strbuf prettybuf = STRBUF_INIT;
+
+	...
+
+	while ((commit = get_revision(rev))) {
+		if (!commit)
+			continue;
+
+		strbuf_reset(&prettybuf);
+		pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
+		puts(prettybuf.buf);
+	}
+	strbuf_release(&prettybuf);
+}
+----
+
+NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
+command we expect to be machine-parsed, we're sending it directly to stdout.
+
+Give it a shot.
+
+----
+$ make
+$ ./bin-wrappers/git walken
+----
+
+You should see all of the subject lines of all the commits in
+your tree's history, in order, ending with the initial commit, "Initial revision
+of "git", the information manager from hell". Congratulations! You've written
+your first revision walk. You can play with printing some additional fields
+from each commit if you're curious; have a look at the functions available in
+`commit.h`.
+
+=== Adding a Filter
+
+Next, let's try to filter the commits we see based on their author. This is
+equivalent to running `git log --author=<pattern>`. We can add a filter by
+modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
+
+First some setup. Add `init_grep_defaults()` to `init_walken_defaults()` and add
+`grep_config()` to `git_walken_config()`:
+
+----
+static void init_walken_defaults(void)
+{
+	init_grep_defaults(the_repository);
+}
+
+...
+
+static int git_walken_config(const char *var, const char *value, void *cb)
+{
+	grep_config(var, value, cb);
+	return git_default_config(var, value, cb);
+}
+----
+
+Next, we can modify the `grep_filter`. This is done with convenience functions
+found in `grep.h`. For fun, we're filtering to only commits from folks using a
+`gmail.com` email address - a not-very-precise guess at who may be working on
+Git as a hobby. Since we're checking the author, which is a specific line in the
+header, we'll use the `append_header_grep_pattern()` helper. We can use
+the `enum grep_header_field` to indicate which part of the commit header we want
+to search.
+
+In `final_rev_info_setup()`, add your filter line:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
+		"gmail");
+	compile_grep_patterns(&rev->grep_filter);
+
+	...
+}
+----
+
+`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
+it won't work unless we compile it with `compile_grep_patterns()`.
+
+NOTE: If you are using `setup_revisions()` (for example, if you are passing a
+`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
+to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
+
+NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
+wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
+`enum grep_pat_token` for us.
+
+=== Changing the Order
+
+There are a few ways that we can change the order of the commits during a
+revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
+typical orderings.
+
+`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
+before all of its children have been shown, and we avoid mixing commits which
+are in different lines of history. (`git help log`'s section on `--topo-order`
+has a very nice diagram to illustrate this.)
+
+Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
+`REV_SORT_BY_AUTHOR_DATE`. Add the following:
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_COMMIT_DATE;
+
+	...
+}
+----
+
+Let's output this into a file so we can easily diff it with the walk sorted by
+author date.
+
+----
+$ make
+$ ./bin-wrappers/git walken > commit-date.txt
+----
+
+Then, let's sort by author date and run it again.
+
+----
+static void final_rev_info_setup(int argc, const char **argv,
+		const char *prefix, struct rev_info *rev)
+{
+	...
+
+	rev->topo_order = 1;
+	rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
+
+	...
+}
+----
+
+----
+$ make
+$ ./bin-wrappers/git walken > author-date.txt
+----
+
+Finally, compare the two. This is a little less helpful without object names or
+dates, but hopefully we get the idea.
+
+----
+$ diff -u commit-date.txt author-date.txt
+----
+
+This display indicates that commits can be reordered after they're written, for
+example with `git rebase`.
+
+Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
+Set that flag somewhere inside of `final_rev_info_setup()`:
+
+----
+static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
+		struct rev_info *rev)
+{
+	...
+
+	rev->reverse = 1;
+
+	...
+}
+----
+
+Run your walk again and note the difference in order. (If you remove the grep
+pattern, you should see the last commit this call gives you as your current
+HEAD.)
+
+== Basic Object Walk
+
+So far we've been walking only commits. But Git has more types of objects than
+that! Let's see if we can walk _all_ objects, and find out some information
+about each one.
+
+We can base our work on an example. `git pack-objects` prepares all kinds of
+objects for packing into a bitmap or packfile. The work we are interested in
+resides in `builtins/pack-objects.c:get_object_list()`; examination of that
+function shows that the all-object walk is being performed by
+`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
+functions reside in `list-objects.c`; examining the source shows that, despite
+the name, these functions traverse all kinds of objects. Let's have a look at
+the arguments to `traverse_commit_list_filtered()`, which are a superset of the
+arguments to the unfiltered version.
+
+- `struct list_objects_filter_options *filter_options`: This is a struct which
+  stores a filter-spec as outlined in `Documentation/rev-list-options.txt`.
+- `struct rev_info *revs`: This is the `rev_info` used for the walk.
+- `show_commit_fn show_commit`: A callback which will be used to handle each
+  individual commit object.
+- `show_object_fn show_object`: A callback which will be used to handle each
+  non-commit object (so each blob, tree, or tag).
+- `void *show_data`: A context buffer which is passed in turn to `show_commit`
+  and `show_object`.
+- `struct oidset *omitted`: A linked-list of object IDs which the provided
+  filter caused to be omitted.
+
+It looks like this `traverse_commit_list_filtered()` uses callbacks we provide
+instead of needing us to call it repeatedly ourselves. Cool! Let's add the
+callbacks first.
+
+For the sake of this tutorial, we'll simply keep track of how many of each kind
+of object we find. At file scope in `builtin/walken.c` add the following
+tracking variables:
+
+----
+static int commit_count;
+static int tag_count;
+static int blob_count;
+static int tree_count;
+----
+
+Commits are handled by a different callback than other objects; let's do that
+one first:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	commit_count++;
+}
+----
+
+The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
+the `buf` argument is actually the context buffer that we can provide to the
+traversal calls - `show_data`, which we mentioned a moment ago.
+
+Since we have the `struct commit` object, we can look at all the same parts that
+we looked at in our earlier commit-only walk. For the sake of this tutorial,
+though, we'll just increment the commit counter and move on.
+
+The callback for non-commits is a little different, as we'll need to check
+which kind of object we're dealing with:
+
+----
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	switch (obj->type) {
+	case OBJ_TREE:
+		tree_count++;
+		break;
+	case OBJ_BLOB:
+		blob_count++;
+		break;
+	case OBJ_TAG:
+		tag_count++;
+		break;
+	case OBJ_COMMIT:
+		BUG("unexpected commit object in walken_show_object\n");
+	default:
+		BUG("unexpected object type %s in walken_show_object\n",
+			type_name(obj->type));
+	}
+}
+----
+
+Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
+context pointer that `walken_show_commit()` receives: the `show_data` argument
+to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
+`str` contains the name of the object, which ends up being something like
+`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
+
+To help assure us that we aren't double-counting commits, we'll include some
+complaining if a commit object is routed through our non-commit callback; we'll
+also complain if we see an invalid object type. Since those two cases should be
+unreachable, and would only change in the event of a semantic change to the Git
+codebase, we complain by using `BUG()` - which is a signal to a developer that
+the change they made caused unintended consequences, and the rest of the
+codebase needs to be updated to understand that change. `BUG()` is not intended
+to be seen by the public, so it is not localized.
+
+Our main object walk implementation is substantially different from our commit
+walk implementation, so let's make a new function to perform the object walk. We
+can perform setup which is applicable to all objects here, too, to keep separate
+from setup which is applicable to commit-only walks.
+
+We'll start by enabling all types of objects in the `struct rev_info`.  We'll
+also turn on `tree_blobs_in_commit_order`, which means that we will walk a
+commit's tree and everything it points to immediately after we find each commit,
+as opposed to waiting for the end and walking through all trees after the commit
+history has been discovered. With the appropriate settings configured, we are
+ready to call `prepare_revision_walk()`.
+
+----
+static void walken_object_walk(struct rev_info *rev)
+{
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+
+	if (prepare_revision_walk(rev))
+		die(_("revision walk setup failed"));
+
+	commit_count = 0;
+	tag_count = 0;
+	blob_count = 0;
+	tree_count = 0;
+----
+
+Let's start by calling just the unfiltered walk and reporting our counts.
+Complete your implementation of `walken_object_walk()`:
+
+----
+	traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
+
+	printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
+		blob_count, tag_count, tree_count);
+}
+----
+
+NOTE: This output is intended to be machine-parsed. Therefore, we are not
+sending it to `trace_printf()`, and we are not localizing it - we need scripts
+to be able to count on the formatting to be exactly the way it is shown here.
+If we were intending this output to be read by humans, we would need to localize
+it with `_()`.
+
+Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
+command line options is out of scope for this tutorial, so we'll just hardcode
+a branch we can change at compile time. Where you call `final_rev_info_setup()`
+and `walken_commit_walk()`, instead branch like so:
+
+----
+	if (1) {
+		add_head_to_pending(&rev);
+		walken_object_walk(&rev);
+	} else {
+		final_rev_info_setup(argc, argv, prefix, &rev);
+		walken_commit_walk(&rev);
+	}
+----
+
+NOTE: For simplicity, we've avoided all the filters and sorts we applied in
+`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
+want, you can certainly use the filters we added before by moving
+`final_rev_info_setup()` out of the conditional and removing the call to
+`add_head_to_pending()`.
+
+Now we can try to run our command! It should take noticeably longer than the
+commit walk, but an examination of the output will give you an idea why. Your
+output should look similar to this example, but with different counts:
+
+----
+Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
+----
+
+This makes sense. We have more trees than commits because the Git project has
+lots of subdirectories which can change, plus at least one tree per commit. We
+have no tags because we started on a commit (`HEAD`) and while tags can point to
+commits, commits can't point to tags.
+
+NOTE: You will have different counts when you run this yourself! The number of
+objects grows along with the Git project.
+
+=== Adding a Filter
+
+There are a handful of filters that we can apply to the object walk laid out in
+`Documentation/rev-list-options.txt`. These filters are typically useful for
+operations such as creating packfiles or performing a partial clone. They are
+defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
+will use the "tree:1" filter, which causes the walk to omit all trees and blobs
+which are not directly referenced by commits reachable from the commit in
+`pending` when the walk begins. (`pending` is the list of objects which need to
+be traversed during a walk; you can imagine a breadth-first tree traversal to
+help understand. In our case, that means we omit trees and blobs not directly
+referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
+`HEAD` in the `pending` list.)
+
+First, we'll need to `#include "list-objects-filter-options.h`" and set up the
+`struct list_objects_filter_options` at the top of the function.
+
+----
+static void walken_object_walk(struct rev_info *rev)
+{
+	struct list_objects_filter_options filter_options = {};
+
+	...
+----
+
+For now, we are not going to track the omitted objects, so we'll replace those
+parameters with `NULL`. For the sake of simplicity, we'll add a simple
+build-time branch to use our filter or not. Replace the line calling
+`traverse_commit_list()` with the following, which will remind us which kind of
+walk we've just performed:
+
+----
+	if (0) {
+		/* Unfiltered: */
+		trace_printf(_("Unfiltered object walk.\n"));
+		traverse_commit_list(rev, walken_show_commit,
+				walken_show_object, NULL);
+	} else {
+		trace_printf(
+			_("Filtered object walk with filterspec 'tree:1'.\n"));
+		parse_list_objects_filter(&filter_options, "tree:1");
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, NULL);
+	}
+----
+
+`struct list_objects_filter_options` is usually built directly from a command
+line argument, so the module provides an easy way to build one from a string.
+Even though we aren't taking user input right now, we can still build one with
+a hardcoded string using `parse_list_objects_filter()`.
+
+With the filter spec "tree:1", we are expecting to see _only_ the root tree for
+each commit; therefore, the tree object count should be less than or equal to
+the number of commits. (For an example of why that's true: `git commit --revert`
+points to the same tree object as its grandparent.)
+
+=== Counting Omitted Objects
+
+We also have the capability to enumerate all objects which were omitted by a
+filter, like with `git log --filter=<spec> --filter-print-omitted`. Asking
+`traverse_commit_list_filtered()` to populate the `omitted` list means that our
+object walk does not perform any better than an unfiltered object walk; all
+reachable objects are walked in order to populate the list.
+
+First, add the `struct oidset` and related items we will use to iterate it:
+
+----
+static void walken_object_walk(
+	...
+
+	struct oidset omitted;
+	struct oidset_iter oit;
+	struct object_id *oid = NULL;
+	int omitted_count = 0;
+	oidset_init(&omitted, 0);
+
+	...
+----
+
+Modify the call to `traverse_commit_list_filtered()` to include your `omitted`
+object:
+
+----
+	...
+
+		traverse_commit_list_filtered(&filter_options, rev,
+			walken_show_commit, walken_show_object, NULL, &omitted);
+
+	...
+----
+
+Then, after your traversal, the `oidset` traversal is pretty straightforward.
+Count all the objects within and modify the print statement:
+
+----
+	/* Count the omitted objects. */
+	oidset_iter_init(&omitted, &oit);
+
+	while ((oid = oidset_iter_next(&oit)))
+		omitted_count++;
+
+	printf("commits %d\nblobs %d\ntags %d\ntrees%d\nomitted %d\n",
+		commit_count, blob_count, tag_count, tree_count, omitted_count);
+----
+
+By running your walk with and without the filter, you should find that the total
+object count in each case is identical. You can also time each invocation of
+the `walken` subcommand, with and without `omitted` being passed in, to confirm
+to yourself the runtime impact of tracking all omitted objects.
+
+=== Changing the Order
+
+Finally, let's demonstrate that you can also reorder walks of all objects, not
+just walks of commits. First, we'll make our handlers chattier - modify
+`walken_show_commit()` and `walken_show_object()` to print the object as they
+go:
+
+----
+static void walken_show_commit(struct commit *cmt, void *buf)
+{
+	trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
+	commit_count++;
+}
+
+static void walken_show_object(struct object *obj, const char *str, void *buf)
+{
+	trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
+
+	...
+}
+----
+
+NOTE: Since we will be examining this output directly as humans, we'll use
+`trace_printf()` here. Additionally, since this change introduces a significant
+number of printed lines, using `trace_printf()` will allow us to easily silence
+those lines without having to recompile.
+
+(Leave the counter increment logic in place.)
+
+With only that change, run again (but save yourself some scrollback):
+
+----
+$ GIT_TRACE=1 ./bin-wrappers/git walken | head -n 10
+----
+
+Take a look at the top commit with `git show` and the object ID you printed; it
+should be the same as the output of `git show HEAD`.
+
+Next, let's change a setting on our `struct rev_info` within
+`walken_object_walk()`. Find where you're changing the other settings on `rev`,
+such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
+`reverse` setting at the bottom:
+
+----
+	...
+
+	rev->tree_objects = 1;
+	rev->blob_objects = 1;
+	rev->tag_objects = 1;
+	rev->tree_blobs_in_commit_order = 1;
+	rev->reverse = 1;
+
+	...
+----
+
+Now, run again, but this time, let's grab the last handful of objects instead
+of the first handful:
+
+----
+$ make
+$ GIT_TRACE=1 ./bin-wrappers git walken | tail -n 10
+----
+
+The last commit object given should have the same OID as the one we saw at the
+top before, and running `git show <oid>` with that OID should give you again
+the same results as `git show HEAD`. Furthermore, if you run and examine the
+first ten lines again (with `head` instead of `tail` like we did before applying
+the `reverse` setting), you should see that now the first commit printed is the
+initial commit, `e83c5163`.
+
+== Wrapping Up
+
+Let's review. In this tutorial, we:
+
+- Built a commit walk from the ground up
+- Enabled a grep filter for that commit walk
+- Changed the sort order of that filtered commit walk
+- Built an object walk (tags, commits, trees, and blobs) from the ground up
+- Learned how to add a filter-spec to an object walk
+- Changed the display order of the filtered object walk
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related	[flat|nested] 102+ messages in thread

end of thread, other threads:[~2019-10-11 23:56 UTC | newest]

Thread overview: 102+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-07  1:07 [PATCH] documentation: add tutorial for revision walking Emily Shaffer
2019-06-07  1:07 ` [RFC PATCH 00/13] example implementation of revwalk tutorial Emily Shaffer
2019-06-07  1:07   ` [RFC PATCH 01/13] walken: add infrastructure for revwalk demo Emily Shaffer
2019-06-07  1:08   ` [RFC PATCH 02/13] walken: add usage to enable -h Emily Shaffer
2019-06-07  1:08   ` [RFC PATCH 03/13] walken: add placeholder to initialize defaults Emily Shaffer
2019-06-07  1:08   ` [RFC PATCH 04/13] walken: add handler to git_config Emily Shaffer
2019-06-07  1:08   ` [RFC PATCH 05/13] walken: configure rev_info and prepare for walk Emily Shaffer
2019-06-07  1:08   ` [RFC PATCH 06/13] walken: perform our basic revision walk Emily Shaffer
2019-06-07  1:08   ` [RFC PATCH 07/13] walken: filter for authors from gmail address Emily Shaffer
2019-06-07  1:08   ` [RFC PATCH 08/13] walken: demonstrate various topographical sorts Emily Shaffer
2019-06-07  1:08   ` [RFC PATCH 09/13] walken: demonstrate reversing a revision walk list Emily Shaffer
2019-06-07  1:08   ` [RFC PATCH 10/13] walken: add unfiltered object walk from HEAD Emily Shaffer
2019-06-07  1:08   ` [RFC PATCH 11/13] walken: add filtered object walk Emily Shaffer
2019-06-07 19:15     ` Jeff Hostetler
2019-06-17 20:30       ` Emily Shaffer
2019-06-07  1:08   ` [RFC PATCH 12/13] walken: count omitted objects Emily Shaffer
2019-06-07  1:08   ` [RFC PATCH 13/13] walken: reverse the object walk order Emily Shaffer
2019-06-07  6:21 ` [PATCH] documentation: add tutorial for revision walking Eric Sunshine
2019-06-10 21:26   ` Junio C Hamano
2019-06-10 21:38     ` Eric Sunshine
2019-06-17 23:19   ` Emily Shaffer
2019-06-19  8:13     ` Eric Sunshine
2019-06-19 23:35       ` Emily Shaffer
2019-06-23 18:54         ` Eric Sunshine
2019-06-10 20:25 ` Junio C Hamano
2019-06-17 23:50   ` Emily Shaffer
2019-06-19 15:17     ` Junio C Hamano
2019-06-20 21:06       ` Emily Shaffer
2019-07-13  0:39         ` Josh Steadmon
2019-07-16  0:06           ` Emily Shaffer
2019-07-16 17:24             ` Junio C Hamano
2019-06-10 20:49 ` Junio C Hamano
2019-06-17 23:33   ` Emily Shaffer
2019-06-26 23:49 ` [PATCH v2] " Emily Shaffer
2019-06-26 23:50   ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
2019-06-26 23:50     ` [RFC PATCH v2 01/13] walken: add infrastructure for revwalk demo Emily Shaffer
2019-06-26 23:50     ` [RFC PATCH v2 02/13] walken: add usage to enable -h Emily Shaffer
2019-06-27  4:47       ` Eric Sunshine
2019-06-27 18:40         ` Emily Shaffer
2019-06-27  4:50       ` Eric Sunshine
2019-06-26 23:50     ` [RFC PATCH v2 03/13] walken: add placeholder to initialize defaults Emily Shaffer
2019-06-26 23:50     ` [RFC PATCH v2 04/13] walken: add handler to git_config Emily Shaffer
2019-06-27  4:54       ` Eric Sunshine
2019-06-27 18:47         ` Emily Shaffer
2019-06-26 23:50     ` [RFC PATCH v2 05/13] walken: configure rev_info and prepare for walk Emily Shaffer
2019-06-27  5:06       ` Eric Sunshine
2019-06-27 18:56         ` Emily Shaffer
2019-06-26 23:50     ` [RFC PATCH v2 06/13] walken: perform our basic revision walk Emily Shaffer
2019-06-27  5:16       ` Eric Sunshine
2019-06-27 20:54         ` Emily Shaffer
2019-06-26 23:50     ` [RFC PATCH v2 07/13] walken: filter for authors from gmail address Emily Shaffer
2019-06-27  5:20       ` Eric Sunshine
2019-06-27 20:58         ` Emily Shaffer
2019-06-26 23:50     ` [RFC PATCH v2 08/13] walken: demonstrate various topographical sorts Emily Shaffer
2019-06-27  5:22       ` Eric Sunshine
2019-06-27 22:12         ` Emily Shaffer
2019-06-26 23:50     ` [RFC PATCH v2 09/13] walken: demonstrate reversing a revision walk list Emily Shaffer
2019-06-27  5:26       ` Eric Sunshine
2019-06-27 22:20         ` Emily Shaffer
2019-06-26 23:50     ` [RFC PATCH v2 10/13] walken: add unfiltered object walk from HEAD Emily Shaffer
2019-06-27  5:37       ` Eric Sunshine
2019-06-27 22:31         ` Emily Shaffer
2019-06-28  0:48           ` Eric Sunshine
2019-07-01 19:19             ` Emily Shaffer
2019-06-26 23:50     ` [RFC PATCH v2 11/13] walken: add filtered object walk Emily Shaffer
2019-06-27  5:42       ` Eric Sunshine
2019-06-27 22:33         ` Emily Shaffer
2019-06-26 23:50     ` [RFC PATCH v2 12/13] walken: count omitted objects Emily Shaffer
2019-06-27  5:44       ` Eric Sunshine
2019-06-26 23:50     ` [RFC PATCH v2 13/13] walken: reverse the object walk order Emily Shaffer
2019-06-27 22:56     ` [RFC PATCH v2 00/13] example implementation of revwalk tutorial Emily Shaffer
2019-07-01 20:19   ` [PATCH v3] documentation: add tutorial for revision walking Emily Shaffer
2019-07-01 20:20     ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 01/13] walken: add infrastructure for revwalk demo Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 02/13] walken: add usage to enable -h Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 03/13] walken: add placeholder to initialize defaults Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 04/13] walken: add handler to git_config Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 05/13] walken: configure rev_info and prepare for walk Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 06/13] walken: perform our basic revision walk Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 07/13] walken: filter for authors from gmail address Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 08/13] walken: demonstrate various topographical sorts Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 09/13] walken: demonstrate reversing a revision walk list Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 10/13] walken: add unfiltered object walk from HEAD Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 11/13] walken: add filtered object walk Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 12/13] walken: count omitted objects Emily Shaffer
2019-07-01 20:20       ` [RFC PATCH v3 13/13] walken: reverse the object walk order Emily Shaffer
2019-07-25  9:25       ` [RFC PATCH v3 00/13] example implementation of revwalk tutorial Johannes Schindelin
2019-08-06 23:13         ` Emily Shaffer
2019-08-08 19:19           ` Johannes Schindelin
2019-07-24 23:11     ` [PATCH v3] documentation: add tutorial for revision walking Josh Steadmon
2019-07-24 23:32     ` Jonathan Tan
2019-08-06 23:10       ` Emily Shaffer
2019-08-06 23:19     ` [PATCH v4] " Emily Shaffer
2019-08-07 19:19       ` Junio C Hamano
2019-08-14 18:33         ` Emily Shaffer
2019-08-14 19:18           ` Junio C Hamano
2019-10-10 15:19       ` [PATCH v5] documentation: add tutorial for object walking Emily Shaffer
2019-10-11  5:50         ` Junio C Hamano
2019-10-11 23:26           ` Emily Shaffer
2019-10-11 17:50         ` SZEDER Gábor
2019-10-11 23:33           ` Emily Shaffer
2019-10-11 23:55         ` [PATCH v6] " Emily Shaffer

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).