git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [ANNOUNCE] Git v2.19.0-rc0
@ 2018-08-20 22:13 Junio C Hamano
  2018-08-20 22:41 ` Stefan Beller
  2018-08-21 20:41 ` Derrick Stolee
  0 siblings, 2 replies; 58+ messages in thread
From: Junio C Hamano @ 2018-08-20 22:13 UTC (permalink / raw)
  To: git; +Cc: Linux Kernel, git-packagers

An early preview release Git v2.19.0-rc0 is now available for
testing at the usual places.  It is comprised of 707 non-merge
commits since v2.18.0, contributed by 60 people, 14 of which are
new faces.

The tarballs are found at:

    https://www.kernel.org/pub/software/scm/git/testing/

The following public repositories all have a copy of the
'v2.19.0-rc0' tag and the 'master' branch that the tag points at:

  url = https://kernel.googlesource.com/pub/scm/git/git
  url = git://repo.or.cz/alt-git.git
  url = https://github.com/gitster/git

New contributors whose contributions weren't in v2.18.0 are as follows.
Welcome to the Git development community!

  Aleksandr Makarov, Andrei Rybak, Chen Bin, Henning Schild,
  Isabella Stephens, Josh Steadmon, Jules Maselbas, Kana Natsuno,
  Marc Strapetz, Masaya Suzuki, Nicholas Guriev, Sebastian Kisela,
  Vladimir Parfinenko, and William Chargin.

Returning contributors who helped this release are as follows.
Thanks for your continued support.

  Aaron Schrab, Ævar Arnfjörð Bjarmason, Alban Gruin, Alejandro
  R. Sedeño, Anthony Sottile, Antonio Ospite, Beat Bolli, Ben
  Peart, Brandon Williams, brian m. carlson, Christian Couder,
  Derrick Stolee, Elijah Newren, Eric Sunshine, Han-Wen Nienhuys,
  Jameson Miller, Jeff Hostetler, Jeff King, Johannes Schindelin,
  Johannes Sixt, Jonathan Nieder, Jonathan Tan, Junio C Hamano,
  Kim Gybels, Kirill Smelkov, Luis Marsano, Łukasz Stelmach,
  Luke Diamand, Martin Ågren, Max Kirillov, Michael Barabanov,
  Mike Hommey, Nguyễn Thái Ngọc Duy, Olga Telezhnaya,
  Phillip Wood, Prathamesh Chavan, Ramsay Jones, René Scharfe,
  Stefan Beller, SZEDER Gábor, Taylor Blau, Thomas Rast, Tobias
  Klauser, Todd Zullinger, Ville Skyttä, and Xiaolong Ye.

----------------------------------------------------------------

Git 2.19 Release Notes (draft)
==============================

Updates since v2.18
-------------------

UI, Workflows & Features

 * "git diff" compares the index and the working tree.  For paths
   added with intent-to-add bit, the command shows the full contents
   of them as added, but the paths themselves were not marked as new
   files.  They are now shown as new by default.

   "git apply" learned the "--intent-to-add" option so that an
   otherwise working-tree-only application of a patch will add new
   paths to the index marked with the "intent-to-add" bit.

 * "git grep" learned the "--column" option that gives not just the
   line number but the column number of the hit.

 * The "-l" option in "git branch -l" is an unfortunate short-hand for
   "--create-reflog", but many users, both old and new, somehow expect
   it to be something else, perhaps "--list".  This step warns when "-l"
   is used as a short-hand for "--create-reflog" and warns about the
   future repurposing of the it when it is used.

 * The userdiff pattern for .php has been updated.

 * The content-transfer-encoding of the message "git send-email" sends
   out by default was 8bit, which can cause trouble when there is an
   overlong line to bust RFC 5322/2822 limit.  A new option 'auto' to
   automatically switch to quoted-printable when there is such a line
   in the payload has been introduced and is made the default.

 * "git checkout" and "git worktree add" learned to honor
   checkout.defaultRemote when auto-vivifying a local branch out of a
   remote tracking branch in a repository with multiple remotes that
   have tracking branches that share the same names.
   (merge 8d7b558bae ab/checkout-default-remote later to maint).

 * "git grep" learned the "--only-matching" option.

 * "git rebase --rebase-merges" mode now handles octopus merges as
   well.

 * Add a server-side knob to skip commits in exponential/fibbonacci
   stride in an attempt to cover wider swath of history with a smaller
   number of iterations, potentially accepting a larger packfile
   transfer, instead of going back one commit a time during common
   ancestor discovery during the "git fetch" transaction.
   (merge 42cc7485a2 jt/fetch-negotiator-skipping later to maint).

 * A new configuration variable core.usereplacerefs has been added,
   primarily to help server installations that want to ignore the
   replace mechanism altogether.

 * Teach "git tag -s" etc. a few configuration variables (gpg.format
   that can be set to "openpgp" or "x509", and gpg.<format>.program
   that is used to specify what program to use to deal with the format)
   to allow x.509 certs with CMS via "gpgsm" to be used instead of
   openpgp via "gnupg".

 * Many more strings are prepared for l10n.

 * "git p4 submit" learns to ask its own pre-submit hook if it should
   continue with submitting.

 * The test performed at the receiving end of "git push" to prevent
   bad objects from entering repository can be customized via
   receive.fsck.* configuration variables; we now have gained a
   counterpart to do the same on the "git fetch" side, with
   fetch.fsck.* configuration variables.

 * "git pull --rebase=interactive" learned "i" as a short-hand for
   "interactive".

 * "git instaweb" has been adjusted to run better with newer Apache on
   RedHat based distros.

 * "git range-diff" is a reimplementation of "git tbdiff" that lets us
   compare individual patches in two iterations of a topic.

 * The sideband code learned to optionally paint selected keywords at
   the beginning of incoming lines on the receiving end.


Performance, Internal Implementation, Development Support etc.

 * The bulk of "git submodule foreach" has been rewritten in C.

 * The in-core "commit" object had an all-purpose "void *util" field,
   which was tricky to use especially in library-ish part of the
   code.  All of the existing uses of the field has been migrated to a
   more dedicated "commit-slab" mechanism and the field is eliminated.

 * A less often used command "git show-index" has been modernized.
   (merge fb3010c31f jk/show-index later to maint).

 * The conversion to pass "the_repository" and then "a_repository"
   throughout the object access API continues.

 * Continuing with the idea to programatically enumerate various
   pieces of data required for command line completion, teach the
   codebase to report the list of configuration variables
   subcommands care about to help complete them.

 * Separate "rebase -p" codepath out of "rebase -i" implementation to
   slim down the latter and make it easier to manage.

 * Make refspec parsing codepath more robust.

 * Some flaky tests have been fixed.

 * Continuing with the idea to programmatically enumerate various
   pieces of data required for command line completion, the codebase
   has been taught to enumerate options prefixed with "--no-" to
   negate them.

 * Build and test procedure for netrc credential helper (in contrib/)
   has been updated.

 * The conversion to pass "the_repository" and then "a_repository"
   throughout the object access API continues.

 * Remove unused function definitions and declarations from ewah
   bitmap subsystem.

 * Code preparation to make "git p4" closer to be usable with Python 3.

 * Tighten the API to make it harder to misuse in-tree .gitmodules
   file, even though it shares the same syntax with configuration
   files, to read random configuration items from it.

 * "git fast-import" has been updated to avoid attempting to create
   delta against a zero-byte-long string, which is pointless.

 * The codebase has been updated to compile cleanly with -pedantic
   option.
   (merge 2b647a05d7 bb/pedantic later to maint).

 * The character display width table has been updated to match the
   latest Unicode standard.
   (merge 570951eea2 bb/unicode-11-width later to maint).

 * test-lint now looks for broken use of "VAR=VAL shell_func" in test
   scripts.

 * Conversion from uchar[40] to struct object_id continues.

 * Recent "security fix" to pay attention to contents of ".gitmodules"
   while accepting "git push" was a bit overly strict than necessary,
   which has been adjusted.

 * "git fsck" learns to make sure the optional commit-graph file is in
   a sane state.

 * "git diff --color-moved" feature has further been tweaked.

 * Code restructuring and a small fix to transport protocol v2 during
   fetching.

 * Parsing of -L[<N>][,[<M>]] parameters "git blame" and "git log"
   take has been tweaked.

 * lookup_commit_reference() and friends have been updated to find
   in-core object for a specific in-core repository instance.

 * Various glitches in the heuristics of merge-recursive strategy have
   been documented in new tests.

 * "git fetch" learned a new option "--negotiation-tip" to limit the
   set of commits it tells the other end as "have", to reduce wasted
   bandwidth and cycles, which would be helpful when the receiving
   repository has a lot of refs that have little to do with the
   history at the remote it is fetching from.

 * For a large tree, the index needs to hold many cache entries
   allocated on heap.  These cache entries are now allocated out of a
   dedicated memory pool to amortize malloc(3) overhead.

 * Tests to cover various conflicting cases have been added for
   merge-recursive.

 * Tests to cover conflict cases that involve submodules have been
   added for merge-recursive.

 * Look for broken "&&" chains that are hidden in subshell, many of
   which have been found and corrected.

 * The singleton commit-graph in-core instance is made per in-core
   repository instance.

 * "make DEVELOPER=1 DEVOPTS=pedantic" allows developers to compile
   with -pedantic option, which may catch more problematic program
   constructs and potential bugs.

 * Preparatory code to later add json output for telemetry data has
   been added.

 * Update the way we use Coccinelle to find out-of-style code that
   need to be modernised.

 * It is too easy to misuse system API functions such as strcat();
   these selected functions are now forbidden in this codebase and
   will cause a compilation failure.

 * Add a script (in contrib/) to help users of VSCode work better with
   our codebase.

 * The Travis CI scripts were taught to ship back the test data from
   failed tests.
   (merge aea8879a6a sg/travis-retrieve-trash-upon-failure later to maint).

 * The parse-options machinery learned to refrain from enclosing
   placeholder string inside a "<bra" and "ket>" pair automatically
   without PARSE_OPT_LITERAL_ARGHELP.  Existing help text for option
   arguments that are not formatted correctly have been identified and
   fixed.
   (merge 5f0df44cd7 rs/parse-opt-lithelp later to maint).

 * Noiseword "extern" has been removed from function decls in the
   header files.

 * A few atoms like %(objecttype) and %(objectsize) in the format
   specifier of "for-each-ref --format=<format>" can be filled without
   getting the full contents of the object, but just with the object
   header.  These cases have been optimized by calling
   oid_object_info() API (instead of reading and inspecting the data).

 * The end result of documentation update has been made to be
   inspected more easily to help developers.

 * The API to iterate over all objects learned to optionally list
   objects in the order they appear in packfiles, which helps locality
   of access if the caller accesses these objects while as objects are
   enumerated.

 * Improve built-in facility to catch broken &&-chain in the tests.

 * The more library-ish parts of the codebase learned to work on the
   in-core index-state instance that is passed in by their callers,
   instead of always working on the singleton "the_index" instance.

 * A test prerequisite defined by various test scripts with slightly
   different semantics has been consolidated into a single copy and
   made into a lazily defined one.
   (merge 6ec633059a wc/make-funnynames-shared-lazy-prereq later to maint).

 * After a partial clone, repeated fetches from promisor remote would
   have accumulated many packfiles marked with .promisor bit without
   getting them coalesced into fewer packfiles, hurting performance.
   "git repack" now learned to repack them.


Fixes since v2.18
-----------------

 * "git remote update" can take both a single remote nickname and a
   nickname for remote groups, and the completion script (in contrib/)
   has been taught about it.
   (merge 9cd4382ad5 ls/complete-remote-update-names later to maint).

 * "git fetch --shallow-since=<cutoff>" that specifies the cut-off
   point that is newer than the existing history used to end up
   grabbing the entire history.  Such a request now errors out.
   (merge e34de73c56 nd/reject-empty-shallow-request later to maint).

 * Fix for 2.17-era regression around `core.safecrlf`.
   (merge 6cb09125be as/safecrlf-quiet-fix later to maint).

 * The recent addition of "partial clone" experimental feature kicked
   in when it shouldn't, namely, when there is no partial-clone filter
   defined even if extensions.partialclone is set.
   (merge cac1137dc4 jh/partial-clone later to maint).

 * "git send-pack --signed" (hence "git push --signed" over the http
   transport) did not read user ident from the config mechanism to
   determine whom to sign the push certificate as, which has been
   corrected.
   (merge d067d98887 ms/send-pack-honor-config later to maint).

 * "git fetch-pack --all" used to unnecessarily fail upon seeing an
   annotated tag that points at an object other than a commit.
   (merge c12c9df527 jk/fetch-all-peeled-fix later to maint).

 * When user edits the patch in "git add -p" and the user's editor is
   set to strip trailing whitespaces indiscriminately, an empty line
   that is unchanged in the patch would become completely empty
   (instead of a line with a sole SP on it).  The code introduced in
   Git 2.17 timeframe failed to parse such a patch, but now it learned
   to notice the situation and cope with it.
   (merge f4d35a6b49 pw/add-p-recount later to maint).

 * The code to try seeing if a fetch is necessary in a submodule
   during a fetch with --recurse-submodules got confused when the path
   to the submodule was changed in the range of commits in the
   superproject, sometimes showing "(null)".  This has been corrected.

 * "git submodule" did not correctly adjust core.worktree setting that
   indicates whether/where a submodule repository has its associated
   working tree across various state transitions, which has been
   corrected.
   (merge 984cd77ddb sb/submodule-core-worktree later to maint).

 * Bugfix for "rebase -i" corner case regression.
   (merge a9279c6785 pw/rebase-i-keep-reword-after-conflict later to maint).

 * Recently added "--base" option to "git format-patch" command did
   not correctly generate prereq patch ids.
   (merge 15b76c1fb3 xy/format-patch-prereq-patch-id-fix later to maint).

 * POSIX portability fix in Makefile to fix a glitch introduced a few
   releases ago.
   (merge 6600054e9b dj/runtime-prefix later to maint).

 * "git filter-branch" when used with the "--state-branch" option
   still attempted to rewrite the commits whose filtered result is
   known from the previous attempt (which is recorded on the state
   branch); the command has been corrected not to waste cycles doing
   so.
   (merge 709cfe848a mb/filter-branch-optim later to maint).

 * Clarify that setting core.ignoreCase to deviate from reality would
   not turn a case-incapable filesystem into a case-capable one.
   (merge 48294b512a ms/core-icase-doc later to maint).

 * "fsck.skipList" did not prevent a blob object listed there from
   being inspected for is contents (e.g. we recently started to
   inspect the contents of ".gitmodules" for certain malicious
   patterns), which has been corrected.
   (merge fb16287719 rj/submodule-fsck-skip later to maint).

 * "git checkout --recurse-submodules another-branch" did not report
   in which submodule it failed to update the working tree, which
   resulted in an unhelpful error message.
   (merge ba95d4e4bd sb/submodule-move-head-error-msg later to maint).

 * "git rebase" behaved slightly differently depending on which one of
   the three backends gets used; this has been documented and an
   effort to make them more uniform has begun.
   (merge b00bf1c9a8 en/rebase-consistency later to maint).

 * The "--ignore-case" option of "git for-each-ref" (and its friends)
   did not work correctly, which has been fixed.
   (merge e674eb2528 jk/for-each-ref-icase later to maint).

 * "git fetch" failed to correctly validate the set of objects it
   received when making a shallow history deeper, which has been
   corrected.
   (merge cf1e7c0770 jt/connectivity-check-after-unshallow later to maint).

 * Partial clone support of "git clone" has been updated to correctly
   validate the objects it receives from the other side.  The server
   side has been corrected to send objects that are directly
   requested, even if they may match the filtering criteria (e.g. when
   doing a "lazy blob" partial clone).
   (merge a7e67c11b8 jt/partial-clone-fsck-connectivity later to maint).

 * Handling of an empty range by "git cherry-pick" was inconsistent
   depending on how the range ended up to be empty, which has been
   corrected.
   (merge c5e358d073 jk/empty-pick-fix later to maint).

 * "git reset --merge" (hence "git merge ---abort") and "git reset --hard"
   had trouble working correctly in a sparsely checked out working
   tree after a conflict, which has been corrected.
   (merge b33fdfc34c mk/merge-in-sparse-checkout later to maint).

 * Correct a broken use of "VAR=VAL shell_func" in a test.
   (merge 650161a277 jc/t3404-one-shot-export-fix later to maint).

 * "git rev-parse ':/substring'" did not consider the history leading
   only to HEAD when looking for a commit with the given substring,
   when the HEAD is detached.  This has been fixed.
   (merge 6b3351e799 wc/find-commit-with-pattern-on-detached-head later to maint).

 * Build doc update for Windows.
   (merge ede8d89bb1 nd/command-list later to maint).

 * core.commentchar is now honored when preparing the list of commits
   to replay in "rebase -i".

 * "git pull --rebase" on a corrupt HEAD caused a segfault.  In
   general we substitute an empty tree object when running the in-core
   equivalent of the diff-index command, and the codepath has been
   corrected to do so as well to fix this issue.
   (merge 3506dc9445 jk/has-uncommitted-changes-fix later to maint).

 * httpd tests saw occasional breakage due to the way its access log
   gets inspected by the tests, which has been updated to make them
   less flaky.
   (merge e8b3b2e275 sg/httpd-test-unflake later to maint).

 * Tests to cover more D/F conflict cases have been added for
   merge-recursive.

 * "git gc --auto" opens file descriptors for the packfiles before
   spawning "git repack/prune", which would upset Windows that does
   not want a process to work on a file that is open by another
   process.  The issue has been worked around.
   (merge 12e73a3ce4 kg/gc-auto-windows-workaround later to maint).

 * The recursive merge strategy did not properly ensure there was no
   change between HEAD and the index before performing its operation,
   which has been corrected.
   (merge 55f39cf755 en/dirty-merge-fixes later to maint).

 * "git rebase" started exporting GIT_DIR environment variable and
   exposing it to hook scripts when part of it got rewritten in C.
   Instead of matching the old scripted Porcelains' behaviour,
   compensate by also exporting GIT_WORK_TREE environment as well to
   lessen the damage.  This can harm existing hooks that want to
   operate on different repository, but the current behaviour is
   already broken for them anyway.
   (merge ab5e67d751 bc/sequencer-export-work-tree-as-well later to maint).

 * "git send-email" when using in a batched mode that limits the
   number of messages sent in a single SMTP session lost the contents
   of the variable used to choose between tls/ssl, unable to send the
   second and later batches, which has been fixed.
   (merge 636f3d7ac5 jm/send-email-tls-auth-on-batch later to maint).

 * The lazy clone support had a few places where missing but promised
   objects were not correctly tolerated, which have been fixed.

 * One of the "diff --color-moved" mode "dimmed_zebra" that was named
   in an unusual way has been deprecated and replaced by
   "dimmed-zebra".
   (merge e3f2f5f9cd es/diff-color-moved-fix later to maint).

 * The wire-protocol v2 relies on the client to send "ref prefixes" to
   limit the bandwidth spent on the initial ref advertisement.  "git
   clone" when learned to speak v2 forgot to do so, which has been
   corrected.
   (merge 402c47d939 bw/clone-ref-prefixes later to maint).

 * "git diff --histogram" had a bad memory usage pattern, which has
   been rearranged to reduce the peak usage.
   (merge 79cb2ebb92 sb/histogram-less-memory later to maint).

 * Code clean-up to use size_t/ssize_t when they are the right type.
   (merge 7726d360b5 jk/size-t later to maint).

 * The wire-protocol v2 relies on the client to send "ref prefixes" to
   limit the bandwidth spent on the initial ref advertisement.  "git
   fetch $remote branch:branch" that asks tags that point into the
   history leading to the "branch" automatically followed sent to
   narrow prefix and broke the tag following, which has been fixed.
   (merge 2b554353a5 jt/tag-following-with-proto-v2-fix later to maint).

 * When the sparse checkout feature is in use, "git cherry-pick" and
   other mergy operations lost the skip_worktree bit when a path that
   is excluded from checkout requires content level merge, which is
   resolved as the same as the HEAD version, without materializing the
   merge result in the working tree, which made the path appear as
   deleted.  This has been corrected by preserving the skip_worktree
   bit (and not materializing the file in the working tree).
   (merge 2b75fb601c en/merge-recursive-skip-fix later to maint).

 * The "author-script" file "git rebase -i" creates got broken when
   we started to move the command away from shell script, which is
   getting fixed now.
   (merge 5522bbac20 es/rebase-i-author-script-fix later to maint).

 * The automatic tree-matching in "git merge -s subtree" was broken 5
   years ago and nobody has noticed since then, which is now fixed.
   (merge 2ec4150713 jk/merge-subtree-heuristics later to maint).

 * "git fetch $there refs/heads/s" ought to fetch the tip of the
   branch 's', but when "refs/heads/refs/heads/s", i.e. a branch whose
   name is "refs/heads/s" exists at the same time, fetched that one
   instead by mistake.  This has been corrected to honor the usual
   disambiguation rules for abbreviated refnames.
   (merge 60650a48c0 jt/refspec-dwim-precedence-fix later to maint).

 * Futureproofing a helper function that can easily be misused.
   (merge 65bb21e77e es/want-color-fd-defensive later to maint).

 * The http-backend (used for smart-http transport) used to slurp the
   whole input until EOF, without paying attention to CONTENT_LENGTH
   that is supplied in the environment and instead expecting the Web
   server to close the input stream.  This has been fixed.
   (merge eebfe40962 mk/http-backend-content-length later to maint).

 * "git merge --abort" etc. did not clean things up properly when
   there were conflicted entries in the index in certain order that
   are involved in D/F conflicts.  This has been corrected.
   (merge ad3762042a en/abort-df-conflict-fixes later to maint).

 * "git diff --indent-heuristic" had a bad corner case performance.
   (merge 301ef85401 sb/indent-heuristic-optim later to maint).

 * The "--exec" option to "git rebase --rebase-merges" placed the exec
   commands at wrong places, which has been corrected.

 * "git verify-tag" and "git verify-commit" have been taught to use
   the exit status of underlying "gpg --verify" to signal bad or
   untrusted signature they found.
   (merge 4e5dc9ca17 jc/gpg-status later to maint).

 * "git mergetool" stopped and gave an extra prompt to continue after
   the last path has been handled, which did not make much sense.
   (merge d651a54b8a ng/mergetool-lose-final-prompt later to maint).

 * Among the three codepaths we use O_APPEND to open a file for
   appending, one used for writing GIT_TRACE output requires O_APPEND
   implementation that behaves sensibly when multiple processes are
   writing to the same file.  POSIX emulation used in the Windows port
   has been updated to improve in this area.
   (merge d641097589 js/mingw-o-append later to maint).

 * "git pull --rebase -v" in a repository with a submodule barfed as
   an intermediate process did not understand what "-v(erbose)" flag
   meant, which has been fixed.
   (merge e84c3cf3dc sb/pull-rebase-submodule later to maint).

 * Recent update to "git config" broke updating variable in a
   subsection, which has been corrected.
   (merge bff7df7a87 sb/config-write-fix later to maint).

 * When "git rebase -i" is told to squash two or more commits into
   one, it labeled the log message for each commit with its number.
   It correctly called the first one "1st commit", but the next one
   was "commit #1", which was off-by-one.  This has been corrected.
   (merge dd2e36ebac pw/rebase-i-squash-number-fix later to maint).

 * "git rebase -i", when a 'merge <branch>' insn in its todo list
   fails, segfaulted, which has been (minimally) corrected.
   (merge bc9238bb09 pw/rebase-i-merge-segv-fix later to maint).

 * "git cherry-pick --quit" failed to remove CHERRY_PICK_HEAD even
   though we won't be in a cherry-pick session after it returns, which
   has been corrected.
   (merge 3e7dd99208 nd/cherry-pick-quit-fix later to maint).

 * Code cleanup, docfix, build fix, etc.
   (merge aee9be2ebe sg/update-ref-stdin-cleanup later to maint).
   (merge 037714252f jc/clean-after-sanity-tests later to maint).
   (merge 5b26c3c941 en/merge-recursive-cleanup later to maint).
   (merge 0dcbc0392e bw/config-refer-to-gitsubmodules-doc later to maint).
   (merge bb4d000e87 bw/protocol-v2 later to maint).
   (merge 928f0ab4ba vs/typofixes later to maint).
   (merge d7f590be84 en/rebase-i-microfixes later to maint).
   (merge 81d395cc85 js/rebase-recreate-merge later to maint).
   (merge 51d1863168 tz/exclude-doc-smallfixes later to maint).
   (merge a9aa3c0927 ds/commit-graph later to maint).
   (merge 5cf8e06474 js/enhanced-version-info later to maint).
   (merge 6aaded5509 tb/config-default later to maint).
   (merge 022d2ac1f3 sb/blame-color later to maint).
   (merge 5a06a20e0c bp/test-drop-caches-for-windows later to maint).
   (merge dd61cc1c2e jk/ui-color-always-to-auto later to maint).
   (merge 1e83b9bfdd sb/trailers-docfix later to maint).
   (merge ab29f1b329 sg/fast-import-dump-refs-on-checkpoint-fix later to maint).
   (merge 6a8ad880f0 jn/subtree-test-fixes later to maint).
   (merge ffbd51cc60 nd/pack-objects-threading-doc later to maint).
   (merge e9dac7be60 es/mw-to-git-chain-fix later to maint).
   (merge fe583c6c7a rs/remote-mv-leakfix later to maint).
   (merge 69885ab015 en/t3031-title-fix later to maint).
   (merge 8578037bed nd/config-blame-sort later to maint).
   (merge 8ad169c4ba hn/config-in-code-comment later to maint).
   (merge b7446fcfdf ar/t4150-am-scissors-test-fix later to maint).
   (merge a8132410ee js/typofixes later to maint).
   (merge 388d0ff6e5 en/update-index-doc later to maint).
   (merge e05aa688dd jc/update-index-doc later to maint).
   (merge 10c600172c sg/t5310-empty-input-fix later to maint).
   (merge 5641eb9465 jh/partial-clone-doc later to maint).
   (merge 2711b1ad5e ab/submodule-relative-url-tests later to maint).

----------------------------------------------------------------

Changes since v2.18.0 are as follows:

Aaron Schrab (1):
      sequencer: use configured comment character

Alban Gruin (4):
      rebase: introduce a dedicated backend for --preserve-merges
      rebase: strip unused code in git-rebase--preserve-merges.sh
      rebase: use the new git-rebase--preserve-merges.sh
      rebase: remove -p code from git-rebase--interactive.sh

Alejandro R. Sedeño (1):
      Makefile: tweak sed invocation

Aleksandr Makarov (1):
      for-each-ref: consistently pass WM_IGNORECASE flag

Andrei Rybak (2):
      Documentation: fix --color option formatting
      t4150: fix broken test for am --scissors

Anthony Sottile (1):
      config.c: fix regression for core.safecrlf false

Antonio Ospite (6):
      config: move config_from_gitmodules to submodule-config.c
      submodule-config: add helper function to get 'fetch' config from .gitmodules
      submodule-config: add helper to get 'update-clone' config from .gitmodules
      submodule-config: make 'config_from_gitmodules' private
      submodule-config: pass repository as argument to config_from_gitmodules
      submodule-config: reuse config_from_gitmodules in repo_read_gitmodules

Beat Bolli (10):
      builtin/config: work around an unsized array forward declaration
      unicode: update the width tables to Unicode 11
      connect.h: avoid forward declaration of an enum
      refs/refs-internal.h: avoid forward declaration of an enum
      convert.c: replace "\e" escapes with "\033".
      sequencer.c: avoid empty statements at top level
      string-list.c: avoid conversion from void * to function pointer
      utf8.c: avoid char overflow
      Makefile: add a DEVOPTS flag to get pedantic compilation
      packfile: ensure that enum object_type is defined

Ben Peart (3):
      convert log_ref_write_fd() to use strbuf
      handle lower case drive letters on Windows
      t3507: add a testcase showing failure with sparse checkout

Brandon Williams (15):
      commit: convert commit_graft_pos() to handle arbitrary repositories
      commit: convert register_commit_graft to handle arbitrary repositories
      commit: convert read_graft_file to handle arbitrary repositories
      test-pkt-line: add unpack-sideband subcommand
      docs: link to gitsubmodules
      upload-pack: implement ref-in-want
      upload-pack: test negotiation with changing repository
      fetch: refactor the population of peer ref OIDs
      fetch: refactor fetch_refs into two functions
      fetch: refactor to make function args narrower
      fetch-pack: put shallow info in output parameter
      fetch-pack: implement ref-in-want
      clone: send ref-prefixes when using protocol v2
      fetch-pack: mark die strings for translation
      pack-protocol: mention and point to docs for protocol v2

Chen Bin (1):
      git-p4: add the `p4-pre-submit` hook

Christian Couder (1):
      t9104: kosherly remove remote refs

Derrick Stolee (43):
      ref-filter: fix outdated comment on in_commit_list
      commit: add generation number to struct commit
      commit-graph: compute generation numbers
      commit: use generations in paint_down_to_common()
      commit-graph: always load commit-graph information
      ref-filter: use generation number for --contains
      commit: use generation numbers for in_merge_bases()
      commit: add short-circuit to paint_down_to_common()
      commit: use generation number in remove_redundant()
      merge: check config before loading commits
      commit-graph.txt: update design document
      commit-graph: fix UX issue when .lock file exists
      ewah/bitmap.c: delete unused 'bitmap_clear()'
      ewah/bitmap.c: delete unused 'bitmap_each_bit()'
      ewah_bitmap: delete unused 'ewah_and()'
      ewah_bitmap: delete unused 'ewah_and_not()'
      ewah_bitmap: delete unused 'ewah_not()'
      ewah_bitmap: delete unused 'ewah_or()'
      ewah_io: delete unused 'ewah_serialize()'
      t5318-commit-graph.sh: use core.commitGraph
      commit-graph: UNLEAK before die()
      commit-graph: fix GRAPH_MIN_SIZE
      commit-graph: parse commit from chosen graph
      commit: force commit to parse from object database
      commit-graph: load a root tree from specific graph
      commit-graph: add 'verify' subcommand
      commit-graph: verify catches corrupt signature
      commit-graph: verify required chunks are present
      commit-graph: verify corrupt OID fanout and lookup
      commit-graph: verify objects exist
      commit-graph: verify root tree OIDs
      commit-graph: verify parent list
      commit-graph: verify generation number
      commit-graph: verify commit date
      commit-graph: test for corrupted octopus edge
      commit-graph: verify contents match checksum
      fsck: verify commit-graph
      commit-graph: use string-list API for input
      commit-graph: add '--reachable' option
      gc: automatically write commit-graph files
      commit-graph: update design document
      commit-graph: fix documentation inconsistencies
      coccinelle: update commit.cocci

Elijah Newren (63):
      t6036, t6042: use test_create_repo to keep tests independent
      t6036, t6042: use test_line_count instead of wc -l
      t6036, t6042: prefer test_path_is_file, test_path_is_missing
      t6036, t6042: prefer test_cmp to sequences of test
      t6036: prefer test_when_finished to manual cleanup in following test
      merge-recursive: fix miscellaneous grammar error in comment
      merge-recursive: fix numerous argument alignment issues
      merge-recursive: align labels with their respective code blocks
      merge-recursive: clarify the rename_dir/RENAME_DIR meaning
      merge-recursive: rename conflict_rename_*() family of functions
      merge-recursive: add pointer about unduly complex looking code
      git-rebase.txt: document incompatible options
      git-rebase.sh: update help messages a bit
      t3422: new testcases for checking when incompatible options passed
      git-rebase: error out when incompatible options passed
      git-rebase.txt: address confusion between --no-ff vs --force-rebase
      directory-rename-detection.txt: technical docs on abilities and limitations
      git-rebase.txt: document behavioral differences between modes
      t3401: add directory rename testcases for rebase and am
      git-rebase: make --allow-empty-message the default
      t3418: add testcase showing problems with rebase -i and strategy options
      Fix use of strategy options with interactive rebases
      git-rebase--merge: modernize "git-$cmd" to "git $cmd"
      apply: fix grammar error in comment
      t5407: fix test to cover intended arguments
      read-cache.c: move index_has_changes() from merge.c
      index_has_changes(): avoid assuming operating on the_index
      t6044: verify that merges expected to abort actually abort
      t6036: add a failed conflict detection case with symlink modify/modify
      t6036: add a failed conflict detection case with symlink add/add
      t6036: add a failed conflict detection case with submodule modify/modify
      t6036: add a failed conflict detection case with submodule add/add
      t6036: add a failed conflict detection case with conflicting types
      t6042: add testcase covering rename/add/delete conflict type
      t6042: add testcase covering rename/rename(2to1)/delete/delete conflict
      t6042: add testcase covering long chains of rename conflicts
      t6036: add lots of detail for directory/file conflicts in recursive case
      t6036: add a failed conflict detection case: regular files, different modes
      t6044: add a testcase for index matching head, when head doesn't match HEAD
      merge-recursive: make sure when we say we abort that we actually abort
      merge-recursive: fix assumption that head tree being merged is HEAD
      t6044: add more testcases with staged changes before a merge is invoked
      merge-recursive: enforce rule that index matches head before merging
      merge: fix misleading pre-merge check documentation
      t7405: add a file/submodule conflict
      t7405: add a directory/submodule conflict
      t7405: verify 'merge --abort' works after submodule/path conflicts
      merge-recursive: preserve skip_worktree bit when necessary
      t1015: demonstrate directory/file conflict recovery failures
      read-cache: fix directory/file conflict handling in read_index_unmerged()
      t3031: update test description to mention desired behavior
      t7406: fix call that was failing for the wrong reason
      t7406: simplify by using diff --name-only instead of diff --raw
      t7406: avoid having git commands upstream of a pipe
      t7406: prefer test_* helper functions to test -[feds]
      t7406: avoid using test_must_fail for commands other than git
      git-update-index.txt: reword possibly confusing example
      Add missing includes and forward declarations
      alloc: make allocate_alloc_state and clear_alloc_state more consistent
      Move definition of enum branch_track from cache.h to branch.h
      urlmatch.h: fix include guard
      compat/precompose_utf8.h: use more common include guard style
      Remove forward declaration of an enum

Eric Sunshine (53):
      t: use test_might_fail() instead of manipulating exit code manually
      t: use test_write_lines() instead of series of 'echo' commands
      t: use sane_unset() rather than 'unset' with broken &&-chain
      t: drop unnecessary terminating semicolon in subshell
      t/lib-submodule-update: fix "absorbing" test
      t5405: use test_must_fail() instead of checking exit code manually
      t5406: use write_script() instead of birthing shell script manually
      t5505: modernize and simplify hard-to-digest test
      t6036: fix broken "merge fails but has appropriate contents" tests
      t7201: drop pointless "exit 0" at end of subshell
      t7400: fix broken "submodule add/reconfigure --force" test
      t7810: use test_expect_code() instead of hand-rolled comparison
      t9001: fix broken "invoke hook" test
      t9814: simplify convoluted check that command correctly errors out
      t0000-t0999: fix broken &&-chains
      t1000-t1999: fix broken &&-chains
      t2000-t2999: fix broken &&-chains
      t3000-t3999: fix broken &&-chains
      t3030: fix broken &&-chains
      t4000-t4999: fix broken &&-chains
      t5000-t5999: fix broken &&-chains
      t6000-t6999: fix broken &&-chains
      t7000-t7999: fix broken &&-chains
      t9000-t9999: fix broken &&-chains
      t9119: fix broken &&-chains
      t6046/t9833: fix use of "VAR=VAL cmd" with a shell function
      t/check-non-portable-shell: stop being so polite
      t/check-non-portable-shell: make error messages more compact
      t/check-non-portable-shell: detect "FOO=bar shell_func"
      t/test-lib: teach --chain-lint to detect broken &&-chains in subshells
      t/Makefile: add machinery to check correctness of chainlint.sed
      t/chainlint: add chainlint "basic" test cases
      t/chainlint: add chainlint "whitespace" test cases
      t/chainlint: add chainlint "one-liner" test cases
      t/chainlint: add chainlint "nested subshell" test cases
      t/chainlint: add chainlint "loop" and "conditional" test cases
      t/chainlint: add chainlint "cuddled" test cases
      t/chainlint: add chainlint "complex" test cases
      t/chainlint: add chainlint "specialized" test cases
      diff: --color-moved: rename "dimmed_zebra" to "dimmed-zebra"
      mw-to-git/t9360: fix broken &&-chain
      t/chainlint.sed: drop extra spaces from regex character class
      sequencer: fix "rebase -i --root" corrupting author header
      sequencer: fix "rebase -i --root" corrupting author header timezone
      sequencer: fix "rebase -i --root" corrupting author header timestamp
      sequencer: don't die() on bogus user-edited timestamp
      color: protect against out-of-bounds reads and writes
      chainlint: match arbitrary here-docs tags rather than hard-coded names
      chainlint: match 'quoted' here-doc tags
      chainlint: recognize multi-line $(...) when command cuddled with "$("
      chainlint: let here-doc and multi-line string commence on same line
      chainlint: recognize multi-line quoted strings more robustly
      chainlint: add test of pathological case which triggered false positive

Han-Wen Nienhuys (2):
      config: document git config getter return value
      sideband: highlight keywords in remote sideband output

Henning Schild (9):
      builtin/receive-pack: use check_signature from gpg-interface
      gpg-interface: make parse_gpg_output static and remove from interface header
      gpg-interface: add new config to select how to sign a commit
      t/t7510: check the validation of the new config gpg.format
      gpg-interface: introduce an abstraction for multiple gpg formats
      gpg-interface: do not hardcode the key string len anymore
      gpg-interface: introduce new config to select per gpg format program
      gpg-interface: introduce new signature format "x509" using gpgsm
      gpg-interface t: extend the existing GPG tests with GPGSM

Isabella Stephens (2):
      blame: prevent error if range ends past end of file
      log: prevent error if line range ends past end of file

Jameson Miller (8):
      read-cache: teach refresh_cache_entry to take istate
      read-cache: teach make_cache_entry to take object_id
      block alloc: add lifecycle APIs for cache_entry structs
      mem-pool: only search head block for available space
      mem-pool: add life cycle management functions
      mem-pool: fill out functionality
      block alloc: allocate cache entries from mem_pool
      block alloc: add validations around cache_entry lifecyle

Jeff Hostetler (1):
      json_writer: new routines to create JSON data

Jeff King (48):
      make show-index a builtin
      show-index: update documentation for index v2
      fetch-pack: don't try to fetch peel values with --all
      ewah: drop ewah_deserialize function
      ewah: drop ewah_serialize_native function
      t3200: unset core.logallrefupdates when testing reflog creation
      t: switch "branch -l" to "branch --create-reflog"
      branch: deprecate "-l" option
      config: turn die_on_error into caller-facing enum
      config: add CONFIG_ERROR_SILENT handler
      config: add options parameter to git_config_from_mem
      fsck: silence stderr when parsing .gitmodules
      t6300: add a test for --ignore-case
      ref-filter: avoid backend filtering with --ignore-case
      t5500: prettify non-commit tag tests
      sequencer: handle empty-set cases consistently
      sequencer: don't say BUG on bogus input
      has_uncommitted_changes(): fall back to empty tree
      fsck: split ".gitmodules too large" error from parse failure
      fsck: downgrade gitmodulesParse default to "info"
      blame: prefer xsnprintf to strcpy for colors
      check_replace_refs: fix outdated comment
      check_replace_refs: rename to read_replace_refs
      add core.usereplacerefs config option
      reencode_string: use st_add/st_mult helpers
      reencode_string: use size_t for string lengths
      strbuf: use size_t for length in intermediate variables
      strbuf_readlink: use ssize_t
      pass st.st_size as hint for strbuf_readlink()
      strbuf_humanise: use unsigned variables
      automatically ban strcpy()
      banned.h: mark strcat() as banned
      banned.h: mark sprintf() as banned
      banned.h: mark strncpy() as banned
      score_trees(): fix iteration over trees with missing entries
      add a script to diff rendered documentation
      t5552: suppress upload-pack trace output
      for_each_*_object: store flag definitions in a single location
      for_each_*_object: take flag arguments as enum
      for_each_*_object: give more comprehensive docstrings
      for_each_packed_object: support iterating in pack-order
      t1006: test cat-file --batch-all-objects with duplicates
      cat-file: rename batch_{loose,packed}_object callbacks
      cat-file: support "unordered" output for --batch-all-objects
      cat-file: use oidset check-and-insert
      cat-file: split batch "buf" into two variables
      cat-file: use a single strbuf for all output
      for_each_*_object: move declarations to object-store.h

Johannes Schindelin (41):
      Makefile: fix the "built from commit" code
      merge: allow reading the merge commit message from a file
      rebase --rebase-merges: add support for octopus merges
      rebase --rebase-merges: adjust man page for octopus support
      vcbuild/README: update to accommodate for missing common-cmds.h
      t7406: avoid failures solely due to timing issues
      contrib: add a script to initialize VS Code configuration
      vscode: hard-code a couple defines
      cache.h: extract enum declaration from inside a struct declaration
      mingw: define WIN32 explicitly
      vscode: only overwrite C/C++ settings
      vscode: wrap commit messages at column 72 by default
      vscode: use 8-space tabs, no trailing ws, etc for Git's source code
      vscode: add a dictionary for cSpell
      vscode: let cSpell work on commit messages, too
      pull --rebase=<type>: allow single-letter abbreviations for the type
      t3430: demonstrate what -r, --autosquash & --exec should do
      git-compat-util.h: fix typo
      remote-curl: remove spurious period
      rebase --exec: make it work with --rebase-merges
      linear-assignment: a function to solve least-cost assignment problems
      Introduce `range-diff` to compare iterations of a topic branch
      range-diff: first rudimentary implementation
      range-diff: improve the order of the shown commits
      range-diff: also show the diff between patches
      range-diff: right-trim commit messages
      range-diff: indent the diffs just like tbdiff
      range-diff: suppress the diff headers
      range-diff: adjust the output of the commit pairs
      range-diff: do not show "function names" in hunk headers
      range-diff: use color for the commit pairs
      color: add the meta color GIT_COLOR_REVERSE
      diff: add an internal option to dual-color diffs of diffs
      range-diff: offer to dual-color the diffs
      range-diff --dual-color: skip white-space warnings
      range-diff: populate the man page
      completion: support `git range-diff`
      range-diff: left-pad patch numbers
      range-diff: make --dual-color the default mode
      range-diff: use dim/bold cues to improve dual color mode
      chainlint: fix for core.autocrlf=true

Johannes Sixt (1):
      mingw: enable atomic O_APPEND

Jonathan Nieder (11):
      object: add repository argument to grow_object_hash
      object: move grafts to object parser
      commit: add repository argument to commit_graft_pos
      commit: add repository argument to register_commit_graft
      commit: add repository argument to read_graft_file
      commit: add repository argument to prepare_commit_graft
      commit: add repository argument to lookup_commit_graft
      subtree test: add missing && to &&-chain
      subtree test: simplify preparation of expected results
      doc hash-function-transition: pick SHA-256 as NewHash
      partial-clone: render design doc using asciidoc

Jonathan Tan (28):
      list-objects: check if filter is NULL before using
      fetch-pack: split up everything_local()
      fetch-pack: clear marks before re-marking
      fetch-pack: directly end negotiation if ACK ready
      fetch-pack: use ref adv. to prune "have" sent
      fetch-pack: make negotiation-related vars local
      fetch-pack: move common check and marking together
      fetch-pack: introduce negotiator API
      pack-bitmap: remove bitmap_git global variable
      pack-bitmap: add free function
      fetch-pack: write shallow, then check connectivity
      fetch-pack: support negotiation tip whitelist
      upload-pack: send refs' objects despite "filter"
      clone: check connectivity even if clone is partial
      revision: tolerate promised targets of tags
      tag: don't warn if target is missing but promised
      negotiator/skipping: skip commits during fetch
      commit-graph: refactor preparing commit graph
      object-store: add missing include
      commit-graph: add missing forward declaration
      commit-graph: add free_commit_graph
      commit-graph: store graph in struct object_store
      commit-graph: add repo arg to graph readers
      t5702: test fetch with multiple refspecs at a time
      fetch: send "refs/tags/" prefix upon CLI refspecs
      fetch-pack: unify ref in and out param
      repack: refactor setup of pack-objects cmd
      repack: repack promisor objects if -a or -A is set

Josh Steadmon (1):
      protocol-v2 doc: put HTTP headers after request

Jules Maselbas (1):
      send-email: fix tls AUTH when sending batch

Junio C Hamano (18):
      tests: clean after SANITY tests
      ewah: delete unused 'rlwit_discharge_empty()'
      Prepare to start 2.19 cycle
      First batch for 2.19 cycle
      Second batch for 2.19 cycle
      fixup! connect.h: avoid forward declaration of an enum
      fixup! refs/refs-internal.h: avoid forward declaration of an enum
      t3404: fix use of "VAR=VAL cmd" with a shell function
      Third batch for 2.19 cycle
      Fourth batch for 2.19 cycle
      remote: make refspec follow the same disambiguation rule as local refs
      Fifth batch for 2.19 cycle
      update-index: there no longer is `apply --index-info`
      gpg-interface: propagate exit status from gpg back to the callers
      Sixth batch for 2.19 cycle
      Seventh batch for 2.19 cycle
      sideband: do not read beyond the end of input
      Git 2.19-rc0

Kana Natsuno (2):
      t4018: add missing test cases for PHP
      userdiff: support new keywords in PHP hunk header

Kim Gybels (1):
      gc --auto: release pack files before auto packing

Kirill Smelkov (1):
      fetch-pack: test explicitly that --all can fetch tag references pointing to non-commits

Luis Marsano (2):
      git-credential-netrc: use in-tree Git.pm for tests
      git-credential-netrc: fix exit status when tests fail

Luke Diamand (6):
      git-p4: python3: replace <> with !=
      git-p4: python3: replace dict.has_key(k) with "k in dict"
      git-p4: python3: remove backticks
      git-p4: python3: basestring workaround
      git-p4: python3: use print() function
      git-p4: python3: fix octal constants

Marc Strapetz (1):
      Documentation: declare "core.ignoreCase" as internal variable

Martin Ågren (1):
      refspec: initalize `refspec_item` in `valid_fetch_refspec()`

Masaya Suzuki (2):
      builtin/send-pack: populate the default configs
      doc: fix want-capability separator

Max Kirillov (4):
      http-backend: cleanup writing to child process
      http-backend: respect CONTENT_LENGTH as specified by rfc3875
      unpack-trees: do not fail reset because of unmerged skipped entry
      http-backend: respect CONTENT_LENGTH for receive-pack

Michael Barabanov (1):
      filter-branch: skip commits present on --state-branch

Mike Hommey (1):
      fast-import: do not call diff_delta() with empty buffer

Nguyễn Thái Ngọc Duy (98):
      commit-slab.h: code split
      commit-slab: support shared commit-slab
      blame: use commit-slab for blame suspects instead of commit->util
      describe: use commit-slab for commit names instead of commit->util
      shallow.c: use commit-slab for commit depth instead of commit->util
      sequencer.c: use commit-slab to mark seen commits
      sequencer.c: use commit-slab to associate todo items to commits
      revision.c: use commit-slab for show_source
      bisect.c: use commit-slab for commit weight instead of commit->util
      name-rev: use commit-slab for rev-name instead of commit->util
      show-branch: use commit-slab for commit-name instead of commit->util
      show-branch: note about its object flags usage
      log: use commit-slab in prepare_bases() instead of commit->util
      merge: use commit-slab in merge remote desc instead of commit->util
      commit.h: delete 'util' field in struct commit
      diff: ignore --ita-[in]visible-in-index when diffing worktree-to-tree
      diff: turn --ita-invisible-in-index on by default
      t2203: add a test about "diff HEAD" case
      apply: add --intent-to-add
      parse-options: option to let --git-completion-helper show negative form
      completion: suppress some -no- options
      Add and use generic name->id mapping code for color slot parsing
      grep: keep all colors in an array
      fsck: factor out msg_id_info[] lazy initialization code
      help: add --config to list all available config
      fsck: produce camelCase config key names
      advice: keep config name in camelCase in advice_config[]
      am: move advice.amWorkDir parsing back to advice.c
      completion: drop the hard coded list of config vars
      completion: keep other config var completion in camelCase
      completion: support case-insensitive config vars
      log-tree: allow to customize 'grafted' color
      completion: complete general config vars in two steps
      upload-pack: reject shallow requests that would return nothing
      completion: collapse extra --no-.. options
      Update messages in preparation for i18n
      archive-tar.c: mark more strings for translation
      archive-zip.c: mark more strings for translation
      builtin/config.c: mark more strings for translation
      builtin/grep.c: mark strings for translation
      builtin/pack-objects.c: mark more strings for translation
      builtin/replace.c: mark more strings for translation
      commit-graph.c: mark more strings for translation
      config.c: mark more strings for translation
      connect.c: mark more strings for translation
      convert.c: mark more strings for translation
      dir.c: mark more strings for translation
      environment.c: mark more strings for translation
      exec-cmd.c: mark more strings for translation
      object.c: mark more strings for translation
      pkt-line.c: mark more strings for translation
      refs.c: mark more strings for translation
      refspec.c: mark more strings for translation
      replace-object.c: mark more strings for translation
      sequencer.c: mark more strings for translation
      sha1-file.c: mark more strings for translation
      transport.c: mark more strings for translation
      transport-helper.c: mark more strings for translation
      pack-objects: document about thread synchronization
      apply.h: drop extern on func declaration
      attr.h: drop extern from function declaration
      blame.h: drop extern on func declaration
      cache-tree.h: drop extern from function declaration
      convert.h: drop 'extern' from function declaration
      diffcore.h: drop extern from function declaration
      diff.h: remove extern from function declaration
      line-range.h: drop extern from function declaration
      rerere.h: drop extern from function declaration
      repository.h: drop extern from function declaration
      revision.h: drop extern from function declaration
      submodule.h: drop extern from function declaration
      config.txt: reorder blame stuff to keep config keys sorted
      Makefile: add missing dependency for command-list.h
      diff.c: move read_index() code back to the caller
      cache-tree: wrap the_index based wrappers with #ifdef
      attr: remove an implicit dependency on the_index
      convert.c: remove an implicit dependency on the_index
      dir.c: remove an implicit dependency on the_index in pathspec code
      preload-index.c: use the right index instead of the_index
      ls-files: correct index argument to get_convert_attr_ascii()
      unpack-trees: remove 'extern' on function declaration
      unpack-trees: add a note about path invalidation
      unpack-trees: don't shadow global var the_index
      unpack-trees: convert clear_ce_flags* to avoid the_index
      unpack-trees: avoid the_index in verify_absent()
      pathspec.c: use the right index instead of the_index
      submodule.c: use the right index instead of the_index
      entry.c: use the right index instead of the_index
      attr: remove index from git_attr_set_direction()
      grep: use the right index instead of the_index
      archive.c: avoid access to the_index
      archive-*.c: use the right repository
      resolve-undo.c: use the right index instead of the_index
      apply.c: pass struct apply_state to more functions
      apply.c: make init_apply_state() take a struct repository
      apply.c: remove implicit dependency on the_index
      blame.c: remove implicit dependency on the_index
      cherry-pick: fix --quit not deleting CHERRY_PICK_HEAD

Nicholas Guriev (1):
      mergetool: don't suggest to continue after last file

Olga Telezhnaya (5):
      ref-filter: add info_source to valid_atom
      ref-filter: fill empty fields with empty values
      ref-filter: initialize eaten variable
      ref-filter: merge get_obj and get_object
      ref-filter: use oid_object_info() to get object

Phillip Wood (5):
      add -p: fix counting empty context lines in edited patches
      sequencer: do not squash 'reword' commits when we hit conflicts
      rebase -i: fix numbering in squash message
      t3430: add conflicting commit
      rebase -i: fix SIGSEGV when 'merge <branch>' fails

Prathamesh Chavan (4):
      submodule foreach: correct '$path' in nested submodules from a subdirectory
      submodule foreach: document '$sm_path' instead of '$path'
      submodule foreach: document variable '$displaypath'
      submodule: port submodule subcommand 'foreach' from shell to C

Ramsay Jones (3):
      fsck: check skiplist for object in fsck_blob()
      t6036: fix broken && chain in sub-shell
      t5562: avoid non-portable "export FOO=bar" construct

René Scharfe (7):
      remote: clear string_list after use in mv()
      add, update-index: fix --chmod argument help
      difftool: remove angular brackets from argument help
      pack-objects: specify --index-version argument help explicitly
      send-pack: specify --force-with-lease argument help explicitly
      shortlog: correct option help for -w
      parse-options: automatically infer PARSE_OPT_LITERAL_ARGHELP

SZEDER Gábor (19):
      update-ref --stdin: use skip_prefix()
      t7510-signed-commit: use 'test_must_fail'
      tests: make forging GPG signed commits and tags more robust
      t5541: clean up truncating access log
      t/lib-httpd: add the strip_access_log() helper function
      t/lib-httpd: avoid occasional failures when checking access.log
      t5608: fix broken &&-chain
      t9300: wait for background fast-import process to die after killing it
      travis-ci: run Coccinelle static analysis with two parallel jobs
      travis-ci: fail if Coccinelle static analysis found something to transform
      coccinelle: mark the 'coccicheck' make target as .PHONY
      coccinelle: use $(addsuffix) in 'coccicheck' make target
      coccinelle: exclude sha1dc source files from static analysis
      coccinelle: put sane filenames into output patches
      coccinelle: extract dedicated make target to clean Coccinelle's results
      travis-ci: include the trash directories of failed tests in the trace log
      t5318: use 'test_cmp_bin' to compare commit-graph files
      t5318: avoid unnecessary command substitutions
      t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test

Sebastian Kisela (2):
      git-instaweb: support Fedora/Red Hat apache module path
      git-instaweb: fix apache2 config with apache >= 2.4

Stefan Beller (87):
      repository: introduce parsed objects field
      object: add repository argument to create_object
      alloc: add repository argument to alloc_blob_node
      alloc: add repository argument to alloc_tree_node
      alloc: add repository argument to alloc_commit_node
      alloc: add repository argument to alloc_tag_node
      alloc: add repository argument to alloc_object_node
      alloc: add repository argument to alloc_report
      alloc: add repository argument to alloc_commit_index
      object: allow grow_object_hash to handle arbitrary repositories
      object: allow create_object to handle arbitrary repositories
      alloc: allow arbitrary repositories for alloc functions
      object-store: move object access functions to object-store.h
      shallow: add repository argument to set_alternate_shallow_file
      shallow: add repository argument to register_shallow
      shallow: add repository argument to check_shallow_file_for_update
      shallow: add repository argument to is_repository_shallow
      cache: convert get_graft_file to handle arbitrary repositories
      path.c: migrate global git_path_* to take a repository argument
      shallow: migrate shallow information into the object parser
      commit: allow prepare_commit_graft to handle arbitrary repositories
      commit: allow lookup_commit_graft to handle arbitrary repositories
      refs/packed-backend.c: close fd of empty file
      submodule--helper: plug mem leak in print_default_remote
      sequencer.c: plug leaks in do_pick_commit
      submodule: fix NULL correctness in renamed broken submodules
      t5526: test recursive submodules when fetching moved submodules
      submodule: unset core.worktree if no working tree is present
      submodule: ensure core.worktree is set after update
      submodule deinit: unset core.worktree
      submodule.c: report the submodule that an error occurs in
      sequencer.c: plug mem leak in git_sequencer_config
      .mailmap: merge different spellings of names
      object: add repository argument to parse_object
      object: add repository argument to lookup_object
      object: add repository argument to parse_object_buffer
      object: add repository argument to object_as_type
      blob: add repository argument to lookup_blob
      tree: add repository argument to lookup_tree
      commit: add repository argument to lookup_commit_reference_gently
      commit: add repository argument to lookup_commit_reference
      commit: add repository argument to lookup_commit
      commit: add repository argument to parse_commit_buffer
      commit: add repository argument to set_commit_buffer
      commit: add repository argument to get_cached_commit_buffer
      tag: add repository argument to lookup_tag
      tag: add repository argument to parse_tag_buffer
      tag: add repository argument to deref_tag
      object: allow object_as_type to handle arbitrary repositories
      object: allow lookup_object to handle arbitrary repositories
      blob: allow lookup_blob to handle arbitrary repositories
      tree: allow lookup_tree to handle arbitrary repositories
      commit: allow lookup_commit to handle arbitrary repositories
      tag: allow lookup_tag to handle arbitrary repositories
      tag: allow parse_tag_buffer to handle arbitrary repositories
      commit.c: allow parse_commit_buffer to handle arbitrary repositories
      commit-slabs: remove realloc counter outside of slab struct
      commit.c: migrate the commit buffer to the parsed object store
      commit.c: allow set_commit_buffer to handle arbitrary repositories
      commit.c: allow get_cached_commit_buffer to handle arbitrary repositories
      object.c: allow parse_object_buffer to handle arbitrary repositories
      object.c: allow parse_object to handle arbitrary repositories
      tag.c: allow deref_tag to handle arbitrary repositories
      commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories
      commit.c: allow lookup_commit_reference to handle arbitrary repositories
      xdiff/xdiff.h: remove unused flags
      xdiff/xdiffi.c: remove unneeded function declarations
      t4015: avoid git as a pipe input
      diff.c: do not pass diff options as keydata to hashmap
      diff.c: adjust hash function signature to match hashmap expectation
      diff.c: add a blocks mode for moved code detection
      diff.c: decouple white space treatment from move detection algorithm
      diff.c: factor advance_or_nullify out of mark_color_as_moved
      diff.c: add white space mode to move detection that allows indent changes
      diff.c: offer config option to control ws handling in move detection
      xdiff/xhistogram: pass arguments directly to fall_back_to_classic_diff
      xdiff/xhistogram: factor out memory cleanup into free_index()
      xdiff/xhistogram: move index allocation into find_lcs
      Documentation/git-interpret-trailers: explain possible values
      xdiff/histogram: remove tail recursion
      t1300: document current behavior of setting options
      xdiff: reduce indent heuristic overhead
      config: fix case sensitive subsection names on writing
      git-config: document accidental multi-line setting in deprecated syntax
      git-submodule.sh: accept verbose flag in cmd_update to be non-quiet
      t7410: update to new style
      builtin/submodule--helper: remove stray new line

Taylor Blau (9):
      Documentation/config.txt: camel-case lineNumber for consistency
      grep.c: expose {,inverted} match column in match_line()
      grep.[ch]: extend grep_opt to allow showing matched column
      grep.c: display column number of first match
      builtin/grep.c: add '--column' option to 'git-grep(1)'
      grep.c: add configuration variables to show matched option
      contrib/git-jump/git-jump: jump to exact location
      grep.c: extract show_line_header()
      grep.c: teach 'git grep --only-matching'

Thomas Rast (1):
      range-diff: add tests

Tobias Klauser (1):
      git-rebase--preserve-merges: fix formatting of todo help message

Todd Zullinger (4):
      git-credential-netrc: minor whitespace cleanup in test script
      git-credential-netrc: make "all" default target of Makefile
      gitignore.txt: clarify default core.excludesfile path
      dir.c: fix typos in core.excludesfile comment

Ville Skyttä (1):
      Documentation: spelling and grammar fixes

Vladimir Parfinenko (1):
      rebase: fix documentation formatting

William Chargin (2):
      sha1-name.c: for ":/", find detached HEAD commits
      t: factor out FUNNYNAMES as shared lazy prereq

Xiaolong Ye (1):
      format-patch: clear UNINTERESTING flag before prepare_bases

brian m. carlson (21):
      send-email: add an auto option for transfer encoding
      send-email: accept long lines with suitable transfer encoding
      send-email: automatically determine transfer-encoding
      docs: correct RFC specifying email line length
      sequencer: pass absolute GIT_WORK_TREE to exec commands
      cache: update object ID functions for the_hash_algo
      tree-walk: replace hard-coded constants with the_hash_algo
      hex: switch to using the_hash_algo
      commit: express tree entry constants in terms of the_hash_algo
      strbuf: allocate space with GIT_MAX_HEXSZ
      sha1-name: use the_hash_algo when parsing object names
      refs/files-backend: use the_hash_algo for writing refs
      builtin/update-index: convert to using the_hash_algo
      builtin/update-index: simplify parsing of cacheinfo
      builtin/fmt-merge-msg: make hash independent
      builtin/merge: switch to use the_hash_algo
      builtin/merge-recursive: make hash independent
      diff: switch GIT_SHA1_HEXSZ to use the_hash_algo
      log-tree: switch GIT_SHA1_HEXSZ to the_hash_algo->hexsz
      sha1-file: convert constants to uses of the_hash_algo
      pretty: switch hard-coded constants to the_hash_algo

Ævar Arnfjörð Bjarmason (36):
      checkout tests: index should be clean after dwim checkout
      checkout.h: wrap the arguments to unique_tracking_name()
      checkout.c: introduce an *_INIT macro
      checkout.c: change "unique" member to "num_matches"
      checkout: pass the "num_matches" up to callers
      builtin/checkout.c: use "ret" variable for return
      checkout: add advice for ambiguous "checkout <branch>"
      checkout & worktree: introduce checkout.defaultRemote
      refspec: s/refspec_item_init/&_or_die/g
      refspec: add back a refspec_item_init() function
      doc hash-function-transition: note the lack of a changelog
      receive.fsck.<msg-id> tests: remove dead code
      config doc: don't describe *.fetchObjects twice
      config doc: unify the description of fsck.* and receive.fsck.*
      config doc: elaborate on what transfer.fsckObjects does
      config doc: elaborate on fetch.fsckObjects security
      transfer.fsckObjects tests: untangle confusing setup
      fetch: implement fetch.fsck.*
      fsck: test & document {fetch,receive}.fsck.* config fallback
      fsck: add stress tests for fsck.skipList
      fsck: test and document unknown fsck.<msg-id> values
      tests: make use of the test_must_be_empty function
      tests: make use of the test_must_be_empty function
      fetch tests: change "Tag" test tag to "testTag"
      push tests: remove redundant 'git push' invocation
      push tests: fix logic error in "push" test assertion
      push tests: add more testing for forced tag pushing
      push tests: assert re-pushing annotated tags
      negotiator: unknown fetch.negotiationAlgorithm should error out
      fetch doc: cross-link two new negotiation options
      sha1dc: update from upstream
      push: use PARSE_OPT_LITERAL_ARGHELP instead of unbalanced brackets
      fetch tests: correct a comment "remove it" -> "remove them"
      pull doc: fix a long-standing grammar error
      submodule: add more exhaustive up-path testing
      t2024: mark test using "checkout -p" with PERL prerequisite

Łukasz Stelmach (1):
      completion: complete remote names too


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-20 22:13 [ANNOUNCE] Git v2.19.0-rc0 Junio C Hamano
@ 2018-08-20 22:41 ` Stefan Beller
  2018-08-20 23:39   ` Jonathan Nieder
  2018-08-21 20:41 ` Derrick Stolee
  1 sibling, 1 reply; 58+ messages in thread
From: Stefan Beller @ 2018-08-20 22:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linux-Kernel@Vger. Kernel. Org, git-packagers

>  * The conversion to pass "the_repository" and then "a_repository"
>    throughout the object access API continues.
>
[...]
>
>  * The conversion to pass "the_repository" and then "a_repository"
>    throughout the object access API continues.

I guess it continues twice as two large series were merged? ;-)
sb/object-store-grafts
sb/object-store-lookup

The latter one is not the correct one, as later we'll have

  * lookup_commit_reference() and friends have been updated to find
    in-core object for a specific in-core repository instance.

>  * "git submodule" did not correctly adjust core.worktree setting that
>    indicates whether/where a submodule repository has its associated
>    working tree across various state transitions, which has been
>    corrected.
>    (merge 984cd77ddb sb/submodule-core-worktree later to maint).

Personally I do not view this as a bug fix but a feature
(but then again my thinking might be tainted of too much
submodule work) hence I would not merge it down.

Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-20 22:41 ` Stefan Beller
@ 2018-08-20 23:39   ` Jonathan Nieder
  2018-08-21  0:27     ` Jonathan Nieder
  0 siblings, 1 reply; 58+ messages in thread
From: Jonathan Nieder @ 2018-08-20 23:39 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git

(-cc: other lists)
Stefan Beller wrote:
> Junio C Hamano wrote:

>>  * "git submodule" did not correctly adjust core.worktree setting that
>>    indicates whether/where a submodule repository has its associated
>>    working tree across various state transitions, which has been
>>    corrected.
>>    (merge 984cd77ddb sb/submodule-core-worktree later to maint).
>
> Personally I do not view this as a bug fix but a feature
> (but then again my thinking might be tainted of too much
> submodule work) hence I would not merge it down.

Can you elaborate?

The symptom that this series fixes was pretty bad, so I'm pretty glad
you wrote it.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-20 23:39   ` Jonathan Nieder
@ 2018-08-21  0:27     ` Jonathan Nieder
  2018-08-21  0:46       ` Stefan Beller
  0 siblings, 1 reply; 58+ messages in thread
From: Jonathan Nieder @ 2018-08-21  0:27 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git

Jonathan Nieder wrote:
> Stefan Beller wrote:
>> Junio C Hamano wrote:

>>>  * "git submodule" did not correctly adjust core.worktree setting that
>>>    indicates whether/where a submodule repository has its associated
>>>    working tree across various state transitions, which has been
>>>    corrected.
>>>    (merge 984cd77ddb sb/submodule-core-worktree later to maint).
>>
>> Personally I do not view this as a bug fix but a feature
>> (but then again my thinking might be tainted of too much
>> submodule work) hence I would not merge it down.
>
> Can you elaborate?

... ah, I figured it out.  You are saying "would not merge it down to
maint".  In that case, I agree, since this this is not a recent bug
(it's existed since before v1.7.10-rc1~14^2~2, 2012-03-02).

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-21  0:27     ` Jonathan Nieder
@ 2018-08-21  0:46       ` Stefan Beller
  0 siblings, 0 replies; 58+ messages in thread
From: Stefan Beller @ 2018-08-21  0:46 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Junio C Hamano, git

On Mon, Aug 20, 2018 at 5:27 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Jonathan Nieder wrote:
> > Stefan Beller wrote:
> >> Junio C Hamano wrote:
>
> >>>  * "git submodule" did not correctly adjust core.worktree setting that
> >>>    indicates whether/where a submodule repository has its associated
> >>>    working tree across various state transitions, which has been
> >>>    corrected.
> >>>    (merge 984cd77ddb sb/submodule-core-worktree later to maint).
> >>
> >> Personally I do not view this as a bug fix but a feature
> >> (but then again my thinking might be tainted of too much
> >> submodule work) hence I would not merge it down.
> >
> > Can you elaborate?
>
> ... ah, I figured it out.  You are saying "would not merge it down to
> maint".  In that case, I agree, since this this is not a recent bug
> (it's existed since before v1.7.10-rc1~14^2~2, 2012-03-02).

Yeah; the behavior was the gold standard for submodules ever since,
so I am wary of changing it under the guise of fixing a bug.
The core.worktree setting doesn't harm the user by default; you
need to craft a very specific situation to benefit from this feature.

Stefan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-20 22:13 [ANNOUNCE] Git v2.19.0-rc0 Junio C Hamano
  2018-08-20 22:41 ` Stefan Beller
@ 2018-08-21 20:41 ` Derrick Stolee
  2018-08-21 21:29   ` Jeff King
  1 sibling, 1 reply; 58+ messages in thread
From: Derrick Stolee @ 2018-08-21 20:41 UTC (permalink / raw)
  To: Junio C Hamano, git

On 8/20/2018 6:13 PM, Junio C Hamano wrote:
> An early preview release Git v2.19.0-rc0 is now available for
> testing at the usual places.

As part of testing the release candidate, I ran the performance suite 
against a fresh clone of the Linux repository using v2.18.0 and 
v2.19.0-rc0 (also: GIT_PERF_REPEAT_COUNT=10). I found a few nice 
improvements, but I also found a possible regression in tree walking. I 
say "tree walking" because it was revealed using p0001-rev-list.sh, but 
only with the "--objects" flag. I also saw some similar numbers on 'git 
log --raw'.

Test v2.18.0             v2.19.0-rc0
--------------------------------------------------------------------------------------------
0001.1: rev-list --all 6.69(6.33+0.35)     6.52(6.20+0.31) -2.5%
0001.2: rev-list --all --objects 52.14(47.43+1.02)   57.15(51.09+1.18) +9.6%

To me, 9.6% seems out of the range of just noise for this length of a 
command, but I could be wrong. Could anyone else try to repro these results?

(This may also not just be tree-walking, but general pack-file loading 
and decompression, since I computed and stored a commit-graph file. 
Hence, commits are not being parsed from the pack-file by either command.)

Aside: the perf results were not all bad. Here was an interesting 
improvement:

Test v2.18.0             v2.19.0-rc0
--------------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times 5.63(5.30+0.32)       
3.34(3.03+0.30) -40.7%

Thanks,

-Stolee


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-21 20:41 ` Derrick Stolee
@ 2018-08-21 21:29   ` Jeff King
  2018-08-22  0:48     ` brian m. carlson
  0 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-21 21:29 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: brian m. carlson, Junio C Hamano, git

On Tue, Aug 21, 2018 at 04:41:02PM -0400, Derrick Stolee wrote:

> On 8/20/2018 6:13 PM, Junio C Hamano wrote:
> > An early preview release Git v2.19.0-rc0 is now available for
> > testing at the usual places.
> 
> As part of testing the release candidate, I ran the performance suite
> against a fresh clone of the Linux repository using v2.18.0 and v2.19.0-rc0
> (also: GIT_PERF_REPEAT_COUNT=10).

Wow, you're a glutton for punishment. :)

> I found a few nice improvements, but I
> also found a possible regression in tree walking. I say "tree walking"
> because it was revealed using p0001-rev-list.sh, but only with the
> "--objects" flag. I also saw some similar numbers on 'git log --raw'.
> 
> Test v2.18.0             v2.19.0-rc0
> --------------------------------------------------------------------------------------------
> 0001.1: rev-list --all 6.69(6.33+0.35)     6.52(6.20+0.31) -2.5%
> 0001.2: rev-list --all --objects 52.14(47.43+1.02)   57.15(51.09+1.18) +9.6%
> 
> To me, 9.6% seems out of the range of just noise for this length of a
> command, but I could be wrong. Could anyone else try to repro these results?

I got:

0001.2: rev-list --all --objects  37.07(36.62+0.45)   39.11(38.58+0.51) +5.5%

Less change, but my overall times were smaller, too, so clearly our
hardware or exact repos are a little bit different. Those numbers seem
pretty consistent in further runs.

It bisects to 509f6f62a4 (cache: update object ID functions for
the_hash_algo, 2018-07-16). Which make sense. An "--objects" traversal
spends a huge amount of time checking each tree entry to see if we've
processed that object yet, which ends up as hashcmp() in the hash table.
I expect that a fixed 20-byte memcmp() can be optimized a lot more than
one with an arbitrary value.

Even if _we_ know the value can only take on one of a few values, I
don't know that we have an easy way to tell the compiler that. Possibly
we could improve things by jumping directly to an optimized code path.
Sort of a poor-man's JIT. ;)

Doing this:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..9c004a26c9 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-	return memcmp(sha1, sha2, the_hash_algo->rawsz);
+	if (the_hash_algo->rawsz == 20)
+		return memcmp(sha1, sha2, 20);
+	else
+		return memcmp(sha1, sha1, the_hash_algo->rawsz);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
on top of v2.19-rc0 seems to give me about a 3% speedup (though I might
be imaging it, as there's a bit of noise). A function pointer in
the_hash_algo might make even more sense.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-21 21:29   ` Jeff King
@ 2018-08-22  0:48     ` brian m. carlson
  2018-08-22  3:03       ` Jeff King
  0 siblings, 1 reply; 58+ messages in thread
From: brian m. carlson @ 2018-08-22  0:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 2253 bytes --]

On Tue, Aug 21, 2018 at 05:29:24PM -0400, Jeff King wrote:
> 0001.2: rev-list --all --objects  37.07(36.62+0.45)   39.11(38.58+0.51) +5.5%
> 
> Less change, but my overall times were smaller, too, so clearly our
> hardware or exact repos are a little bit different. Those numbers seem
> pretty consistent in further runs.
> 
> It bisects to 509f6f62a4 (cache: update object ID functions for
> the_hash_algo, 2018-07-16). Which make sense. An "--objects" traversal
> spends a huge amount of time checking each tree entry to see if we've
> processed that object yet, which ends up as hashcmp() in the hash table.
> I expect that a fixed 20-byte memcmp() can be optimized a lot more than
> one with an arbitrary value.
> 
> Even if _we_ know the value can only take on one of a few values, I
> don't know that we have an easy way to tell the compiler that. Possibly
> we could improve things by jumping directly to an optimized code path.
> Sort of a poor-man's JIT. ;)
> 
> Doing this:
> 
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..9c004a26c9 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  {
> -	return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +	if (the_hash_algo->rawsz == 20)
> +		return memcmp(sha1, sha2, 20);
> +	else
> +		return memcmp(sha1, sha1, the_hash_algo->rawsz);
>  }
>  
>  static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
> on top of v2.19-rc0 seems to give me about a 3% speedup (though I might
> be imaging it, as there's a bit of noise). A function pointer in
> the_hash_algo might make even more sense.

It's possible that might be a better solution.  I looked into a GCC
assertion that the value was either 20 or 32, and that in itself didn't
seem to help, at least in the generated code.  Your solution is likely
better in that regard.

We could wire it up to be either 20 or 32 and let people experimenting
with other sizes of algorithms just add another branch.  I haven't
tested how that performs, though.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22  0:48     ` brian m. carlson
@ 2018-08-22  3:03       ` Jeff King
  2018-08-22  3:36         ` Jeff King
                           ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Jeff King @ 2018-08-22  3:03 UTC (permalink / raw)
  To: brian m. carlson, Derrick Stolee, Junio C Hamano, git

On Wed, Aug 22, 2018 at 12:48:16AM +0000, brian m. carlson wrote:

> > diff --git a/cache.h b/cache.h
> > index b1fd3d58ab..9c004a26c9 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
> >  
> >  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
> >  {
> > -	return memcmp(sha1, sha2, the_hash_algo->rawsz);
> > +	if (the_hash_algo->rawsz == 20)
> > +		return memcmp(sha1, sha2, 20);
> > +	else
> > +		return memcmp(sha1, sha1, the_hash_algo->rawsz);
> >  }
> >  
> >  static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
> > on top of v2.19-rc0 seems to give me about a 3% speedup (though I might
> > be imaging it, as there's a bit of noise). A function pointer in
> > the_hash_algo might make even more sense.
> 
> It's possible that might be a better solution.  I looked into a GCC
> assertion that the value was either 20 or 32, and that in itself didn't
> seem to help, at least in the generated code.  Your solution is likely
> better in that regard.
> 
> We could wire it up to be either 20 or 32 and let people experimenting
> with other sizes of algorithms just add another branch.  I haven't
> tested how that performs, though.

Here's a _really_ dirty one:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..a6750524ea 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,6 +1023,7 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
+	assert(the_hash_algo->rawsz == 20);
 	return memcmp(sha1, sha2, the_hash_algo->rawsz);
 }
 

We probably don't want to do that, because it makes experimenting with
new hash algos a bit painful, but it gives the same 3-4% speedup pretty
consistently. But I think it demonstrates pretty clearly that giving the
compiler the extra limit information is sufficient. Presumably the
fixed-size memcmp turns into a few multi-word compares.

And indeed, if I look at the generated asm for the call in lookup_object
(which is likely the one we're hitting a lot in this case), I see:

  # cache.h:1027:         return memcmp(sha1, sha2, the_hash_algo->rawsz);
          .loc 4 1027 9 is_stmt 0 view .LVU86
          movq    (%rsi), %rcx    # MEM[(void *)sha1_25(D)], MEM[(void *)sha1_25(D)]
          movq    8(%rsi), %rdi   # MEM[(void *)sha1_25(D)], tmp125
          xorq    4(%rax), %rcx   # MEM[(void *)_6], tmp116
          xorq    8(%r8), %rdi    # MEM[(void *)_6], tmp115
          orq     %rcx, %rdi      # tmp116, tmp115
          jne     .L27    #,
          movl    16(%r8), %ecx   # MEM[(void *)_6], tmp129
          cmpl    %ecx, 16(%rsi)  # tmp129, MEM[(void *)sha1_25(D)]
          jne     .L27    #,

So I wonder if there's some other way to tell the compiler that we'll
only have a few values. An enum comes to mind, though I don't think the
enum rules are strict enough to make this guarantee (after all, it's OK
to bitwise-OR enums, so they clearly don't specify all possible values).

Having a dedicate hashcmp function for each hash_algo seems like the
sanest approach. We pay for one indirect function call, but the function
itself will have the constants available. But it does introduce one
extra complication. We're benefiting here from knowing that the size is
always 20, but also the inline hashcmp knows that we only care about
equality, not comparison.

So if I do this:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..da56da7be2 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,7 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-	return memcmp(sha1, sha2, the_hash_algo->rawsz);
+	return the_hash_algo->cmp_fn(sha1, sha2);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
diff --git a/hash.h b/hash.h
index 7c8238bc2e..ac22ba63b6 100644
--- a/hash.h
+++ b/hash.h
@@ -64,6 +64,7 @@ typedef union git_hash_ctx git_hash_ctx;
 typedef void (*git_hash_init_fn)(git_hash_ctx *ctx);
 typedef void (*git_hash_update_fn)(git_hash_ctx *ctx, const void *in, size_t len);
 typedef void (*git_hash_final_fn)(unsigned char *hash, git_hash_ctx *ctx);
+typedef int (*git_hash_cmp_fn)(const void *a, const void *b);
 
 struct git_hash_algo {
 	/*
@@ -90,6 +91,8 @@ struct git_hash_algo {
 	/* The hash finalization function. */
 	git_hash_final_fn final_fn;
 
+	git_hash_cmp_fn cmp_fn;
+
 	/* The OID of the empty tree. */
 	const struct object_id *empty_tree;
 
diff --git a/sha1-file.c b/sha1-file.c
index 97b7423848..7072e360d7 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -69,6 +69,11 @@ static void git_hash_sha1_final(unsigned char *hash, git_hash_ctx *ctx)
 	git_SHA1_Final(hash, &ctx->sha1);
 }
 
+static int git_hash_sha1_cmp(const void *a, const void *b)
+{
+	return memcmp(a, b, 20);
+}
+
 static void git_hash_unknown_init(git_hash_ctx *ctx)
 {
 	BUG("trying to init unknown hash");
@@ -84,6 +89,11 @@ static void git_hash_unknown_final(unsigned char *hash, git_hash_ctx *ctx)
 	BUG("trying to finalize unknown hash");
 }
 
+static int git_hash_unknown_cmp(const void *a, const void *b)
+{
+	BUG("trying to compare unknown hash");
+}
+
 const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 	{
 		NULL,
@@ -93,6 +103,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		git_hash_unknown_init,
 		git_hash_unknown_update,
 		git_hash_unknown_final,
+		git_hash_unknown_cmp,
 		NULL,
 		NULL,
 	},
@@ -105,6 +116,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
 		git_hash_sha1_init,
 		git_hash_sha1_update,
 		git_hash_sha1_final,
+		git_hash_sha1_cmp,
 		&empty_tree_oid,
 		&empty_blob_oid,
 	},

the result is actually _slower_ than the current code.

If I instead introduce an "eq" function, then that function can do the
optimized thing. So if I do something more like this:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..52533a9710 100644
--- a/cache.h
+++ b/cache.h
@@ -1026,6 +1026,11 @@ static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 	return memcmp(sha1, sha2, the_hash_algo->rawsz);
 }
 
+static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
+{
+	return the_hash_algo->eq_fn(sha1, sha2);
+}
+
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
 {
 	return hashcmp(oid1->hash, oid2->hash);
diff --git a/sha1-file.c b/sha1-file.c
index 97b7423848..a491ff5bef 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -69,6 +69,11 @@ static void git_hash_sha1_final(unsigned char *hash, git_hash_ctx *ctx)
 	git_SHA1_Final(hash, &ctx->sha1);
 }
 
+static int git_hash_sha1_eq(const void *a, const void *b)
+{
+	return !memcmp(a, b, 20);
+}
+
 static void git_hash_unknown_init(git_hash_ctx *ctx)
 {
 	BUG("trying to init unknown hash");
> [omitting similar hunks from the last one]

plus converting this one callsite:

diff --git a/object.c b/object.c
index 51c4594515..e54160550c 100644
--- a/object.c
+++ b/object.c
@@ -95,7 +95,7 @@ struct object *lookup_object(struct repository *r, const unsigned char *sha1)
 
 	first = i = hash_obj(sha1, r->parsed_objects->obj_hash_size);
 	while ((obj = r->parsed_objects->obj_hash[i]) != NULL) {
-		if (!hashcmp(sha1, obj->oid.hash))
+		if (hasheq(sha1, obj->oid.hash))
 			break;
 		i++;
 		if (i == r->parsed_objects->obj_hash_size)

I get about a 1.5% speedup. If I apply this coccinelle patch:

@@
expression a, b;
@@
- !hashcmp(a, b)
+ hasheq(a, b)

@@
expression a, b;
@@
- !oidcmp(a, b)
+ oideq(a, b)

with the obvious "oideq()" implementation added, that seems to get me to
2-3%. Not _quite_ as good as the original branching version I showed.
And we had to touch all the callsites (although arguably that kind of
"eq" function is a better interface anyway, since it obviously allows
for more optimization.

So maybe the branching thing is actually not so insane. It makes new
hash_algo's Just Work; they just won't be optimized. And the change is
very localized.

Or maybe it's crazy to spend any effort at all chasing a few percent.
It's not like people's large repositories aren't just going to grow by
that much after a few months anyway. ;) It just seems like if we can do
it for little cost, it's worth it.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22  3:03       ` Jeff King
@ 2018-08-22  3:36         ` Jeff King
  2018-08-22 11:11           ` Derrick Stolee
  2018-08-22  5:36         ` brian m. carlson
  2018-08-22 12:42         ` Paul Smith
  2 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-22  3:36 UTC (permalink / raw)
  To: brian m. carlson, Derrick Stolee, Junio C Hamano, git

On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:

> with the obvious "oideq()" implementation added, that seems to get me to
> 2-3%. Not _quite_ as good as the original branching version I showed.
> And we had to touch all the callsites (although arguably that kind of
> "eq" function is a better interface anyway, since it obviously allows
> for more optimization.
> 
> So maybe the branching thing is actually not so insane. It makes new
> hash_algo's Just Work; they just won't be optimized. And the change is
> very localized.

Hmph. So I went back to double-check my measurements on that branching
version, and I couldn't replicate it!

It turns out what I showed (and measured) before has a bug. Can you see
it?

diff --git a/cache.h b/cache.h
index b1fd3d58ab..9c004a26c9 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,10 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-	return memcmp(sha1, sha2, the_hash_algo->rawsz);
+	if (the_hash_algo->rawsz == 20)
+		return memcmp(sha1, sha2, 20);
+	else
+		return memcmp(sha1, sha1, the_hash_algo->rawsz);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)


The problem is the fallback code compares "sha1" to "sha1". The compiler
realizes that's a noop and is able to treat it like a constant. Thus
essentially leaving only the first branch, which it then expands into a
few instructions.

If we fix that bug, then we really do memcmp on either side of the
conditional. And the compiler is smart enough to realize that hey,
that's the same as just calling memcmp with the_hash_algo->rawsz on
either side. And we end up with roughly the same code that we started
with.

So the assert() version really is the fastest. I didn't test, but I
suspect we could "trick" the compiler by having the fallback call an
opaque wrapper around memcmp(). That would prevent it from combining the
two paths, and presumably it would still optimize the constant-20 side.
Or maybe it would eventually decide our inline function is getting too
big and scrap it. Which probably crosses a line of craziness (if I
didn't already cross it two emails ago).

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22  3:03       ` Jeff King
  2018-08-22  3:36         ` Jeff King
@ 2018-08-22  5:36         ` brian m. carlson
  2018-08-22  6:07           ` Jeff King
  2018-08-22 14:28           ` Derrick Stolee
  2018-08-22 12:42         ` Paul Smith
  2 siblings, 2 replies; 58+ messages in thread
From: brian m. carlson @ 2018-08-22  5:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 2225 bytes --]

On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
> So I wonder if there's some other way to tell the compiler that we'll
> only have a few values. An enum comes to mind, though I don't think the
> enum rules are strict enough to make this guarantee (after all, it's OK
> to bitwise-OR enums, so they clearly don't specify all possible values).

I was thinking about this:

diff --git a/cache.h b/cache.h
index 1398b2a4e4..1f5c6e9319 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-	return memcmp(sha1, sha2, the_hash_algo->rawsz);
+	switch (the_hash_algo->rawsz) {
+		case 20:
+			return memcmp(sha1, sha2, 20);
+		case 32:
+			return memcmp(sha1, sha2, 32);
+		default:
+			assert(0);
+	}
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)

That would make it obvious that there are at most two options.
Unfortunately, gcc for me determines that the buffer in walker.c is 20
bytes in size and steadfastly refuses to compile because it doesn't know
that the value will never be 32 in our codebase currently.  I'd need to
send in more patches before it would compile.

I don't know if something like this is an improvement or now, but this
seems to at least compile:

diff --git a/cache.h b/cache.h
index 1398b2a4e4..3207f74771 100644
--- a/cache.h
+++ b/cache.h
@@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-	return memcmp(sha1, sha2, the_hash_algo->rawsz);
+	switch (the_hash_algo->rawsz) {
+		case 20:
+		case 32:
+			return memcmp(sha1, sha2, the_hash_algo->rawsz);
+		default:
+			assert(0);
+	}
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)

I won't have time to sit down and test this out until tomorrow afternoon
at the earliest.  If you want to send in something in the mean time,
even if that limits things to just 20 for now, that's fine.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22  5:36         ` brian m. carlson
@ 2018-08-22  6:07           ` Jeff King
  2018-08-22  7:39             ` Ævar Arnfjörð Bjarmason
  2018-08-22 14:28           ` Derrick Stolee
  1 sibling, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-22  6:07 UTC (permalink / raw)
  To: brian m. carlson, Derrick Stolee, Junio C Hamano, git

On Wed, Aug 22, 2018 at 05:36:26AM +0000, brian m. carlson wrote:

> On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
> > So I wonder if there's some other way to tell the compiler that we'll
> > only have a few values. An enum comes to mind, though I don't think the
> > enum rules are strict enough to make this guarantee (after all, it's OK
> > to bitwise-OR enums, so they clearly don't specify all possible values).
> 
> I was thinking about this:
> 
> diff --git a/cache.h b/cache.h
> index 1398b2a4e4..1f5c6e9319 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  {
> -	return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +	switch (the_hash_algo->rawsz) {
> +		case 20:
> +			return memcmp(sha1, sha2, 20);
> +		case 32:
> +			return memcmp(sha1, sha2, 32);
> +		default:
> +			assert(0);
> +	}
>  }

Unfortunately this version doesn't seem to be any faster than the status
quo. And looking at the generated asm, it still looks to be calling
memcpy(). Removing the "case 32" branch switches it back to fast
assembly (this is all using gcc 8.2.0, btw). So I think we're deep into
guessing what the optimizer is going to do, and there's a good chance
that other versions are going to optimize it differently.

We might be better off just writing it out manually. Unfortunately, it's
a bit hard because the neg/0/pos return is more expensive to compute
than pure equality. And only the compiler knows at each inlined site
whether we actually want equality. So now we're back to switching every
caller to use hasheq() if that's what they want.

But _if_ we're OK with that, and _if_ we don't mind some ifdefs for
portability, then this seems as fast as the original (memcmp+constant)
code on my machine:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..c406105f3c 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,16 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-	return memcmp(sha1, sha2, the_hash_algo->rawsz);
+	switch (the_hash_algo->rawsz) {
+	case 20:
+		if (*(uint32_t *)sha1 == *(uint32_t *)sha2 &&
+		    *(unsigned __int128 *)(sha1+4) == *(unsigned __int128 *)(sha2+4))
+			return 0;
+	case 32:
+		return memcmp(sha1, sha2, 32);
+	default:
+		assert(0);
+	}
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)

Which is really no surprise, because the generated asm looks about the
same. There are obviously alignment questions there. It's possible it
could even be written portably as a simple loop. Or maybe not. We used
to do that, but modern compilers were able to optimize the memcmp
better. Maybe that's changed. Or maybe they were simply unwilling to
unroll a 20-length loop to find out that it could be turned into a few
quad-word compares.

> That would make it obvious that there are at most two options.
> Unfortunately, gcc for me determines that the buffer in walker.c is 20
> bytes in size and steadfastly refuses to compile because it doesn't know
> that the value will never be 32 in our codebase currently.  I'd need to
> send in more patches before it would compile.

Yeah, I see that warning all over the place (everywhere that calls
is_null_oid(), which is passing in a 20-byte buffer).

> I don't know if something like this is an improvement or now, but this
> seems to at least compile:
> 
> diff --git a/cache.h b/cache.h
> index 1398b2a4e4..3207f74771 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  {
> -	return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +	switch (the_hash_algo->rawsz) {
> +		case 20:
> +		case 32:
> +			return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +		default:
> +			assert(0);
> +	}

I think that would end up with the same slow code, as gcc would rather
call memcmp than expand out the two sets of asm.

> I won't have time to sit down and test this out until tomorrow afternoon
> at the earliest.  If you want to send in something in the mean time,
> even if that limits things to just 20 for now, that's fine.

I don't have a good option. The assert() thing works until I add in the
"32" branch, but that's just punting the issue off until you add support
for the new hash.

Hand-rolling our own asm or C is a portability headache, and we need to
change all of the callsites to use a new hasheq().

Hiding it behind a per-hash function is conceptually cleanest, but not
quite as fast. And it also requires hasheq().

So all of the solutions seem non-trivial.  Again, I'm starting to wonder
if it's worth chasing this few percent.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22  6:07           ` Jeff King
@ 2018-08-22  7:39             ` Ævar Arnfjörð Bjarmason
  2018-08-22 11:14               ` Derrick Stolee
  2018-08-22 15:14               ` Jeff King
  0 siblings, 2 replies; 58+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-22  7:39 UTC (permalink / raw)
  To: Jeff King
  Cc: brian m. carlson, Derrick Stolee, Junio C Hamano, Git Mailing List

On Wed, Aug 22, 2018 at 8:20 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Aug 22, 2018 at 05:36:26AM +0000, brian m. carlson wrote:
>
> > On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
> > > So I wonder if there's some other way to tell the compiler that we'll
> > > only have a few values. An enum comes to mind, though I don't think the
> > > enum rules are strict enough to make this guarantee (after all, it's OK
> > > to bitwise-OR enums, so they clearly don't specify all possible values).
> >
> > I was thinking about this:
> >
> > diff --git a/cache.h b/cache.h
> > index 1398b2a4e4..1f5c6e9319 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
> >
> >  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
> >  {
> > -     return memcmp(sha1, sha2, the_hash_algo->rawsz);
> > +     switch (the_hash_algo->rawsz) {
> > +             case 20:
> > +                     return memcmp(sha1, sha2, 20);
> > +             case 32:
> > +                     return memcmp(sha1, sha2, 32);
> > +             default:
> > +                     assert(0);
> > +     }
> >  }
>
> Unfortunately this version doesn't seem to be any faster than the status
> quo. And looking at the generated asm, it still looks to be calling
> memcpy(). Removing the "case 32" branch switches it back to fast
> assembly (this is all using gcc 8.2.0, btw). So I think we're deep into
> guessing what the optimizer is going to do, and there's a good chance
> that other versions are going to optimize it differently.
>
> We might be better off just writing it out manually. Unfortunately, it's
> a bit hard because the neg/0/pos return is more expensive to compute
> than pure equality. And only the compiler knows at each inlined site
> whether we actually want equality. So now we're back to switching every
> caller to use hasheq() if that's what they want.
>
> But _if_ we're OK with that, and _if_ we don't mind some ifdefs for
> portability, then this seems as fast as the original (memcmp+constant)
> code on my machine:
>
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..c406105f3c 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,7 +1023,16 @@ extern const struct object_id null_oid;
>
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  {
> -       return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +       switch (the_hash_algo->rawsz) {
> +       case 20:
> +               if (*(uint32_t *)sha1 == *(uint32_t *)sha2 &&
> +                   *(unsigned __int128 *)(sha1+4) == *(unsigned __int128 *)(sha2+4))
> +                       return 0;
> +       case 32:
> +               return memcmp(sha1, sha2, 32);
> +       default:
> +               assert(0);
> +       }
>  }
>
>  static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
>
> Which is really no surprise, because the generated asm looks about the
> same. There are obviously alignment questions there. It's possible it
> could even be written portably as a simple loop. Or maybe not. We used
> to do that, but modern compilers were able to optimize the memcmp
> better. Maybe that's changed. Or maybe they were simply unwilling to
> unroll a 20-length loop to find out that it could be turned into a few
> quad-word compares.
>
> > That would make it obvious that there are at most two options.
> > Unfortunately, gcc for me determines that the buffer in walker.c is 20
> > bytes in size and steadfastly refuses to compile because it doesn't know
> > that the value will never be 32 in our codebase currently.  I'd need to
> > send in more patches before it would compile.
>
> Yeah, I see that warning all over the place (everywhere that calls
> is_null_oid(), which is passing in a 20-byte buffer).
>
> > I don't know if something like this is an improvement or now, but this
> > seems to at least compile:
> >
> > diff --git a/cache.h b/cache.h
> > index 1398b2a4e4..3207f74771 100644
> > --- a/cache.h
> > +++ b/cache.h
> > @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
> >
> >  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
> >  {
> > -     return memcmp(sha1, sha2, the_hash_algo->rawsz);
> > +     switch (the_hash_algo->rawsz) {
> > +             case 20:
> > +             case 32:
> > +                     return memcmp(sha1, sha2, the_hash_algo->rawsz);
> > +             default:
> > +                     assert(0);
> > +     }
>
> I think that would end up with the same slow code, as gcc would rather
> call memcmp than expand out the two sets of asm.
>
> > I won't have time to sit down and test this out until tomorrow afternoon
> > at the earliest.  If you want to send in something in the mean time,
> > even if that limits things to just 20 for now, that's fine.
>
> I don't have a good option. The assert() thing works until I add in the
> "32" branch, but that's just punting the issue off until you add support
> for the new hash.
>
> Hand-rolling our own asm or C is a portability headache, and we need to
> change all of the callsites to use a new hasheq().
>
> Hiding it behind a per-hash function is conceptually cleanest, but not
> quite as fast. And it also requires hasheq().
>
> So all of the solutions seem non-trivial.  Again, I'm starting to wonder
> if it's worth chasing this few percent.

Did you try __builtin_expect? It's a GCC builtin for these sorts of
situations, and sometimes helps:
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

I.e. you'd tell GCC we expect to have the 20 there with:

    if (__builtin_expect(the_hash_algo->rawsz == 20, 1)) { ... }

The perl codebase has LIKELY() and UNLIKELY() macros for this which if
the feature isn't available fall back on just plain C code:
https://github.com/Perl/perl5/blob/v5.27.7/perl.h#L3335-L3344

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22  3:36         ` Jeff King
@ 2018-08-22 11:11           ` Derrick Stolee
  0 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-08-22 11:11 UTC (permalink / raw)
  To: Jeff King, brian m. carlson, Junio C Hamano, git

On 8/21/2018 11:36 PM, Jeff King wrote:
> On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
>
>> with the obvious "oideq()" implementation added, that seems to get me to
>> 2-3%. Not _quite_ as good as the original branching version I showed.
>> And we had to touch all the callsites (although arguably that kind of
>> "eq" function is a better interface anyway, since it obviously allows
>> for more optimization.
>>
>> So maybe the branching thing is actually not so insane. It makes new
>> hash_algo's Just Work; they just won't be optimized. And the change is
>> very localized.
> Hmph. So I went back to double-check my measurements on that branching
> version, and I couldn't replicate it!
I'm actually relieved to see this, as I couldn't either.
>
> It turns out what I showed (and measured) before has a bug. Can you see
> it?
I had rewritten the section from scratch instead of applying your diff, 
so I didn't get the sha1-sha1 error. I decided to sleep on it instead of 
sending my email.

> So the assert() version really is the fastest. I didn't test, but I
> suspect we could "trick" the compiler by having the fallback call an
> opaque wrapper around memcmp(). That would prevent it from combining the
> two paths, and presumably it would still optimize the constant-20 side.
> Or maybe it would eventually decide our inline function is getting too
> big and scrap it. Which probably crosses a line of craziness (if I
> didn't already cross it two emails ago).
I appreciate your effort here.

Thanks
-Stolee

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22  7:39             ` Ævar Arnfjörð Bjarmason
@ 2018-08-22 11:14               ` Derrick Stolee
  2018-08-22 15:17                 ` Jeff King
  2018-08-22 15:14               ` Jeff King
  1 sibling, 1 reply; 58+ messages in thread
From: Derrick Stolee @ 2018-08-22 11:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Jeff King
  Cc: brian m. carlson, Junio C Hamano, Git Mailing List


On 8/22/2018 3:39 AM, Ævar Arnfjörð Bjarmason wrote:
> On Wed, Aug 22, 2018 at 8:20 AM Jeff King <peff@peff.net> wrote:
>> On Wed, Aug 22, 2018 at 05:36:26AM +0000, brian m. carlson wrote:
>>
>>> On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
>>> I don't know if something like this is an improvement or now, but this
>>> seems to at least compile:
>>>
>>> diff --git a/cache.h b/cache.h
>>> index 1398b2a4e4..3207f74771 100644
>>> --- a/cache.h
>>> +++ b/cache.h
>>> @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
>>>
>>>   static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>>>   {
>>> -     return memcmp(sha1, sha2, the_hash_algo->rawsz);
>>> +     switch (the_hash_algo->rawsz) {
>>> +             case 20:
>>> +             case 32:
>>> +                     return memcmp(sha1, sha2, the_hash_algo->rawsz);
>>> +             default:
>>> +                     assert(0);
>>> +     }
>> I think that would end up with the same slow code, as gcc would rather
>> call memcmp than expand out the two sets of asm.
>>
>>> I won't have time to sit down and test this out until tomorrow afternoon
>>> at the earliest.  If you want to send in something in the mean time,
>>> even if that limits things to just 20 for now, that's fine.
>> I don't have a good option. The assert() thing works until I add in the
>> "32" branch, but that's just punting the issue off until you add support
>> for the new hash.
>>
>> Hand-rolling our own asm or C is a portability headache, and we need to
>> change all of the callsites to use a new hasheq().
>>
>> Hiding it behind a per-hash function is conceptually cleanest, but not
>> quite as fast. And it also requires hasheq().
>>
>> So all of the solutions seem non-trivial.  Again, I'm starting to wonder
>> if it's worth chasing this few percent.
> Did you try __builtin_expect? It's a GCC builtin for these sorts of
> situations, and sometimes helps:
> https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
>
> I.e. you'd tell GCC we expect to have the 20 there with:
>
>      if (__builtin_expect(the_hash_algo->rawsz == 20, 1)) { ... }
>
> The perl codebase has LIKELY() and UNLIKELY() macros for this which if
> the feature isn't available fall back on just plain C code:
> https://github.com/Perl/perl5/blob/v5.27.7/perl.h#L3335-L3344
The other thing I was going to recommend (and I'll try to test this out 
myself later) is to see if 'the_hash_algo->rawsz' is being treated as a 
volatile variable, since it is being referenced through a pointer. 
Perhaps storing the value locally and then casing on it would help?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22  3:03       ` Jeff King
  2018-08-22  3:36         ` Jeff King
  2018-08-22  5:36         ` brian m. carlson
@ 2018-08-22 12:42         ` Paul Smith
  2018-08-22 15:23           ` Jeff King
  2 siblings, 1 reply; 58+ messages in thread
From: Paul Smith @ 2018-08-22 12:42 UTC (permalink / raw)
  To: Jeff King, git

On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:
>  static inline int hashcmp(const unsigned char *sha1, const unsigned
> char *sha2)
>  {
> +       assert(the_hash_algo->rawsz == 20);
>         return memcmp(sha1, sha2, the_hash_algo->rawsz);
>  }

I'm not familiar with Git code, but for most environments assert() is a
macro which is compiled out when built for "release mode" (whatever
that might mean).  If that's the case for Git too, then relying on
assert() to provide a side-effect (even an optimizer hint side-effect)
won't work and this will actually get slower when built for "release
mode".

Just a thought...

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22  5:36         ` brian m. carlson
  2018-08-22  6:07           ` Jeff King
@ 2018-08-22 14:28           ` Derrick Stolee
  2018-08-22 15:24             ` Jeff King
  1 sibling, 1 reply; 58+ messages in thread
From: Derrick Stolee @ 2018-08-22 14:28 UTC (permalink / raw)
  To: brian m. carlson, Jeff King, Junio C Hamano, git

On 8/22/2018 1:36 AM, brian m. carlson wrote:
> On Tue, Aug 21, 2018 at 11:03:44PM -0400, Jeff King wrote:
>> So I wonder if there's some other way to tell the compiler that we'll
>> only have a few values. An enum comes to mind, though I don't think the
>> enum rules are strict enough to make this guarantee (after all, it's OK
>> to bitwise-OR enums, so they clearly don't specify all possible values).
> I was thinking about this:
>
> diff --git a/cache.h b/cache.h
> index 1398b2a4e4..1f5c6e9319 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1033,7 +1033,14 @@ extern const struct object_id null_oid;
>   
>   static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>   {
> -	return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +	switch (the_hash_algo->rawsz) {
> +		case 20:
> +			return memcmp(sha1, sha2, 20);
> +		case 32:
> +			return memcmp(sha1, sha2, 32);
> +		default:
> +			assert(0);
> +	}
>   }
>   
>   static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
>
> That would make it obvious that there are at most two options.
> Unfortunately, gcc for me determines that the buffer in walker.c is 20
> bytes in size and steadfastly refuses to compile because it doesn't know
> that the value will never be 32 in our codebase currently.  I'd need to
> send in more patches before it would compile.
>
> I don't know if something like this is an improvement or now, but this
> seems to at least compile:
>
> diff --git a/cache.h b/cache.h
> index 1398b2a4e4..3207f74771 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1033,7 +1033,13 @@ extern const struct object_id null_oid;
>   
>   static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>   {
> -	return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +	switch (the_hash_algo->rawsz) {
> +		case 20:
> +		case 32:
> +			return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +		default:
> +			assert(0);
> +	}
>   }
In my testing, I've had the best luck with this change:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..6c8b51c390 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,7 +1023,14 @@ extern const struct object_id null_oid;

  static inline int hashcmp(const unsigned char *sha1, const unsigned 
char *sha2)
  {
-       return memcmp(sha1, sha2, the_hash_algo->rawsz);
+       switch (the_hash_algo->rawsz) {
+               case 20:
+                       return memcmp(sha1, sha2, 20);
+               case 32:
+                       return memcmp(sha1, sha2, 32);
+               default:
+                       assert(0);
+       }
  }

The fact that '20' and '32' are constants here may be helpful to the 
compiler. Can someone else test the perf?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22  7:39             ` Ævar Arnfjörð Bjarmason
  2018-08-22 11:14               ` Derrick Stolee
@ 2018-08-22 15:14               ` Jeff King
  1 sibling, 0 replies; 58+ messages in thread
From: Jeff King @ 2018-08-22 15:14 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: brian m. carlson, Derrick Stolee, Junio C Hamano, Git Mailing List

On Wed, Aug 22, 2018 at 09:39:57AM +0200, Ævar Arnfjörð Bjarmason wrote:

> > I don't have a good option. The assert() thing works until I add in the
> > "32" branch, but that's just punting the issue off until you add support
> > for the new hash.
> >
> > Hand-rolling our own asm or C is a portability headache, and we need to
> > change all of the callsites to use a new hasheq().
> >
> > Hiding it behind a per-hash function is conceptually cleanest, but not
> > quite as fast. And it also requires hasheq().
> >
> > So all of the solutions seem non-trivial.  Again, I'm starting to wonder
> > if it's worth chasing this few percent.
> 
> Did you try __builtin_expect? It's a GCC builtin for these sorts of
> situations, and sometimes helps:
> https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
> 
> I.e. you'd tell GCC we expect to have the 20 there with:
> 
>     if (__builtin_expect(the_hash_algo->rawsz == 20, 1)) { ... }
> 
> The perl codebase has LIKELY() and UNLIKELY() macros for this which if
> the feature isn't available fall back on just plain C code:
> https://github.com/Perl/perl5/blob/v5.27.7/perl.h#L3335-L3344

Sadly, no, this doesn't seem to change anything. We still end up with a
single call to memcmp.

I also tried "hiding" the fallback call like this:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..7808bf3d6b 100644
--- a/cache.h
+++ b/cache.h
@@ -1021,9 +1021,13 @@ extern int find_unique_abbrev_r(char *hex, const struct object_id *oid, int len)
 extern const unsigned char null_sha1[GIT_MAX_RAWSZ];
 extern const struct object_id null_oid;
 
+int super_secret_memcmp(const void *a, const void *b, size_t len);
+
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
-	return memcmp(sha1, sha2, the_hash_algo->rawsz);
+	if (the_hash_algo->rawsz == 20)
+		return memcmp(sha1, sha2, 20);
+	return super_secret_memcmp(sha1, sha2, the_hash_algo->rawsz);
 }
 
 static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
diff --git a/sha1-file.c b/sha1-file.c
index 97b7423848..5cd0a4b73f 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -2280,3 +2280,8 @@ int read_loose_object(const char *path,
 		munmap(map, mapsize);
 	return ret;
 }
+
+int super_secret_memcmp(const void *a, const void *b, size_t len)
+{
+	return memcmp(a, b, len);
+}

but that just results in calling memcmp and super_secret_memcmp on the
two codepaths (with or without the __builtin_expect).

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22 11:14               ` Derrick Stolee
@ 2018-08-22 15:17                 ` Jeff King
  2018-08-22 16:08                   ` Duy Nguyen
  0 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-22 15:17 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Ævar Arnfjörð Bjarmason, brian m. carlson,
	Junio C Hamano, Git Mailing List

On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:

> The other thing I was going to recommend (and I'll try to test this out
> myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> volatile variable, since it is being referenced through a pointer. Perhaps
> storing the value locally and then casing on it would help?

I tried various sprinkling of "const" around the declarations to make it
clear that the values wouldn't change once we saw them. But I couldn't
detect any difference. At most I think that would let us hoist the "if"
out of the loop, but gcc still seems unwilling to expand the memcmp when
there are other branches.

I think if that's the thing we want to have happen, we really do need to
just write it out on that branch rather than saying "memcmp".

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22 12:42         ` Paul Smith
@ 2018-08-22 15:23           ` Jeff King
  2018-08-23  1:23             ` Jonathan Nieder
  0 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-22 15:23 UTC (permalink / raw)
  To: Paul Smith; +Cc: git

On Wed, Aug 22, 2018 at 08:42:20AM -0400, Paul Smith wrote:

> On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:
> >  static inline int hashcmp(const unsigned char *sha1, const unsigned
> > char *sha2)
> >  {
> > +       assert(the_hash_algo->rawsz == 20);
> >         return memcmp(sha1, sha2, the_hash_algo->rawsz);
> >  }
> 
> I'm not familiar with Git code, but for most environments assert() is a
> macro which is compiled out when built for "release mode" (whatever
> that might mean).  If that's the case for Git too, then relying on
> assert() to provide a side-effect (even an optimizer hint side-effect)
> won't work and this will actually get slower when built for "release
> mode".
> 
> Just a thought...

We don't have such a "release mode" in Git, though of course people may
pass -DNDEBUG to the compiler if they want.

However, to me how we spell the assert is mostly orthogonal to the
discussion. We can do "if (...) BUG(...)" to get a guaranteed-present
conditional. The bigger questions are:

  - are we OK with such an assertion; and

  - does the assertion still give us the desired behavior when we add in
    a branch for rawsz==32?

And I think the answers for those are both "probably not".

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22 14:28           ` Derrick Stolee
@ 2018-08-22 15:24             ` Jeff King
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff King @ 2018-08-22 15:24 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: brian m. carlson, Junio C Hamano, git

On Wed, Aug 22, 2018 at 10:28:56AM -0400, Derrick Stolee wrote:

> In my testing, I've had the best luck with this change:
> 
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..6c8b51c390 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,7 +1023,14 @@ extern const struct object_id null_oid;
> 
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char
> *sha2)
>  {
> -       return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +       switch (the_hash_algo->rawsz) {
> +               case 20:
> +                       return memcmp(sha1, sha2, 20);
> +               case 32:
> +                       return memcmp(sha1, sha2, 32);
> +               default:
> +                       assert(0);
> +       }
>  }
> 
> The fact that '20' and '32' are constants here may be helpful to the
> compiler. Can someone else test the perf?

I tested that one last night (and just re-tested it now to be sure). It
seems to just generate two separate calls to memcmp, with no speed
improvement.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22 15:17                 ` Jeff King
@ 2018-08-22 16:08                   ` Duy Nguyen
  2018-08-22 16:14                     ` Duy Nguyen
  0 siblings, 1 reply; 58+ messages in thread
From: Duy Nguyen @ 2018-08-22 16:08 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	brian m. carlson, Junio C Hamano, Git Mailing List

On Wed, Aug 22, 2018 at 6:03 PM Jeff King <peff@peff.net> wrote:
>
> On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
>
> > The other thing I was going to recommend (and I'll try to test this out
> > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> > volatile variable, since it is being referenced through a pointer. Perhaps
> > storing the value locally and then casing on it would help?
>
> I tried various sprinkling of "const" around the declarations to make it
> clear that the values wouldn't change once we saw them. But I couldn't
> detect any difference. At most I think that would let us hoist the "if"
> out of the loop, but gcc still seems unwilling to expand the memcmp when
> there are other branches.
>
> I think if that's the thing we want to have happen, we really do need to
> just write it out on that branch rather than saying "memcmp".

This reminds me of an old discussion about memcpy() vs doing explicit
compare loop with lots of performance measurements.. Is that what you
meant by "write it out"?
-- 
Duy

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22 16:08                   ` Duy Nguyen
@ 2018-08-22 16:14                     ` Duy Nguyen
  2018-08-22 16:26                       ` Jeff King
  0 siblings, 1 reply; 58+ messages in thread
From: Duy Nguyen @ 2018-08-22 16:14 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	brian m. carlson, Junio C Hamano, Git Mailing List

On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Wed, Aug 22, 2018 at 6:03 PM Jeff King <peff@peff.net> wrote:
> >
> > On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
> >
> > > The other thing I was going to recommend (and I'll try to test this out
> > > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> > > volatile variable, since it is being referenced through a pointer. Perhaps
> > > storing the value locally and then casing on it would help?
> >
> > I tried various sprinkling of "const" around the declarations to make it
> > clear that the values wouldn't change once we saw them. But I couldn't
> > detect any difference. At most I think that would let us hoist the "if"
> > out of the loop, but gcc still seems unwilling to expand the memcmp when
> > there are other branches.
> >
> > I think if that's the thing we want to have happen, we really do need to
> > just write it out on that branch rather than saying "memcmp".
>
> This reminds me of an old discussion about memcpy() vs doing explicit
> compare loop with lots of performance measurements..

Ah found it. Not sure if it is still relevant in light of multiple hash support

https://public-inbox.org/git/20110427225114.GA16765@elte.hu/
-- 
Duy

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22 16:14                     ` Duy Nguyen
@ 2018-08-22 16:26                       ` Jeff King
  2018-08-22 16:49                         ` Derrick Stolee
  0 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-22 16:26 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Derrick Stolee, Ævar Arnfjörð Bjarmason,
	brian m. carlson, Junio C Hamano, Git Mailing List

On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote:

> On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Wed, Aug 22, 2018 at 6:03 PM Jeff King <peff@peff.net> wrote:
> > >
> > > On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
> > >
> > > > The other thing I was going to recommend (and I'll try to test this out
> > > > myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> > > > volatile variable, since it is being referenced through a pointer. Perhaps
> > > > storing the value locally and then casing on it would help?
> > >
> > > I tried various sprinkling of "const" around the declarations to make it
> > > clear that the values wouldn't change once we saw them. But I couldn't
> > > detect any difference. At most I think that would let us hoist the "if"
> > > out of the loop, but gcc still seems unwilling to expand the memcmp when
> > > there are other branches.
> > >
> > > I think if that's the thing we want to have happen, we really do need to
> > > just write it out on that branch rather than saying "memcmp".
> >
> > This reminds me of an old discussion about memcpy() vs doing explicit
> > compare loop with lots of performance measurements..
> 
> Ah found it. Not sure if it is still relevant in light of multiple hash support
> 
> https://public-inbox.org/git/20110427225114.GA16765@elte.hu/

Yes, that was what I meant. We actually did switch to that hand-rolled
loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
memcmp instead of open-coded loop, 2017-08-09).

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22 16:26                       ` Jeff King
@ 2018-08-22 16:49                         ` Derrick Stolee
  2018-08-22 16:58                           ` Duy Nguyen
  2018-08-22 16:59                           ` Jeff King
  0 siblings, 2 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-08-22 16:49 UTC (permalink / raw)
  To: Jeff King, Duy Nguyen
  Cc: Ævar Arnfjörð Bjarmason, brian m. carlson,
	Junio C Hamano, Git Mailing List

On 8/22/2018 12:26 PM, Jeff King wrote:
> On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote:
>
>> On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen <pclouds@gmail.com> wrote:
>>> On Wed, Aug 22, 2018 at 6:03 PM Jeff King <peff@peff.net> wrote:
>>>> On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
>>>>
>>>>> The other thing I was going to recommend (and I'll try to test this out
>>>>> myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
>>>>> volatile variable, since it is being referenced through a pointer. Perhaps
>>>>> storing the value locally and then casing on it would help?
>>>> I tried various sprinkling of "const" around the declarations to make it
>>>> clear that the values wouldn't change once we saw them. But I couldn't
>>>> detect any difference. At most I think that would let us hoist the "if"
>>>> out of the loop, but gcc still seems unwilling to expand the memcmp when
>>>> there are other branches.
>>>>
>>>> I think if that's the thing we want to have happen, we really do need to
>>>> just write it out on that branch rather than saying "memcmp".
>>> This reminds me of an old discussion about memcpy() vs doing explicit
>>> compare loop with lots of performance measurements..
>> Ah found it. Not sure if it is still relevant in light of multiple hash support
>>
>> https://public-inbox.org/git/20110427225114.GA16765@elte.hu/
> Yes, that was what I meant. We actually did switch to that hand-rolled
> loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
> memcmp instead of open-coded loop, 2017-08-09).

Looking at that commit, I'm surprised the old logic was just a for loop, instead of a word-based approach, such as the following:

diff --git a/cache.h b/cache.h
index b1fd3d58ab..5e5819ad49 100644
--- a/cache.h
+++ b/cache.h
@@ -1021,9 +1021,41 @@ extern int find_unique_abbrev_r(char *hex, const 
struct object_id *oid, int len)
  extern const unsigned char null_sha1[GIT_MAX_RAWSZ];
  extern const struct object_id null_oid;

+static inline int word_cmp_32(uint32_t a, uint32_t b)
+{
+       return memcmp(&a, &b, sizeof(uint32_t));
+}
+
+static inline int word_cmp_64(uint64_t a, uint64_t b)
+{
+       return memcmp(&a, &b, sizeof(uint64_t));
+}
+
+struct object_id_20 {
+       uint64_t data0;
+       uint64_t data1;
+       uint32_t data2;
+};
+
  static inline int hashcmp(const unsigned char *sha1, const unsigned 
char *sha2)
  {
-       return memcmp(sha1, sha2, the_hash_algo->rawsz);
+       if (the_hash_algo->rawsz == 20) {
+               struct object_id_20 *obj1 = (struct object_id_20 *)sha1;
+               struct object_id_20 *obj2 = (struct object_id_20 *)sha2;
+
+               if (obj1->data0 == obj2->data0) {
+                       if (obj1->data1 == obj2->data1) {
+                               if (obj1->data2 == obj2->data2) {
+                                       return 0;
+                               }
+                               return word_cmp_32(obj1->data2, 
obj2->data2);
+                       }
+                       return word_cmp_64(obj1->data1, obj2->data1);
+               }
+               return word_cmp_64(obj1->data0, obj2->data0);
+       }
+
+       assert(0);
  }

  static inline int oidcmp(const struct object_id *oid1, const struct 
object_id *oid2)



^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22 16:49                         ` Derrick Stolee
@ 2018-08-22 16:58                           ` Duy Nguyen
  2018-08-22 17:04                             ` Derrick Stolee
  2018-08-22 16:59                           ` Jeff King
  1 sibling, 1 reply; 58+ messages in thread
From: Duy Nguyen @ 2018-08-22 16:58 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Jeff King, Ævar Arnfjörð Bjarmason,
	brian m. carlson, Junio C Hamano, Git Mailing List

On Wed, Aug 22, 2018 at 6:49 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 8/22/2018 12:26 PM, Jeff King wrote:
> > On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote:
> >
> >> On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen <pclouds@gmail.com> wrote:
> >>> On Wed, Aug 22, 2018 at 6:03 PM Jeff King <peff@peff.net> wrote:
> >>>> On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
> >>>>
> >>>>> The other thing I was going to recommend (and I'll try to test this out
> >>>>> myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
> >>>>> volatile variable, since it is being referenced through a pointer. Perhaps
> >>>>> storing the value locally and then casing on it would help?
> >>>> I tried various sprinkling of "const" around the declarations to make it
> >>>> clear that the values wouldn't change once we saw them. But I couldn't
> >>>> detect any difference. At most I think that would let us hoist the "if"
> >>>> out of the loop, but gcc still seems unwilling to expand the memcmp when
> >>>> there are other branches.
> >>>>
> >>>> I think if that's the thing we want to have happen, we really do need to
> >>>> just write it out on that branch rather than saying "memcmp".
> >>> This reminds me of an old discussion about memcpy() vs doing explicit
> >>> compare loop with lots of performance measurements..
> >> Ah found it. Not sure if it is still relevant in light of multiple hash support
> >>
> >> https://public-inbox.org/git/20110427225114.GA16765@elte.hu/
> > Yes, that was what I meant. We actually did switch to that hand-rolled
> > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
> > memcmp instead of open-coded loop, 2017-08-09).
>
> Looking at that commit, I'm surprised the old logic was just a for loop, instead of a word-based approach, such as the following:

Might work on x86 but it breaks on cpu architectures with stricter
alignment. I don't think we have a guarantee that object_id is always
8 byte aligned.

>
> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..5e5819ad49 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1021,9 +1021,41 @@ extern int find_unique_abbrev_r(char *hex, const
> struct object_id *oid, int len)
>   extern const unsigned char null_sha1[GIT_MAX_RAWSZ];
>   extern const struct object_id null_oid;
>
> +static inline int word_cmp_32(uint32_t a, uint32_t b)
> +{
> +       return memcmp(&a, &b, sizeof(uint32_t));
> +}
> +
> +static inline int word_cmp_64(uint64_t a, uint64_t b)
> +{
> +       return memcmp(&a, &b, sizeof(uint64_t));
> +}
> +
> +struct object_id_20 {
> +       uint64_t data0;
> +       uint64_t data1;
> +       uint32_t data2;
> +};
> +
>   static inline int hashcmp(const unsigned char *sha1, const unsigned
> char *sha2)
>   {
> -       return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +       if (the_hash_algo->rawsz == 20) {
> +               struct object_id_20 *obj1 = (struct object_id_20 *)sha1;
> +               struct object_id_20 *obj2 = (struct object_id_20 *)sha2;
> +
> +               if (obj1->data0 == obj2->data0) {
> +                       if (obj1->data1 == obj2->data1) {
> +                               if (obj1->data2 == obj2->data2) {
> +                                       return 0;
> +                               }
> +                               return word_cmp_32(obj1->data2,
> obj2->data2);
> +                       }
> +                       return word_cmp_64(obj1->data1, obj2->data1);
> +               }
> +               return word_cmp_64(obj1->data0, obj2->data0);
> +       }
> +
> +       assert(0);
>   }
>
>   static inline int oidcmp(const struct object_id *oid1, const struct
> object_id *oid2)
>
>


-- 
Duy

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22 16:49                         ` Derrick Stolee
  2018-08-22 16:58                           ` Duy Nguyen
@ 2018-08-22 16:59                           ` Jeff King
  2018-08-22 17:02                             ` Junio C Hamano
  1 sibling, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-22 16:59 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Duy Nguyen, Ævar Arnfjörð Bjarmason,
	brian m. carlson, Junio C Hamano, Git Mailing List

On Wed, Aug 22, 2018 at 12:49:34PM -0400, Derrick Stolee wrote:

> > Yes, that was what I meant. We actually did switch to that hand-rolled
> > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
> > memcmp instead of open-coded loop, 2017-08-09).
> 
> Looking at that commit, I'm surprised the old logic was just a for
> loop, instead of a word-based approach, such as the following:
> [...]
> +struct object_id_20 {
> +       uint64_t data0;
> +       uint64_t data1;
> +       uint32_t data2;
> +};
> +
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char
> *sha2)
>  {
> -       return memcmp(sha1, sha2, the_hash_algo->rawsz);
> +       if (the_hash_algo->rawsz == 20) {
> +               struct object_id_20 *obj1 = (struct object_id_20 *)sha1;
> +               struct object_id_20 *obj2 = (struct object_id_20 *)sha2;

I wonder if you're potentially running afoul of alignment requirements
here.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22 16:59                           ` Jeff King
@ 2018-08-22 17:02                             ` Junio C Hamano
  0 siblings, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2018-08-22 17:02 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee, Duy Nguyen,
	Ævar Arnfjörð Bjarmason, brian m. carlson,
	Git Mailing List

Jeff King <peff@peff.net> writes:

> On Wed, Aug 22, 2018 at 12:49:34PM -0400, Derrick Stolee wrote:
>
>> > Yes, that was what I meant. We actually did switch to that hand-rolled
>> > loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
>> > memcmp instead of open-coded loop, 2017-08-09).
>> 
>> Looking at that commit, I'm surprised the old logic was just a for
>> loop, instead of a word-based approach, such as the following:
>> [...]
>> +struct object_id_20 {
>> +       uint64_t data0;
>> +       uint64_t data1;
>> +       uint32_t data2;
>> +};
>> +
>>  static inline int hashcmp(const unsigned char *sha1, const unsigned char
>> *sha2)
>>  {
>> -       return memcmp(sha1, sha2, the_hash_algo->rawsz);
>> +       if (the_hash_algo->rawsz == 20) {
>> +               struct object_id_20 *obj1 = (struct object_id_20 *)sha1;
>> +               struct object_id_20 *obj2 = (struct object_id_20 *)sha2;
>
> I wonder if you're potentially running afoul of alignment requirements
> here.

Yup, and I think that all was discussed in that old thread ;-)

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22 16:58                           ` Duy Nguyen
@ 2018-08-22 17:04                             ` Derrick Stolee
  0 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-08-22 17:04 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Jeff King, Ævar Arnfjörð Bjarmason,
	brian m. carlson, Junio C Hamano, Git Mailing List

On 8/22/2018 12:58 PM, Duy Nguyen wrote:
> On Wed, Aug 22, 2018 at 6:49 PM Derrick Stolee <stolee@gmail.com> wrote:
>> On 8/22/2018 12:26 PM, Jeff King wrote:
>>> On Wed, Aug 22, 2018 at 06:14:24PM +0200, Duy Nguyen wrote:
>>>
>>>> On Wed, Aug 22, 2018 at 6:08 PM Duy Nguyen <pclouds@gmail.com> wrote:
>>>>> On Wed, Aug 22, 2018 at 6:03 PM Jeff King <peff@peff.net> wrote:
>>>>>> On Wed, Aug 22, 2018 at 07:14:42AM -0400, Derrick Stolee wrote:
>>>>>>
>>>>>>> The other thing I was going to recommend (and I'll try to test this out
>>>>>>> myself later) is to see if 'the_hash_algo->rawsz' is being treated as a
>>>>>>> volatile variable, since it is being referenced through a pointer. Perhaps
>>>>>>> storing the value locally and then casing on it would help?
>>>>>> I tried various sprinkling of "const" around the declarations to make it
>>>>>> clear that the values wouldn't change once we saw them. But I couldn't
>>>>>> detect any difference. At most I think that would let us hoist the "if"
>>>>>> out of the loop, but gcc still seems unwilling to expand the memcmp when
>>>>>> there are other branches.
>>>>>>
>>>>>> I think if that's the thing we want to have happen, we really do need to
>>>>>> just write it out on that branch rather than saying "memcmp".
>>>>> This reminds me of an old discussion about memcpy() vs doing explicit
>>>>> compare loop with lots of performance measurements..
>>>> Ah found it. Not sure if it is still relevant in light of multiple hash support
>>>>
>>>> https://public-inbox.org/git/20110427225114.GA16765@elte.hu/
>>> Yes, that was what I meant. We actually did switch to that hand-rolled
>>> loop, but later we went back to memcmp in 0b006014c8 (hashcmp: use
>>> memcmp instead of open-coded loop, 2017-08-09).
>> Looking at that commit, I'm surprised the old logic was just a for loop, instead of a word-based approach, such as the following:
> Might work on x86 but it breaks on cpu architectures with stricter
> alignment. I don't think we have a guarantee that object_id is always
> 8 byte aligned.
You (and Peff) are probably correct here, which is unfortunate. I'm not 
familiar with alignment constraints, but assume that such a word-based 
approach is best.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-22 15:23           ` Jeff King
@ 2018-08-23  1:23             ` Jonathan Nieder
  2018-08-23  2:16               ` Jeff King
  0 siblings, 1 reply; 58+ messages in thread
From: Jonathan Nieder @ 2018-08-23  1:23 UTC (permalink / raw)
  To: Jeff King
  Cc: Paul Smith, git, Derrick Stolee, Duy Nguyen,
	Ævar Arnfjörð Bjarmason, brian m. carlson

Hi,

Jeff King wrote:
>> On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:

>>>  static inline int hashcmp(const unsigned char *sha1, const unsigned
>>> char *sha2)
>>>  {
>>> +       assert(the_hash_algo->rawsz == 20);
>>>         return memcmp(sha1, sha2, the_hash_algo->rawsz);
>>>  }
[...]
>              The bigger questions are:
>
>   - are we OK with such an assertion; and
>
>   - does the assertion still give us the desired behavior when we add in
>     a branch for rawsz==32?
>
> And I think the answers for those are both "probably not".

At this point in the release process, I think the answer to the first
question is a pretty clear "yes".

A ~10% increase in latency of some operations is quite significant, in
exchange for no user benefit yet.  We can continue to try to figure
out how to convince compilers to generate good code for this (and
that's useful), but in the meantime we should also do the simple thing
to avoid the regression for users.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23  1:23             ` Jonathan Nieder
@ 2018-08-23  2:16               ` Jeff King
  2018-08-23  2:27                 ` Jonathan Nieder
  2018-08-23  3:47                 ` brian m. carlson
  0 siblings, 2 replies; 58+ messages in thread
From: Jeff King @ 2018-08-23  2:16 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Paul Smith, git, Derrick Stolee, Duy Nguyen,
	Ævar Arnfjörð Bjarmason, brian m. carlson

On Wed, Aug 22, 2018 at 06:23:43PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> >> On Tue, 2018-08-21 at 23:03 -0400, Jeff King wrote:
> 
> >>>  static inline int hashcmp(const unsigned char *sha1, const unsigned
> >>> char *sha2)
> >>>  {
> >>> +       assert(the_hash_algo->rawsz == 20);
> >>>         return memcmp(sha1, sha2, the_hash_algo->rawsz);
> >>>  }
> [...]
> >              The bigger questions are:
> >
> >   - are we OK with such an assertion; and
> >
> >   - does the assertion still give us the desired behavior when we add in
> >     a branch for rawsz==32?
> >
> > And I think the answers for those are both "probably not".
> 
> At this point in the release process, I think the answer to the first
> question is a pretty clear "yes".
> 
> A ~10% increase in latency of some operations is quite significant, in
> exchange for no user benefit yet.  We can continue to try to figure
> out how to convince compilers to generate good code for this (and
> that's useful), but in the meantime we should also do the simple thing
> to avoid the regression for users.

FWIW, it's not 10%. The best I measured was ~4% on a very
hashcmp-limited operation, and I suspect even that may be highly
dependent on the compiler. We might be able to improve more by
sprinkling more asserts around, but there are 75 mentions of
the_hash_algo->rawsz. I wouldn't want to an assert at each one.

I don't mind doing one or a handful of these asserts as part of v2.19 if
we want to try to reclaim those few percent. But I suspect the very
first commit in any further hash-transition work is just going to be to
rip them all out.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23  2:16               ` Jeff King
@ 2018-08-23  2:27                 ` Jonathan Nieder
  2018-08-23  5:02                   ` Jeff King
  2018-08-23  3:47                 ` brian m. carlson
  1 sibling, 1 reply; 58+ messages in thread
From: Jonathan Nieder @ 2018-08-23  2:27 UTC (permalink / raw)
  To: Jeff King
  Cc: Paul Smith, git, Derrick Stolee, Duy Nguyen,
	Ævar Arnfjörð Bjarmason, brian m. carlson

Jeff King wrote:

> FWIW, it's not 10%. The best I measured was ~4% on a very
> hashcmp-limited operation, and I suspect even that may be highly
> dependent on the compiler. We might be able to improve more by
> sprinkling more asserts around, but there are 75 mentions of
> the_hash_algo->rawsz. I wouldn't want to an assert at each one.
>
> I don't mind doing one or a handful of these asserts as part of v2.19 if
> we want to try to reclaim those few percent. But I suspect the very
> first commit in any further hash-transition work is just going to be to
> rip them all out.

I was thinking just hashcmp and hashcpy.

Ideally such a change would come with a performance test to help the
person writing that very first commit.  Except we already have
performance tests that capture this. ;-)

For further hash-transition work, I agree someone may want to revert
this, and I don't mind such a revert appearing right away in "next".
And it's possible that we might have to do the equivalent of manual
template expansion to recover the performance in some
performance-sensitive areas.  Maybe we can get the compiler to
cooperate with us in that and maybe we can't.  That's okay with me.

Anyway, I'll resend your patch with a commit message added some time
this evening.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23  2:16               ` Jeff King
  2018-08-23  2:27                 ` Jonathan Nieder
@ 2018-08-23  3:47                 ` brian m. carlson
  2018-08-23  5:04                   ` Jeff King
  1 sibling, 1 reply; 58+ messages in thread
From: brian m. carlson @ 2018-08-23  3:47 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Paul Smith, git, Derrick Stolee, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 1372 bytes --]

On Wed, Aug 22, 2018 at 10:16:18PM -0400, Jeff King wrote:
> FWIW, it's not 10%. The best I measured was ~4% on a very
> hashcmp-limited operation, and I suspect even that may be highly
> dependent on the compiler. We might be able to improve more by
> sprinkling more asserts around, but there are 75 mentions of
> the_hash_algo->rawsz. I wouldn't want to an assert at each one.
> 
> I don't mind doing one or a handful of these asserts as part of v2.19 if
> we want to try to reclaim those few percent. But I suspect the very
> first commit in any further hash-transition work is just going to be to
> rip them all out.

I expect that's going to be the case as well.  I have patches that
wire up actual SHA-256 support in my hash-impl branch.

However, having said that, I'm happy to defer to whatever everyone else
thinks is best for 2.19.  The assert solution would be fine with me in
this situation, and if we need to pull it out in the future, that's okay
with me.

I don't really have a strong opinion on this either way, so if someone
else does, please say so.  I have somewhat more limited availability
over the next couple days, as I'm travelling on business, but I'm happy
to review a patch (and it seems like Peff has one minus the actual
commit message).
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23  2:27                 ` Jonathan Nieder
@ 2018-08-23  5:02                   ` Jeff King
  2018-08-23  5:09                     ` brian m. carlson
                                       ` (2 more replies)
  0 siblings, 3 replies; 58+ messages in thread
From: Jeff King @ 2018-08-23  5:02 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Paul Smith, git, Derrick Stolee, Duy Nguyen,
	Ævar Arnfjörð Bjarmason, brian m. carlson

On Wed, Aug 22, 2018 at 07:27:56PM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> 
> > FWIW, it's not 10%. The best I measured was ~4% on a very
> > hashcmp-limited operation, and I suspect even that may be highly
> > dependent on the compiler. We might be able to improve more by
> > sprinkling more asserts around, but there are 75 mentions of
> > the_hash_algo->rawsz. I wouldn't want to an assert at each one.
> >
> > I don't mind doing one or a handful of these asserts as part of v2.19 if
> > we want to try to reclaim those few percent. But I suspect the very
> > first commit in any further hash-transition work is just going to be to
> > rip them all out.
> 
> I was thinking just hashcmp and hashcpy.
> 
> Ideally such a change would come with a performance test to help the
> person writing that very first commit.  Except we already have
> performance tests that capture this. ;-)
> 
> For further hash-transition work, I agree someone may want to revert
> this, and I don't mind such a revert appearing right away in "next".
> And it's possible that we might have to do the equivalent of manual
> template expansion to recover the performance in some
> performance-sensitive areas.  Maybe we can get the compiler to
> cooperate with us in that and maybe we can't.  That's okay with me.
> 
> Anyway, I'll resend your patch with a commit message added some time
> this evening.

Here's the patch. For some reason my numbers aren't quite as large as
they were yesterday (I was very careful to keep the system unloaded
today, whereas yesterday I was doing a few other things, so perhaps that
is the difference).

-- >8 --
Subject: [PATCH] hashcmp: assert constant hash size

Prior to 509f6f62a4 (cache: update object ID functions for
the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a
constant size of 20 bytes. Some compilers were able to turn
that into a few quad-word comparisons, which is faster than
actually calling memcmp().

In 509f6f62a4, we started using the_hash_algo->rawsz
instead. Even though this will always be 20, the compiler
doesn't know that while inlining hashcmp() and ends up just
generating a call to memcmp().

Eventually we'll have to deal with multiple hash sizes, but
for the upcoming v2.19, we can restore some of the original
performance by asserting on the size. That gives the
compiler enough information to know that the memcmp will
always be called with a length of 20, and it performs the
same optimization.

Here are numbers for p0001.2 run against linux.git on a few
versions. This is using -O2 with gcc 8.2.0.

  Test     v2.18.0             v2.19.0-rc0               HEAD
  ------------------------------------------------------------------------------
  0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) -1.0%

You can see that v2.19 is a little slower than v2.18. This
commit ended up slightly faster than v2.18, but there's a
fair bit of run-to-run noise (the generated code in the two
cases is basically the same). This patch does seem to be
consistently 1-2% faster than v2.19.

I tried changing hashcpy(), which was also touched by
509f6f62a4, in the same way, but couldn't measure any
speedup. Which makes sense, at least for this workload. A
traversal of the whole commit graph requires looking up
every entry of every tree via lookup_object(). That's many
multiples of the numbers of objects in the repository (most
of the lookups just return "yes, we already saw that
object").

Reported-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/cache.h b/cache.h
index b1fd3d58ab..4d014541ab 100644
--- a/cache.h
+++ b/cache.h
@@ -1023,6 +1023,16 @@ extern const struct object_id null_oid;
 
 static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
 {
+	/*
+	 * This is a temporary optimization hack. By asserting the size here,
+	 * we let the compiler know that it's always going to be 20, which lets
+	 * it turn this fixed-size memcmp into a few inline instructions.
+	 *
+	 * This will need to be extended or ripped out when we learn about
+	 * hashes of different sizes.
+	 */
+	if (the_hash_algo->rawsz != 20)
+		BUG("hash size not yet supported by hashcmp");
 	return memcmp(sha1, sha2, the_hash_algo->rawsz);
 }
 
-- 
2.19.0.rc0.412.g7005db4e88


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23  3:47                 ` brian m. carlson
@ 2018-08-23  5:04                   ` Jeff King
  2018-08-23 10:26                     ` Derrick Stolee
  0 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-23  5:04 UTC (permalink / raw)
  To: brian m. carlson, Jonathan Nieder, Paul Smith, git,
	Derrick Stolee, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Thu, Aug 23, 2018 at 03:47:07AM +0000, brian m. carlson wrote:

> I expect that's going to be the case as well.  I have patches that
> wire up actual SHA-256 support in my hash-impl branch.
> 
> However, having said that, I'm happy to defer to whatever everyone else
> thinks is best for 2.19.  The assert solution would be fine with me in
> this situation, and if we need to pull it out in the future, that's okay
> with me.
> 
> I don't really have a strong opinion on this either way, so if someone
> else does, please say so.  I have somewhat more limited availability
> over the next couple days, as I'm travelling on business, but I'm happy
> to review a patch (and it seems like Peff has one minus the actual
> commit message).

I just posted the patch elsewhere in the thread. I think you can safely
ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
without some mitigation, I don't know that it's all that big a deal,
given the numbers I generated (which for some reason are less dramatic
than Stolee's).

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23  5:02                   ` Jeff King
@ 2018-08-23  5:09                     ` brian m. carlson
  2018-08-23  5:10                     ` Jonathan Nieder
  2018-08-23 13:20                     ` Junio C Hamano
  2 siblings, 0 replies; 58+ messages in thread
From: brian m. carlson @ 2018-08-23  5:09 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Paul Smith, git, Derrick Stolee, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

[-- Attachment #1: Type: text/plain, Size: 437 bytes --]

On Thu, Aug 23, 2018 at 01:02:25AM -0400, Jeff King wrote:
> Here's the patch. For some reason my numbers aren't quite as large as
> they were yesterday (I was very careful to keep the system unloaded
> today, whereas yesterday I was doing a few other things, so perhaps that
> is the difference).

This looks sane to me.  Thanks for writing this up.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23  5:02                   ` Jeff King
  2018-08-23  5:09                     ` brian m. carlson
@ 2018-08-23  5:10                     ` Jonathan Nieder
  2018-08-23 13:20                     ` Junio C Hamano
  2 siblings, 0 replies; 58+ messages in thread
From: Jonathan Nieder @ 2018-08-23  5:10 UTC (permalink / raw)
  To: Jeff King
  Cc: Paul Smith, git, Derrick Stolee, Duy Nguyen,
	Ævar Arnfjörð Bjarmason, brian m. carlson

Jeff King wrote:

> Here's the patch. For some reason my numbers aren't quite as large as
> they were yesterday (I was very careful to keep the system unloaded
> today, whereas yesterday I was doing a few other things, so perhaps that
> is the difference).
>
> -- >8 --
> Subject: [PATCH] hashcmp: assert constant hash size
>
> Prior to 509f6f62a4 (cache: update object ID functions for
> the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a
> constant size of 20 bytes. Some compilers were able to turn
> that into a few quad-word comparisons, which is faster than
> actually calling memcmp().
>
> In 509f6f62a4, we started using the_hash_algo->rawsz
> instead. Even though this will always be 20, the compiler
> doesn't know that while inlining hashcmp() and ends up just
> generating a call to memcmp().
>
> Eventually we'll have to deal with multiple hash sizes, but
> for the upcoming v2.19, we can restore some of the original
> performance by asserting on the size. That gives the
> compiler enough information to know that the memcmp will
> always be called with a length of 20, and it performs the
> same optimization.
>
> Here are numbers for p0001.2 run against linux.git on a few
> versions. This is using -O2 with gcc 8.2.0.
>
>   Test     v2.18.0             v2.19.0-rc0               HEAD
>   ------------------------------------------------------------------------------
>   0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) -1.0%
>
> You can see that v2.19 is a little slower than v2.18. This
> commit ended up slightly faster than v2.18, but there's a
> fair bit of run-to-run noise (the generated code in the two
> cases is basically the same). This patch does seem to be
> consistently 1-2% faster than v2.19.
>
> I tried changing hashcpy(), which was also touched by
> 509f6f62a4, in the same way, but couldn't measure any
> speedup. Which makes sense, at least for this workload. A
> traversal of the whole commit graph requires looking up
> every entry of every tree via lookup_object(). That's many
> multiples of the numbers of objects in the repository (most
> of the lookups just return "yes, we already saw that
> object").
>
> Reported-by: Derrick Stolee <stolee@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  cache.h | 10 ++++++++++
>  1 file changed, 10 insertions(+)

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

Verified using "make object.s" that the memcmp call goes away.  Thank
you.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23  5:04                   ` Jeff King
@ 2018-08-23 10:26                     ` Derrick Stolee
  2018-08-23 13:16                       ` Junio C Hamano
                                         ` (3 more replies)
  0 siblings, 4 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-08-23 10:26 UTC (permalink / raw)
  To: Jeff King, brian m. carlson, Jonathan Nieder, Paul Smith, git,
	Duy Nguyen, Ævar Arnfjörð Bjarmason

On 8/23/2018 1:04 AM, Jeff King wrote:
> On Thu, Aug 23, 2018 at 03:47:07AM +0000, brian m. carlson wrote:
>
>> I expect that's going to be the case as well.  I have patches that
>> wire up actual SHA-256 support in my hash-impl branch.
>>
>> However, having said that, I'm happy to defer to whatever everyone else
>> thinks is best for 2.19.  The assert solution would be fine with me in
>> this situation, and if we need to pull it out in the future, that's okay
>> with me.
>>
>> I don't really have a strong opinion on this either way, so if someone
>> else does, please say so.  I have somewhat more limited availability
>> over the next couple days, as I'm travelling on business, but I'm happy
>> to review a patch (and it seems like Peff has one minus the actual
>> commit message).
> I just posted the patch elsewhere in the thread.

Thank you for that!

> I think you can safely
> ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
> without some mitigation, I don't know that it's all that big a deal,
> given the numbers I generated (which for some reason are less dramatic
> than Stolee's).
My numbers may be more dramatic because my Linux environment is a 
virtual machine.

I was thinking that having a mitigation for 2.19 is best, and then we 
can focus as part of the 2.20 cycle how we can properly avoid this cost, 
especially when 32 is a valid option.

Around the time that my proposed approaches were getting vetoed for 
alignment issues, I figured I was out of my depth here. I reached out to 
Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of 
posts of word-based approaches to different problems, so I thought he 
might know something off the top of his head that would be applicable. 
His conclusion (after looking only a short time) was to take a 'hasheq' 
approach [2] like Peff suggested [3]. Since that requires auditing all 
callers of hashcmp to see if hasheq is appropriate, it is not a good 
solution for 2.19 but (in my opinion) should be evaluated as part of the 
2.20 cycle.

Of course, if someone with knowledge of word-alignment issues across the 
platforms we support knows how to enforce an alignment for object_id, 
then something word-based like [4] could be reconsidered.

Thanks, everyone!
-Stolee

[1] https://twitter.com/stolee/status/1032312965754748930

[2] 
https://lemire.me/blog/2018/08/22/avoid-lexicographical-comparisons-when-testing-for-string-equality/

[3] 
https://public-inbox.org/git/20180822030344.GA14684@sigill.intra.peff.net/

[4] 
https://public-inbox.org/git/7ea416cf-b043-1274-e161-85a8780b8e1c@gmail.com/

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23 10:26                     ` Derrick Stolee
@ 2018-08-23 13:16                       ` Junio C Hamano
  2018-08-23 16:14                       ` Jeff King
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2018-08-23 13:16 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Jeff King, brian m. carlson, Jonathan Nieder, Paul Smith, git,
	Duy Nguyen, Ævar Arnfjörð Bjarmason

Derrick Stolee <stolee@gmail.com> writes:

> I was thinking that having a mitigation for 2.19 is best, and then we
> can focus as part of the 2.20 cycle how we can properly avoid this
> cost, especially when 32 is a valid option.
> ...
> ... to
> take a 'hasheq' approach [2] like Peff suggested [3]. Since that
> requires auditing all callers of hashcmp to see if hasheq is
> appropriate, it is not a good solution for 2.19 but (in my opinion)
> should be evaluated as part of the 2.20 cycle.

Thanks for thoughtful comments.  I think it makes sense to go with
the "tell compiler hashcmp() currently is only about 20-byte array"
for now, as it is trivial to see why it cannot break things.

During 2.20 cycle, we may find out that hasheq() abstraction
performs better than hashcmp() with multiple lengths, and end up
doing that "audit and replace to use hasheq()".  But as you said,
it probably isnot a good idea to rush it in this cycle.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23  5:02                   ` Jeff King
  2018-08-23  5:09                     ` brian m. carlson
  2018-08-23  5:10                     ` Jonathan Nieder
@ 2018-08-23 13:20                     ` Junio C Hamano
  2018-08-23 16:31                       ` wide t/perf output, was " Jeff King
  2 siblings, 1 reply; 58+ messages in thread
From: Junio C Hamano @ 2018-08-23 13:20 UTC (permalink / raw)
  To: Jeff King
  Cc: Jonathan Nieder, Paul Smith, git, Derrick Stolee, Duy Nguyen,
	Ævar Arnfjörð Bjarmason, brian m. carlson

Jeff King <peff@peff.net> writes:

> Here are numbers for p0001.2 run against linux.git on a few
> versions. This is using -O2 with gcc 8.2.0.
>
>   Test     v2.18.0             v2.19.0-rc0               HEAD
>   ------------------------------------------------------------------------------
>   0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) -1.0%

I see what you did to the formatting here, which is a topic of
another thread ;-).

Thanks, as Derrick also noted, I agree this is an appropriate
workaround for the upcoming release and we may want to explore
hasheq() and other solutions as part of the effort during the next
cycle (which can start now if people are bored---after all working
on the codebase is the best way to hunt for recent bugs).

Thanks, will queue.

> diff --git a/cache.h b/cache.h
> index b1fd3d58ab..4d014541ab 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1023,6 +1023,16 @@ extern const struct object_id null_oid;
>  
>  static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
>  {
> +	/*
> +	 * This is a temporary optimization hack. By asserting the size here,
> +	 * we let the compiler know that it's always going to be 20, which lets
> +	 * it turn this fixed-size memcmp into a few inline instructions.
> +	 *
> +	 * This will need to be extended or ripped out when we learn about
> +	 * hashes of different sizes.
> +	 */
> +	if (the_hash_algo->rawsz != 20)
> +		BUG("hash size not yet supported by hashcmp");
>  	return memcmp(sha1, sha2, the_hash_algo->rawsz);
>  }

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23 10:26                     ` Derrick Stolee
  2018-08-23 13:16                       ` Junio C Hamano
@ 2018-08-23 16:14                       ` Jeff King
  2018-08-23 23:30                         ` Jacob Keller
  2018-08-23 18:53                       ` Jeff King
  2018-09-02 18:53                       ` Kaartic Sivaraam
  3 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-23 16:14 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: brian m. carlson, Jonathan Nieder, Paul Smith, git, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:

> Around the time that my proposed approaches were getting vetoed for
> alignment issues, I figured I was out of my depth here. I reached out to
> Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of
> posts of word-based approaches to different problems, so I thought he might
> know something off the top of his head that would be applicable. His
> conclusion (after looking only a short time) was to take a 'hasheq' approach
> [2] like Peff suggested [3]. Since that requires auditing all callers of
> hashcmp to see if hasheq is appropriate, it is not a good solution for 2.19
> but (in my opinion) should be evaluated as part of the 2.20 cycle.

I think that audit isn't actually too bad (but definitely not something
we should do for v2.19). The cocci patch I showed earlier hits most of
them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not
sure if there's a way to ask coccinelle only for oidcmp()
used in a boolean context.

Just skimming the grep output, it's basically every call except the ones
we use for binary search.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* wide t/perf output, was Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23 13:20                     ` Junio C Hamano
@ 2018-08-23 16:31                       ` " Jeff King
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff King @ 2018-08-23 16:31 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder, Paul Smith, git, Derrick Stolee, Duy Nguyen,
	Ævar Arnfjörð Bjarmason, brian m. carlson

On Thu, Aug 23, 2018 at 06:20:26AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Here are numbers for p0001.2 run against linux.git on a few
> > versions. This is using -O2 with gcc 8.2.0.
> >
> >   Test     v2.18.0             v2.19.0-rc0               HEAD
> >   ------------------------------------------------------------------------------
> >   0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) -1.0%
> 
> I see what you did to the formatting here, which is a topic of
> another thread ;-).

Do you happen to have a link? I missed that one, and digging turned up
nothing.

A while ago I wrote the patch below. I don't recall why I never sent it
in (and it doesn't apply cleanly these days, though I'm sure it could be
forward-ported).

-- >8 --
Date: Wed, 20 Jan 2016 23:54:14 -0500
Subject: [PATCH] t/perf: add "tall" output format

When aggregating results, we usually show a list of tests,
one per line, with the tested revisions in columns across.
Like:

    $ ./aggregate.perl 348d4f2^ 348d4f2 p7000-filter-branch.sh
    Test                  348d4f2^               348d4f2
    -------------------------------------------------------------------
    7000.2: noop filter   295.32(269.61+14.36)   7.92(0.85+0.72) -97.3%

This is useful if you have a lot of tests to show, but few
revisions; you're effectively comparing the two items on
each line. But sometimes you have the opposite: few tests,
but a large number of revisions. In this case, the lines
get very long, and it's hard to compare values.

This patch introduces a "tall" format that shows the same
data in a more vertical manner:

    $ ./aggregate.perl --tall \
        348d4f2^ 348d4f2 \
        jk/filter-branch-empty^ \
        jk/filter-branch-empty \
        p7000-filter-branch.sh
    Test: p7000-filter-branch.2
    348d4f2^                  295.32(269.61+14.36)
    348d4f2                        7.92(0.85+0.72) -97.3%
    jk/filter-branch-empty^        9.37(0.87+0.80) -96.8%
    jk/filter-branch-empty         7.71(0.92+0.62) -97.4%

Signed-off-by: Jeff King <peff@peff.net>
---
 t/perf/aggregate.perl | 124 ++++++++++++++++++++++++++++++------------
 1 file changed, 88 insertions(+), 36 deletions(-)

diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
index e401208488..d108a02ccd 100755
--- a/t/perf/aggregate.perl
+++ b/t/perf/aggregate.perl
@@ -17,29 +17,41 @@ sub get_times {
 	return ($rt, $4, $5);
 }
 
-sub format_times {
+sub format_times_list {
 	my ($r, $u, $s, $firstr) = @_;
 	if (!defined $r) {
 		return "<missing>";
 	}
 	my $out = sprintf "%.2f(%.2f+%.2f)", $r, $u, $s;
+	my $pct;
 	if (defined $firstr) {
 		if ($firstr > 0) {
-			$out .= sprintf " %+.1f%%", 100.0*($r-$firstr)/$firstr;
+			$pct = sprintf "%+.1f%%", 100.0*($r-$firstr)/$firstr;
 		} elsif ($r == 0) {
-			$out .= " =";
+			$pct = "=";
 		} else {
-			$out .= " +inf";
+			$pct = "+inf";
 		}
 	}
-	return $out;
+	return ($out, $pct);
+}
+
+sub format_times {
+	my ($times, $pct) = format_times_list(@_);
+	return defined $pct ? "$times $pct" : $times;
 }
 
 my (@dirs, %dirnames, %dirabbrevs, %prefixes, @tests);
+my ($tall_format);
 while (scalar @ARGV) {
 	my $arg = $ARGV[0];
 	my $dir;
 	last if -f $arg or $arg eq "--";
+	if ($arg eq "--tall") {
+		$tall_format = 1;
+		shift @ARGV;
+		next;
+	}
 	if (! -d $arg) {
 		my $rev = Git::command_oneline(qw(rev-parse --verify), $arg);
 		$dir = "build/".$rev;
@@ -122,6 +134,11 @@ sub have_slash {
 	return 0;
 }
 
+sub printable_dir {
+	my ($d) = @_;
+	return exists $dirabbrevs{$d} ? $dirabbrevs{$d} : $dirnames{$d};
+}
+
 my %newdirabbrevs = %dirabbrevs;
 while (!have_duplicate(values %newdirabbrevs)) {
 	%dirabbrevs = %newdirabbrevs;
@@ -132,44 +149,79 @@ sub have_slash {
 	}
 }
 
-my %times;
-my @colwidth = ((0)x@dirs);
-for my $i (0..$#dirs) {
-	my $d = $dirs[$i];
-	my $w = length (exists $dirabbrevs{$d} ? $dirabbrevs{$d} : $dirnames{$d});
-	$colwidth[$i] = $w if $w > $colwidth[$i];
-}
-for my $t (@subtests) {
-	my $firstr;
+binmode STDOUT, ":utf8" or die "PANIC on binmode: $!";
+
+if (!$tall_format) {
+	my %times;
+	my @colwidth = ((0)x@dirs);
 	for my $i (0..$#dirs) {
 		my $d = $dirs[$i];
-		$times{$prefixes{$d}.$t} = [get_times("$resultsdir/$prefixes{$d}$t.times")];
-		my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
-		my $w = length format_times($r,$u,$s,$firstr);
+		my $w = length(printable_dir($d));
 		$colwidth[$i] = $w if $w > $colwidth[$i];
-		$firstr = $r unless defined $firstr;
 	}
-}
-my $totalwidth = 3*@dirs+$descrlen;
-$totalwidth += $_ for (@colwidth);
-
-binmode STDOUT, ":utf8" or die "PANIC on binmode: $!";
+	for my $t (@subtests) {
+		my $firstr;
+		for my $i (0..$#dirs) {
+			my $d = $dirs[$i];
+			$times{$prefixes{$d}.$t} = [get_times("$resultsdir/$prefixes{$d}$t.times")];
+			my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
+			my $w = length format_times($r,$u,$s,$firstr);
+			$colwidth[$i] = $w if $w > $colwidth[$i];
+			$firstr = $r unless defined $firstr;
+		}
+	}
+	my $totalwidth = 3*@dirs+$descrlen;
+	$totalwidth += $_ for (@colwidth);
 
-printf "%-${descrlen}s", "Test";
-for my $i (0..$#dirs) {
-	my $d = $dirs[$i];
-	printf "   %-$colwidth[$i]s", (exists $dirabbrevs{$d} ? $dirabbrevs{$d} : $dirnames{$d});
-}
-print "\n";
-print "-"x$totalwidth, "\n";
-for my $t (@subtests) {
-	printf "%-${descrlen}s", $descrs{$t};
-	my $firstr;
+	printf "%-${descrlen}s", "Test";
 	for my $i (0..$#dirs) {
 		my $d = $dirs[$i];
-		my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
-		printf "   %-$colwidth[$i]s", format_times($r,$u,$s,$firstr);
-		$firstr = $r unless defined $firstr;
+		printf "   %-$colwidth[$i]s", printable_dir($d);
 	}
 	print "\n";
+	print "-"x$totalwidth, "\n";
+	for my $t (@subtests) {
+		printf "%-${descrlen}s", $descrs{$t};
+		my $firstr;
+		for my $i (0..$#dirs) {
+			my $d = $dirs[$i];
+			my ($r,$u,$s) = @{$times{$prefixes{$d}.$t}};
+			printf "   %-$colwidth[$i]s", format_times($r,$u,$s,$firstr);
+			$firstr = $r unless defined $firstr;
+		}
+		print "\n";
+	}
+} else {
+	my $shown = 0;
+	for my $t (@subtests) {
+		print "\n" if $shown++;
+		print "Test: $t\n";
+
+		my %times;
+		my $firstr;
+		for my $d (@dirs) {
+			my ($r, $u, $s) = get_times("test-results/$prefixes{$d}$t.times");
+			$times{$d} = [format_times_list($r, $u, $s, $firstr)];
+			$firstr = $r unless defined $firstr;
+		}
+
+		my $maxdirlen = 0;
+		my $maxtimelen = 0;
+		for my $d (@dirs) {
+			if (length($d) > $maxdirlen) {
+				$maxdirlen = length(printable_dir($d));
+			}
+			if (length($times{$d}->[0]) > $maxtimelen) {
+				$maxtimelen = length($times{$d}->[0]);
+			}
+		}
+		$maxdirlen++;
+
+		for my $d (@dirs) {
+			printf "%-${maxdirlen}s", printable_dir($d);
+			printf "   %${maxtimelen}s", $times{$d}->[0];
+			print " $times{$d}->[1]" if defined $times{$d}->[1];
+			print "\n";
+		}
+	}
 }
-- 
2.19.0.rc0.412.g7005db4e88


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23 10:26                     ` Derrick Stolee
  2018-08-23 13:16                       ` Junio C Hamano
  2018-08-23 16:14                       ` Jeff King
@ 2018-08-23 18:53                       ` Jeff King
  2018-08-23 20:59                         ` Derrick Stolee
  2018-09-02 18:53                       ` Kaartic Sivaraam
  3 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-23 18:53 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: brian m. carlson, Jonathan Nieder, Paul Smith, git, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:

> > I think you can safely
> > ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
> > without some mitigation, I don't know that it's all that big a deal,
> > given the numbers I generated (which for some reason are less dramatic
> > than Stolee's).
> My numbers may be more dramatic because my Linux environment is a virtual
> machine.

If you have a chance, can you run p0001 on my patch (compared to
2.19-rc0, or to both v2.18 and v2.19-rc0)? It would be nice to double
check that it really is fixing the problem you saw.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23 18:53                       ` Jeff King
@ 2018-08-23 20:59                         ` Derrick Stolee
  2018-08-24  6:56                           ` Jeff King
  2018-08-24 16:45                           ` Derrick Stolee
  0 siblings, 2 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-08-23 20:59 UTC (permalink / raw)
  To: Jeff King
  Cc: brian m. carlson, Jonathan Nieder, Paul Smith, git, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On 8/23/2018 2:53 PM, Jeff King wrote:
> On Thu, Aug 23, 2018 at 06:26:58AM -0400, Derrick Stolee wrote:
>
>>> I think you can safely
>>> ignore the rest of it if you are otherwise occupied. Even if v2.19 ships
>>> without some mitigation, I don't know that it's all that big a deal,
>>> given the numbers I generated (which for some reason are less dramatic
>>> than Stolee's).
>> My numbers may be more dramatic because my Linux environment is a virtual
>> machine.
> If you have a chance, can you run p0001 on my patch (compared to
> 2.19-rc0, or to both v2.18 and v2.19-rc0)? It would be nice to double
> check that it really is fixing the problem you saw.

Sure. Note: I had to create a new Linux VM on a different machine 
between Tuesday and today, so the absolute numbers are different.

Using git/git:

Test      v2.18.0           v2.19.0-rc0             HEAD
-------------------------------------------------------------------------
0001.2:   3.10(3.02+0.08)   3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3%


Using torvalds/linux:

Test     v2.18.0             v2.19.0-rc0               HEAD
------------------------------------------------------------------------------
0001.2:  56.08(45.91+1.50)   56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6%


Now here is where I get on my soapbox (and create a TODO for myself 
later). I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively 
suggests that the results should be _more_ accurate than the default of 
3. However, I then remember that we only report the *minimum* time from 
all the runs, which is likely to select an outlier from the 
distribution. To test this, I ran a few tests manually and found the 
variation between runs to be larger than 3%.

When I choose my own metrics for performance tests, I like to run at 
least 10 runs, remove the largest AND smallest runs from the samples, 
and then take the average. I did this manually for 'git rev-list --all 
--objects' on git/git and got the following results:

v2.18.0    v2.19.0-rc0   HEAD
--------------------------------
3.126 s    3.308 s       3.170 s

For full disclosure, here is a full table including all samples:

|      | v2.18.0 | v2.19.0-rc0 | HEAD    |
|------|---------|-------------|---------|
|      | 4.58    | 3.302       | 3.239   |
|      | 3.13    | 3.337       | 3.133   |
|      | 3.213   | 3.291       | 3.159   |
|      | 3.219   | 3.318       | 3.131   |
|      | 3.077   | 3.302       | 3.163   |
|      | 3.074   | 3.328       | 3.119   |
|      | 3.022   | 3.277       | 3.125   |
|      | 3.083   | 3.259       | 3.203   |
|      | 3.057   | 3.311       | 3.223   |
|      | 3.155   | 3.413       | 3.225   |
| Max  | 4.58    | 3.413       | 3.239   |
| Min  | 3.022   | 3.259       | 3.119   |
| Avg* | 3.126   | 3.30825     | 3.17025 |

(Note that the largest one was the first run, on v2.18.0, which is due 
to a cold disk.)

I just kicked off a script that will run this test on the Linux repo 
while I drive home. I'll be able to report a similar table of data easily.

My TODO is to consider aggregating the data this way (or with a median) 
instead of reporting the minimum.

Thanks,

-Stolee


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23 16:14                       ` Jeff King
@ 2018-08-23 23:30                         ` Jacob Keller
  2018-08-23 23:40                           ` Jeff King
  0 siblings, 1 reply; 58+ messages in thread
From: Jacob Keller @ 2018-08-23 23:30 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee, brian m. carlson, Jonathan Nieder, paul,
	Git mailing list, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Thu, Aug 23, 2018 at 9:30 AM Jeff King <peff@peff.net> wrote:
> I think that audit isn't actually too bad (but definitely not something
> we should do for v2.19). The cocci patch I showed earlier hits most of
> them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not
> sure if there's a way to ask coccinelle only for oidcmp()
> used in a boolean context.
>

You can look for explicitly "if (oidcmp(...))" though. I don't know if
you can catch *any* use which degrades to boolean outside of an if
statement, but I wouldn't expect there to be too many of those?

Thanks,
Jake

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23 23:30                         ` Jacob Keller
@ 2018-08-23 23:40                           ` Jeff King
  2018-08-24  0:06                             ` Jeff King
  0 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-23 23:40 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Derrick Stolee, brian m. carlson, Jonathan Nieder, paul,
	Git mailing list, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Thu, Aug 23, 2018 at 04:30:10PM -0700, Jacob Keller wrote:

> On Thu, Aug 23, 2018 at 9:30 AM Jeff King <peff@peff.net> wrote:
> > I think that audit isn't actually too bad (but definitely not something
> > we should do for v2.19). The cocci patch I showed earlier hits most of
> > them. It misses the negated ones (e.g., "if (oidcmp(...))"). I'm not
> > sure if there's a way to ask coccinelle only for oidcmp()
> > used in a boolean context.
> >
> 
> You can look for explicitly "if (oidcmp(...))" though. I don't know if
> you can catch *any* use which degrades to boolean outside of an if
> statement, but I wouldn't expect there to be too many of those?

Yeah, that was my thought, too. And I've been trying this all afternoon
without success. Why doesn't this work:

  @@
  expression a, b;
  @@
  - if (oidcmp(a, b))
  + if (!oideq(a, b))

I get:

  Fatal error: exception Failure("minus: parse error: \n = File
  \"contrib/coccinelle/oideq.cocci\", line 21, column 0,  charpos =
  221\n    around = '', whole content = \n")

If I do:

  - if (oidcmp(a, b)) { ... }

that seems to please the parser for the minus line. But I cannot include
the "..." on the plus line. Clearly the "..." part should be context,
but I can't seem to find the right syntax.

FWIW, I do have patches adding hasheq() and converting all of the
!oidcmp() cases. I may resort to hand-investigating each of the negated
ones, but I really feel like I should be able to do better with
coccinelle.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23 23:40                           ` Jeff King
@ 2018-08-24  0:06                             ` Jeff King
  2018-08-24  0:16                               ` Jeff King
  0 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-24  0:06 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Derrick Stolee, brian m. carlson, Jonathan Nieder, paul,
	Git mailing list, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Thu, Aug 23, 2018 at 07:40:49PM -0400, Jeff King wrote:

> > You can look for explicitly "if (oidcmp(...))" though. I don't know if
> > you can catch *any* use which degrades to boolean outside of an if
> > statement, but I wouldn't expect there to be too many of those?
> 
> Yeah, that was my thought, too. And I've been trying this all afternoon
> without success. Why doesn't this work:
> 
>   @@
>   expression a, b;
>   @@
>   - if (oidcmp(a, b))
>   + if (!oideq(a, b))
> 
> I get:
> 
>   Fatal error: exception Failure("minus: parse error: \n = File
>   \"contrib/coccinelle/oideq.cocci\", line 21, column 0,  charpos =
>   221\n    around = '', whole content = \n")
> 
> If I do:
> 
>   - if (oidcmp(a, b)) { ... }
> 
> that seems to please the parser for the minus line. But I cannot include
> the "..." on the plus line. Clearly the "..." part should be context,
> but I can't seem to find the right syntax.

This almost works:

  @@
  expression a, b;
  statement s;
  @@
  - if (oidcmp(a, b)) s
  + if (!oideq(a, b)) s

It generates this, for example:

diff -u -p a/bisect.c b/bisect.c
--- a/bisect.c
+++ b/bisect.c
@@ -595,7 +595,7 @@ static struct commit_list *skip_away(str
 
 	for (i = 0; cur; cur = cur->next, i++) {
 		if (i == index) {
-			if (oidcmp(&cur->item->object.oid, current_bad_oid))
+			if (!oideq(&cur->item->object.oid, current_bad_oid))
 				return cur;
 			if (previous)
 				return previous;

which is what we want. But it also generates this:

diff -u -p a/bundle.c b/bundle.c
--- a/bundle.c
+++ b/bundle.c
@@ -369,25 +369,11 @@ static int write_bundle_refs(int bundle_
 		 * commit that is referenced by the tag, and not the tag
 		 * itself.
 		 */
-		if (oidcmp(&oid, &e->item->oid)) {
-			/*
-			 * Is this the positive end of a range expressed
-			 * in terms of a tag (e.g. v2.0 from the range
-			 * "v1.0..v2.0")?
-			 */
-			struct commit *one = lookup_commit_reference(the_repository,
-								     &oid);
+		if (!oideq(&oid, &e->item->oid)) {
+			struct commit *one=lookup_commit_reference(the_repository,
+								   &oid);
 			struct object *obj;
-
 			if (e->item == &(one->object)) {
-				/*
-				 * Need to include e->name as an
-				 * independent ref to the pack-objects
-				 * input, so that the tag is included
-				 * in the output; otherwise we would
-				 * end up triggering "empty bundle"
-				 * error.
-				 */
 				obj = parse_object_or_die(&oid, e->name);
 				obj->flags |= SHOWN;
 				add_pending_object(revs, obj, e->name);

So I really do want some way of saying "all of the block, no matter what
it is". Or of leaving it out as context.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-24  0:06                             ` Jeff King
@ 2018-08-24  0:16                               ` Jeff King
  2018-08-24  2:48                                 ` Jacob Keller
  0 siblings, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-24  0:16 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Derrick Stolee, brian m. carlson, Jonathan Nieder, paul,
	Git mailing list, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Thu, Aug 23, 2018 at 08:06:37PM -0400, Jeff King wrote:

> This almost works:
> 
>   @@
>   expression a, b;
>   statement s;
>   @@
>   - if (oidcmp(a, b)) s
>   + if (!oideq(a, b)) s
>
> [...]

> So I really do want some way of saying "all of the block, no matter what
> it is". Or of leaving it out as context.

Aha. The magic invocation is:

  @@
  expression a, b;
  statement s;
  @@
  - if (oidcmp(a, b))
  + if (!oideq(a, b))
      s

I would have expected that you could replace "s" with "...", but that
does not seem to work.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-24  0:16                               ` Jeff King
@ 2018-08-24  2:48                                 ` Jacob Keller
  2018-08-24  2:59                                   ` Jeff King
  0 siblings, 1 reply; 58+ messages in thread
From: Jacob Keller @ 2018-08-24  2:48 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee, brian m. carlson, Jonathan Nieder, Paul Smith,
	Git mailing list, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Thu, Aug 23, 2018 at 5:16 PM Jeff King <peff@peff.net> wrote:
>
> On Thu, Aug 23, 2018 at 08:06:37PM -0400, Jeff King wrote:
>
> > This almost works:
> >
> >   @@
> >   expression a, b;
> >   statement s;
> >   @@
> >   - if (oidcmp(a, b)) s
> >   + if (!oideq(a, b)) s
> >
> > [...]
>
> > So I really do want some way of saying "all of the block, no matter what
> > it is". Or of leaving it out as context.
>
> Aha. The magic invocation is:
>
>   @@
>   expression a, b;
>   statement s;
>   @@
>   - if (oidcmp(a, b))
>   + if (!oideq(a, b))
>       s
>
> I would have expected that you could replace "s" with "...", but that
> does not seem to work.
>
> -Peff

Odd...

What about..

- if (oidcmp(a,b))
+ if(!oideq(a,b))
  { ... }

Or maybe you need to use something like

  <...
- if (oidcmp(a,b))
+ if (!oideq(a,b))
  ...>

Hmm. Yea, semantic patches are a bit confusing overall sometimes.

But it looks like you got something which works?

Thanks,
Jake

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-24  2:48                                 ` Jacob Keller
@ 2018-08-24  2:59                                   ` Jeff King
  2018-08-24  6:45                                     ` Jeff King
  2018-08-27 19:36                                     ` Junio C Hamano
  0 siblings, 2 replies; 58+ messages in thread
From: Jeff King @ 2018-08-24  2:59 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Derrick Stolee, brian m. carlson, Jonathan Nieder, Paul Smith,
	Git mailing list, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Thu, Aug 23, 2018 at 07:48:42PM -0700, Jacob Keller wrote:

> Odd...
> 
> What about..
> 
> - if (oidcmp(a,b))
> + if(!oideq(a,b))
>   { ... }

Nope, it doesn't like that syntactically.

> Or maybe you need to use something like
> 
>   <...
> - if (oidcmp(a,b))
> + if (!oideq(a,b))
>   ...>

Nor that (I also tried finding documentation on what exactly the angle
brackets mean, but couldn't).

> Hmm. Yea, semantic patches are a bit confusing overall sometimes.
> 
> But it looks like you got something which works?

Actually, what I showed earlier does seem to have some weirdness with
else-if. But I finally stumbled on something even better:

  - oidcmp(a, b) != 0
  + !oideq(a, b)

Because of the isomorphisms that coccinelle knows about, that catches
everything we want.  Obvious ones like:

diff --git a/bisect.c b/bisect.c
index 41c56a665e..7c1d8f1a6d 100644
--- a/bisect.c
+++ b/bisect.c
@@ -595,7 +595,7 @@ static struct commit_list *skip_away(struct commit_list *list, int count)
 
        for (i = 0; cur; cur = cur->next, i++) {
                if (i == index) {
-                       if (oidcmp(&cur->item->object.oid, current_bad_oid))
+                       if (!oideq(&cur->item->object.oid, current_bad_oid))
                                return cur;
                        if (previous)
                                return previous;

and compound conditionals like:

diff --git a/blame.c b/blame.c
index 10d72e36dd..538d0ab1aa 100644
--- a/blame.c
+++ b/blame.c
@@ -1834,7 +1834,7 @@ void setup_scoreboard(struct blame_scoreboard *sb,

                sb->revs->children.name = "children";
                while (c->parents &&
-                      oidcmp(&c->object.oid, &sb->final->object.oid)) {
+                      !oideq(&c->object.oid, &sb->final->object.oid)) {
                        struct commit_list *l = xcalloc(1, sizeof(*l));

                        l->item = c;

and even non-if contexts, like:

diff --git a/sha1-file.c b/sha1-file.c
index 631f6b9dc2..d85f4e93e1 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -825,7 +825,7 @@ int check_object_signature(const struct object_id *oid, void *map,
 
        if (map) {
                hash_object_file(map, size, type, &real_oid);
-               return oidcmp(oid, &real_oid) ? -1 : 0;
+               return !oideq(oid, &real_oid) ? -1 : 0;
        }

So I think we have a winner. I'll polish that up into patches and send
it out later tonight.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-24  2:59                                   ` Jeff King
@ 2018-08-24  6:45                                     ` Jeff King
  2018-08-24 11:04                                       ` Derrick Stolee
  2018-08-27 19:36                                     ` Junio C Hamano
  1 sibling, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-24  6:45 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Derrick Stolee, brian m. carlson, Jonathan Nieder, Paul Smith,
	Git mailing list, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Thu, Aug 23, 2018 at 10:59:55PM -0400, Jeff King wrote:

> So I think we have a winner. I'll polish that up into patches and send
> it out later tonight.

Oof. This rabbit hole keeps going deeper and deeper. I wrote up my
coccinelle findings separately in:

  https://public-inbox.org/git/20180824064228.GA3183@sigill.intra.peff.net/

which is possibly a coccinelle bug (there I talked about oidcmp, since
it can be demonstrated with the existing transformations, but the same
thing happens with my hasheq patches). I'll wait to see how that
discussion plays out, but I do otherwise have hasheq() patches ready to
go, so it's probably not worth anybody else digging in further.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23 20:59                         ` Derrick Stolee
@ 2018-08-24  6:56                           ` Jeff King
  2018-08-24  7:57                             ` Ævar Arnfjörð Bjarmason
  2018-08-24 16:45                           ` Derrick Stolee
  1 sibling, 1 reply; 58+ messages in thread
From: Jeff King @ 2018-08-24  6:56 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: brian m. carlson, Jonathan Nieder, Paul Smith, git, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Thu, Aug 23, 2018 at 04:59:27PM -0400, Derrick Stolee wrote:

> Using git/git:
> 
> Test      v2.18.0           v2.19.0-rc0             HEAD
> -------------------------------------------------------------------------
> 0001.2:   3.10(3.02+0.08)   3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3%
> 
> 
> Using torvalds/linux:
> 
> Test     v2.18.0             v2.19.0-rc0               HEAD
> ------------------------------------------------------------------------------
> 0001.2:  56.08(45.91+1.50)   56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6%

Interesting that these timings aren't as dramatic as the ones you got
the other day (mine seemed to shift, too; for whatever reason it seems
like under load the difference is larger).

> Now here is where I get on my soapbox (and create a TODO for myself later).
> I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively suggests
> that the results should be _more_ accurate than the default of 3. However, I
> then remember that we only report the *minimum* time from all the runs,
> which is likely to select an outlier from the distribution. To test this, I
> ran a few tests manually and found the variation between runs to be larger
> than 3%.

Yes, I agree it's not a great system. The whole "best of 3" thing is
OK for throwing out cold-cache warmups, but it's really bad for teasing
out the significance of small changes, or even understanding how much
run-to-run noise there is.

> When I choose my own metrics for performance tests, I like to run at least
> 10 runs, remove the largest AND smallest runs from the samples, and then
> take the average. I did this manually for 'git rev-list --all --objects' on
> git/git and got the following results:

I agree that technique is better. I wonder if there's something even
more statistically rigorous we could do. E.g., to compute the variance
and throw away outliers based on standard deviations. And also to report
the variance to give a sense of the significance of any changes.

Obviously more runs gives greater confidence in the results, but 10
sounds like a lot. Many of these tests take minutes to run. Letting it
go overnight is OK if you're doing a once-per-release mega-run, but it's
pretty painful if you just want to generate some numbers to show off
your commit.

> v2.18.0    v2.19.0-rc0   HEAD
> --------------------------------
> 3.126 s    3.308 s       3.170 s

So that's 5% worsening in 2.19, and we reclaim all but 1.4% of it. Those
numbers match what I expect (and what I was seeing in some of my earlier
timings).

> I just kicked off a script that will run this test on the Linux repo while I
> drive home. I'll be able to report a similar table of data easily.

Thanks, I'd expect it to come up with similar percentages. So we'll see
if that holds true. :)

> My TODO is to consider aggregating the data this way (or with a median)
> instead of reporting the minimum.

Yes, I think that would be a great improvement for t/perf.

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-24  6:56                           ` Jeff King
@ 2018-08-24  7:57                             ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 58+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-08-24  7:57 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee, brian m. carlson, Jonathan Nieder, Paul Smith,
	git, Duy Nguyen, Steffen Mueller


On Fri, Aug 24 2018, Jeff King wrote:

> On Thu, Aug 23, 2018 at 04:59:27PM -0400, Derrick Stolee wrote:
>
>> Using git/git:
>>
>> Test v2.18.0 v2.19.0-rc0 HEAD
>> -------------------------------------------------------------------------
>> 0001.2: 3.10(3.02+0.08) 3.27(3.17+0.09) +5.5% 3.14(3.02+0.11) +1.3%
>>
>>
>> Using torvalds/linux:
>>
>> Test v2.18.0 v2.19.0-rc0 HEAD
>> ------------------------------------------------------------------------------
>> 0001.2: 56.08(45.91+1.50) 56.60(46.62+1.50) +0.9% 54.61(45.47+1.46) -2.6%
>
> Interesting that these timings aren't as dramatic as the ones you got
> the other day (mine seemed to shift, too; for whatever reason it seems
> like under load the difference is larger).
>
>> Now here is where I get on my soapbox (and create a TODO for myself later).
>> I ran the above with GIT_PERF_REPEAT_COUNT=10, which intuitively suggests
>> that the results should be _more_ accurate than the default of 3. However, I
>> then remember that we only report the *minimum* time from all the runs,
>> which is likely to select an outlier from the distribution. To test this, I
>> ran a few tests manually and found the variation between runs to be larger
>> than 3%.
>
> Yes, I agree it's not a great system. The whole "best of 3" thing is
> OK for throwing out cold-cache warmups, but it's really bad for teasing
> out the significance of small changes, or even understanding how much
> run-to-run noise there is.
>
>> When I choose my own metrics for performance tests, I like to run at least
>> 10 runs, remove the largest AND smallest runs from the samples, and then
>> take the average. I did this manually for 'git rev-list --all --objects' on
>> git/git and got the following results:
>
> I agree that technique is better. I wonder if there's something even
> more statistically rigorous we could do. E.g., to compute the variance
> and throw away outliers based on standard deviations. And also to report
> the variance to give a sense of the significance of any changes.
>
> Obviously more runs gives greater confidence in the results, but 10
> sounds like a lot. Many of these tests take minutes to run. Letting it
> go overnight is OK if you're doing a once-per-release mega-run, but it's
> pretty painful if you just want to generate some numbers to show off
> your commit.

An ex-coworker who's a *lot* smarter about these things than I am wrote
this module: https://metacpan.org/pod/Dumbbench

I while ago I dabbled briefly with integrating it into t/perf/ but got
distracted by something else:

    The module currently works similar to [more traditional iterative
    benchmark modules], except (in layman terms) it will run the command
    many times, estimate the uncertainty of the result and keep
    iterating until a certain user-defined precision has been
    reached. Then, it calculates the resulting uncertainty and goes
    through some pain to discard bad runs and subtract overhead from the
    timings. The reported timing includes an uncertainty, so that
    multiple benchmarks can more easily be compared.

Details of how it works here:
https://metacpan.org/pod/Dumbbench#HOW-IT-WORKS-AND-WHY-IT-DOESN'T

Something like that seems to me to be an inherently better
approach. I.e. we have lots of test cases that take 500ms, and some that
take maybe 5 minutes (depending on the size of the repository).

Indiscriminately repeating all of those for GIT_PERF_REPEAT_COUNT must
be dumber than something like the above method.

We could also speed up the runtime of the perf tests a lot with such a
method, by e.g. saying that we're OK with less certainty on tests that
take a longer time than those that take a shorter time.


>> v2.18.0 v2.19.0-rc0 HEAD
>> --------------------------------
>> 3.126 s 3.308 s 3.170 s
>
> So that's 5% worsening in 2.19, and we reclaim all but 1.4% of it. Those
> numbers match what I expect (and what I was seeing in some of my earlier
> timings).
>
>> I just kicked off a script that will run this test on the Linux repo while I
>> drive home. I'll be able to report a similar table of data easily.
>
> Thanks, I'd expect it to come up with similar percentages. So we'll see
> if that holds true. :)
>
>> My TODO is to consider aggregating the data this way (or with a median)
>> instead of reporting the minimum.
>
> Yes, I think that would be a great improvement for t/perf.
>
> -Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-24  6:45                                     ` Jeff King
@ 2018-08-24 11:04                                       ` Derrick Stolee
  0 siblings, 0 replies; 58+ messages in thread
From: Derrick Stolee @ 2018-08-24 11:04 UTC (permalink / raw)
  To: Jeff King, Jacob Keller
  Cc: brian m. carlson, Jonathan Nieder, Paul Smith, Git mailing list,
	Duy Nguyen, Ævar Arnfjörð Bjarmason

On 8/24/2018 2:45 AM, Jeff King wrote:
> On Thu, Aug 23, 2018 at 10:59:55PM -0400, Jeff King wrote:
>
>> So I think we have a winner. I'll polish that up into patches and send
>> it out later tonight.
> Oof. This rabbit hole keeps going deeper and deeper. I wrote up my
> coccinelle findings separately in:
>
>    https://public-inbox.org/git/20180824064228.GA3183@sigill.intra.peff.net/
>
> which is possibly a coccinelle bug (there I talked about oidcmp, since
> it can be demonstrated with the existing transformations, but the same
> thing happens with my hasheq patches). I'll wait to see how that
> discussion plays out, but I do otherwise have hasheq() patches ready to
> go, so it's probably not worth anybody else digging in further.
>
I was trying this myself yesterday, and found a doc _somewhere_ that 
said what we should do is this:

@@
expression e1, e2;
@@
if (
- oidcmp(e1, e2)
+ !oideq(e1, e2)
  )

(Don't forget the space before the last end-paren!)

-Stolee

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23 20:59                         ` Derrick Stolee
  2018-08-24  6:56                           ` Jeff King
@ 2018-08-24 16:45                           ` Derrick Stolee
  2018-08-25  8:26                             ` Jeff King
  1 sibling, 1 reply; 58+ messages in thread
From: Derrick Stolee @ 2018-08-24 16:45 UTC (permalink / raw)
  To: Jeff King
  Cc: brian m. carlson, Jonathan Nieder, Paul Smith, git, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On 8/23/2018 4:59 PM, Derrick Stolee wrote:
>
> When I choose my own metrics for performance tests, I like to run at 
> least 10 runs, remove the largest AND smallest runs from the samples, 
> and then take the average. I did this manually for 'git rev-list --all 
> --objects' on git/git and got the following results:
>
> v2.18.0    v2.19.0-rc0   HEAD
> --------------------------------
> 3.126 s    3.308 s       3.170 s
>
> For full disclosure, here is a full table including all samples:
>
> |      | v2.18.0 | v2.19.0-rc0 | HEAD    |
> |------|---------|-------------|---------|
> |      | 4.58    | 3.302       | 3.239   |
> |      | 3.13    | 3.337       | 3.133   |
> |      | 3.213   | 3.291       | 3.159   |
> |      | 3.219   | 3.318       | 3.131   |
> |      | 3.077   | 3.302       | 3.163   |
> |      | 3.074   | 3.328       | 3.119   |
> |      | 3.022   | 3.277       | 3.125   |
> |      | 3.083   | 3.259       | 3.203   |
> |      | 3.057   | 3.311       | 3.223   |
> |      | 3.155   | 3.413       | 3.225   |
> | Max  | 4.58    | 3.413       | 3.239   |
> | Min  | 3.022   | 3.259       | 3.119   |
> | Avg* | 3.126   | 3.30825     | 3.17025 |
>
> (Note that the largest one was the first run, on v2.18.0, which is due 
> to a cold disk.)
>
> I just kicked off a script that will run this test on the Linux repo 
> while I drive home. I'll be able to report a similar table of data 
> easily.

Here are the numbers for Linux:

|      | v2.18.0  | v2.19.0-rc0 | HEAD   |
|------|----------|-------------|--------|
|      | 86.5     | 70.739      | 57.266 |
|      | 60.582   | 101.928     | 56.641 |
|      | 58.964   | 60.139      | 60.258 |
|      | 59.47    | 61.141      | 58.213 |
|      | 62.554   | 60.73       | 84.54  |
|      | 59.139   | 85.424      | 57.745 |
|      | 58.487   | 59.31       | 59.979 |
|      | 58.653   | 69.845      | 60.181 |
|      | 58.085   | 102.777     | 61.455 |
|      | 58.304   | 60.459      | 62.551 |
| Max  | 86.5     | 102.777     | 84.54  |
| Min  | 58.085   | 59.31       | 56.641 |
| Avg* | 59.51913 | 71.30063    | 59.706 |
| Med  | 59.0515  | 65.493      | 60.08  |

Thanks,

-Stolee


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-24 16:45                           ` Derrick Stolee
@ 2018-08-25  8:26                             ` Jeff King
  0 siblings, 0 replies; 58+ messages in thread
From: Jeff King @ 2018-08-25  8:26 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: brian m. carlson, Jonathan Nieder, Paul Smith, git, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Fri, Aug 24, 2018 at 12:45:10PM -0400, Derrick Stolee wrote:

> Here are the numbers for Linux:
> 
> |      | v2.18.0  | v2.19.0-rc0 | HEAD   |
> |------|----------|-------------|--------|
> |      | 86.5     | 70.739      | 57.266 |
> |      | 60.582   | 101.928     | 56.641 |
> |      | 58.964   | 60.139      | 60.258 |
> |      | 59.47    | 61.141      | 58.213 |
> |      | 62.554   | 60.73       | 84.54  |
> |      | 59.139   | 85.424      | 57.745 |
> |      | 58.487   | 59.31       | 59.979 |
> |      | 58.653   | 69.845      | 60.181 |
> |      | 58.085   | 102.777     | 61.455 |
> |      | 58.304   | 60.459      | 62.551 |
> | Max  | 86.5     | 102.777     | 84.54  |
> | Min  | 58.085   | 59.31       | 56.641 |
> | Avg* | 59.51913 | 71.30063    | 59.706 |
> | Med  | 59.0515  | 65.493      | 60.08  |

Thanks. The median ones are the most interesting, I think (and show a
very satisfying recovery from my patch).

I'm surprised at the variance, especially in your v2.19 runs. It makes
me distrust the mean (and implies to me we could do better by throwing
away outliers based on value, not just the single high/low; or just
using the median also solves that).

The variance in the v2.18 column is what I'd expect based on past
experience (slow cold cache to start, then a few percent change
run-to-run).

-Peff

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-24  2:59                                   ` Jeff King
  2018-08-24  6:45                                     ` Jeff King
@ 2018-08-27 19:36                                     ` Junio C Hamano
  1 sibling, 0 replies; 58+ messages in thread
From: Junio C Hamano @ 2018-08-27 19:36 UTC (permalink / raw)
  To: Jeff King
  Cc: Jacob Keller, Derrick Stolee, brian m. carlson, Jonathan Nieder,
	Paul Smith, Git mailing list, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

Jeff King <peff@peff.net> writes:

> Actually, what I showed earlier does seem to have some weirdness with
> else-if. But I finally stumbled on something even better:
>
>   - oidcmp(a, b) != 0
>   + !oideq(a, b)
>
> Because of the isomorphisms that coccinelle knows about, that catches
> everything we want.

Nice ;-)


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [ANNOUNCE] Git v2.19.0-rc0
  2018-08-23 10:26                     ` Derrick Stolee
                                         ` (2 preceding siblings ...)
  2018-08-23 18:53                       ` Jeff King
@ 2018-09-02 18:53                       ` Kaartic Sivaraam
  3 siblings, 0 replies; 58+ messages in thread
From: Kaartic Sivaraam @ 2018-09-02 18:53 UTC (permalink / raw)
  To: Derrick Stolee, Jeff King, brian m. carlson, Jonathan Nieder,
	Paul Smith, git, Duy Nguyen,
	Ævar Arnfjörð Bjarmason

On Thu, 2018-08-23 at 06:26 -0400, Derrick Stolee wrote:
> 
> Around the time that my proposed approaches were getting vetoed for 
> alignment issues, I figured I was out of my depth here. I reached out to 
> Daniel Lemire (of EWAH bitmap fame) on Twitter [1]. His blog is full of 
> posts of word-based approaches to different problems, so I thought he 
> might know something off the top of his head that would be applicable. 
> His conclusion (after looking only a short time) was to take a 'hasheq' 
> approach [2] like Peff suggested [3]. Since that requires auditing all 
> callers of hashcmp to see if hasheq is appropriate, it is not a good 
> solution for 2.19 but (in my opinion) should be evaluated as part of the 
> 2.20 cycle.
> 

That was an interesting blog post, indeed. It had an interesting
comments section. One comment especially caught my eyes was [a]:

"So the way gcc (and maybe clang) handles this is specifically by
recognizing memcmp and checking whether a only a 2-way result is needed
and then essentially replacing it with a memcmp_eq call.

..."

I find this to be an interesting note. It seems GCC does optimize when
we clearly indicate that we use the result of the memcmp as a boolean.
So would that help in anyway? Maybe it would help in writing a `hasheq`
method easily? I'm not sure.

> [1] https://twitter.com/stolee/status/1032312965754748930
> 
> [2] 
> https://lemire.me/blog/2018/08/22/avoid-lexicographical-comparisons-when-testing-for-string-equality/
> 
> [3] 
> https://public-inbox.org/git/20180822030344.GA14684@sigill.intra.peff.net/
> 
> [4] 
> https://public-inbox.org/git/7ea416cf-b043-1274-e161-85a8780b8e1c@gmail.com/


[a]: 
https://lemire.me/blog/2018/08/22/avoid-lexicographical-comparisons-when-testing-for-string-equality/#comment-344073

--
Sivaraam


^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, back to index

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-20 22:13 [ANNOUNCE] Git v2.19.0-rc0 Junio C Hamano
2018-08-20 22:41 ` Stefan Beller
2018-08-20 23:39   ` Jonathan Nieder
2018-08-21  0:27     ` Jonathan Nieder
2018-08-21  0:46       ` Stefan Beller
2018-08-21 20:41 ` Derrick Stolee
2018-08-21 21:29   ` Jeff King
2018-08-22  0:48     ` brian m. carlson
2018-08-22  3:03       ` Jeff King
2018-08-22  3:36         ` Jeff King
2018-08-22 11:11           ` Derrick Stolee
2018-08-22  5:36         ` brian m. carlson
2018-08-22  6:07           ` Jeff King
2018-08-22  7:39             ` Ævar Arnfjörð Bjarmason
2018-08-22 11:14               ` Derrick Stolee
2018-08-22 15:17                 ` Jeff King
2018-08-22 16:08                   ` Duy Nguyen
2018-08-22 16:14                     ` Duy Nguyen
2018-08-22 16:26                       ` Jeff King
2018-08-22 16:49                         ` Derrick Stolee
2018-08-22 16:58                           ` Duy Nguyen
2018-08-22 17:04                             ` Derrick Stolee
2018-08-22 16:59                           ` Jeff King
2018-08-22 17:02                             ` Junio C Hamano
2018-08-22 15:14               ` Jeff King
2018-08-22 14:28           ` Derrick Stolee
2018-08-22 15:24             ` Jeff King
2018-08-22 12:42         ` Paul Smith
2018-08-22 15:23           ` Jeff King
2018-08-23  1:23             ` Jonathan Nieder
2018-08-23  2:16               ` Jeff King
2018-08-23  2:27                 ` Jonathan Nieder
2018-08-23  5:02                   ` Jeff King
2018-08-23  5:09                     ` brian m. carlson
2018-08-23  5:10                     ` Jonathan Nieder
2018-08-23 13:20                     ` Junio C Hamano
2018-08-23 16:31                       ` wide t/perf output, was " Jeff King
2018-08-23  3:47                 ` brian m. carlson
2018-08-23  5:04                   ` Jeff King
2018-08-23 10:26                     ` Derrick Stolee
2018-08-23 13:16                       ` Junio C Hamano
2018-08-23 16:14                       ` Jeff King
2018-08-23 23:30                         ` Jacob Keller
2018-08-23 23:40                           ` Jeff King
2018-08-24  0:06                             ` Jeff King
2018-08-24  0:16                               ` Jeff King
2018-08-24  2:48                                 ` Jacob Keller
2018-08-24  2:59                                   ` Jeff King
2018-08-24  6:45                                     ` Jeff King
2018-08-24 11:04                                       ` Derrick Stolee
2018-08-27 19:36                                     ` Junio C Hamano
2018-08-23 18:53                       ` Jeff King
2018-08-23 20:59                         ` Derrick Stolee
2018-08-24  6:56                           ` Jeff King
2018-08-24  7:57                             ` Ævar Arnfjörð Bjarmason
2018-08-24 16:45                           ` Derrick Stolee
2018-08-25  8:26                             ` Jeff King
2018-09-02 18:53                       ` Kaartic Sivaraam

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox