git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* What's cooking in git.git (Jan 2019, #01; Mon, 7)
@ 2019-01-07 23:34 Junio C Hamano
  2019-01-08  9:50 ` tg/checkout-no-overlay, was " Thomas Gummerer
                   ` (5 more replies)
  0 siblings, 6 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-01-07 23:34 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

The 'master' branch now has the first batch of topics (many of which
have been cooking in 'next' during the pre-release freeze).

You can find the changes described here in the integration branches
of the repositories listed at

    http://git-blame.blogspot.com/p/git-public-repositories.html

--------------------------------------------------
[Graduated to "master"]

* ab/push-dwim-dst (2018-11-14) 7 commits
  (merged to 'next' on 2018-12-28 at d9f618de10)
 + push doc: document the DWYM behavior pushing to unqualified <dst>
 + push: test that <src> doesn't DWYM if <dst> is unqualified
 + push: add an advice on unqualified <dst> push
 + push: move unqualified refname error into a function
 + push: improve the error shown on unqualified <dst> push
 + i18n: remote.c: mark error(...) messages for translation
 + remote.c: add braces in anticipation of a follow-up change

 Originally merged to 'next' on 2018-11-18

 "git push $there $src:$dst" rejects when $dst is not a fully
 qualified refname and not clear what the end user meant.  The
 codepath has been taught to give a clearer error message, and also
 guess where the push should go by taking the type of the pushed
 object into account (e.g. a tag object would want to go under
 refs/tags/).


* en/fast-export-import (2018-11-17) 11 commits
  (merged to 'next' on 2018-12-28 at a1b09cf515)
 + fast-export: add a --show-original-ids option to show original names
 + fast-import: remove unmaintained duplicate documentation
 + fast-export: add --reference-excluded-parents option
 + fast-export: ensure we export requested refs
 + fast-export: when using paths, avoid corrupt stream with non-existent mark
 + fast-export: move commit rewriting logic into a function for reuse
 + fast-export: avoid dying when filtering by paths and old tags exist
 + fast-export: use value from correct enum
 + git-fast-export.txt: clarify misleading documentation about rev-list args
 + git-fast-import.txt: fix documentation for --quiet option
 + fast-export: convert sha1 to oid

 Originally merged to 'next' on 2018-11-18

 Small fixes and features for fast-export and fast-import, mostly on
 the fast-export side.


* en/merge-path-collision (2018-12-01) 11 commits
  (merged to 'next' on 2018-12-28 at b50d3eee25)
 + t6036: avoid non-portable "cp -a"
 + merge-recursive: combine error handling
 + t6036, t6043: increase code coverage for file collision handling
 + merge-recursive: improve rename/rename(1to2)/add[/add] handling
 + merge-recursive: use handle_file_collision for add/add conflicts
 + merge-recursive: improve handling for rename/rename(2to1) conflicts
 + merge-recursive: fix rename/add conflict handling
 + merge-recursive: new function for better colliding conflict resolutions
 + merge-recursive: increase marker length with depth of recursion
 + t6036, t6042: testcases for rename collision of already conflicting files
 + t6042: add tests for consistency in file collision conflict handling

 Originally merged to 'next' on 2018-12-01

 Updates for corner cases in merge-recursive.


* fc/http-version (2018-11-09) 1 commit
  (merged to 'next' on 2018-12-28 at 56bcbb0fa9)
 + http: add support selecting http version

 Originally merged to 'next' on 2018-11-18

 The "http.version" configuration variable can be used with recent
 enough cURL library to force the version of HTTP used to talk when
 fetching and pushing.


* jk/loose-object-cache (2018-11-24) 10 commits
  (merged to 'next' on 2018-12-28 at 5a5faf384e)
 + odb_load_loose_cache: fix strbuf leak
 + fetch-pack: drop custom loose object cache
 + sha1-file: use loose object cache for quick existence check
 + object-store: provide helpers for loose_objects_cache
 + sha1-file: use an object_directory for the main object dir
 + handle alternates paths the same as the main object dir
 + sha1_file_name(): overwrite buffer instead of appending
 + rename "alternate_object_database" to "object_directory"
 + submodule--helper: prefer strip_suffix() to ends_with()
 + fsck: do not reuse child_process structs

 Originally merged to 'next' on 2018-11-24

 Code clean-up with optimization for the codepath that checks
 (non-)existence of loose objects.


* mk/http-backend-kill-children-before-exit (2018-11-26) 1 commit
  (merged to 'next' on 2018-12-28 at 81188d93c3)
 + http-backend: enable cleaning up forked upload/receive-pack on exit

 Originally merged to 'next' on 2018-11-29

 The http-backend CGI process did not correctly clean up the child
 processes it spawns to run upload-pack etc. when it dies itself,
 which has been corrected.


* nd/checkout-dwim-fix (2018-11-14) 1 commit
  (merged to 'next' on 2018-12-28 at 3183c9305b)
 + checkout: disambiguate dwim tracking branches and local files

 Originally merged to 'next' on 2018-11-18

 "git checkout frotz" (without any double-dash) avoids ambiguity by
 making sure 'frotz' cannot be interpreted as a revision and as a
 path at the same time.  This safety has been updated to check also
 a unique remote-tracking branch 'frotz' in a remote, when dwimming
 to create a local branch 'frotz' out of a remote-tracking branch
 'frotz' from a remote.


* nd/i18n (2018-11-12) 16 commits
  (merged to 'next' on 2018-12-28 at 8e2de8338e)
 + fsck: mark strings for translation
 + fsck: reduce word legos to help i18n
 + parse-options.c: mark more strings for translation
 + parse-options.c: turn some die() to BUG()
 + parse-options: replace opterror() with optname()
 + repack: mark more strings for translation
 + remote.c: mark messages for translation
 + remote.c: turn some error() or die() to BUG()
 + reflog: mark strings for translation
 + read-cache.c: add missing colon separators
 + read-cache.c: mark more strings for translation
 + read-cache.c: turn die("internal error") to BUG()
 + attr.c: mark more string for translation
 + archive.c: mark more strings for translation
 + alias.c: mark split_cmdline_strerror() strings for translation
 + git.c: mark more strings for translation

 Originally merged to 'next' on 2018-11-18

 More _("i18n") markings.


* nd/the-index (2018-11-12) 22 commits
  (merged to 'next' on 2018-12-28 at 6bbd3befbe)
 + rebase-interactive.c: remove the_repository references
 + rerere.c: remove the_repository references
 + pack-*.c: remove the_repository references
 + pack-check.c: remove the_repository references
 + notes-cache.c: remove the_repository references
 + line-log.c: remove the_repository reference
 + diff-lib.c: remove the_repository references
 + delta-islands.c: remove the_repository references
 + cache-tree.c: remove the_repository references
 + bundle.c: remove the_repository references
 + branch.c: remove the_repository reference
 + bisect.c: remove the_repository reference
 + blame.c: remove implicit dependency the_repository
 + sequencer.c: remove implicit dependency on the_repository
 + sequencer.c: remove implicit dependency on the_index
 + transport.c: remove implicit dependency on the_index
 + notes-merge.c: remove implicit dependency the_repository
 + notes-merge.c: remove implicit dependency on the_index
 + list-objects.c: reduce the_repository references
 + list-objects-filter.c: remove implicit dependency on the_index
 + wt-status.c: remove implicit dependency the_repository
 + wt-status.c: remove implicit dependency on the_index
 (this branch is used by md/list-objects-filter-by-depth.)

 Originally merged to 'next' on 2018-11-18

 More codepaths become aware of working with in-core repository
 instance other than the default "the_repository".


* sd/stash-wo-user-name (2018-11-19) 1 commit
  (merged to 'next' on 2018-12-28 at 99197ef5a1)
 + stash: tolerate missing user identity
 (this branch is used by ps/stash-in-c.)

 Originally merged to 'next' on 2018-11-19

 A properly configured username/email is required under
 user.useConfigOnly in order to create commits; now "git stash"
 (even though it creates commit objects to represent stash entries)
 command is exempt from the requirement.


* sg/clone-initial-fetch-configuration (2018-11-16) 3 commits
  (merged to 'next' on 2018-12-28 at 82e104f221)
 + Documentation/clone: document ignored configuration variables
 + clone: respect additional configured fetch refspecs during initial fetch
 + clone: use a more appropriate variable name for the default refspec

 Originally merged to 'next' on 2018-11-18

 Refspecs configured with "git -c var=val clone" did not propagate
 to the resulting repository, which has been corrected.

--------------------------------------------------
[New Topics]

* ab/proto-v2-fixes (2018-12-14) 5 commits
 - tests: mark tests broken under GIT_TEST_PROTOCOL_VERSION=2
 - builtin/fetch-pack: support protocol version 2
 - tests: mark & fix tests broken under GIT_TEST_PROTOCOL_VERSION=1
 - tests: add a special setup where for protocol.version
 - tests: add a check for unportable env --unset

 This is (just a part of) v2, and I see v3 posted on the list, but
 it seems that v3 needs further updates.
 cf. <20181218124852.GD30471@sigill.intra.peff.net>

 I'll drop the whole thing and wait for an update.


* jn/stripspace-wo-repository (2018-12-26) 1 commit
 - stripspace: allow -s/-c outside git repository

 "git stripspace" should be usable outside a git repository, but
 under the "-s" or "-c" mode, it didn't.

 Will merge to 'next'.


* ma/asciidoctor (2018-12-26) 3 commits
 - git-status.txt: render tables correctly under Asciidoctor
 - Documentation: do not nest open blocks
 - git-column.txt: fix section header

 Some of the documentation pages formatted incorrectly with
 Asciidoctor, which have been fixed.

 Will merge to 'next'.


* nd/worktree-remove-with-uninitialized-submodules (2019-01-07) 1 commit
 - worktree: allow to (re)move worktrees with uninitialized submodules

 "git worktree remove" and "git worktree move" refused to work when
 there is a submodule involved.  This has been loosened to ignore
 uninitialized submodules.

 Will merge to 'next'.


* sb/submodule-unset-core-worktree-when-worktree-is-lost (2018-12-26) 4 commits
 - submodule deinit: unset core.worktree
 - submodule--helper: fix BUG message in ensure_core_worktree
 - submodule: unset core.worktree if no working tree is present
 - submodule update: add regression test with old style setups

 The core.worktree setting in a submodule repository should not be
 pointing at a directory when the submodule loses its working tree
 (e.g. getting deinit'ed), but the code did not properly maintain
 this invariant.

 Will merge to 'next'.


* so/cherry-pick-always-allow-m1 (2019-01-07) 4 commits
 - t3506: validate '-m 1 -ff' is now accepted for non-merge commits
 - t3502: validate '-m 1' argument is now accepted for non-merge commits
 - cherry-pick: do not error on non-merge commits when '-m 1' is specified
 - t3510: stop using '-m 1' to force failure mid-sequence of cherry-picks

 "git cherry-pick -m1" was forbidden when picking a non-merge
 commit, even though there _is_ parent number 1 for such a commit.
 This was done to avoid mistakes back when "cherry-pick" was about
 picking a single commit, but is no longer useful with "cherry-pick"
 that can pick a range of commits.  Now the "-m$num" option is
 allowed when picking any commit, as long as $num names an existing
 parent of the commit.

 Technically this is a backward incompatible change; hopefully
 nobody is relying on the error-checking behaviour.

 Will merge to 'next'.


* cy/completion-typofix (2019-01-03) 1 commit
 - completion: fix typo in git-completion.bash

 Typofix.

 Will merge to 'next'.


* cy/zsh-completion-SP-in-path (2019-01-03) 2 commits
 - completion: treat results of git ls-tree as file paths
 - zsh: complete unquoted paths with spaces correctly

 With zsh, "git cmd path<TAB>" was completed to "git cmd path name"
 when the completed path has a special character like SP in it,
 without any attempt to keep "path name" a single filename.  This
 has been fixed to complete it to "git cmd path\ name" just like
 Bash completion does.

 Will merge to 'next'.


* ds/commit-graph-assert-missing-parents (2019-01-02) 1 commit
 - commit-graph: writing missing parents is a BUG

 Tightening error checking in commit-graph writer.

 Will merge to 'next'.


* ed/simplify-setup-git-dir (2019-01-03) 1 commit
 - Simplify handling of setup_git_directory_gently() failure cases.

 Code simplification.

 Will merge to 'next'.


* es/doc-worktree-guessremote-config (2018-12-28) 1 commit
 - doc/config: do a better job of introducing 'worktree.guessRemote'

 Doc clarification.

 Will merge to 'next'.


* ew/ban-strncat (2019-01-02) 1 commit
 - banned.h: mark strncat() as banned

 The "strncat()" function is now among the banned functions.

 Will merge to 'next'.


* jk/dev-build-format-security (2019-01-07) 1 commit
 - config.mak.dev: add -Wall, primarily for -Wformat, to help autoconf users

 Earlier we added "-Wformat-security" to developer builds, assuming
 that "-Wall" (which includes "-Wformat" which in turn is required
 to use "-Wformat-security") is always in effect.  This is not true
 when config.mak.autogen is in use, unfortunately.  This has been
 fixed by unconditionally adding "-Wall" to developer builds.

 Will merge to 'next'.


* jp/author-committer-config (2019-01-02) 2 commits
 - DONTMERGE
 - Add author and committer configuration settings

 Four new configuration variables {author,committer}.{name,email}
 have been introduced to override user.{name,email} in more specific
 cases.

 Expecting a reroll.
 cf. <xmqq1s5uk6qh.fsf@gitster-ct.c.googlers.com>


* js/rebase-am (2019-01-03) 4 commits
 - built-in rebase: call `git am` directly
 - rebase: teach `reset_head()` to optionally skip the worktree
 - rebase: avoid double reflog entry when switching branches
 - rebase: move `reset_head()` into a better spot

 Instead of going through "git-rebase--am" scriptlet to use the "am"
 backend, the built-in version of "git rebase" learned to drive the
 "am" backend directly.

 Waiting for a review response.
 Looked almost ready.


* ms/packet-err-check (2019-01-02) 2 commits
 - pack-protocol.txt: accept error packets in any context
 - Use packet_reader instead of packet_read_line

 Error checking of data sent over the pack-protocol has been
 revamped so that error packets are always diagnosed properly.


* os/rebase-runs-post-checkout-hook (2019-01-02) 2 commits
 - rebase: run post-checkout hook on checkout
 - t5403: simplify by using a single repository

 "git rebase" internally runs "checkout" to switch between branches,
 and the command used to call the post-checkout hook, but the
 reimplementation stopped doing so, which is getting fixed.


* rb/hpe (2019-01-03) 5 commits
 - compat/regex/regcomp.c: define intptr_t and uintptr_t on NonStop
 - git-compat-util.h: add FLOSS headers for HPE NonStop
 - config.mak.uname: support for modern HPE NonStop config.
 - transport-helper: drop read/write errno checks
 - transport-helper: use xread instead of read

 Portability updates for the HPE NonStop platform.

 Will merge to 'next'.


* sg/test-bash-version-fix (2019-01-03) 2 commits
 - Merge branch 'sg/test-bash-version-fix'
 - test-lib: check Bash version for '-x' without using shell arrays
 (this branch is used by sg/stress-test.)

 The test suite tried to see if it is run under bash, but the check
 itself failed under some other implementations of shell (notably
 under NetBSD).  This has been corrected.

 Will merge to 'next'.


* sm/http-no-more-failonerror (2019-01-03) 2 commits
 - Unset CURLOPT_FAILONERROR
 - Change how HTTP response body is returned

 Waiting for clarifications.


* tg/t5570-drop-racy-test (2019-01-07) 2 commits
 - Revert "t/lib-git-daemon: record daemon log"
 - t5570: drop racy test

 An inherently racy test that caused intermittent failures has been
 removed.

 Will merge to 'next'.


* tt/bisect-in-c (2019-01-02) 7 commits
 - bisect--helper: `bisect_start` shell function partially in C
 - bisect--helper: `get_terms` & `bisect_terms` shell function in C
 - bisect--helper: `bisect_next_check` shell function in C
 - bisect--helper: `check_and_set_terms` shell function in C
 - wrapper: move is_empty_file() and rename it as is_empty_or_missing_file()
 - bisect--helper: `bisect_write` shell function in C
 - bisect--helper: `bisect_reset` shell function in C

 More code in "git bisect" has been rewritten in C.


* ja/doc-build-l10n (2019-01-07) 1 commit
 - Documentation/Makefile add optional targets for l10n

 Prepare Documentation/Makefile so that manpage localization can
 reuse it by overriding and tweaking the list of build products.

 Will merge to 'next'.


* jk/loose-object-cache-oid (2019-01-07) 5 commits
 - sha1-file: modernize loose header/stream functions
 - sha1-file: modernize loose object file functions
 - http: use struct object_id instead of bare sha1
 - update comment references to sha1_object_info()
 - sha1-file: fix outdated sha1 comment references
 (this branch uses rs/loose-object-cache-perffix.)

 Code clean-up.

 Will merge to 'next'.


* mm/multimail-1.5 (2019-01-07) 1 commit
 - git-multimail: update to release 1.5.0

 Update "git multimail" from the upstream.

 Will merge to 'next'.


* po/git-p4-wo-login (2019-01-07) 1 commit
 - git-p4: fix problem when p4 login is not necessary

 "git p4" update.

 Will merge to 'next'.


* rs/loose-object-cache-perffix (2019-01-07) 4 commits
 - object-store: retire odb_load_loose_cache()
 - object-store: use one oid_array per subdirectory for loose cache
 - object-store: factor out odb_clear_loose_cache()
 - object-store: factor out odb_loose_cache()
 (this branch is used by jk/loose-object-cache-oid.)

 The loose object cache used to optimize existence look-up has been
 updated.

 Will merge to 'next'.


* rs/sha1-file-close-mapped-file-on-error (2019-01-07) 1 commit
 - sha1-file: close fd of empty file in map_sha1_file_1()

 Code clean-up.

 Will merge to 'next'.

--------------------------------------------------
[Cooking]

* lt/date-human (2019-01-02) 3 commits
 - t0006-date.sh: add `human` date format tests.
 - Add 'human' date format documentation
 - Add 'human' date format

 A new date format "--date=human" that morphs its output depending
 on how far the time is from the current time has been introduced.
 "--date=auto" can be used to use this new format when the output is
 goint to the pager or to the terminal and otherwise the default
 format.

 The design around "auto" may need to be rethought.
 The tests need to be updated, too.
 cf. <20190104075034.GA26014@sigill.intra.peff.net>
 cf. <a5412274-028f-3662-e4f5-dbbcad4d9a40@iee.org>


* ds/midx-expire-repack (2019-01-02) 7 commits
 - midx: implement midx_repack()
 - multi-pack-index: prepare 'repack' subcommand
 - multi-pack-index: implement 'expire' verb
 - midx: refactor permutation logic
 - multi-pack-index: prepare for 'expire' subcommand
 - Docs: rearrange subcommands for multi-pack-index
 - repack: refactor pack deletion for future use


* ds/push-sparse-tree-walk (2018-12-11) 6 commits
 - pack-objects: create GIT_TEST_PACK_SPARSE
 - pack-objects: create pack.useSparse setting
 - revision: implement sparse algorithm
 - pack-objects: add --sparse option
 - list-objects: consume sparse tree walk
 - revision: add mark_tree_uninteresting_sparse


* js/rebase-i-redo-exec (2018-12-11) 3 commits
 - rebase: introduce a shortcut for --reschedule-failed-exec
 - rebase: add a config option to default to --reschedule-failed-exec
 - rebase: introduce --reschedule-failed-exec


* md/list-objects-filter-by-depth (2018-12-28) 4 commits
 - tree:<depth>: skip some trees even when collecting omits
 - list-objects-filter: teach tree:# how to handle >0
 - Merge branch 'nd/the-index' into md/list-objects-filter-by-depth
 - Merge branch 'sb/more-repo-in-api' into md/list-objects-filter-by-depth
 (this branch uses sb/more-repo-in-api; is tangled with jt/get-reference-with-commit-graph.)


* nd/backup-log (2018-12-10) 24 commits
 - FIXME
 - rebase: keep backup of overwritten files on --skip or --abort
 - am: keep backup of overwritten files on --skip or --abort
 - checkout -f: keep backup of overwritten files
 - reset --hard: keep backup of overwritten files
 - unpack-trees.c: keep backup of ignored files being overwritten
 - refs: keep backup of deleted reflog
 - config --edit: support backup log
 - sha1-file.c: let index_path() accept NULL istate
 - backup-log: keep all blob references around
 - gc: prune backup logs
 - backup-log: add prune command
 - backup-log: add log command
 - backup-log: add diff command
 - backup-log: add cat command
 - backup-log.c: add API for walking backup log
 - add--interactive: support backup log
 - apply: support backup log with --keep-backup
 - commit: support backup log
 - update-index: support backup log with --keep-backup
 - add: support backup log
 - read-cache.c: new flag for add_index_entry() to write to backup log
 - backup-log: add "update" subcommand
 - doc: introduce new "backup log" concept


* nd/style-opening-brace (2018-12-10) 1 commit
 - style: the opening '{' of a function is in a separate line

 Code clean-up.

 Will merge to 'next'.


* sg/stress-test (2019-01-07) 8 commits
 - test-lib: add the '--stress' option to run a test repeatedly under load
 - test-lib-functions: introduce the 'test_set_port' helper function
 - test-lib: set $TRASH_DIRECTORY earlier
 - test-lib: consolidate naming of test-results paths
 - test-lib: parse command line options earlier
 - test-lib: parse options in a for loop to keep $@ intact
 - test-lib: extract Bash version check for '-x' tracing
 - test-lib: translate SIGTERM and SIGHUP to an exit
 (this branch uses sg/test-bash-version-fix.)

 Flaky tests can now be repeatedly run under load with the
 "--stress" option.

 Will merge to 'next'.


* tg/checkout-no-overlay (2019-01-02) 8 commits
 - checkout: introduce checkout.overlayMode config
 - checkout: introduce --{,no-}overlay option
 - checkout: factor out mark_cache_entry_for_checkout function
 - checkout: clarify comment
 - read-cache: add invalidate parameter to remove_marked_cache_entries
 - entry: support CE_WT_REMOVE flag in checkout_entry
 - entry: factor out unlink_entry function
 - move worktree tests to t24*

 "git checkout --no-overlay" can be used to trigger a new mode of
 checking out paths out of the tree-ish, that allows paths that
 match the pathspec that are in the current index and working tree
 and are not in the tree-ish.

 Will merge to 'next'.


* jk/proto-v2-hidden-refs-fix (2018-12-14) 3 commits
 - upload-pack: support hidden refs with protocol v2
 - parse_hide_refs_config: handle NULL section
 - serve: pass "config context" through to individual commands

 The v2 upload-pack protocol implementation failed to honor
 hidden-ref configuration, which has been corrected.

 Will merge to 'next'.


* la/quiltimport-keep-non-patch (2018-12-14) 1 commit
 - git-quiltimport: Add --keep-non-patch option

 "git quiltimport" learned "--keep-non-patch" option.

 Will merge to 'next'.


* sb/submodule-fetchjobs-default-to-one (2018-12-14) 1 commit
 - submodule update: run at most one fetch job unless otherwise set

 "git submodule update" ought to use a single job unless asked, but
 by mistake used multiple jobs, which has been fixed.

 Will merge to 'next'.


* cb/openbsd-allows-reading-directory (2018-12-03) 1 commit
  (merged to 'next' on 2019-01-04 at 9d865107fd)
 + config.mak.uname: OpenBSD uses BSD semantics with fread for directories

 BSD port update.

 Will merge to 'master'.


* cb/t5004-empty-tar-archive-fix (2018-12-03) 1 commit
  (merged to 'next' on 2019-01-04 at 39f4cf94ce)
 + t5004: avoid using tar for empty packages

 BSD port update.

 Will merge to 'master'.


* cb/test-lint-cp-a (2018-12-03) 1 commit
  (merged to 'next' on 2019-01-04 at d13e6cfcb0)
 + tests: add lint for non portable cp -a

 BSD port update.

 Will merge to 'master'.


* hb/t0061-dot-in-path-fix (2018-12-03) 1 commit
  (merged to 'next' on 2019-01-04 at 789f990c4e)
 + t0061: do not fail test if '.' is part of $PATH

 Test update.

 Will merge to 'master'.


* hn/highlight-sideband-keywords (2018-12-04) 1 commit
  (merged to 'next' on 2019-01-04 at b039601533)
 + sideband: color lines with keyword only

 Lines that begin with a certain keyword that come over the wire, as
 well as lines that consist only of one of these keywords, ought to
 be painted in color for easier eyeballing, but the latter was
 broken ever since the feature was introduced in 2.19, which has
 been corrected.

 Will merge to 'master'.


* js/commit-graph-chunk-table-fix (2018-12-14) 3 commits
 - Makefile: correct example fuzz build
 - commit-graph: fix buffer read-overflow
 - commit-graph, fuzz: add fuzzer for commit-graph

 The codepath to read from the commit-graph file attempted to read
 past the end of it when the file's table-of-contents was corrupt.


* jt/get-reference-with-commit-graph (2018-12-28) 1 commit
 - revision: use commit graph in get_reference()
 (this branch uses sb/more-repo-in-api; is tangled with md/list-objects-filter-by-depth.)

 Micro-optimize the code that prepares commit objects to be walked
 by "git rev-list" when the commit-graph is available.

 Needs to be rebuilt when sb/more-repo-in-api is rewound.


* md/exclude-promisor-objects-fix-cleanup (2018-12-06) 1 commit
  (merged to 'next' on 2019-01-04 at c15362832d)
 + revision.c: put promisor option in specialized struct

 Code clean-up.

 Will merge to 'master'.


* bw/mailmap (2018-12-09) 1 commit
  (merged to 'next' on 2019-01-04 at 02b6e83231)
 + mailmap: update brandon williams's email address

 Will merge to 'master'.


* do/gitweb-strict-export-conf-doc (2018-12-09) 1 commit
  (merged to 'next' on 2019-01-04 at 5249c9386a)
 + docs: fix $strict_export text in gitweb.conf.txt

 Doc update.

 Will merge to 'master'.


* en/directory-renames-nothanks-doc-update (2018-12-09) 1 commit
  (merged to 'next' on 2019-01-04 at cb7134b54c)
 + git-rebase.txt: update note about directory rename detection and am

 Doc update.

 Will merge to 'master'.


* fd/gitweb-snapshot-conf-doc-fix (2018-12-09) 1 commit
  (merged to 'next' on 2019-01-04 at 7ba71fca17)
 + docs/gitweb.conf: config variable typo

 Doc update.

 Will merge to 'master'.


* km/rebase-doc-typofix (2018-12-10) 1 commit
  (merged to 'next' on 2019-01-04 at c89f646e8f)
 + rebase docs: drop stray word in merge command description

 Doc update.

 Will merge to 'master'.


* nd/indentation-fix (2018-12-09) 1 commit
  (merged to 'next' on 2019-01-04 at 738b17d365)
 + Indent code with TABs

 Code cleanup.

 Will merge to 'master'.


* tb/use-common-win32-pathfuncs-on-cygwin (2018-12-26) 1 commit
  (merged to 'next' on 2019-01-04 at c3b2b1f3c3)
 + git clone <url> C:\cygwin\home\USER\repo' is working (again)

 Cygwin update.

 Will merge to 'master'.


* tb/log-G-binary (2018-12-26) 1 commit
  (merged to 'next' on 2019-01-04 at a713ef389c)
 + log -G: ignore binary files

 "git log -G<regex>" looked for a hunk in the "git log -p" patch
 output that contained a string that matches the given pattern.
 Optimize this code to ignore binary files, which by default will
 not show any hunk that would match any pattern (unless textconv or
 the --text option is in effect, that is).

 Will merge to 'master'.


* dl/merge-cleanup-scissors-fix (2018-11-21) 2 commits
 - merge: add scissors line on merge conflict
 - t7600: clean up 'merge --squash c3 with c7' test

 The list of conflicted paths shown in the editor while concluding a
 conflicted merge was shown above the scissors line when the
 clean-up mode is set to "scissors", even though it was commented
 out just like the list of updated paths and other information to
 help the user explain the merge better.

 Kicked out of 'next', to replace with a newer iteration.
 cf. <cover.1545745331.git.liu.denton@gmail.com>


* aw/pretty-trailers (2018-12-09) 7 commits
 - pretty: add support for separator option in %(trailers)
 - strbuf: separate callback for strbuf_expand:ing literals
 - pretty: add support for "valueonly" option in %(trailers)
 - pretty: allow showing specific trailers
 - pretty: single return path in %(trailers) handling
 - pretty: allow %(trailers) options with explicit value
 - doc: group pretty-format.txt placeholders descriptions

 The %(trailers) formatter in "git log --format=..."  now allows to
 optionally pick trailers selectively by keyword, show only values,
 etc.

 How's the doneness of this one?


* nd/attr-pathspec-in-tree-walk (2018-11-19) 5 commits
  (merged to 'next' on 2019-01-04 at 6a07e5b905)
 + tree-walk: support :(attr) matching
 + dir.c: move, rename and export match_attrs()
 + pathspec.h: clean up "extern" in function declarations
 + tree-walk.c: make tree_entry_interesting() take an index
 + tree.c: make read_tree*() take 'struct repository *'

 The traversal over tree objects has learned to honor
 ":(attr:label)" pathspec match, which has been implemented only for
 enumerating paths on the filesystem.

 Will merge to 'master'.


* ab/commit-graph-progress-fix (2018-11-20) 1 commit
  (merged to 'next' on 2019-01-04 at 405a1a2cf5)
 + commit-graph: split up close_reachable() progress output

 Will merge to 'master'.


* jn/unknown-index-extensions (2018-11-21) 2 commits
 - index: offer advice for unknown index extensions
 - index: do not warn about unrecognized extensions

 A bit too alarming warning given when unknown index extensions
 exist is getting revamped.

 Expecting a reroll.


* nd/checkout-noisy (2018-11-20) 2 commits
  (merged to 'next' on 2019-01-04 at 480172d3d7)
 + t0027: squelch checkout path run outside test_expect_* block
 + checkout: print something when checking out paths

 "git checkout [<tree-ish>] path..." learned to report the number of
 paths that have been checked out of the index or the tree-ish,
 which gives it the same degree of noisy-ness as the case in which
 the command checks out a branch.

 Will merge to 'master'.


* en/rebase-merge-on-sequencer (2019-01-07) 8 commits
 - rebase: implement --merge via the interactive machinery
 - rebase: define linearization ordering and enforce it
 - git-legacy-rebase: simplify unnecessary triply-nested if
 - git-rebase, sequencer: extend --quiet option for the interactive machinery
 - am, rebase--merge: do not overlook --skip'ed commits with post-rewrite
 - t5407: add a test demonstrating how interactive handles --skip differently
 - rebase: fix incompatible options error message
 - rebase: make builtin and legacy script error messages the same

 "git rebase --merge" as been reimplemented by reusing the internal
 machinery used for "git rebase -i".


* dl/remote-save-to-push (2018-12-11) 1 commit
 - remote: add --save-to-push option to git remote set-url

 "git remote set-url" learned a new option that moves existing value
 of the URL field to pushURL field of the remote before replacing
 the URL field with a new value.

 I am personally not yet quite convinced if this is worth pursuing.


* js/protocol-advertise-multi (2018-12-28) 1 commit
 - protocol: advertise multiple supported versions

 The transport layer has been updated so that the protocol version
 used can be negotiated between the parties, by the initiator
 listing the protocol versions it is willing to talk, and the other
 side choosing from one of them.


* js/smart-http-detect-remote-error (2019-01-07) 3 commits
 - remote-curl: die on server-side errors
 - remote-curl: tighten "version 2" check for smart-http
 - remote-curl: refactor smart-http discovery

 Some errors from the other side coming over smart HTTP transport
 were not noticed, which has been corrected.


* nb/branch-show-other-worktrees-head (2019-01-07) 4 commits
 - branch: add an extra verbose output displaying worktree path for checked out branch
 - branch: mark and color a branch that is checked out in a linked worktree differently
 - SQUASH???
 - ref-filter: add worktreepath atom

 "git branch --list" learned to show branches that are checked out
 in other worktrees connected to the same repository prefixed with
 '+', similar to the way the currently checked out branch is shown
 with '*' in front.
 

* ot/ref-filter-object-info (2018-12-28) 6 commits
 - ref-filter: add docs for new options
 - ref-filter: add tests for deltabase
 - ref-filter: add deltabase option
 - ref-filter: add tests for objectsize:disk
 - ref-filter: add check for negative file size
 - ref-filter: add objectsize:disk option

 The "--format=<placeholder>" option of for-each-ref, branch and tag
 learned to show a few more traits of objects that can be learned by
 the object_info API.

 Will merge to 'next'.


* sb/diff-color-moved-config-option-fixup (2018-11-14) 1 commit
  (merged to 'next' on 2019-01-04 at 46de5f42d1)
 + diff: align move detection error handling with other options

 Minor inconsistency fix.

 Will merge to 'master'.


* md/list-lazy-objects-fix (2018-12-06) 1 commit
  (merged to 'next' on 2019-01-04 at 93bd38fff9)
 + list-objects.c: don't segfault for missing cmdline objects

 "git rev-list --exclude-promisor-objects" had to take an object
 that does not exist locally (and is lazily available) from the
 command line without barfing, but the code dereferenced NULL.

 Will merge to 'master'.


* sb/more-repo-in-api (2018-12-28) 23 commits
 - t/helper/test-repository: celebrate independence from the_repository
 - path.h: make REPO_GIT_PATH_FUNC repository agnostic
 - commit: prepare free_commit_buffer and release_commit_memory for any repo
 - commit-graph: convert remaining functions to handle any repo
 - submodule: don't add submodule as odb for push
 - submodule: use submodule repos for object lookup
 - pretty: prepare format_commit_message to handle arbitrary repositories
 - commit: prepare logmsg_reencode to handle arbitrary repositories
 - commit: prepare repo_unuse_commit_buffer to handle any repo
 - commit: prepare get_commit_buffer to handle any repo
 - commit-reach: prepare in_merge_bases[_many] to handle any repo
 - commit-reach: prepare get_merge_bases to handle any repo
 - commit-reach.c: allow get_merge_bases_many_0 to handle any repo
 - commit-reach.c: allow remove_redundant to handle any repo
 - commit-reach.c: allow merge_bases_many to handle any repo
 - commit-reach.c: allow paint_down_to_common to handle any repo
 - commit: allow parse_commit* to handle any repo
 - object: parse_object to honor its repository argument
 - object-store: prepare has_{sha1, object}_file to handle any repo
 - object-store: prepare read_object_file to deal with any repo
 - object-store: allow read_object_file_extended to read from any repo
 - packfile: allow has_packed_and_bad to handle arbitrary repositories
 - sha1_file: allow read_object to read objects in arbitrary repositories
 (this branch is used by jt/get-reference-with-commit-graph and md/list-objects-filter-by-depth.)

 The in-core repository instances are passed through more codepaths.


* bc/sha-256 (2018-11-14) 12 commits
 - hash: add an SHA-256 implementation using OpenSSL
 - sha256: add an SHA-256 implementation using libgcrypt
 - Add a base implementation of SHA-256 support
 - commit-graph: convert to using the_hash_algo
 - t/helper: add a test helper to compute hash speed
 - sha1-file: add a constant for hash block size
 - t: make the sha1 test-tool helper generic
 - t: add basic tests for our SHA-1 implementation
 - cache: make hashcmp and hasheq work with larger hashes
 - hex: introduce functions to print arbitrary hashes
 - sha1-file: provide functions to look up hash algorithms
 - sha1-file: rename algorithm to "sha1"

 Add sha-256 hash and plug it through the code to allow building Git
 with the "NewHash".


* js/vsts-ci (2018-10-16) 13 commits
 . travis: fix skipping tagged releases
 . README: add a build badge (status of the Azure Pipelines build)
 . tests: record more stderr with --write-junit-xml in case of failure
 . tests: include detailed trace logs with --write-junit-xml upon failure
 . git-p4: use `test_atexit` to kill the daemon
 . git-daemon: use `test_atexit` in the tests
 . tests: introduce `test_atexit`
 . ci: add a build definition for Azure DevOps
 . ci/lib.sh: add support for Azure Pipelines
 . tests: optionally write results as JUnit-style .xml
 . test-date: add a subcommand to measure times in shell scripts
 . ci/lib.sh: encapsulate Travis-specific things
 . ci: rename the library of common functions

 Prepare to run test suite on Azure DevOps.

 Ejected out of 'pu', as doing so seems to help other topics get
 tested at TravisCI.

 https://travis-ci.org/git/git/builds/452713184 is a sample of a
 build whose tests on 4 hang (with this series in).  Ejecting it
 gave us https://travis-ci.org/git/git/builds/452778963 which still
 shows breakages from other topics not yet in 'next', but at least
 the tests do not stall.


* du/branch-show-current (2018-10-26) 1 commit
 - branch: introduce --show-current display option

 "git branch" learned a new subcommand "--show-current".

 I am personally not yet quite convinced if this is worth pursuing.


* mk/use-size-t-in-zlib (2018-10-15) 1 commit
 - zlib.c: use size_t for size

 The wrapper to call into zlib followed our long tradition to use
 "unsigned long" for sizes of regions in memory, which have been
 updated to use "size_t".


* ag/sequencer-reduce-rewriting-todo (2018-11-12) 16 commits
 . rebase--interactive: move transform_todo_file() to rebase--interactive.c
 . sequencer: fix a call to error() in transform_todo_file()
 . sequencer: use edit_todo_list() in complete_action()
 . rebase-interactive: rewrite edit_todo_list() to handle the initial edit
 . rebase-interactive: append_todo_help() changes
 . rebase-interactive: use todo_list_write_to_file() in edit_todo_list()
 . sequencer: refactor skip_unnecessary_picks() to work on a todo_list
 . sequencer: change complete_action() to use the refactored functions
 . sequencer: make sequencer_make_script() write its script to a strbuf
 . sequencer: refactor rearrange_squash() to work on a todo_list
 . sequencer: refactor sequencer_add_exec_commands() to work on a todo_list
 . sequencer: refactor check_todo_list() to work on a todo_list
 . sequencer: introduce todo_list_write_to_file()
 . sequencer: refactor transform_todos() to work on a todo_list
 . sequencer: make the todo_list structure public
 . sequencer: changes in parse_insn_buffer()

 The scripted version of "git rebase -i" wrote and rewrote the todo
 list many times during a single step of its operation, and the
 recent C-rewrite made a faithful conversion of the logic to C.  The
 implementation has been updated to carry necessary information
 around in-core to avoid rewriting the same file over and over
 unnecessarily.

 With too many topics in-flight that touch sequencer and rebaser,
 this need to wait giving precedence to other topics that fix bugs.


* sb/submodule-recursive-fetch-gets-the-tip (2018-12-09) 9 commits
 - fetch: ensure submodule objects fetched
 - submodule.c: fetch in submodules git directory instead of in worktree
 - submodule: migrate get_next_submodule to use repository structs
 - repository: repo_submodule_init to take a submodule struct
 - submodule: store OIDs in changed_submodule_names
 - submodule.c: tighten scope of changed_submodule_names struct
 - submodule.c: sort changed_submodule_names before searching it
 - submodule.c: fix indentation
 - sha1-array: provide oid_array_filter

 "git fetch --recurse-submodules" may not fetch the necessary commit
 that is bound to the superproject, which is getting corrected.

 Ready?


* js/add-i-coalesce-after-editing-hunk (2018-08-28) 1 commit
 - add -p: coalesce hunks before testing applicability

 Applicability check after a patch is edited in a "git add -i/p"
 session has been improved.

 Will hold.
 cf. <e5b2900a-0558-d3bf-8ea1-d526b078bbc2@talktalk.net>


* ps/stash-in-c (2019-01-04) 27 commits
 - tests: add a special setup where stash.useBuiltin is off
 - stash: optionally use the scripted version again
 - stash: add back the original, scripted `git stash`
 - stash: convert `stash--helper.c` into `stash.c`
 - stash: replace all `write-tree` child processes with API calls
 - stash: optimize `get_untracked_files()` and `check_changes()`
 - stash: convert save to builtin
 - stash: make push -q quiet
 - stash: convert push to builtin
 - stash: convert create to builtin
 - stash: convert store to builtin
 - stash: convert show to builtin
 - stash: convert list to builtin
 - stash: convert pop to builtin
 - stash: convert branch to builtin
 - stash: convert drop and clear to builtin
 - stash: convert apply to builtin
 - stash: mention options in `show` synopsis
 - stash: add tests for `git stash show` config
 - stash: rename test cases to be more descriptive
 - t3903: modernize style
 - stash: improve option parsing test coverage
 - ident: add the ability to provide a "fallback identity"
 - strbuf.c: add `strbuf_insertf()` and `strbuf_vinsertf()`
 - strbuf.c: add `strbuf_join_argv()`
 - sha1-name.c: add `get_oidf()` which acts like `get_oid()`
 - Merge branch 'sd/stash-wo-user-name'

 "git stash" rewritten in C.


* pw/add-p-select (2018-07-26) 4 commits
 - add -p: optimize line selection for short hunks
 - add -p: allow line selection to be inverted
 - add -p: select modified lines correctly
 - add -p: select individual hunk lines

 "git add -p" interactive interface learned to let users choose
 individual added/removed lines to be used in the operation, instead
 of accepting or rejecting a whole hunk.

 Will discard.
 No further feedbacks on the topic for quite some time.

 cf. <d622a95b-7302-43d4-4ec9-b2cf3388c653@talktalk.net>
 I found the feature to be hard to explain, and may result in more
 end-user complaints, but let's see.

--------------------------------------------------
[Discarded]

* ab/reject-alias-loop (2018-10-19) 1 commit
 . alias: detect loops in mixed execution mode

 Two (or more) aliases that mutually refer to each other can form an
 infinite loop; we now attempt to notice and stop.

 Discarded.
 Reverted out of 'next'.
 cf. <87sh0slvxm.fsf@evledraar.gmail.com>


* gl/bundle-unlock-before-aborting (2018-11-14) 1 commit
 . bundle: rollback lock file while refusing to create an empty bundle

 Superseded by jk/close-duped-fd-before-unlock-for-bundle


* js/remote-archive-v2 (2018-09-28) 4 commits
 . archive: allow archive over HTTP(S) with proto v2
 . archive: implement protocol v2 archive command
 . archive: use packet_reader for communications
 . archive: follow test standards around assertions

 The original implementation of "git archive --remote" more or less
 bypassed the transport layer and did not work over http(s).  The
 version 2 of the protocol is defined to allow going over http(s) as
 well as Git native transport.

 Retracted; reverted out of next.
 cf. <20181114195142.GI126896@google.com>


* ab/format-patch-rangediff-not-stat (2018-11-24) 1 commit
 . format-patch: don't include --stat with --range-diff output

 The "--rangediff" option recently added to "format-patch"
 interspersed a bogus and useless "--stat" information by mistake,
 which is being corrected.

 Reverted out of 'next'.


* jc/postpone-rebase-in-c (2018-11-26) 1 commit
 . rebase: mark the C reimplementation as an experimental opt-in feature

 People seem to be still finding latent bugs in the "rebase in C"
 reimplementation.  For the upcoming release, use the scripted
 version by default and adjust the documentation accordingly.

 Reverted out of 'next'.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* tg/checkout-no-overlay, was Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano
@ 2019-01-08  9:50 ` " Thomas Gummerer
  2019-01-08 17:51   ` Junio C Hamano
  2019-01-08 17:30 ` ag/sequencer-reduce-rewriting-todo " Alban Gruin
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 41+ messages in thread
From: Thomas Gummerer @ 2019-01-08  9:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On Mon, Jan 7, 2019 at 11:34 PM Junio C Hamano <gitster@pobox.com> wrote:
> * tg/checkout-no-overlay (2019-01-02) 8 commits
>  - checkout: introduce checkout.overlayMode config
>  - checkout: introduce --{,no-}overlay option
>  - checkout: factor out mark_cache_entry_for_checkout function
>  - checkout: clarify comment
>  - read-cache: add invalidate parameter to remove_marked_cache_entries
>  - entry: support CE_WT_REMOVE flag in checkout_entry
>  - entry: factor out unlink_entry function
>  - move worktree tests to t24*
>
>  "git checkout --no-overlay" can be used to trigger a new mode of
>  checking out paths out of the tree-ish, that allows paths that
>  match the pathspec that are in the current index and working tree
>  and are not in the tree-ish.
>
>  Will merge to 'next'.

Please hold off on merging this to 'next'. There's still an
outstanding comment from Duy and Eric [*1*], that should be
addressed before this goes to next.  (Their comments also apply
to the documentation in 8/8).  I'll send v3 of the series with
this fixed today.

*1*: <CAPig+cSOyCQZXiG7sJWb12WzzujM-nsqqpt+cFZTFvXB1+-SVQ@mail.gmail.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* ag/sequencer-reduce-rewriting-todo Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano
  2019-01-08  9:50 ` tg/checkout-no-overlay, was " Thomas Gummerer
@ 2019-01-08 17:30 ` " Alban Gruin
  2019-01-08 21:20 ` sb/more-repo-in-api, was " Jonathan Tan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: Alban Gruin @ 2019-01-08 17:30 UTC (permalink / raw)
  To: Junio C Hamano, git

Hi Junio,

Le 08/01/2019 à 00:34, Junio C Hamano a écrit :
> * ag/sequencer-reduce-rewriting-todo (2018-11-12) 16 commits
>  . rebase--interactive: move transform_todo_file() to rebase--interactive.c
>  . sequencer: fix a call to error() in transform_todo_file()
>  . sequencer: use edit_todo_list() in complete_action()
>  . rebase-interactive: rewrite edit_todo_list() to handle the initial edit
>  . rebase-interactive: append_todo_help() changes
>  . rebase-interactive: use todo_list_write_to_file() in edit_todo_list()
>  . sequencer: refactor skip_unnecessary_picks() to work on a todo_list
>  . sequencer: change complete_action() to use the refactored functions
>  . sequencer: make sequencer_make_script() write its script to a strbuf
>  . sequencer: refactor rearrange_squash() to work on a todo_list
>  . sequencer: refactor sequencer_add_exec_commands() to work on a todo_list
>  . sequencer: refactor check_todo_list() to work on a todo_list
>  . sequencer: introduce todo_list_write_to_file()
>  . sequencer: refactor transform_todos() to work on a todo_list
>  . sequencer: make the todo_list structure public
>  . sequencer: changes in parse_insn_buffer()
> 
>  The scripted version of "git rebase -i" wrote and rewrote the todo
>  list many times during a single step of its operation, and the
>  recent C-rewrite made a faithful conversion of the logic to C.  The
>  implementation has been updated to carry necessary information
>  around in-core to avoid rewriting the same file over and over
>  unnecessarily.
> 
>  With too many topics in-flight that touch sequencer and rebaser,
>  this need to wait giving precedence to other topics that fix bugs.
> 
> 

I submitted a new version of this topic a week ago, you may have missed
it: <20181229160413.19333-1-alban.gruin@gmail.com>.

Cheers,
Alban


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: tg/checkout-no-overlay, was Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-08  9:50 ` tg/checkout-no-overlay, was " Thomas Gummerer
@ 2019-01-08 17:51   ` Junio C Hamano
  0 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-01-08 17:51 UTC (permalink / raw)
  To: Thomas Gummerer; +Cc: Git Mailing List

Thomas Gummerer <t.gummerer@gmail.com> writes:

> On Mon, Jan 7, 2019 at 11:34 PM Junio C Hamano <gitster@pobox.com> wrote:
>> * tg/checkout-no-overlay (2019-01-02) 8 commits
>>  - checkout: introduce checkout.overlayMode config
>>  - checkout: introduce --{,no-}overlay option
>>  - checkout: factor out mark_cache_entry_for_checkout function
>>  - checkout: clarify comment
>>  - read-cache: add invalidate parameter to remove_marked_cache_entries
>>  - entry: support CE_WT_REMOVE flag in checkout_entry
>>  - entry: factor out unlink_entry function
>>  - move worktree tests to t24*
>>
>>  "git checkout --no-overlay" can be used to trigger a new mode of
>>  checking out paths out of the tree-ish, that allows paths that
>>  match the pathspec that are in the current index and working tree
>>  and are not in the tree-ish.
>>
>>  Will merge to 'next'.
>
> Please hold off on merging this to 'next'.

Thanks, will do so.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* sb/more-repo-in-api, was Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano
  2019-01-08  9:50 ` tg/checkout-no-overlay, was " Thomas Gummerer
  2019-01-08 17:30 ` ag/sequencer-reduce-rewriting-todo " Alban Gruin
@ 2019-01-08 21:20 ` " Jonathan Tan
  2019-01-08 21:35   ` Junio C Hamano
  2019-01-09  7:37 ` Martin Ågren
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 41+ messages in thread
From: Jonathan Tan @ 2019-01-08 21:20 UTC (permalink / raw)
  To: gitster; +Cc: git, stolee, Jonathan Tan

> * sb/more-repo-in-api (2018-12-28) 23 commits
>  - t/helper/test-repository: celebrate independence from the_repository
>  - path.h: make REPO_GIT_PATH_FUNC repository agnostic
>  - commit: prepare free_commit_buffer and release_commit_memory for any repo
>  - commit-graph: convert remaining functions to handle any repo
>  - submodule: don't add submodule as odb for push
>  - submodule: use submodule repos for object lookup
>  - pretty: prepare format_commit_message to handle arbitrary repositories
>  - commit: prepare logmsg_reencode to handle arbitrary repositories
>  - commit: prepare repo_unuse_commit_buffer to handle any repo
>  - commit: prepare get_commit_buffer to handle any repo
>  - commit-reach: prepare in_merge_bases[_many] to handle any repo
>  - commit-reach: prepare get_merge_bases to handle any repo
>  - commit-reach.c: allow get_merge_bases_many_0 to handle any repo
>  - commit-reach.c: allow remove_redundant to handle any repo
>  - commit-reach.c: allow merge_bases_many to handle any repo
>  - commit-reach.c: allow paint_down_to_common to handle any repo
>  - commit: allow parse_commit* to handle any repo
>  - object: parse_object to honor its repository argument
>  - object-store: prepare has_{sha1, object}_file to handle any repo
>  - object-store: prepare read_object_file to deal with any repo
>  - object-store: allow read_object_file_extended to read from any repo
>  - packfile: allow has_packed_and_bad to handle arbitrary repositories
>  - sha1_file: allow read_object to read objects in arbitrary repositories
>  (this branch is used by jt/get-reference-with-commit-graph and md/list-objects-filter-by-depth.)
> 
>  The in-core repository instances are passed through more codepaths.

I think this is ready to be considered for merging to next. This series looks
good both to Stolee [1] and to me (I replied to a previous version with
comments on patch 18 [2] which Stefan has addressed, as can be seen in the
inter-diff provided by Junio [3] - I probably should have replied to the latest
version stating this too).

[1] https://public-inbox.org/git/b2ff842d-4d60-0db7-c11d-dcc006dade18@gmail.com/
[2] https://public-inbox.org/git/20181115221254.45373-1-jonathantanmy@google.com/
[3] https://public-inbox.org/git/xmqq36qho1gf.fsf@gitster-ct.c.googlers.com/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: sb/more-repo-in-api, was Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-08 21:20 ` sb/more-repo-in-api, was " Jonathan Tan
@ 2019-01-08 21:35   ` Junio C Hamano
  2019-01-09 21:28     ` Stefan Beller
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-01-08 21:35 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, stolee

Jonathan Tan <jonathantanmy@google.com> writes:

>>  The in-core repository instances are passed through more codepaths.
>
> I think this is ready to be considered for merging to next. This series looks
> good both to Stolee [1] and to me (I replied to a previous version with
> comments on patch 18 [2] which Stefan has addressed, as can be seen in the
> inter-diff provided by Junio [3] - I probably should have replied to the latest
> version stating this too).

Alright, thanks.

While attempting to resolve conflicts with this and Peff's "get rid
of has_sha1_file()" topic, I found it somewhat disturbing that
sha1-file.c still was a mixture of functions that do and do not take
the repository pointer, but perhaps it's something we need to polish
over time on top.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano
                   ` (2 preceding siblings ...)
  2019-01-08 21:20 ` sb/more-repo-in-api, was " Jonathan Tan
@ 2019-01-09  7:37 ` Martin Ågren
  2019-01-09 21:06   ` Martin Ågren
  2019-01-09 10:28 ` What's cooking in git.git (Jan 2019, #01; Mon, 7) Jeff King
  2019-01-10 18:02 ` Stefan Beller
  5 siblings, 1 reply; 41+ messages in thread
From: Martin Ågren @ 2019-01-09  7:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, brian m. carlson

On Tue, 8 Jan 2019 at 00:34, Junio C Hamano <gitster@pobox.com> wrote:
> * bc/sha-256 (2018-11-14) 12 commits
>  - hash: add an SHA-256 implementation using OpenSSL
>  - sha256: add an SHA-256 implementation using libgcrypt
>  - Add a base implementation of SHA-256 support
>  - commit-graph: convert to using the_hash_algo
>  - t/helper: add a test helper to compute hash speed
>  - sha1-file: add a constant for hash block size
>  - t: make the sha1 test-tool helper generic
>  - t: add basic tests for our SHA-1 implementation
>  - cache: make hashcmp and hasheq work with larger hashes
>  - hex: introduce functions to print arbitrary hashes
>  - sha1-file: provide functions to look up hash algorithms
>  - sha1-file: rename algorithm to "sha1"
>
>  Add sha-256 hash and plug it through the code to allow building Git
>  with the "NewHash".

AddressSanitizer barks at current pu (855f98be272f19d16564e) for a
handful of tests.

One example is t5702-protocol-v2.sh. I've tracked this one down to
ce1a82c251 ("Merge branch 'bc/sha-256' into jch", 2019-01-08).
Reverting that merge makes the problem go away, at least in t5702.
Notably bc/sha-256 on its own has no problems with t5702, possibly
because there has been quite some work done on the test script itself
in bc/sha-256..pu.

There are a few failing tests with AddressSanitizer on pu and they might
be caused by different topics. I've got to run, but thought I'd just
mention this one for now. Here's AddressSanitizer's first complaint for
t5702:

==1691823==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x6040000004f2 at pc 0x0000004ea0fd bp 0x7ffc53082590 sp
0x7ffc53081d40
READ of size 32 at 0x6040000004f2 thread T0
    #0 0x4ea0fc in __asan_memcpy
llvm/projects/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cc:23
    #1 0x8603ec in oidset_insert oidset.c
    #2 0x86c977 in add_promisor_object packfile.c:2129:4
    #3 0x86c07a in for_each_object_in_pack packfile.c:2070:7
    #4 0x86c535 in for_each_packed_object packfile.c:2095:7
    #5 0x86c651 in is_promisor_object packfile.c:2151:4
    #6 0x5ae77d in mark_object builtin/fsck.c:173:6
    #7 0x5b0824 in mark_object_reachable builtin/fsck.c:200:2
    #8 0x5b0824 in fsck_handle_ref builtin/fsck.c:509
    #9 0x8e6987 in do_for_each_repo_ref_iterator refs/iterator.c:418:12
    #10 0x8cb84f in do_for_each_ref refs.c:1466:9
    #11 0x8cb84f in refs_for_each_rawref refs.c:1532
    #12 0x8cb84f in for_each_rawref refs.c:1538
    #13 0x5acd1f in get_default_heads builtin/fsck.c:524:2
    #14 0x5acd1f in cmd_fsck builtin/fsck.c:846
    #15 0x52f022 in run_builtin git.c:422:11
    #16 0x52d583 in handle_builtin git.c:655:8
    #17 0x52c00b in run_argv git.c:709:4
    #18 0x52c00b in cmd_main git.c:806
    #19 0x6c4569 in main common-main.c:45:9
    #20 0x7f5fd6c23b96 in __libc_start_main
/build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
    #21 0x41c799 in _start (git+0x41c799)

0x6040000004f2 is located 0 bytes to the right of 34-byte region
[0x6040000004d0,0x6040000004f2)
allocated by thread T0 here:
    #0 0x4eb4cf in malloc
llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:146
    #1 0x9fa1db in do_xmalloc wrapper.c:60:8
    #2 0x9fa2fd in do_xmallocz wrapper.c:100:8
    #3 0x9fa2fd in xmallocz_gently wrapper.c:113
    #4 0x86a877 in unpack_compressed_entry packfile.c:1588:11
    #5 0x86a02e in unpack_entry packfile.c:1737:11
    #6 0x867431 in cache_or_unpack_entry packfile.c:1439:10
    #7 0x867431 in packed_object_info packfile.c:1506
    #8 0x96b7be in oid_object_info_extended sha1-file.c:1394:10
    #9 0x96d7d0 in read_object sha1-file.c:1434:6
    #10 0x96d7d0 in read_object_file_extended sha1-file.c:1476
    #11 0x85cf40 in repo_read_object_file ./object-store.h:174:9
    #12 0x85cf40 in parse_object object.c:273
    #13 0x86c752 in add_promisor_object packfile.c:2108:23
    #14 0x86c07a in for_each_object_in_pack packfile.c:2070:7
    #15 0x86c535 in for_each_packed_object packfile.c:2095:7
    #16 0x86c651 in is_promisor_object packfile.c:2151:4
    #17 0x5ae77d in mark_object builtin/fsck.c:173:6
    #18 0x5b0824 in mark_object_reachable builtin/fsck.c:200:2
    #19 0x5b0824 in fsck_handle_ref builtin/fsck.c:509
    #20 0x8e6987 in do_for_each_repo_ref_iterator refs/iterator.c:418:12
    #21 0x8cb84f in do_for_each_ref refs.c:1466:9
    #22 0x8cb84f in refs_for_each_rawref refs.c:1532
    #23 0x8cb84f in for_each_rawref refs.c:1538
    #24 0x5acd1f in get_default_heads builtin/fsck.c:524:2
    #25 0x5acd1f in cmd_fsck builtin/fsck.c:846
    #26 0x52f022 in run_builtin git.c:422:11
    #27 0x52d583 in handle_builtin git.c:655:8
    #28 0x52c00b in run_argv git.c:709:4
    #29 0x52c00b in cmd_main git.c:806
    #30 0x6c4569 in main common-main.c:45:9
    #31 0x7f5fd6c23b96 in __libc_start_main
/build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano
                   ` (3 preceding siblings ...)
  2019-01-09  7:37 ` Martin Ågren
@ 2019-01-09 10:28 ` Jeff King
  2019-01-10 19:05   ` Junio C Hamano
  2019-01-10 19:46   ` Junio C Hamano
  2019-01-10 18:02 ` Stefan Beller
  5 siblings, 2 replies; 41+ messages in thread
From: Jeff King @ 2019-01-09 10:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder, git

On Mon, Jan 07, 2019 at 03:34:10PM -0800, Junio C Hamano wrote:

> * jk/proto-v2-hidden-refs-fix (2018-12-14) 3 commits
>  - upload-pack: support hidden refs with protocol v2
>  - parse_hide_refs_config: handle NULL section
>  - serve: pass "config context" through to individual commands
> 
>  The v2 upload-pack protocol implementation failed to honor
>  hidden-ref configuration, which has been corrected.
> 
>  Will merge to 'next'.

Sorry I didn't catch this before it hit 'next', but I think we were
planning to scrap this in favor of:

  https://public-inbox.org/git/20181218124750.GC30471@sigill.intra.peff.net/

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-09  7:37 ` Martin Ågren
@ 2019-01-09 21:06   ` Martin Ågren
  2019-01-10  1:02     ` brian m. carlson
                       ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Martin Ågren @ 2019-01-09 21:06 UTC (permalink / raw)
  To: Git Mailing List; +Cc: brian m. carlson, Junio C Hamano

On Wed, 9 Jan 2019 at 08:37, Martin Ågren <martin.agren@gmail.com> wrote:
>
> On Tue, 8 Jan 2019 at 00:34, Junio C Hamano <gitster@pobox.com> wrote:
> > * bc/sha-256 (2018-11-14) 12 commits
> >  - hash: add an SHA-256 implementation using OpenSSL
> >  - sha256: add an SHA-256 implementation using libgcrypt
> >  - Add a base implementation of SHA-256 support
> >  - commit-graph: convert to using the_hash_algo
> >  - t/helper: add a test helper to compute hash speed
> >  - sha1-file: add a constant for hash block size
> >  - t: make the sha1 test-tool helper generic
> >  - t: add basic tests for our SHA-1 implementation
> >  - cache: make hashcmp and hasheq work with larger hashes
> >  - hex: introduce functions to print arbitrary hashes
> >  - sha1-file: provide functions to look up hash algorithms
> >  - sha1-file: rename algorithm to "sha1"
> >
> >  Add sha-256 hash and plug it through the code to allow building Git
> >  with the "NewHash".
>
> AddressSanitizer barks at current pu (855f98be272f19d16564e) for a
> handful of tests.
>
> One example is t5702-protocol-v2.sh. [...]
>
> ==1691823==ERROR: AddressSanitizer: heap-buffer-overflow on address
> 0x6040000004f2 at pc 0x0000004ea0fd bp 0x7ffc53082590 sp
> 0x7ffc53081d40
> READ of size 32 at 0x6040000004f2 thread T0
>     #0 0x4ea0fc in __asan_memcpy
> llvm/projects/compiler-rt/lib/asan/asan_interceptors_memintrinsics.cc:23
>     #1 0x8603ec in oidset_insert oidset.c
>     #2 0x86c977 in add_promisor_object packfile.c:2129:4
>     #3 0x86c07a in for_each_object_in_pack packfile.c:2070:7
>     #4 0x86c535 in for_each_packed_object packfile.c:2095:7
>     #5 0x86c651 in is_promisor_object packfile.c:2151:4

> 0x6040000004f2 is located 0 bytes to the right of 34-byte region
> [0x6040000004d0,0x6040000004f2)
> allocated by thread T0 here:
>     #0 0x4eb4cf in malloc
> llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:146
>     #1 0x9fa1db in do_xmalloc wrapper.c:60:8
>     #2 0x9fa2fd in do_xmallocz wrapper.c:100:8
>     #3 0x9fa2fd in xmallocz_gently wrapper.c:113
>     #4 0x86a877 in unpack_compressed_entry packfile.c:1588:11
>     #5 0x86a02e in unpack_entry packfile.c:1737:11
>     #6 0x867431 in cache_or_unpack_entry packfile.c:1439:10
>     #7 0x867431 in packed_object_info packfile.c:1506
>     #8 0x96b7be in oid_object_info_extended sha1-file.c:1394:10
>     #9 0x96d7d0 in read_object sha1-file.c:1434:6
>     #10 0x96d7d0 in read_object_file_extended sha1-file.c:1476
>     #11 0x85cf40 in repo_read_object_file ./object-store.h:174:9
>     #12 0x85cf40 in parse_object object.c:273
>     #13 0x86c752 in add_promisor_object packfile.c:2108:23
>     #14 0x86c07a in for_each_object_in_pack packfile.c:2070:7
>     #15 0x86c535 in for_each_packed_object packfile.c:2095:7
>     #16 0x86c651 in is_promisor_object packfile.c:2151:4

I found some more time to look into this.

It seems we have a buffer with raw data and we set up a `struct
object_id *` pointing into it, at a (supposed) OID value. Then
`update_tree_entry_internal()` verifies that the buffer contains
sufficiently many bytes, i.e., at least `the_hash_algo->rawsz` (=20).
We immediately call `oidset_insert()` which copies an entire struct,
i.e., we copy sizeof(struct object_id) (=32) bytes. Which is 12 more
than what is known to be safe. For this particular input data, we read
outside allocated memory.

I can think of three possible approaches:

* Allocate with a margin (GIT_MAX_RAWSZ - the_hash_algo->rawsz) where
  "necessary" (TM). Maybe not so maintainable.

* Teach `oidset_insert()` (i.e., khash) to only copy
  `the_hash_algo->rawsz` bytes. Maybe not so good for performance.

* Ignore.

I wonder which of these is the least awful, or if there are other ideas.

Martin

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: sb/more-repo-in-api, was Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-08 21:35   ` Junio C Hamano
@ 2019-01-09 21:28     ` Stefan Beller
  0 siblings, 0 replies; 41+ messages in thread
From: Stefan Beller @ 2019-01-09 21:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Tan, git, Derrick Stolee

On Tue, Jan 8, 2019 at 1:36 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jonathan Tan <jonathantanmy@google.com> writes:
>
> >>  The in-core repository instances are passed through more codepaths.
> >
> > I think this is ready to be considered for merging to next. This series looks
> > good both to Stolee [1] and to me (I replied to a previous version with
> > comments on patch 18 [2] which Stefan has addressed, as can be seen in the
> > inter-diff provided by Junio [3] - I probably should have replied to the latest
> > version stating this too).
>
> Alright, thanks.
>
> While attempting to resolve conflicts with this and Peff's "get rid
> of has_sha1_file()" topic, I found it somewhat disturbing that
> sha1-file.c still was a mixture of functions that do and do not take
> the repository pointer,

I agree that the current state is less than optimal, but that comes
down to the workflow? Most of the later series regarding the repo
refactoring are crafting some minimal set of changes that make
sense to reach one specific goal. It would have been easier if we
could have refactored the whole code base at once.

> but perhaps it's something we need to polish
> over time on top.

Yeah, that is my current take. And best we'd have
more patches as "t/helper/test-repository: celebrate
independence from the_repository" to make sure
we don't have bugs creeping into the code base.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-09 21:06   ` Martin Ågren
@ 2019-01-10  1:02     ` brian m. carlson
  2019-01-10 18:55       ` Junio C Hamano
  2019-01-10 19:03       ` Martin Ågren
  2019-01-10  4:25     ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson
  2019-01-15  0:39     ` [PATCH v2 " brian m. carlson
  2 siblings, 2 replies; 41+ messages in thread
From: brian m. carlson @ 2019-01-10  1:02 UTC (permalink / raw)
  To: Martin Ågren; +Cc: Git Mailing List, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1964 bytes --]

On Wed, Jan 09, 2019 at 10:06:08PM +0100, Martin Ågren wrote:
> I found some more time to look into this.
> 
> It seems we have a buffer with raw data and we set up a `struct
> object_id *` pointing into it, at a (supposed) OID value. Then
> `update_tree_entry_internal()` verifies that the buffer contains
> sufficiently many bytes, i.e., at least `the_hash_algo->rawsz` (=20).
> We immediately call `oidset_insert()` which copies an entire struct,
> i.e., we copy sizeof(struct object_id) (=32) bytes. Which is 12 more
> than what is known to be safe. For this particular input data, we read
> outside allocated memory.

Anything pointing to a struct object_id has to support at least
GIT_MAX_RAWSZ bytes, and that code doesn't, because it's a tree buffer.

I ran into this later on in my SHA-256 work and have a series that fixes
the tree-walk code, but it's a bit involved and requires copying the
struct object_id out of the buffer.

I thought we were going to be triggering this case only with some new
code I was introducing, but apparently somebody else got there first.

> I can think of three possible approaches:
> 
> * Allocate with a margin (GIT_MAX_RAWSZ - the_hash_algo->rawsz) where
>   "necessary" (TM). Maybe not so maintainable.

I think there are actually several places where we allocate for these
buffers, so this is not likely to be a great solution. Even worse, in
some cases, we intentionally use a too-short buffer knowing that we'll
never dereference the data.

> * Teach `oidset_insert()` (i.e., khash) to only copy
>   `the_hash_algo->rawsz` bytes. Maybe not so good for performance.

This is probably the best fix for the moment if you want an immediate
fix.

As for my series, I'll need to run the testsuite on it, but I'll try to
get it out tonight or at the latest tomorrow if people want to use that
instead.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 0/5] tree-walk object_id refactor
  2019-01-09 21:06   ` Martin Ågren
  2019-01-10  1:02     ` brian m. carlson
@ 2019-01-10  4:25     ` brian m. carlson
  2019-01-10  4:25       ` [PATCH 1/5] tree-walk: copy object ID before use brian m. carlson
                         ` (5 more replies)
  2019-01-15  0:39     ` [PATCH v2 " brian m. carlson
  2 siblings, 6 replies; 41+ messages in thread
From: brian m. carlson @ 2019-01-10  4:25 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Junio C Hamano

There are a small number of places in our codebase where we cast a
buffer of unsigned char to a struct object_id pointer. When we have
GIT_MAX_RAWSZ set to 32 (because we have SHA-256), one of these places
(the buffer for tree objects) can lead to us copying too much data when
using SHA-1 as the hash, since there are only 20 bytes to read.

This was not expected to be a problem before future code was introduced,
but due to a combination of series the issue became noticeable.

This series introduces a refactor to avoid referencing the struct
object_id directly from a buffer and instead storing an additional
struct object_id (and an int) in struct name_entry and referring to
that.

This series, while based on master, addresses the interactions seen on
pu between the SHA-256 series and the oidset series. There are a small
number of conflicts, both textual and logical, when merging this series
and pu, but they should be easily resolved.

This series contains a final patch which will become necessary at some
point for hygienic code, but which could be deferred until later if
desired.

The testsuite passes with AddressSanitizer at each stage and when merged
into pu.

brian m. carlson (5):
  tree-walk: copy object ID before use
  match-trees: compute buffer offset correctly when splicing
  match-trees: use hashcpy to splice trees
  tree-walk: store object_id in a separate member
  cache: make oidcpy always copy GIT_MAX_RAWSZ bytes

 builtin/grep.c                     |  8 ++++----
 builtin/merge-tree.c               | 20 ++++++++++----------
 builtin/pack-objects.c             |  4 ++--
 builtin/reflog.c                   |  4 ++--
 cache-tree.c                       |  4 ++--
 cache.h                            |  2 +-
 contrib/coccinelle/object_id.cocci | 30 ------------------------------
 delta-islands.c                    |  2 +-
 fsck.c                             |  4 ++--
 http-push.c                        |  4 ++--
 list-objects.c                     |  6 +++---
 match-trees.c                      | 11 ++++++-----
 notes.c                            |  4 ++--
 packfile.c                         |  2 +-
 revision.c                         |  4 ++--
 tree-diff.c                        |  6 +++---
 tree-walk.c                        | 21 ++++++++++++---------
 tree-walk.h                        |  9 ++++++---
 tree.c                             | 10 +++++-----
 unpack-trees.c                     |  6 +++---
 walker.c                           |  4 ++--
 21 files changed, 71 insertions(+), 94 deletions(-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/5] tree-walk: copy object ID before use
  2019-01-10  4:25     ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson
@ 2019-01-10  4:25       ` brian m. carlson
  2019-01-10  4:25       ` [PATCH 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-01-10  4:25 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Junio C Hamano

In a future commit, the pointer returned by tree_entry_extract will
point into the struct tree_desc, causing its lifetime to be bound to
that of the struct tree_desc itself.  To ensure this code path keeps
working, copy the object_id into a local variable so that it lives long
enough.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 tree-walk.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 79bafbd1a2..1e040fc20e 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -498,10 +498,10 @@ static int find_tree_entry(struct tree_desc *t, const char *name, struct object_
 	int namelen = strlen(name);
 	while (t->size) {
 		const char *entry;
-		const struct object_id *oid;
+		struct object_id oid;
 		int entrylen, cmp;
 
-		oid = tree_entry_extract(t, &entry, mode);
+		oidcpy(&oid, tree_entry_extract(t, &entry, mode));
 		entrylen = tree_entry_len(&t->entry);
 		update_tree_entry(t);
 		if (entrylen > namelen)
@@ -512,7 +512,7 @@ static int find_tree_entry(struct tree_desc *t, const char *name, struct object_
 		if (cmp < 0)
 			break;
 		if (entrylen == namelen) {
-			oidcpy(result, oid);
+			oidcpy(result, &oid);
 			return 0;
 		}
 		if (name[entrylen] != '/')
@@ -520,10 +520,10 @@ static int find_tree_entry(struct tree_desc *t, const char *name, struct object_
 		if (!S_ISDIR(*mode))
 			break;
 		if (++entrylen == namelen) {
-			oidcpy(result, oid);
+			oidcpy(result, &oid);
 			return 0;
 		}
-		return get_tree_entry(oid, name + entrylen, result, mode);
+		return get_tree_entry(&oid, name + entrylen, result, mode);
 	}
 	return -1;
 }

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/5] match-trees: compute buffer offset correctly when splicing
  2019-01-10  4:25     ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson
  2019-01-10  4:25       ` [PATCH 1/5] tree-walk: copy object ID before use brian m. carlson
@ 2019-01-10  4:25       ` brian m. carlson
  2019-01-10  4:25       ` [PATCH 3/5] match-trees: use hashcpy to splice trees brian m. carlson
                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-01-10  4:25 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Junio C Hamano

Currently, the struct object_id pointer returned from tree_entry_extract
lives directly inside the parsed tree buffer. In a future commit, this
will change so that it instead points to a dedicated struct member.
Since in this code path, we want to modify the buffer directly, compute
the buffer offset we want to modify by using the pointer to the path
instead.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 match-trees.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/match-trees.c b/match-trees.c
index 2b6d31ef9d..feca48a5fd 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -199,15 +199,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	while (desc.size) {
 		const char *name;
 		unsigned mode;
-		const struct object_id *oid;
 
-		oid = tree_entry_extract(&desc, &name, &mode);
+		tree_entry_extract(&desc, &name, &mode);
 		if (strlen(name) == toplen &&
 		    !memcmp(name, prefix, toplen)) {
 			if (!S_ISDIR(mode))
 				die("entry %s in tree %s is not a tree", name,
 				    oid_to_hex(oid1));
-			rewrite_here = (struct object_id *)oid;
+			rewrite_here = (struct object_id *)(desc.entry.path +
+							    strlen(desc.entry.path) +
+							    1);
 			break;
 		}
 		update_tree_entry(&desc);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 3/5] match-trees: use hashcpy to splice trees
  2019-01-10  4:25     ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson
  2019-01-10  4:25       ` [PATCH 1/5] tree-walk: copy object ID before use brian m. carlson
  2019-01-10  4:25       ` [PATCH 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson
@ 2019-01-10  4:25       ` brian m. carlson
  2019-01-10  6:45         ` Jeff King
  2019-01-10  4:25       ` [PATCH 4/5] tree-walk: store object_id in a separate member brian m. carlson
                         ` (2 subsequent siblings)
  5 siblings, 1 reply; 41+ messages in thread
From: brian m. carlson @ 2019-01-10  4:25 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Junio C Hamano

When we're splicing trees, we're writing directly from one location into
a buffer that is exactly the same size as a tree object. If the current
hash algorithm is SHA-1, we may not have a full 32 (GIT_MAX_RAWSZ) bytes
available to write, nor would we want to write that many bytes even if
we did. In a future commit, we'll split out hashcpy to respect
the_hash_algo while oidcpy uses GIT_MAX_RAWSZ, so convert the oidcpy to
a hashcpy so we copy the right number of bytes.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 match-trees.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/match-trees.c b/match-trees.c
index feca48a5fd..b1fbd022d1 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -224,7 +224,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	} else {
 		rewrite_with = oid2;
 	}
-	oidcpy(rewrite_here, rewrite_with);
+	hashcpy(rewrite_here->hash, rewrite_with->hash);
 	status = write_object_file(buf, sz, tree_type, result);
 	free(buf);
 	return status;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 4/5] tree-walk: store object_id in a separate member
  2019-01-10  4:25     ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson
                         ` (2 preceding siblings ...)
  2019-01-10  4:25       ` [PATCH 3/5] match-trees: use hashcpy to splice trees brian m. carlson
@ 2019-01-10  4:25       ` brian m. carlson
  2019-01-10  6:49         ` Jeff King
  2019-01-10  4:25       ` [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson
  2019-01-10  6:40       ` [PATCH 0/5] tree-walk object_id refactor Jeff King
  5 siblings, 1 reply; 41+ messages in thread
From: brian m. carlson @ 2019-01-10  4:25 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Junio C Hamano

When parsing a tree, we read the object ID directly out of the tree
buffer. This is normally fine, but such an object ID cannot be used with
oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
there may not be that many bytes to copy.

Instead, store the object ID in a separate struct member. Since we can
no longer efficiently compute the path length, store that information as
well in struct name_entry. Ensure we only copy the object ID into the
new buffer if the path length is nonzero, as some callers will pass us
an empty path with no object ID following it, and we will not want to
read past the end of the buffer.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/grep.c         |  8 ++++----
 builtin/merge-tree.c   | 20 ++++++++++----------
 builtin/pack-objects.c |  4 ++--
 builtin/reflog.c       |  4 ++--
 cache-tree.c           |  4 ++--
 delta-islands.c        |  2 +-
 fsck.c                 |  4 ++--
 http-push.c            |  4 ++--
 list-objects.c         |  6 +++---
 match-trees.c          |  2 +-
 notes.c                |  4 ++--
 packfile.c             |  2 +-
 revision.c             |  4 ++--
 tree-diff.c            |  6 +++---
 tree-walk.c            | 11 +++++++----
 tree-walk.h            |  9 ++++++---
 tree.c                 | 10 +++++-----
 unpack-trees.c         |  6 +++---
 walker.c               |  4 ++--
 19 files changed, 60 insertions(+), 54 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index bad9c0a3d5..6fb93c0370 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -566,7 +566,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		strbuf_add(base, entry.path, te_len);
 
 		if (S_ISREG(entry.mode)) {
-			hit |= grep_oid(opt, entry.oid, base->buf, tn_len,
+			hit |= grep_oid(opt, &entry.oid, base->buf, tn_len,
 					 check_attr ? base->buf + tn_len : NULL);
 		} else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
@@ -574,10 +574,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 			void *data;
 			unsigned long size;
 
-			data = lock_and_read_oid_file(entry.oid, &type, &size);
+			data = lock_and_read_oid_file(&entry.oid, &type, &size);
 			if (!data)
 				die(_("unable to read tree (%s)"),
-				    oid_to_hex(entry.oid));
+				    oid_to_hex(&entry.oid));
 
 			strbuf_addch(base, '/');
 			init_tree_desc(&sub, data, size);
@@ -585,7 +585,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 					 check_attr, repo);
 			free(data);
 		} else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
-			hit |= grep_submodule(opt, repo, pathspec, entry.oid,
+			hit |= grep_submodule(opt, repo, pathspec, &entry.oid,
 					      base->buf, base->buf + tn_len);
 		}
 
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 70f6fc9167..4500c41e94 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -154,15 +154,15 @@ static void show_result(void)
 /* An empty entry never compares same, not even to another empty entry */
 static int same_entry(struct name_entry *a, struct name_entry *b)
 {
-	return	a->oid &&
-		b->oid &&
-		oideq(a->oid, b->oid) &&
+	return	!is_null_oid(&a->oid) &&
+		!is_null_oid(&b->oid) &&
+		oideq(&a->oid, &b->oid) &&
 		a->mode == b->mode;
 }
 
 static int both_empty(struct name_entry *a, struct name_entry *b)
 {
-	return !(a->oid || b->oid);
+	return is_null_oid(&a->oid) && is_null_oid(&b->oid);
 }
 
 static struct merge_list *create_entry(unsigned stage, unsigned mode, const struct object_id *oid, const char *path)
@@ -178,7 +178,7 @@ static struct merge_list *create_entry(unsigned stage, unsigned mode, const stru
 
 static char *traverse_path(const struct traverse_info *info, const struct name_entry *n)
 {
-	char *path = xmallocz(traverse_path_len(info, n));
+	char *path = xmallocz(traverse_path_len(info, n) + the_hash_algo->rawsz);
 	return make_traverse_path(path, info, n);
 }
 
@@ -192,8 +192,8 @@ static void resolve(const struct traverse_info *info, struct name_entry *ours, s
 		return;
 
 	path = traverse_path(info, result);
-	orig = create_entry(2, ours->mode, ours->oid, path);
-	final = create_entry(0, result->mode, result->oid, path);
+	orig = create_entry(2, ours->mode, &ours->oid, path);
+	final = create_entry(0, result->mode, &result->oid, path);
 
 	final->link = orig;
 
@@ -217,7 +217,7 @@ static void unresolved_directory(const struct traverse_info *info,
 
 	newbase = traverse_path(info, p);
 
-#define ENTRY_OID(e) (((e)->mode && S_ISDIR((e)->mode)) ? (e)->oid : NULL)
+#define ENTRY_OID(e) (((e)->mode && S_ISDIR((e)->mode)) ? &(e)->oid : NULL)
 	buf0 = fill_tree_descriptor(t + 0, ENTRY_OID(n + 0));
 	buf1 = fill_tree_descriptor(t + 1, ENTRY_OID(n + 1));
 	buf2 = fill_tree_descriptor(t + 2, ENTRY_OID(n + 2));
@@ -243,7 +243,7 @@ static struct merge_list *link_entry(unsigned stage, const struct traverse_info
 		path = entry->path;
 	else
 		path = traverse_path(info, n);
-	link = create_entry(stage, n->mode, n->oid, path);
+	link = create_entry(stage, n->mode, &n->oid, path);
 	link->link = entry;
 	return link;
 }
@@ -318,7 +318,7 @@ static int threeway_callback(int n, unsigned long mask, unsigned long dirmask, s
 	}
 
 	if (same_entry(entry+0, entry+1)) {
-		if (entry[2].oid && !S_ISDIR(entry[2].mode)) {
+		if (!is_null_oid(&entry[2].oid) && !S_ISDIR(entry[2].mode)) {
 			/* We did not touch, they modified -- take theirs */
 			resolve(info, entry+1, entry+2);
 			return mask;
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 24bba8147f..327d9170c4 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1334,7 +1334,7 @@ static void add_pbase_object(struct tree_desc *tree,
 		if (cmp < 0)
 			return;
 		if (name[cmplen] != '/') {
-			add_object_entry(entry.oid,
+			add_object_entry(&entry.oid,
 					 object_type(entry.mode),
 					 fullname, 1);
 			return;
@@ -1345,7 +1345,7 @@ static void add_pbase_object(struct tree_desc *tree,
 			const char *down = name+cmplen+1;
 			int downlen = name_cmp_len(down);
 
-			tree = pbase_tree_get(entry.oid);
+			tree = pbase_tree_get(&entry.oid);
 			if (!tree)
 				return;
 			init_tree_desc(&sub, tree->tree_data, tree->tree_size);
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 64a8df4f25..1f1010e2d9 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -94,8 +94,8 @@ static int tree_is_complete(const struct object_id *oid)
 	init_tree_desc(&desc, tree->buffer, tree->size);
 	complete = 1;
 	while (tree_entry(&desc, &entry)) {
-		if (!has_sha1_file(entry.oid->hash) ||
-		    (S_ISDIR(entry.mode) && !tree_is_complete(entry.oid))) {
+		if (!has_sha1_file(entry.oid.hash) ||
+		    (S_ISDIR(entry.mode) && !tree_is_complete(&entry.oid))) {
 			tree->object.flags |= INCOMPLETE;
 			complete = 0;
 		}
diff --git a/cache-tree.c b/cache-tree.c
index 190c6e5aa6..98cb66587b 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -675,7 +675,7 @@ static void prime_cache_tree_rec(struct repository *r,
 			cnt++;
 		else {
 			struct cache_tree_sub *sub;
-			struct tree *subtree = lookup_tree(r, entry.oid);
+			struct tree *subtree = lookup_tree(r, &entry.oid);
 			if (!subtree->object.parsed)
 				parse_tree(subtree);
 			sub = cache_tree_sub(it, entry.path);
@@ -724,7 +724,7 @@ int cache_tree_matches_traversal(struct cache_tree *root,
 
 	it = find_cache_tree_from_traversal(root, info);
 	it = cache_tree_find(it, ent->path);
-	if (it && it->entry_count > 0 && oideq(ent->oid, &it->oid))
+	if (it && it->entry_count > 0 && oideq(&ent->oid, &it->oid))
 		return it->entry_count;
 	return 0;
 }
diff --git a/delta-islands.c b/delta-islands.c
index 191a930705..2186bd0738 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -296,7 +296,7 @@ void resolve_tree_islands(struct repository *r,
 			if (S_ISGITLINK(entry.mode))
 				continue;
 
-			obj = lookup_object(r, entry.oid->hash);
+			obj = lookup_object(r, entry.oid.hash);
 			if (!obj)
 				continue;
 
diff --git a/fsck.c b/fsck.c
index 68502ce85b..2260adb71e 100644
--- a/fsck.c
+++ b/fsck.c
@@ -410,14 +410,14 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 			continue;
 
 		if (S_ISDIR(entry.mode)) {
-			obj = (struct object *)lookup_tree(the_repository, entry.oid);
+			obj = (struct object *)lookup_tree(the_repository, &entry.oid);
 			if (name && obj)
 				put_object_name(options, obj, "%s%s/", name,
 					entry.path);
 			result = options->walk(obj, OBJ_TREE, data, options);
 		}
 		else if (S_ISREG(entry.mode) || S_ISLNK(entry.mode)) {
-			obj = (struct object *)lookup_blob(the_repository, entry.oid);
+			obj = (struct object *)lookup_blob(the_repository, &entry.oid);
 			if (name && obj)
 				put_object_name(options, obj, "%s%s", name,
 					entry.path);
diff --git a/http-push.c b/http-push.c
index cd48590912..bb802d80ee 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1311,11 +1311,11 @@ static struct object_list **process_tree(struct tree *tree,
 	while (tree_entry(&desc, &entry))
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
-			p = process_tree(lookup_tree(the_repository, entry.oid),
+			p = process_tree(lookup_tree(the_repository, &entry.oid),
 					 p);
 			break;
 		case OBJ_BLOB:
-			p = process_blob(lookup_blob(the_repository, entry.oid),
+			p = process_blob(lookup_blob(the_repository, &entry.oid),
 					 p);
 			break;
 		default:
diff --git a/list-objects.c b/list-objects.c
index cf7f25bed3..ee6aff0cef 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -123,15 +123,15 @@ static void process_tree_contents(struct traversal_context *ctx,
 		}
 
 		if (S_ISDIR(entry.mode)) {
-			struct tree *t = lookup_tree(ctx->revs->repo, entry.oid);
+			struct tree *t = lookup_tree(ctx->revs->repo, &entry.oid);
 			t->object.flags |= NOT_USER_GIVEN;
 			process_tree(ctx, t, base, entry.path);
 		}
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash,
+			process_gitlink(ctx, entry.oid.hash,
 					base, entry.path);
 		else {
-			struct blob *b = lookup_blob(ctx->revs->repo, entry.oid);
+			struct blob *b = lookup_blob(ctx->revs->repo, &entry.oid);
 			b->object.flags |= NOT_USER_GIVEN;
 			process_blob(ctx, b, base, entry.path);
 		}
diff --git a/match-trees.c b/match-trees.c
index b1fbd022d1..03e81b56e1 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -106,7 +106,7 @@ static int score_trees(const struct object_id *hash1, const struct object_id *ha
 			update_tree_entry(&two);
 		} else {
 			/* path appears in both */
-			if (!oideq(one.entry.oid, two.entry.oid)) {
+			if (!oideq(&one.entry.oid, &two.entry.oid)) {
 				/* they are different */
 				score += score_differs(one.entry.mode,
 						       two.entry.mode,
diff --git a/notes.c b/notes.c
index 25cdce28b7..7f7cc4d511 100644
--- a/notes.c
+++ b/notes.c
@@ -450,7 +450,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 
 		l = xcalloc(1, sizeof(*l));
 		oidcpy(&l->key_oid, &object_oid);
-		oidcpy(&l->val_oid, entry.oid);
+		oidcpy(&l->val_oid, &entry.oid);
 		if (note_tree_insert(t, node, n, l, type,
 				     combine_notes_concatenate))
 			die("Failed to load %s %s into notes tree "
@@ -481,7 +481,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 			}
 			strbuf_addstr(&non_note_path, entry.path);
 			add_non_note(t, strbuf_detach(&non_note_path, NULL),
-				     entry.mode, entry.oid->hash);
+				     entry.mode, entry.oid.hash);
 		}
 	}
 	free(buf);
diff --git a/packfile.c b/packfile.c
index 8c6b47cc77..e38af1a275 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2100,7 +2100,7 @@ static int add_promisor_object(const struct object_id *oid,
 			 */
 			return 0;
 		while (tree_entry_gently(&desc, &entry))
-			oidset_insert(set, entry.oid);
+			oidset_insert(set, &entry.oid);
 	} else if (obj->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *) obj;
 		struct commit_list *parents = commit->parents;
diff --git a/revision.c b/revision.c
index 13e0519c02..05fb390d55 100644
--- a/revision.c
+++ b/revision.c
@@ -67,10 +67,10 @@ static void mark_tree_contents_uninteresting(struct repository *r,
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
-			mark_tree_uninteresting(r, lookup_tree(r, entry.oid));
+			mark_tree_uninteresting(r, lookup_tree(r, &entry.oid));
 			break;
 		case OBJ_BLOB:
-			mark_blob_uninteresting(lookup_blob(r, entry.oid));
+			mark_blob_uninteresting(lookup_blob(r, &entry.oid));
 			break;
 		default:
 			/* Subproject commit - not in this repository */
diff --git a/tree-diff.c b/tree-diff.c
index 0e54324610..4ffa00ae0c 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -239,7 +239,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
 						DIFF_STATUS_ADDED;
 
 			if (tpi_valid) {
-				oid_i = tp[i].entry.oid;
+				oid_i = &tp[i].entry.oid;
 				mode_i = tp[i].entry.mode;
 			}
 			else {
@@ -280,7 +280,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
 			/* same rule as in emitthis */
 			int tpi_valid = tp && !(tp[i].entry.mode & S_IFXMIN_NEQ);
 
-			parents_oid[i] = tpi_valid ? tp[i].entry.oid : NULL;
+			parents_oid[i] = tpi_valid ? &tp[i].entry.oid : NULL;
 		}
 
 		strbuf_add(base, path, pathlen);
@@ -491,7 +491,7 @@ static struct combine_diff_path *ll_diff_tree_paths(
 						continue;
 
 					/* diff(t,pi) != ø */
-					if (!oideq(t.entry.oid, tp[i].entry.oid) ||
+					if (!oideq(&t.entry.oid, &tp[i].entry.oid) ||
 					    (t.entry.mode != tp[i].entry.mode))
 						continue;
 
diff --git a/tree-walk.c b/tree-walk.c
index 1e040fc20e..b6daeab16d 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -48,7 +48,8 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 	/* Initialize the descriptor entry */
 	desc->entry.path = path;
 	desc->entry.mode = canon_mode(mode);
-	desc->entry.oid  = (const struct object_id *)(path + len);
+	desc->entry.pathlen = len - 1;
+	memcpy(desc->entry.oid.hash, path + len, the_hash_algo->rawsz);
 
 	return 0;
 }
@@ -107,7 +108,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a)
 static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err)
 {
 	const void *buf = desc->buffer;
-	const unsigned char *end = desc->entry.oid->hash + the_hash_algo->rawsz;
+	const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + the_hash_algo->rawsz;
 	unsigned long size = desc->size;
 	unsigned long len = end - (const unsigned char *)buf;
 
@@ -175,9 +176,11 @@ void setup_traverse_info(struct traverse_info *info, const char *base)
 		pathlen--;
 	info->pathlen = pathlen ? pathlen + 1 : 0;
 	info->name.path = base;
-	info->name.oid = (void *)(base + pathlen + 1);
-	if (pathlen)
+	info->name.pathlen = pathlen;
+	if (pathlen) {
+		memcpy(info->name.oid.hash, (base + pathlen + 1), the_hash_algo->rawsz);
 		info->prev = &dummy;
+	}
 }
 
 char *make_traverse_path(char *path, const struct traverse_info *info, const struct name_entry *n)
diff --git a/tree-walk.h b/tree-walk.h
index 196831007e..08d349c54f 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -1,11 +1,14 @@
 #ifndef TREE_WALK_H
 #define TREE_WALK_H
 
+#include "cache.h"
+
 struct strbuf;
 
 struct name_entry {
-	const struct object_id *oid;
+	struct object_id oid;
 	const char *path;
+	int pathlen;
 	unsigned int mode;
 };
 
@@ -19,12 +22,12 @@ static inline const struct object_id *tree_entry_extract(struct tree_desc *desc,
 {
 	*pathp = desc->entry.path;
 	*modep = desc->entry.mode;
-	return desc->entry.oid;
+	return &desc->entry.oid;
 }
 
 static inline int tree_entry_len(const struct name_entry *ne)
 {
-	return (const char *)ne->oid - ne->path - 1;
+	return ne->pathlen;
 }
 
 /*
diff --git a/tree.c b/tree.c
index 215d3fdc7c..7badea0009 100644
--- a/tree.c
+++ b/tree.c
@@ -84,7 +84,7 @@ static int read_tree_1(struct tree *tree, struct strbuf *base,
 				continue;
 		}
 
-		switch (fn(entry.oid, base,
+		switch (fn(&entry.oid, base,
 			   entry.path, entry.mode, stage, context)) {
 		case 0:
 			continue;
@@ -95,19 +95,19 @@ static int read_tree_1(struct tree *tree, struct strbuf *base,
 		}
 
 		if (S_ISDIR(entry.mode))
-			oidcpy(&oid, entry.oid);
+			oidcpy(&oid, &entry.oid);
 		else if (S_ISGITLINK(entry.mode)) {
 			struct commit *commit;
 
-			commit = lookup_commit(the_repository, entry.oid);
+			commit = lookup_commit(the_repository, &entry.oid);
 			if (!commit)
 				die("Commit %s in submodule path %s%s not found",
-				    oid_to_hex(entry.oid),
+				    oid_to_hex(&entry.oid),
 				    base->buf, entry.path);
 
 			if (parse_commit(commit))
 				die("Invalid commit %s in submodule path %s%s",
-				    oid_to_hex(entry.oid),
+				    oid_to_hex(&entry.oid),
 				    base->buf, entry.path);
 
 			oidcpy(&oid, get_commit_tree_oid(commit));
diff --git a/unpack-trees.c b/unpack-trees.c
index 6d53cbfc86..734828fda2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -679,7 +679,7 @@ static int switch_cache_bottom(struct traverse_info *info)
 
 static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k)
 {
-	return name_j->oid && name_k->oid && oideq(name_j->oid, name_k->oid);
+	return !is_null_oid(&name_j->oid) && !is_null_oid(&name_k->oid) && oideq(&name_j->oid, &name_k->oid);
 }
 
 static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
@@ -857,7 +857,7 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 		else {
 			const struct object_id *oid = NULL;
 			if (dirmask & 1)
-				oid = names[i].oid;
+				oid = &names[i].oid;
 			buf[nr_buf++] = fill_tree_descriptor(t + i, oid);
 		}
 	}
@@ -981,7 +981,7 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	ce->ce_mode = create_ce_mode(n->mode);
 	ce->ce_flags = create_ce_flags(stage);
 	ce->ce_namelen = len;
-	oidcpy(&ce->oid, n->oid);
+	oidcpy(&ce->oid, &n->oid);
 	make_traverse_path(ce->name, info, n);
 
 	return ce;
diff --git a/walker.c b/walker.c
index 96990d84da..d74ae59c77 100644
--- a/walker.c
+++ b/walker.c
@@ -50,13 +50,13 @@ static int process_tree(struct walker *walker, struct tree *tree)
 			continue;
 		if (S_ISDIR(entry.mode)) {
 			struct tree *tree = lookup_tree(the_repository,
-							entry.oid);
+							&entry.oid);
 			if (tree)
 				obj = &tree->object;
 		}
 		else {
 			struct blob *blob = lookup_blob(the_repository,
-							entry.oid);
+							&entry.oid);
 			if (blob)
 				obj = &blob->object;
 		}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes
  2019-01-10  4:25     ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson
                         ` (3 preceding siblings ...)
  2019-01-10  4:25       ` [PATCH 4/5] tree-walk: store object_id in a separate member brian m. carlson
@ 2019-01-10  4:25       ` brian m. carlson
  2019-01-10  6:50         ` Jeff King
  2019-01-10  6:40       ` [PATCH 0/5] tree-walk object_id refactor Jeff King
  5 siblings, 1 reply; 41+ messages in thread
From: brian m. carlson @ 2019-01-10  4:25 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Junio C Hamano

There are some situations in which we want to store an object ID into
struct object_id without the_hash_algo necessarily being set correctly.
One such case is when cloning a repository, where we must read refs from
the remote side without having a repository from which to read the
preferred algorithm.

In this cases, we may have the_hash_algo set to SHA-1, which is the
default, but read refs into struct object_id that are SHA-256. When
copying these values, we will want to copy them completely, not just the
first 20 bytes. Consequently, make sure that oidcpy copies the maximum
number of bytes at all times, regardless of the setting of
the_hash_algo.

Since oidcpy and hashcpy are no longer functionally identical, remove
the Cocinelle object_id transformations that convert from one into the
other.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h                            |  2 +-
 contrib/coccinelle/object_id.cocci | 30 ------------------------------
 2 files changed, 1 insertion(+), 31 deletions(-)

diff --git a/cache.h b/cache.h
index ca36b44ee0..8b8c6a1a2a 100644
--- a/cache.h
+++ b/cache.h
@@ -1072,7 +1072,7 @@ static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src)
 
 static inline void oidcpy(struct object_id *dst, const struct object_id *src)
 {
-	hashcpy(dst->hash, src->hash);
+	memcpy(dst->hash, src->hash, GIT_MAX_RAWSZ);
 }
 
 static inline struct object_id *oiddup(const struct object_id *src)
diff --git a/contrib/coccinelle/object_id.cocci b/contrib/coccinelle/object_id.cocci
index 6a7cf3e02d..3e536a9834 100644
--- a/contrib/coccinelle/object_id.cocci
+++ b/contrib/coccinelle/object_id.cocci
@@ -86,36 +86,6 @@ struct object_id OID;
 - hashcmp(OID.hash, OIDPTR->hash)
 + oidcmp(&OID, OIDPTR)
 
-@@
-struct object_id OID1, OID2;
-@@
-- hashcpy(OID1.hash, OID2.hash)
-+ oidcpy(&OID1, &OID2)
-
-@@
-identifier f != oidcpy;
-struct object_id *OIDPTR1;
-struct object_id *OIDPTR2;
-@@
-  f(...) {<...
-- hashcpy(OIDPTR1->hash, OIDPTR2->hash)
-+ oidcpy(OIDPTR1, OIDPTR2)
-  ...>}
-
-@@
-struct object_id *OIDPTR;
-struct object_id OID;
-@@
-- hashcpy(OIDPTR->hash, OID.hash)
-+ oidcpy(OIDPTR, &OID)
-
-@@
-struct object_id *OIDPTR;
-struct object_id OID;
-@@
-- hashcpy(OID.hash, OIDPTR->hash)
-+ oidcpy(&OID, OIDPTR)
-
 @@
 struct object_id *OIDPTR1;
 struct object_id *OIDPTR2;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/5] tree-walk object_id refactor
  2019-01-10  4:25     ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson
                         ` (4 preceding siblings ...)
  2019-01-10  4:25       ` [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson
@ 2019-01-10  6:40       ` Jeff King
  2019-01-11  0:17         ` brian m. carlson
  5 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-01-10  6:40 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Martin Ågren, Junio C Hamano

On Thu, Jan 10, 2019 at 04:25:46AM +0000, brian m. carlson wrote:

> There are a small number of places in our codebase where we cast a
> buffer of unsigned char to a struct object_id pointer. When we have
> GIT_MAX_RAWSZ set to 32 (because we have SHA-256), one of these places
> (the buffer for tree objects) can lead to us copying too much data when
> using SHA-1 as the hash, since there are only 20 bytes to read.
> 
> This was not expected to be a problem before future code was introduced,
> but due to a combination of series the issue became noticeable.
> 
> This series introduces a refactor to avoid referencing the struct
> object_id directly from a buffer and instead storing an additional
> struct object_id (and an int) in struct name_entry and referring to
> that.

I think this is really the only safe and sane solution. We resisted it
because of the cost of the extra copies (especially the
update_tree_entry() one). But I don't know that anybody actually
measured it. Do you have any performance numbers before/after this
series?

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/5] match-trees: use hashcpy to splice trees
  2019-01-10  4:25       ` [PATCH 3/5] match-trees: use hashcpy to splice trees brian m. carlson
@ 2019-01-10  6:45         ` Jeff King
  2019-01-10 23:55           ` brian m. carlson
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-01-10  6:45 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Martin Ågren, Junio C Hamano

On Thu, Jan 10, 2019 at 04:25:49AM +0000, brian m. carlson wrote:

> When we're splicing trees, we're writing directly from one location into
> a buffer that is exactly the same size as a tree object. If the current
> hash algorithm is SHA-1, we may not have a full 32 (GIT_MAX_RAWSZ) bytes
> available to write, nor would we want to write that many bytes even if
> we did. In a future commit, we'll split out hashcpy to respect
> the_hash_algo while oidcpy uses GIT_MAX_RAWSZ, so convert the oidcpy to
> a hashcpy so we copy the right number of bytes.
> 
> Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
> ---
>  match-trees.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/match-trees.c b/match-trees.c
> index feca48a5fd..b1fbd022d1 100644
> --- a/match-trees.c
> +++ b/match-trees.c
> @@ -224,7 +224,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
>  	} else {
>  		rewrite_with = oid2;
>  	}
> -	oidcpy(rewrite_here, rewrite_with);
> +	hashcpy(rewrite_here->hash, rewrite_with->hash);

Hrm. Our coccinelle patches will want to convert this back to oidcpy(),
though I see you drop them in the final patch.

However, I wonder if it points to another mismatch. Isn't the point that
we _don't_ actually have "struct object_id"s here? I.e., rewrite_here
and rewrite_with should actually be "const unsigned char *" that we
happen to know are the_hash_algo->raw_sz?

I think the only reason they are "struct object_id" is because that's
what tree_entry_extract() returns. Should we be providing another
function to allow more byte-oriented access?

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/5] tree-walk: store object_id in a separate member
  2019-01-10  4:25       ` [PATCH 4/5] tree-walk: store object_id in a separate member brian m. carlson
@ 2019-01-10  6:49         ` Jeff King
  2019-01-10 23:57           ` brian m. carlson
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-01-10  6:49 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Martin Ågren, Junio C Hamano

On Thu, Jan 10, 2019 at 04:25:50AM +0000, brian m. carlson wrote:

> diff --git a/tree-walk.c b/tree-walk.c
> index 1e040fc20e..b6daeab16d 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -48,7 +48,8 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
>  	/* Initialize the descriptor entry */
>  	desc->entry.path = path;
>  	desc->entry.mode = canon_mode(mode);
> -	desc->entry.oid  = (const struct object_id *)(path + len);
> +	desc->entry.pathlen = len - 1;
> +	memcpy(desc->entry.oid.hash, path + len, the_hash_algo->rawsz);
>  
>  	return 0;
>  }

This one really is a hashcpy() now, right, even after your final patch?
I guess using rawsz explicitly makes it match the computation here:

> @@ -107,7 +108,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a)
>  static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err)
>  {
>  	const void *buf = desc->buffer;
> -	const unsigned char *end = desc->entry.oid->hash + the_hash_algo->rawsz;
> +	const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + the_hash_algo->rawsz;
>  	unsigned long size = desc->size;
>  	unsigned long len = end - (const unsigned char *)buf;

So maybe it's better to be explicit as you have here. (Mostly just as I
was reading it, I was looking for a use of hashcpy and was surprised not
to find it ;) ).

> @@ -175,9 +176,11 @@ void setup_traverse_info(struct traverse_info *info, const char *base)
>  		pathlen--;
>  	info->pathlen = pathlen ? pathlen + 1 : 0;
>  	info->name.path = base;
> -	info->name.oid = (void *)(base + pathlen + 1);
> -	if (pathlen)
> +	info->name.pathlen = pathlen;
> +	if (pathlen) {
> +		memcpy(info->name.oid.hash, (base + pathlen + 1), the_hash_algo->rawsz);
>  		info->prev = &dummy;
> +	}
>  }
>  

Ditto here, I think.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes
  2019-01-10  4:25       ` [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson
@ 2019-01-10  6:50         ` Jeff King
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-01-10  6:50 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Martin Ågren, Junio C Hamano

On Thu, Jan 10, 2019 at 04:25:51AM +0000, brian m. carlson wrote:

> Since oidcpy and hashcpy are no longer functionally identical, remove
> the Cocinelle object_id transformations that convert from one into the
> other.

Unfortunately this means we'll no longer automatically find cases where
"foo" got converted into "struct object_id" and could be updated. I
guess at some point we assume that most everything has been converted
anyway, and that coccinelle rule loses its usefulness.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano
                   ` (4 preceding siblings ...)
  2019-01-09 10:28 ` What's cooking in git.git (Jan 2019, #01; Mon, 7) Jeff King
@ 2019-01-10 18:02 ` Stefan Beller
  5 siblings, 0 replies; 41+ messages in thread
From: Stefan Beller @ 2019-01-10 18:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

> * sb/submodule-recursive-fetch-gets-the-tip (2018-12-09) 9 commits
>  - fetch: ensure submodule objects fetched
>  - submodule.c: fetch in submodules git directory instead of in worktree
>  - submodule: migrate get_next_submodule to use repository structs
>  - repository: repo_submodule_init to take a submodule struct
>  - submodule: store OIDs in changed_submodule_names
>  - submodule.c: tighten scope of changed_submodule_names struct
>  - submodule.c: sort changed_submodule_names before searching it
>  - submodule.c: fix indentation
>  - sha1-array: provide oid_array_filter
>
>  "git fetch --recurse-submodules" may not fetch the necessary commit
>  that is bound to the superproject, which is getting corrected.
>
>  Ready?

I checked the last discussion at
https://public-inbox.org/git/20181129002756.167615-1-sbeller@google.com/
and I think it is ready as I did not see any outstanding issues.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-10  1:02     ` brian m. carlson
@ 2019-01-10 18:55       ` Junio C Hamano
  2019-01-10 19:03       ` Martin Ågren
  1 sibling, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-01-10 18:55 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Martin Ågren, Git Mailing List

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

>> I can think of three possible approaches:
>> 
>> * Allocate with a margin (GIT_MAX_RAWSZ - the_hash_algo->rawsz) where
>>   "necessary" (TM). Maybe not so maintainable.
>
> I think there are actually several places where we allocate for these
> buffers, so this is not likely to be a great solution. Even worse, in
> some cases, we intentionally use a too-short buffer knowing that we'll
> never dereference the data.
>
>> * Teach `oidset_insert()` (i.e., khash) to only copy
>>   `the_hash_algo->rawsz` bytes. Maybe not so good for performance.
>
> This is probably the best fix for the moment if you want an immediate
> fix.
>
> As for my series, I'll need to run the testsuite on it, but I'll try to
> get it out tonight or at the latest tomorrow if people want to use that
> instead.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-10  1:02     ` brian m. carlson
  2019-01-10 18:55       ` Junio C Hamano
@ 2019-01-10 19:03       ` Martin Ågren
  1 sibling, 0 replies; 41+ messages in thread
From: Martin Ågren @ 2019-01-10 19:03 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Git Mailing List, Junio C Hamano

On Thu, 10 Jan 2019 at 02:03, brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On Wed, Jan 09, 2019 at 10:06:08PM +0100, Martin Ågren wrote:
> > i.e., we copy sizeof(struct object_id) (=32) bytes. Which is 12 more
> > than what is known to be safe. For this particular input data, we read
> > outside allocated memory.
>
> Anything pointing to a struct object_id has to support at least
> GIT_MAX_RAWSZ bytes, and that code doesn't, because it's a tree buffer.
>
> I ran into this later on in my SHA-256 work and have a series that fixes
> the tree-walk code, but it's a bit involved and requires copying the
> struct object_id out of the buffer.
>
> I thought we were going to be triggering this case only with some new
> code I was introducing, but apparently somebody else got there first.

> As for my series, I'll need to run the testsuite on it, but I'll try to
> get it out tonight or at the latest tomorrow if people want to use that
> instead.

Cool. I should have known that you had something in the pipeline. Thanks
for working on this.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-09 10:28 ` What's cooking in git.git (Jan 2019, #01; Mon, 7) Jeff King
@ 2019-01-10 19:05   ` Junio C Hamano
  2019-01-10 19:46   ` Junio C Hamano
  1 sibling, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-01-10 19:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, git

Jeff King <peff@peff.net> writes:

> On Mon, Jan 07, 2019 at 03:34:10PM -0800, Junio C Hamano wrote:
>
>> * jk/proto-v2-hidden-refs-fix (2018-12-14) 3 commits
>>  - upload-pack: support hidden refs with protocol v2
>>  - parse_hide_refs_config: handle NULL section
>>  - serve: pass "config context" through to individual commands
>> 
>>  The v2 upload-pack protocol implementation failed to honor
>>  hidden-ref configuration, which has been corrected.
>> 
>>  Will merge to 'next'.
>
> Sorry I didn't catch this before it hit 'next', but I think we were
> planning to scrap this in favor of:
>
>   https://public-inbox.org/git/20181218124750.GC30471@sigill.intra.peff.net/

Ouch, I 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: What's cooking in git.git (Jan 2019, #01; Mon, 7)
  2019-01-09 10:28 ` What's cooking in git.git (Jan 2019, #01; Mon, 7) Jeff King
  2019-01-10 19:05   ` Junio C Hamano
@ 2019-01-10 19:46   ` Junio C Hamano
  1 sibling, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-01-10 19:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Nieder, git

Jeff King <peff@peff.net> writes:

> On Mon, Jan 07, 2019 at 03:34:10PM -0800, Junio C Hamano wrote:
>
>> * jk/proto-v2-hidden-refs-fix (2018-12-14) 3 commits
>>  - upload-pack: support hidden refs with protocol v2
>>  - parse_hide_refs_config: handle NULL section
>>  - serve: pass "config context" through to individual commands
>> 
>>  The v2 upload-pack protocol implementation failed to honor
>>  hidden-ref configuration, which has been corrected.
>> 
>>  Will merge to 'next'.
>
> Sorry I didn't catch this before it hit 'next', but I think we were
> planning to scrap this in favor of:
>
>   https://public-inbox.org/git/20181218124750.GC30471@sigill.intra.peff.net/

Ouch, indeed.

I did recall that <20181218124852.GD30471@sigill.intra.peff.net> was
the final verdict, which did mention that message, but somehow I
final onefailed to act on it.

Let me revert the merge and redo the topic.

Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/5] match-trees: use hashcpy to splice trees
  2019-01-10  6:45         ` Jeff King
@ 2019-01-10 23:55           ` brian m. carlson
  2019-01-11 14:51             ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: brian m. carlson @ 2019-01-10 23:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Martin Ågren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1601 bytes --]

On Thu, Jan 10, 2019 at 01:45:20AM -0500, Jeff King wrote:
> On Thu, Jan 10, 2019 at 04:25:49AM +0000, brian m. carlson wrote:
> > diff --git a/match-trees.c b/match-trees.c
> > index feca48a5fd..b1fbd022d1 100644
> > --- a/match-trees.c
> > +++ b/match-trees.c
> > @@ -224,7 +224,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
> >  	} else {
> >  		rewrite_with = oid2;
> >  	}
> > -	oidcpy(rewrite_here, rewrite_with);
> > +	hashcpy(rewrite_here->hash, rewrite_with->hash);
> 
> Hrm. Our coccinelle patches will want to convert this back to oidcpy(),
> though I see you drop them in the final patch.
> 
> However, I wonder if it points to another mismatch. Isn't the point that
> we _don't_ actually have "struct object_id"s here? I.e., rewrite_here
> and rewrite_with should actually be "const unsigned char *" that we
> happen to know are the_hash_algo->raw_sz?

Correct.

> I think the only reason they are "struct object_id" is because that's
> what tree_entry_extract() returns. Should we be providing another
> function to allow more byte-oriented access?

The reason is we recursively call splice_tree, passing rewrite_here as
the first argument. That argument is then used for read_object_file,
which requires a struct object_id * (because it is, logically, an object
ID).

I *think* we could fix this by copying an unsigned char *rewrite_here
into a temporary struct object_id before we recurse, but it's not
obvious to me if that's safe to do.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 4/5] tree-walk: store object_id in a separate member
  2019-01-10  6:49         ` Jeff King
@ 2019-01-10 23:57           ` brian m. carlson
  0 siblings, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-01-10 23:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Martin Ågren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1055 bytes --]

On Thu, Jan 10, 2019 at 01:49:45AM -0500, Jeff King wrote:
> This one really is a hashcpy() now, right, even after your final patch?
> I guess using rawsz explicitly makes it match the computation here:
> 
> > @@ -107,7 +108,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a)
> >  static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err)
> >  {
> >  	const void *buf = desc->buffer;
> > -	const unsigned char *end = desc->entry.oid->hash + the_hash_algo->rawsz;
> > +	const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + the_hash_algo->rawsz;
> >  	unsigned long size = desc->size;
> >  	unsigned long len = end - (const unsigned char *)buf;
> 
> So maybe it's better to be explicit as you have here. (Mostly just as I
> was reading it, I was looking for a use of hashcpy and was surprised not
> to find it ;) ).

Yeah, I think a hashcpy is a better choice. Will fix.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/5] tree-walk object_id refactor
  2019-01-10  6:40       ` [PATCH 0/5] tree-walk object_id refactor Jeff King
@ 2019-01-11  0:17         ` brian m. carlson
  2019-01-11 14:17           ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: brian m. carlson @ 2019-01-11  0:17 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Martin Ågren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1793 bytes --]

On Thu, Jan 10, 2019 at 01:40:31AM -0500, Jeff King wrote:
> On Thu, Jan 10, 2019 at 04:25:46AM +0000, brian m. carlson wrote:
> 
> > There are a small number of places in our codebase where we cast a
> > buffer of unsigned char to a struct object_id pointer. When we have
> > GIT_MAX_RAWSZ set to 32 (because we have SHA-256), one of these places
> > (the buffer for tree objects) can lead to us copying too much data when
> > using SHA-1 as the hash, since there are only 20 bytes to read.
> > 
> > This was not expected to be a problem before future code was introduced,
> > but due to a combination of series the issue became noticeable.
> > 
> > This series introduces a refactor to avoid referencing the struct
> > object_id directly from a buffer and instead storing an additional
> > struct object_id (and an int) in struct name_entry and referring to
> > that.
> 
> I think this is really the only safe and sane solution. We resisted it
> because of the cost of the extra copies (especially the
> update_tree_entry() one). But I don't know that anybody actually
> measured it. Do you have any performance numbers before/after this
> series?

Unfortunately, I don't. I'm not really sure in what situations we hit
this code path a lot, so I'm not sure what exactly we should performance
test. If you have suggestions, I can set up some perf tests.

I will say that I resisted writing this series for a long time, since I
knew it was going to be a difficult slog to get everything working (and
it was). If there had been a way to avoid it, I would have done it.
However, writing this series led to about ten tests in my SHA-256 work
suddenly passing where they hadn't before.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/5] tree-walk object_id refactor
  2019-01-11  0:17         ` brian m. carlson
@ 2019-01-11 14:17           ` Jeff King
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-01-11 14:17 UTC (permalink / raw)
  To: brian m. carlson, git, Martin Ågren, Junio C Hamano

On Fri, Jan 11, 2019 at 12:17:50AM +0000, brian m. carlson wrote:

> > I think this is really the only safe and sane solution. We resisted it
> > because of the cost of the extra copies (especially the
> > update_tree_entry() one). But I don't know that anybody actually
> > measured it. Do you have any performance numbers before/after this
> > series?
> 
> Unfortunately, I don't. I'm not really sure in what situations we hit
> this code path a lot, so I'm not sure what exactly we should performance
> test. If you have suggestions, I can set up some perf tests.

I think the interesting one here is the copy in update_tree_entry(),
which is called for every entry of every tree we parse for a diff. So
the command with the most noticeable impact, I think, would be something
like:

  git log -- pathspec

I couldn't demonstrate any measurable slowdown (I didn't compute the
mean to see if it gets worse, but certainly any change is within the
run-to-run noise).

I guess that is competing with the cost to access the commit objects.
Perhaps a pure tree diff would be a more precise test. Here's the most
pathological case I could come up with:

  git init
  for i in $(seq 10000); do echo $i >$i; done
  git add .
  git commit -m base
  echo change >9999
  git commit -am change
  time git diff-tree HEAD^ HEAD

which should really be spending 99% of its time inflating and walking
through the trees. And even there I couldn't measure anything.

So I'd say it's probably fine.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/5] match-trees: use hashcpy to splice trees
  2019-01-10 23:55           ` brian m. carlson
@ 2019-01-11 14:51             ` Jeff King
  2019-01-11 14:54               ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-01-11 14:51 UTC (permalink / raw)
  To: brian m. carlson, git, Martin Ågren, Junio C Hamano

On Thu, Jan 10, 2019 at 11:55:51PM +0000, brian m. carlson wrote:

> > I think the only reason they are "struct object_id" is because that's
> > what tree_entry_extract() returns. Should we be providing another
> > function to allow more byte-oriented access?
> 
> The reason is we recursively call splice_tree, passing rewrite_here as
> the first argument. That argument is then used for read_object_file,
> which requires a struct object_id * (because it is, logically, an object
> ID).
> 
> I *think* we could fix this by copying an unsigned char *rewrite_here
> into a temporary struct object_id before we recurse, but it's not
> obvious to me if that's safe to do.

I think rewrite_here needs to be a direct pointer into the buffer, since
we plan to modify it.

I think rewrite_with is correct to be an object_id. It's either the oid
passed in from the caller, or the subtree we generated (in which case
it's the result from write_object_file).

So the "most correct" thing is probably something like this:

diff --git a/match-trees.c b/match-trees.c
index 03e81b56e1..129b13a970 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -179,7 +179,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	char *buf;
 	unsigned long sz;
 	struct tree_desc desc;
-	struct object_id *rewrite_here;
+	unsigned char *rewrite_here;
 	const struct object_id *rewrite_with;
 	struct object_id subtree;
 	enum object_type type;
@@ -206,9 +206,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 			if (!S_ISDIR(mode))
 				die("entry %s in tree %s is not a tree", name,
 				    oid_to_hex(oid1));
-			rewrite_here = (struct object_id *)(desc.entry.path +
-							    strlen(desc.entry.path) +
-							    1);
+			/*
+			 * We cast here for two reasons:
+			 *
+			 *   - to flip the "char *" (for the path) to "unsigned
+			 *     char *" (for the hash stored after it)
+			 *
+			 *   - to discard the "const"; this is OK because we
+			 *     know it points into our non-const "buf"
+			 */
+			rewrite_here = (unsigned char *)desc.entry.path + strlen(desc.entry.path) + 1;
 			break;
 		}
 		update_tree_entry(&desc);
@@ -224,7 +231,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	} else {
 		rewrite_with = oid2;
 	}
-	hashcpy(rewrite_here->hash, rewrite_with->hash);
+	hashcpy(rewrite_here, rewrite_with->hash);
 	status = write_object_file(buf, sz, tree_type, result);
 	free(buf);
 	return status;

I think if I were trying to write this in a less-subtle way, I'd
probably stop trying to do it in-place, and have a copy loop more like:

  for entry in src_tree
    if match_entry(entry, prefix)
      entry = rewrite_entry(entry) /* either oid2 or subtree */
    push_entry(dst_tree)

We may even have to go that way eventually if we might ever be rewriting
to a tree with a different hash size (i.e., there is a hidden assumption
here that rewrite_here points to exactly the_hash_algo->rawsz bytes of
hash). But I think for now it's not necessary, and it's way outside the
scope of what you're trying to do here.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/5] match-trees: use hashcpy to splice trees
  2019-01-11 14:51             ` Jeff King
@ 2019-01-11 14:54               ` Jeff King
  2019-01-14  1:30                 ` brian m. carlson
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-01-11 14:54 UTC (permalink / raw)
  To: brian m. carlson, git, Martin Ågren, Junio C Hamano

On Fri, Jan 11, 2019 at 09:51:06AM -0500, Jeff King wrote:

> I think rewrite_here needs to be a direct pointer into the buffer, since
> we plan to modify it.
> 
> I think rewrite_with is correct to be an object_id. It's either the oid
> passed in from the caller, or the subtree we generated (in which case
> it's the result from write_object_file).
> 
> So the "most correct" thing is probably something like this:

Of course, it would be nice if what I sent compiled. ;)

rewrite_here does double duty: it's the pointer in the tree that we need
to rewrite, and it's also the oid we pass down recursively. So we have
to handle both cases, like so:

diff --git a/match-trees.c b/match-trees.c
index 03e81b56e1..3e0ed889b4 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -179,7 +179,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	char *buf;
 	unsigned long sz;
 	struct tree_desc desc;
-	struct object_id *rewrite_here;
+	unsigned char *rewrite_here;
 	const struct object_id *rewrite_with;
 	struct object_id subtree;
 	enum object_type type;
@@ -206,9 +206,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 			if (!S_ISDIR(mode))
 				die("entry %s in tree %s is not a tree", name,
 				    oid_to_hex(oid1));
-			rewrite_here = (struct object_id *)(desc.entry.path +
-							    strlen(desc.entry.path) +
-							    1);
+			/*
+			 * We cast here for two reasons:
+			 *
+			 *   - to flip the "char *" (for the path) to "unsigned
+			 *     char *" (for the hash stored after it)
+			 *
+			 *   - to discard the "const"; this is OK because we
+			 *     know it points into our non-const "buf"
+			 */
+			rewrite_here = (unsigned char *)desc.entry.path + strlen(desc.entry.path) + 1;
 			break;
 		}
 		update_tree_entry(&desc);
@@ -217,14 +224,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 		die("entry %.*s not found in tree %s", toplen, prefix,
 		    oid_to_hex(oid1));
 	if (*subpath) {
-		status = splice_tree(rewrite_here, subpath, oid2, &subtree);
+		struct object_id tree_oid;
+		hashcpy(tree_oid.hash, rewrite_here);
+		status = splice_tree(&tree_oid, subpath, oid2, &subtree);
 		if (status)
 			return status;
 		rewrite_with = &subtree;
 	} else {
 		rewrite_with = oid2;
 	}
-	hashcpy(rewrite_here->hash, rewrite_with->hash);
+	hashcpy(rewrite_here, rewrite_with->hash);
 	status = write_object_file(buf, sz, tree_type, result);
 	free(buf);
 	return status;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/5] match-trees: use hashcpy to splice trees
  2019-01-11 14:54               ` Jeff King
@ 2019-01-14  1:30                 ` brian m. carlson
  2019-01-14 15:40                   ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: brian m. carlson @ 2019-01-14  1:30 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Martin Ågren, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 944 bytes --]

On Fri, Jan 11, 2019 at 09:54:46AM -0500, Jeff King wrote:
> On Fri, Jan 11, 2019 at 09:51:06AM -0500, Jeff King wrote:
> 
> > I think rewrite_here needs to be a direct pointer into the buffer, since
> > we plan to modify it.
> > 
> > I think rewrite_with is correct to be an object_id. It's either the oid
> > passed in from the caller, or the subtree we generated (in which case
> > it's the result from write_object_file).
> > 
> > So the "most correct" thing is probably something like this:
> 
> Of course, it would be nice if what I sent compiled. ;)
> 
> rewrite_here does double duty: it's the pointer in the tree that we need
> to rewrite, and it's also the oid we pass down recursively. So we have
> to handle both cases, like so:

Since I took most of the patch you wrote, may I apply your sign-off to
the resulting patch I send out?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/5] match-trees: use hashcpy to splice trees
  2019-01-14  1:30                 ` brian m. carlson
@ 2019-01-14 15:40                   ` Jeff King
  0 siblings, 0 replies; 41+ messages in thread
From: Jeff King @ 2019-01-14 15:40 UTC (permalink / raw)
  To: brian m. carlson, git, Martin Ågren, Junio C Hamano

On Mon, Jan 14, 2019 at 01:30:17AM +0000, brian m. carlson wrote:

> > > So the "most correct" thing is probably something like this:
> > 
> > Of course, it would be nice if what I sent compiled. ;)
> > 
> > rewrite_here does double duty: it's the pointer in the tree that we need
> > to rewrite, and it's also the oid we pass down recursively. So we have
> > to handle both cases, like so:
> 
> Since I took most of the patch you wrote, may I apply your sign-off to
> the resulting patch I send out?

Yes, definitely.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 0/5] tree-walk object_id refactor
  2019-01-09 21:06   ` Martin Ågren
  2019-01-10  1:02     ` brian m. carlson
  2019-01-10  4:25     ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson
@ 2019-01-15  0:39     ` " brian m. carlson
  2019-01-15  0:39       ` [PATCH v2 1/5] tree-walk: copy object ID before use brian m. carlson
                         ` (5 more replies)
  2 siblings, 6 replies; 41+ messages in thread
From: brian m. carlson @ 2019-01-15  0:39 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Junio C Hamano, Jeff King

There are a small number of places in our codebase where we cast a
buffer of unsigned char to a struct object_id pointer. When we have
GIT_MAX_RAWSZ set to 32 (because we have SHA-256), one of these places
(the buffer for tree objects) can lead to us copying too much data when
using SHA-1 as the hash, since there are only 20 bytes to read.

Changes from v1:
* Use hashcpy instead of memcpy.
* Adopt Peff's suggestion for improving patch 3.

brian m. carlson (5):
  tree-walk: copy object ID before use
  match-trees: compute buffer offset correctly when splicing
  match-trees: use hashcpy to splice trees
  tree-walk: store object_id in a separate member
  cache: make oidcpy always copy GIT_MAX_RAWSZ bytes

 builtin/grep.c                     |  8 ++++----
 builtin/merge-tree.c               | 20 ++++++++++----------
 builtin/pack-objects.c             |  4 ++--
 builtin/reflog.c                   |  4 ++--
 cache-tree.c                       |  4 ++--
 cache.h                            |  2 +-
 contrib/coccinelle/object_id.cocci | 30 ------------------------------
 delta-islands.c                    |  2 +-
 fsck.c                             |  4 ++--
 http-push.c                        |  4 ++--
 list-objects.c                     |  6 +++---
 match-trees.c                      | 27 ++++++++++++++++++++-------
 notes.c                            |  4 ++--
 packfile.c                         |  2 +-
 revision.c                         |  4 ++--
 tree-diff.c                        |  6 +++---
 tree-walk.c                        | 21 ++++++++++++---------
 tree-walk.h                        |  9 ++++++---
 tree.c                             | 10 +++++-----
 unpack-trees.c                     |  6 +++---
 walker.c                           |  4 ++--
 21 files changed, 85 insertions(+), 96 deletions(-)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/5] tree-walk: copy object ID before use
  2019-01-15  0:39     ` [PATCH v2 " brian m. carlson
@ 2019-01-15  0:39       ` brian m. carlson
  2019-01-15  0:39       ` [PATCH v2 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-01-15  0:39 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Junio C Hamano, Jeff King

In a future commit, the pointer returned by tree_entry_extract will
point into the struct tree_desc, causing its lifetime to be bound to
that of the struct tree_desc itself.  To ensure this code path keeps
working, copy the object_id into a local variable so that it lives long
enough.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 tree-walk.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 79bafbd1a2..1e040fc20e 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -498,10 +498,10 @@ static int find_tree_entry(struct tree_desc *t, const char *name, struct object_
 	int namelen = strlen(name);
 	while (t->size) {
 		const char *entry;
-		const struct object_id *oid;
+		struct object_id oid;
 		int entrylen, cmp;
 
-		oid = tree_entry_extract(t, &entry, mode);
+		oidcpy(&oid, tree_entry_extract(t, &entry, mode));
 		entrylen = tree_entry_len(&t->entry);
 		update_tree_entry(t);
 		if (entrylen > namelen)
@@ -512,7 +512,7 @@ static int find_tree_entry(struct tree_desc *t, const char *name, struct object_
 		if (cmp < 0)
 			break;
 		if (entrylen == namelen) {
-			oidcpy(result, oid);
+			oidcpy(result, &oid);
 			return 0;
 		}
 		if (name[entrylen] != '/')
@@ -520,10 +520,10 @@ static int find_tree_entry(struct tree_desc *t, const char *name, struct object_
 		if (!S_ISDIR(*mode))
 			break;
 		if (++entrylen == namelen) {
-			oidcpy(result, oid);
+			oidcpy(result, &oid);
 			return 0;
 		}
-		return get_tree_entry(oid, name + entrylen, result, mode);
+		return get_tree_entry(&oid, name + entrylen, result, mode);
 	}
 	return -1;
 }

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 2/5] match-trees: compute buffer offset correctly when splicing
  2019-01-15  0:39     ` [PATCH v2 " brian m. carlson
  2019-01-15  0:39       ` [PATCH v2 1/5] tree-walk: copy object ID before use brian m. carlson
@ 2019-01-15  0:39       ` brian m. carlson
  2019-01-15  0:39       ` [PATCH v2 3/5] match-trees: use hashcpy to splice trees brian m. carlson
                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-01-15  0:39 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Junio C Hamano, Jeff King

Currently, the struct object_id pointer returned from tree_entry_extract
lives directly inside the parsed tree buffer. In a future commit, this
will change so that it instead points to a dedicated struct member.
Since in this code path, we want to modify the buffer directly, compute
the buffer offset we want to modify by using the pointer to the path
instead.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 match-trees.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/match-trees.c b/match-trees.c
index 2b6d31ef9d..feca48a5fd 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -199,15 +199,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	while (desc.size) {
 		const char *name;
 		unsigned mode;
-		const struct object_id *oid;
 
-		oid = tree_entry_extract(&desc, &name, &mode);
+		tree_entry_extract(&desc, &name, &mode);
 		if (strlen(name) == toplen &&
 		    !memcmp(name, prefix, toplen)) {
 			if (!S_ISDIR(mode))
 				die("entry %s in tree %s is not a tree", name,
 				    oid_to_hex(oid1));
-			rewrite_here = (struct object_id *)oid;
+			rewrite_here = (struct object_id *)(desc.entry.path +
+							    strlen(desc.entry.path) +
+							    1);
 			break;
 		}
 		update_tree_entry(&desc);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 3/5] match-trees: use hashcpy to splice trees
  2019-01-15  0:39     ` [PATCH v2 " brian m. carlson
  2019-01-15  0:39       ` [PATCH v2 1/5] tree-walk: copy object ID before use brian m. carlson
  2019-01-15  0:39       ` [PATCH v2 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson
@ 2019-01-15  0:39       ` brian m. carlson
  2019-01-15  0:39       ` [PATCH v2 4/5] tree-walk: store object_id in a separate member brian m. carlson
                         ` (2 subsequent siblings)
  5 siblings, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-01-15  0:39 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Junio C Hamano, Jeff King

When we splice trees together, we operate in place on the tree buffer.
If we're using SHA-1 for the hash algorithm, we may not have a full
GIT_MAX_RAWSZ (32) bytes to copy. Consequently, it doesn't logically
make sense for us to use a struct object_id to represent this type,
since it isn't a complete object.

Represent this value as a unsigned char pointer instead and copy it when
necessary.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 match-trees.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/match-trees.c b/match-trees.c
index feca48a5fd..c2b7329e09 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -179,7 +179,7 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 	char *buf;
 	unsigned long sz;
 	struct tree_desc desc;
-	struct object_id *rewrite_here;
+	unsigned char *rewrite_here;
 	const struct object_id *rewrite_with;
 	struct object_id subtree;
 	enum object_type type;
@@ -206,9 +206,19 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 			if (!S_ISDIR(mode))
 				die("entry %s in tree %s is not a tree", name,
 				    oid_to_hex(oid1));
-			rewrite_here = (struct object_id *)(desc.entry.path +
-							    strlen(desc.entry.path) +
-							    1);
+
+			/*
+			 * We cast here for two reasons:
+			 *
+			 *   - to flip the "char *" (for the path) to "unsigned
+			 *     char *" (for the hash stored after it)
+			 *
+			 *   - to discard the "const"; this is OK because we
+			 *     know it points into our non-const "buf"
+			 */
+			rewrite_here = (unsigned char *)(desc.entry.path +
+							 strlen(desc.entry.path) +
+							 1);
 			break;
 		}
 		update_tree_entry(&desc);
@@ -217,14 +227,16 @@ static int splice_tree(const struct object_id *oid1, const char *prefix,
 		die("entry %.*s not found in tree %s", toplen, prefix,
 		    oid_to_hex(oid1));
 	if (*subpath) {
-		status = splice_tree(rewrite_here, subpath, oid2, &subtree);
+		struct object_id tree_oid;
+		hashcpy(tree_oid.hash, rewrite_here);
+		status = splice_tree(&tree_oid, subpath, oid2, &subtree);
 		if (status)
 			return status;
 		rewrite_with = &subtree;
 	} else {
 		rewrite_with = oid2;
 	}
-	oidcpy(rewrite_here, rewrite_with);
+	hashcpy(rewrite_here, rewrite_with->hash);
 	status = write_object_file(buf, sz, tree_type, result);
 	free(buf);
 	return status;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 4/5] tree-walk: store object_id in a separate member
  2019-01-15  0:39     ` [PATCH v2 " brian m. carlson
                         ` (2 preceding siblings ...)
  2019-01-15  0:39       ` [PATCH v2 3/5] match-trees: use hashcpy to splice trees brian m. carlson
@ 2019-01-15  0:39       ` brian m. carlson
  2019-01-15  0:39       ` [PATCH v2 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson
  2019-01-15 17:51       ` [PATCH v2 0/5] tree-walk object_id refactor Junio C Hamano
  5 siblings, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-01-15  0:39 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Junio C Hamano, Jeff King

When parsing a tree, we read the object ID directly out of the tree
buffer. This is normally fine, but such an object ID cannot be used with
oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
there may not be that many bytes to copy.

Instead, store the object ID in a separate struct member. Since we can
no longer efficiently compute the path length, store that information as
well in struct name_entry. Ensure we only copy the object ID into the
new buffer if the path length is nonzero, as some callers will pass us
an empty path with no object ID following it, and we will not want to
read past the end of the buffer.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 builtin/grep.c         |  8 ++++----
 builtin/merge-tree.c   | 20 ++++++++++----------
 builtin/pack-objects.c |  4 ++--
 builtin/reflog.c       |  4 ++--
 cache-tree.c           |  4 ++--
 delta-islands.c        |  2 +-
 fsck.c                 |  4 ++--
 http-push.c            |  4 ++--
 list-objects.c         |  6 +++---
 match-trees.c          |  2 +-
 notes.c                |  4 ++--
 packfile.c             |  2 +-
 revision.c             |  4 ++--
 tree-diff.c            |  6 +++---
 tree-walk.c            | 11 +++++++----
 tree-walk.h            |  9 ++++++---
 tree.c                 | 10 +++++-----
 unpack-trees.c         |  6 +++---
 walker.c               |  4 ++--
 19 files changed, 60 insertions(+), 54 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index bad9c0a3d5..6fb93c0370 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -566,7 +566,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 		strbuf_add(base, entry.path, te_len);
 
 		if (S_ISREG(entry.mode)) {
-			hit |= grep_oid(opt, entry.oid, base->buf, tn_len,
+			hit |= grep_oid(opt, &entry.oid, base->buf, tn_len,
 					 check_attr ? base->buf + tn_len : NULL);
 		} else if (S_ISDIR(entry.mode)) {
 			enum object_type type;
@@ -574,10 +574,10 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 			void *data;
 			unsigned long size;
 
-			data = lock_and_read_oid_file(entry.oid, &type, &size);
+			data = lock_and_read_oid_file(&entry.oid, &type, &size);
 			if (!data)
 				die(_("unable to read tree (%s)"),
-				    oid_to_hex(entry.oid));
+				    oid_to_hex(&entry.oid));
 
 			strbuf_addch(base, '/');
 			init_tree_desc(&sub, data, size);
@@ -585,7 +585,7 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 					 check_attr, repo);
 			free(data);
 		} else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
-			hit |= grep_submodule(opt, repo, pathspec, entry.oid,
+			hit |= grep_submodule(opt, repo, pathspec, &entry.oid,
 					      base->buf, base->buf + tn_len);
 		}
 
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 70f6fc9167..4500c41e94 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -154,15 +154,15 @@ static void show_result(void)
 /* An empty entry never compares same, not even to another empty entry */
 static int same_entry(struct name_entry *a, struct name_entry *b)
 {
-	return	a->oid &&
-		b->oid &&
-		oideq(a->oid, b->oid) &&
+	return	!is_null_oid(&a->oid) &&
+		!is_null_oid(&b->oid) &&
+		oideq(&a->oid, &b->oid) &&
 		a->mode == b->mode;
 }
 
 static int both_empty(struct name_entry *a, struct name_entry *b)
 {
-	return !(a->oid || b->oid);
+	return is_null_oid(&a->oid) && is_null_oid(&b->oid);
 }
 
 static struct merge_list *create_entry(unsigned stage, unsigned mode, const struct object_id *oid, const char *path)
@@ -178,7 +178,7 @@ static struct merge_list *create_entry(unsigned stage, unsigned mode, const stru
 
 static char *traverse_path(const struct traverse_info *info, const struct name_entry *n)
 {
-	char *path = xmallocz(traverse_path_len(info, n));
+	char *path = xmallocz(traverse_path_len(info, n) + the_hash_algo->rawsz);
 	return make_traverse_path(path, info, n);
 }
 
@@ -192,8 +192,8 @@ static void resolve(const struct traverse_info *info, struct name_entry *ours, s
 		return;
 
 	path = traverse_path(info, result);
-	orig = create_entry(2, ours->mode, ours->oid, path);
-	final = create_entry(0, result->mode, result->oid, path);
+	orig = create_entry(2, ours->mode, &ours->oid, path);
+	final = create_entry(0, result->mode, &result->oid, path);
 
 	final->link = orig;
 
@@ -217,7 +217,7 @@ static void unresolved_directory(const struct traverse_info *info,
 
 	newbase = traverse_path(info, p);
 
-#define ENTRY_OID(e) (((e)->mode && S_ISDIR((e)->mode)) ? (e)->oid : NULL)
+#define ENTRY_OID(e) (((e)->mode && S_ISDIR((e)->mode)) ? &(e)->oid : NULL)
 	buf0 = fill_tree_descriptor(t + 0, ENTRY_OID(n + 0));
 	buf1 = fill_tree_descriptor(t + 1, ENTRY_OID(n + 1));
 	buf2 = fill_tree_descriptor(t + 2, ENTRY_OID(n + 2));
@@ -243,7 +243,7 @@ static struct merge_list *link_entry(unsigned stage, const struct traverse_info
 		path = entry->path;
 	else
 		path = traverse_path(info, n);
-	link = create_entry(stage, n->mode, n->oid, path);
+	link = create_entry(stage, n->mode, &n->oid, path);
 	link->link = entry;
 	return link;
 }
@@ -318,7 +318,7 @@ static int threeway_callback(int n, unsigned long mask, unsigned long dirmask, s
 	}
 
 	if (same_entry(entry+0, entry+1)) {
-		if (entry[2].oid && !S_ISDIR(entry[2].mode)) {
+		if (!is_null_oid(&entry[2].oid) && !S_ISDIR(entry[2].mode)) {
 			/* We did not touch, they modified -- take theirs */
 			resolve(info, entry+1, entry+2);
 			return mask;
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 24bba8147f..327d9170c4 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1334,7 +1334,7 @@ static void add_pbase_object(struct tree_desc *tree,
 		if (cmp < 0)
 			return;
 		if (name[cmplen] != '/') {
-			add_object_entry(entry.oid,
+			add_object_entry(&entry.oid,
 					 object_type(entry.mode),
 					 fullname, 1);
 			return;
@@ -1345,7 +1345,7 @@ static void add_pbase_object(struct tree_desc *tree,
 			const char *down = name+cmplen+1;
 			int downlen = name_cmp_len(down);
 
-			tree = pbase_tree_get(entry.oid);
+			tree = pbase_tree_get(&entry.oid);
 			if (!tree)
 				return;
 			init_tree_desc(&sub, tree->tree_data, tree->tree_size);
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 64a8df4f25..1f1010e2d9 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -94,8 +94,8 @@ static int tree_is_complete(const struct object_id *oid)
 	init_tree_desc(&desc, tree->buffer, tree->size);
 	complete = 1;
 	while (tree_entry(&desc, &entry)) {
-		if (!has_sha1_file(entry.oid->hash) ||
-		    (S_ISDIR(entry.mode) && !tree_is_complete(entry.oid))) {
+		if (!has_sha1_file(entry.oid.hash) ||
+		    (S_ISDIR(entry.mode) && !tree_is_complete(&entry.oid))) {
 			tree->object.flags |= INCOMPLETE;
 			complete = 0;
 		}
diff --git a/cache-tree.c b/cache-tree.c
index 190c6e5aa6..98cb66587b 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -675,7 +675,7 @@ static void prime_cache_tree_rec(struct repository *r,
 			cnt++;
 		else {
 			struct cache_tree_sub *sub;
-			struct tree *subtree = lookup_tree(r, entry.oid);
+			struct tree *subtree = lookup_tree(r, &entry.oid);
 			if (!subtree->object.parsed)
 				parse_tree(subtree);
 			sub = cache_tree_sub(it, entry.path);
@@ -724,7 +724,7 @@ int cache_tree_matches_traversal(struct cache_tree *root,
 
 	it = find_cache_tree_from_traversal(root, info);
 	it = cache_tree_find(it, ent->path);
-	if (it && it->entry_count > 0 && oideq(ent->oid, &it->oid))
+	if (it && it->entry_count > 0 && oideq(&ent->oid, &it->oid))
 		return it->entry_count;
 	return 0;
 }
diff --git a/delta-islands.c b/delta-islands.c
index 191a930705..2186bd0738 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -296,7 +296,7 @@ void resolve_tree_islands(struct repository *r,
 			if (S_ISGITLINK(entry.mode))
 				continue;
 
-			obj = lookup_object(r, entry.oid->hash);
+			obj = lookup_object(r, entry.oid.hash);
 			if (!obj)
 				continue;
 
diff --git a/fsck.c b/fsck.c
index 68502ce85b..2260adb71e 100644
--- a/fsck.c
+++ b/fsck.c
@@ -410,14 +410,14 @@ static int fsck_walk_tree(struct tree *tree, void *data, struct fsck_options *op
 			continue;
 
 		if (S_ISDIR(entry.mode)) {
-			obj = (struct object *)lookup_tree(the_repository, entry.oid);
+			obj = (struct object *)lookup_tree(the_repository, &entry.oid);
 			if (name && obj)
 				put_object_name(options, obj, "%s%s/", name,
 					entry.path);
 			result = options->walk(obj, OBJ_TREE, data, options);
 		}
 		else if (S_ISREG(entry.mode) || S_ISLNK(entry.mode)) {
-			obj = (struct object *)lookup_blob(the_repository, entry.oid);
+			obj = (struct object *)lookup_blob(the_repository, &entry.oid);
 			if (name && obj)
 				put_object_name(options, obj, "%s%s", name,
 					entry.path);
diff --git a/http-push.c b/http-push.c
index cd48590912..bb802d80ee 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1311,11 +1311,11 @@ static struct object_list **process_tree(struct tree *tree,
 	while (tree_entry(&desc, &entry))
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
-			p = process_tree(lookup_tree(the_repository, entry.oid),
+			p = process_tree(lookup_tree(the_repository, &entry.oid),
 					 p);
 			break;
 		case OBJ_BLOB:
-			p = process_blob(lookup_blob(the_repository, entry.oid),
+			p = process_blob(lookup_blob(the_repository, &entry.oid),
 					 p);
 			break;
 		default:
diff --git a/list-objects.c b/list-objects.c
index cf7f25bed3..ee6aff0cef 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -123,15 +123,15 @@ static void process_tree_contents(struct traversal_context *ctx,
 		}
 
 		if (S_ISDIR(entry.mode)) {
-			struct tree *t = lookup_tree(ctx->revs->repo, entry.oid);
+			struct tree *t = lookup_tree(ctx->revs->repo, &entry.oid);
 			t->object.flags |= NOT_USER_GIVEN;
 			process_tree(ctx, t, base, entry.path);
 		}
 		else if (S_ISGITLINK(entry.mode))
-			process_gitlink(ctx, entry.oid->hash,
+			process_gitlink(ctx, entry.oid.hash,
 					base, entry.path);
 		else {
-			struct blob *b = lookup_blob(ctx->revs->repo, entry.oid);
+			struct blob *b = lookup_blob(ctx->revs->repo, &entry.oid);
 			b->object.flags |= NOT_USER_GIVEN;
 			process_blob(ctx, b, base, entry.path);
 		}
diff --git a/match-trees.c b/match-trees.c
index c2b7329e09..18ab825bef 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -106,7 +106,7 @@ static int score_trees(const struct object_id *hash1, const struct object_id *ha
 			update_tree_entry(&two);
 		} else {
 			/* path appears in both */
-			if (!oideq(one.entry.oid, two.entry.oid)) {
+			if (!oideq(&one.entry.oid, &two.entry.oid)) {
 				/* they are different */
 				score += score_differs(one.entry.mode,
 						       two.entry.mode,
diff --git a/notes.c b/notes.c
index 25cdce28b7..7f7cc4d511 100644
--- a/notes.c
+++ b/notes.c
@@ -450,7 +450,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 
 		l = xcalloc(1, sizeof(*l));
 		oidcpy(&l->key_oid, &object_oid);
-		oidcpy(&l->val_oid, entry.oid);
+		oidcpy(&l->val_oid, &entry.oid);
 		if (note_tree_insert(t, node, n, l, type,
 				     combine_notes_concatenate))
 			die("Failed to load %s %s into notes tree "
@@ -481,7 +481,7 @@ static void load_subtree(struct notes_tree *t, struct leaf_node *subtree,
 			}
 			strbuf_addstr(&non_note_path, entry.path);
 			add_non_note(t, strbuf_detach(&non_note_path, NULL),
-				     entry.mode, entry.oid->hash);
+				     entry.mode, entry.oid.hash);
 		}
 	}
 	free(buf);
diff --git a/packfile.c b/packfile.c
index 8c6b47cc77..e38af1a275 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2100,7 +2100,7 @@ static int add_promisor_object(const struct object_id *oid,
 			 */
 			return 0;
 		while (tree_entry_gently(&desc, &entry))
-			oidset_insert(set, entry.oid);
+			oidset_insert(set, &entry.oid);
 	} else if (obj->type == OBJ_COMMIT) {
 		struct commit *commit = (struct commit *) obj;
 		struct commit_list *parents = commit->parents;
diff --git a/revision.c b/revision.c
index 13e0519c02..05fb390d55 100644
--- a/revision.c
+++ b/revision.c
@@ -67,10 +67,10 @@ static void mark_tree_contents_uninteresting(struct repository *r,
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
-			mark_tree_uninteresting(r, lookup_tree(r, entry.oid));
+			mark_tree_uninteresting(r, lookup_tree(r, &entry.oid));
 			break;
 		case OBJ_BLOB:
-			mark_blob_uninteresting(lookup_blob(r, entry.oid));
+			mark_blob_uninteresting(lookup_blob(r, &entry.oid));
 			break;
 		default:
 			/* Subproject commit - not in this repository */
diff --git a/tree-diff.c b/tree-diff.c
index 0e54324610..4ffa00ae0c 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -239,7 +239,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
 						DIFF_STATUS_ADDED;
 
 			if (tpi_valid) {
-				oid_i = tp[i].entry.oid;
+				oid_i = &tp[i].entry.oid;
 				mode_i = tp[i].entry.mode;
 			}
 			else {
@@ -280,7 +280,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
 			/* same rule as in emitthis */
 			int tpi_valid = tp && !(tp[i].entry.mode & S_IFXMIN_NEQ);
 
-			parents_oid[i] = tpi_valid ? tp[i].entry.oid : NULL;
+			parents_oid[i] = tpi_valid ? &tp[i].entry.oid : NULL;
 		}
 
 		strbuf_add(base, path, pathlen);
@@ -491,7 +491,7 @@ static struct combine_diff_path *ll_diff_tree_paths(
 						continue;
 
 					/* diff(t,pi) != ø */
-					if (!oideq(t.entry.oid, tp[i].entry.oid) ||
+					if (!oideq(&t.entry.oid, &tp[i].entry.oid) ||
 					    (t.entry.mode != tp[i].entry.mode))
 						continue;
 
diff --git a/tree-walk.c b/tree-walk.c
index 1e040fc20e..56b951d3cb 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -48,7 +48,8 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
 	/* Initialize the descriptor entry */
 	desc->entry.path = path;
 	desc->entry.mode = canon_mode(mode);
-	desc->entry.oid  = (const struct object_id *)(path + len);
+	desc->entry.pathlen = len - 1;
+	hashcpy(desc->entry.oid.hash, (const unsigned char *)path + len);
 
 	return 0;
 }
@@ -107,7 +108,7 @@ static void entry_extract(struct tree_desc *t, struct name_entry *a)
 static int update_tree_entry_internal(struct tree_desc *desc, struct strbuf *err)
 {
 	const void *buf = desc->buffer;
-	const unsigned char *end = desc->entry.oid->hash + the_hash_algo->rawsz;
+	const unsigned char *end = (const unsigned char *)desc->entry.path + desc->entry.pathlen + 1 + the_hash_algo->rawsz;
 	unsigned long size = desc->size;
 	unsigned long len = end - (const unsigned char *)buf;
 
@@ -175,9 +176,11 @@ void setup_traverse_info(struct traverse_info *info, const char *base)
 		pathlen--;
 	info->pathlen = pathlen ? pathlen + 1 : 0;
 	info->name.path = base;
-	info->name.oid = (void *)(base + pathlen + 1);
-	if (pathlen)
+	info->name.pathlen = pathlen;
+	if (pathlen) {
+		hashcpy(info->name.oid.hash, (const unsigned char *)base + pathlen + 1);
 		info->prev = &dummy;
+	}
 }
 
 char *make_traverse_path(char *path, const struct traverse_info *info, const struct name_entry *n)
diff --git a/tree-walk.h b/tree-walk.h
index 196831007e..08d349c54f 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -1,11 +1,14 @@
 #ifndef TREE_WALK_H
 #define TREE_WALK_H
 
+#include "cache.h"
+
 struct strbuf;
 
 struct name_entry {
-	const struct object_id *oid;
+	struct object_id oid;
 	const char *path;
+	int pathlen;
 	unsigned int mode;
 };
 
@@ -19,12 +22,12 @@ static inline const struct object_id *tree_entry_extract(struct tree_desc *desc,
 {
 	*pathp = desc->entry.path;
 	*modep = desc->entry.mode;
-	return desc->entry.oid;
+	return &desc->entry.oid;
 }
 
 static inline int tree_entry_len(const struct name_entry *ne)
 {
-	return (const char *)ne->oid - ne->path - 1;
+	return ne->pathlen;
 }
 
 /*
diff --git a/tree.c b/tree.c
index 215d3fdc7c..7badea0009 100644
--- a/tree.c
+++ b/tree.c
@@ -84,7 +84,7 @@ static int read_tree_1(struct tree *tree, struct strbuf *base,
 				continue;
 		}
 
-		switch (fn(entry.oid, base,
+		switch (fn(&entry.oid, base,
 			   entry.path, entry.mode, stage, context)) {
 		case 0:
 			continue;
@@ -95,19 +95,19 @@ static int read_tree_1(struct tree *tree, struct strbuf *base,
 		}
 
 		if (S_ISDIR(entry.mode))
-			oidcpy(&oid, entry.oid);
+			oidcpy(&oid, &entry.oid);
 		else if (S_ISGITLINK(entry.mode)) {
 			struct commit *commit;
 
-			commit = lookup_commit(the_repository, entry.oid);
+			commit = lookup_commit(the_repository, &entry.oid);
 			if (!commit)
 				die("Commit %s in submodule path %s%s not found",
-				    oid_to_hex(entry.oid),
+				    oid_to_hex(&entry.oid),
 				    base->buf, entry.path);
 
 			if (parse_commit(commit))
 				die("Invalid commit %s in submodule path %s%s",
-				    oid_to_hex(entry.oid),
+				    oid_to_hex(&entry.oid),
 				    base->buf, entry.path);
 
 			oidcpy(&oid, get_commit_tree_oid(commit));
diff --git a/unpack-trees.c b/unpack-trees.c
index 6d53cbfc86..734828fda2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -679,7 +679,7 @@ static int switch_cache_bottom(struct traverse_info *info)
 
 static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k)
 {
-	return name_j->oid && name_k->oid && oideq(name_j->oid, name_k->oid);
+	return !is_null_oid(&name_j->oid) && !is_null_oid(&name_k->oid) && oideq(&name_j->oid, &name_k->oid);
 }
 
 static int all_trees_same_as_cache_tree(int n, unsigned long dirmask,
@@ -857,7 +857,7 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 		else {
 			const struct object_id *oid = NULL;
 			if (dirmask & 1)
-				oid = names[i].oid;
+				oid = &names[i].oid;
 			buf[nr_buf++] = fill_tree_descriptor(t + i, oid);
 		}
 	}
@@ -981,7 +981,7 @@ static struct cache_entry *create_ce_entry(const struct traverse_info *info,
 	ce->ce_mode = create_ce_mode(n->mode);
 	ce->ce_flags = create_ce_flags(stage);
 	ce->ce_namelen = len;
-	oidcpy(&ce->oid, n->oid);
+	oidcpy(&ce->oid, &n->oid);
 	make_traverse_path(ce->name, info, n);
 
 	return ce;
diff --git a/walker.c b/walker.c
index 96990d84da..d74ae59c77 100644
--- a/walker.c
+++ b/walker.c
@@ -50,13 +50,13 @@ static int process_tree(struct walker *walker, struct tree *tree)
 			continue;
 		if (S_ISDIR(entry.mode)) {
 			struct tree *tree = lookup_tree(the_repository,
-							entry.oid);
+							&entry.oid);
 			if (tree)
 				obj = &tree->object;
 		}
 		else {
 			struct blob *blob = lookup_blob(the_repository,
-							entry.oid);
+							&entry.oid);
 			if (blob)
 				obj = &blob->object;
 		}

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes
  2019-01-15  0:39     ` [PATCH v2 " brian m. carlson
                         ` (3 preceding siblings ...)
  2019-01-15  0:39       ` [PATCH v2 4/5] tree-walk: store object_id in a separate member brian m. carlson
@ 2019-01-15  0:39       ` brian m. carlson
  2019-01-15 17:51       ` [PATCH v2 0/5] tree-walk object_id refactor Junio C Hamano
  5 siblings, 0 replies; 41+ messages in thread
From: brian m. carlson @ 2019-01-15  0:39 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Junio C Hamano, Jeff King

There are some situations in which we want to store an object ID into
struct object_id without the_hash_algo necessarily being set correctly.
One such case is when cloning a repository, where we must read refs from
the remote side without having a repository from which to read the
preferred algorithm.

In this cases, we may have the_hash_algo set to SHA-1, which is the
default, but read refs into struct object_id that are SHA-256. When
copying these values, we will want to copy them completely, not just the
first 20 bytes. Consequently, make sure that oidcpy copies the maximum
number of bytes at all times, regardless of the setting of
the_hash_algo.

Since oidcpy and hashcpy are no longer functionally identical, remove
the Cocinelle object_id transformations that convert from one into the
other.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 cache.h                            |  2 +-
 contrib/coccinelle/object_id.cocci | 30 ------------------------------
 2 files changed, 1 insertion(+), 31 deletions(-)

diff --git a/cache.h b/cache.h
index ca36b44ee0..8b8c6a1a2a 100644
--- a/cache.h
+++ b/cache.h
@@ -1072,7 +1072,7 @@ static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src)
 
 static inline void oidcpy(struct object_id *dst, const struct object_id *src)
 {
-	hashcpy(dst->hash, src->hash);
+	memcpy(dst->hash, src->hash, GIT_MAX_RAWSZ);
 }
 
 static inline struct object_id *oiddup(const struct object_id *src)
diff --git a/contrib/coccinelle/object_id.cocci b/contrib/coccinelle/object_id.cocci
index 6a7cf3e02d..3e536a9834 100644
--- a/contrib/coccinelle/object_id.cocci
+++ b/contrib/coccinelle/object_id.cocci
@@ -86,36 +86,6 @@ struct object_id OID;
 - hashcmp(OID.hash, OIDPTR->hash)
 + oidcmp(&OID, OIDPTR)
 
-@@
-struct object_id OID1, OID2;
-@@
-- hashcpy(OID1.hash, OID2.hash)
-+ oidcpy(&OID1, &OID2)
-
-@@
-identifier f != oidcpy;
-struct object_id *OIDPTR1;
-struct object_id *OIDPTR2;
-@@
-  f(...) {<...
-- hashcpy(OIDPTR1->hash, OIDPTR2->hash)
-+ oidcpy(OIDPTR1, OIDPTR2)
-  ...>}
-
-@@
-struct object_id *OIDPTR;
-struct object_id OID;
-@@
-- hashcpy(OIDPTR->hash, OID.hash)
-+ oidcpy(OIDPTR, &OID)
-
-@@
-struct object_id *OIDPTR;
-struct object_id OID;
-@@
-- hashcpy(OID.hash, OIDPTR->hash)
-+ oidcpy(&OID, OIDPTR)
-
 @@
 struct object_id *OIDPTR1;
 struct object_id *OIDPTR2;

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/5] tree-walk object_id refactor
  2019-01-15  0:39     ` [PATCH v2 " brian m. carlson
                         ` (4 preceding siblings ...)
  2019-01-15  0:39       ` [PATCH v2 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson
@ 2019-01-15 17:51       ` Junio C Hamano
  5 siblings, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-01-15 17:51 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git, Martin Ågren, Jeff King

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> There are a small number of places in our codebase where we cast a
> buffer of unsigned char to a struct object_id pointer. When we have
> GIT_MAX_RAWSZ set to 32 (because we have SHA-256), one of these places
> (the buffer for tree objects) can lead to us copying too much data when
> using SHA-1 as the hash, since there are only 20 bytes to read.

Thanks.  And thanks for a pleasant-to-follow discussion during the
review of the previous round.

> Changes from v1:
> * Use hashcpy instead of memcpy.
> * Adopt Peff's suggestion for improving patch 3.
>
> brian m. carlson (5):
>   tree-walk: copy object ID before use
>   match-trees: compute buffer offset correctly when splicing
>   match-trees: use hashcpy to splice trees
>   tree-walk: store object_id in a separate member
>   cache: make oidcpy always copy GIT_MAX_RAWSZ bytes
>
>  builtin/grep.c                     |  8 ++++----
>  builtin/merge-tree.c               | 20 ++++++++++----------
>  builtin/pack-objects.c             |  4 ++--
>  builtin/reflog.c                   |  4 ++--
>  cache-tree.c                       |  4 ++--
>  cache.h                            |  2 +-
>  contrib/coccinelle/object_id.cocci | 30 ------------------------------
>  delta-islands.c                    |  2 +-
>  fsck.c                             |  4 ++--
>  http-push.c                        |  4 ++--
>  list-objects.c                     |  6 +++---
>  match-trees.c                      | 27 ++++++++++++++++++++-------
>  notes.c                            |  4 ++--
>  packfile.c                         |  2 +-
>  revision.c                         |  4 ++--
>  tree-diff.c                        |  6 +++---
>  tree-walk.c                        | 21 ++++++++++++---------
>  tree-walk.h                        |  9 ++++++---
>  tree.c                             | 10 +++++-----
>  unpack-trees.c                     |  6 +++---
>  walker.c                           |  4 ++--
>  21 files changed, 85 insertions(+), 96 deletions(-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, back to index

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-07 23:34 What's cooking in git.git (Jan 2019, #01; Mon, 7) Junio C Hamano
2019-01-08  9:50 ` tg/checkout-no-overlay, was " Thomas Gummerer
2019-01-08 17:51   ` Junio C Hamano
2019-01-08 17:30 ` ag/sequencer-reduce-rewriting-todo " Alban Gruin
2019-01-08 21:20 ` sb/more-repo-in-api, was " Jonathan Tan
2019-01-08 21:35   ` Junio C Hamano
2019-01-09 21:28     ` Stefan Beller
2019-01-09  7:37 ` Martin Ågren
2019-01-09 21:06   ` Martin Ågren
2019-01-10  1:02     ` brian m. carlson
2019-01-10 18:55       ` Junio C Hamano
2019-01-10 19:03       ` Martin Ågren
2019-01-10  4:25     ` [PATCH 0/5] tree-walk object_id refactor brian m. carlson
2019-01-10  4:25       ` [PATCH 1/5] tree-walk: copy object ID before use brian m. carlson
2019-01-10  4:25       ` [PATCH 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson
2019-01-10  4:25       ` [PATCH 3/5] match-trees: use hashcpy to splice trees brian m. carlson
2019-01-10  6:45         ` Jeff King
2019-01-10 23:55           ` brian m. carlson
2019-01-11 14:51             ` Jeff King
2019-01-11 14:54               ` Jeff King
2019-01-14  1:30                 ` brian m. carlson
2019-01-14 15:40                   ` Jeff King
2019-01-10  4:25       ` [PATCH 4/5] tree-walk: store object_id in a separate member brian m. carlson
2019-01-10  6:49         ` Jeff King
2019-01-10 23:57           ` brian m. carlson
2019-01-10  4:25       ` [PATCH 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson
2019-01-10  6:50         ` Jeff King
2019-01-10  6:40       ` [PATCH 0/5] tree-walk object_id refactor Jeff King
2019-01-11  0:17         ` brian m. carlson
2019-01-11 14:17           ` Jeff King
2019-01-15  0:39     ` [PATCH v2 " brian m. carlson
2019-01-15  0:39       ` [PATCH v2 1/5] tree-walk: copy object ID before use brian m. carlson
2019-01-15  0:39       ` [PATCH v2 2/5] match-trees: compute buffer offset correctly when splicing brian m. carlson
2019-01-15  0:39       ` [PATCH v2 3/5] match-trees: use hashcpy to splice trees brian m. carlson
2019-01-15  0:39       ` [PATCH v2 4/5] tree-walk: store object_id in a separate member brian m. carlson
2019-01-15  0:39       ` [PATCH v2 5/5] cache: make oidcpy always copy GIT_MAX_RAWSZ bytes brian m. carlson
2019-01-15 17:51       ` [PATCH v2 0/5] tree-walk object_id refactor Junio C Hamano
2019-01-09 10:28 ` What's cooking in git.git (Jan 2019, #01; Mon, 7) Jeff King
2019-01-10 19:05   ` Junio C Hamano
2019-01-10 19:46   ` Junio C Hamano
2019-01-10 18:02 ` Stefan Beller

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox