git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* What's cooking in git.git (Sep 2022, #08; Tue, 27)
@ 2022-09-27 21:11 Junio C Hamano
  2022-09-28  1:52 ` Victoria Dye
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2022-09-27 21:11 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking in my tree.  Commits
prefixed with '+' are in 'next' (being in 'next' is a sign that a
topic is stable enough to be used and are candidate to be in a future
release).  Commits prefixed with '-' are only in 'seen', and aren't
considered "accepted" at all.  A topic without enough support may be
discarded after a long period of no activity.

The -rc2 has been tagged for this cycle.  People are free to discuss
topics that has no relevance to the upcoming release, but we would
appreciate if they instead concentrated on finding and fixing recent
regressions in the upcoming release.  Even though I may be replacing
topics in 'seen' with their new iterations, I may not be picking up
patches on new topics to 'seen', until the final around the
beginning of the next month (cf. https://tinyurl.com/gitCal).  These
patches are welcome to come back in a more polished form after that
happens (read: discussions on them are not forbidden. just allow me
to leave my tree less distracted by new topics).

Copies of the source code to Git live in many repositories, and the
following is a list of the ones I push into or their mirrors.  Some
repositories have only a subset of branches.

With maint, master, next, seen, todo:

	git://git.kernel.org/pub/scm/git/git.git/
	git://repo.or.cz/alt-git.git/
	https://kernel.googlesource.com/pub/scm/git/git/
	https://github.com/git/git/
	https://gitlab.com/git-vcs/git/

With all the integration branches and topics broken out:

	https://github.com/gitster/git/

Even though the preformatted documentation in HTML and man format
are not sources, they are published in these repositories for
convenience (replace "htmldocs" with "manpages" for the manual
pages):

	git://git.kernel.org/pub/scm/git/git-htmldocs.git/
	https://github.com/gitster/git-htmldocs.git/

Release tarballs are available at:

	https://www.kernel.org/pub/software/scm/git/

--------------------------------------------------
[Graduated to 'master']

* ds/bitmap-lookup-remove-tracing (2022-09-26) 1 commit
  (merged to 'next' on 2022-09-26 at a0d94b95e0)
 + pack-bitmap: remove trace2 region from hot path

 Perf-fix.
 source: <pull.1365.v2.git.1664198277250.gitgitgadget@gmail.com>

--------------------------------------------------
[New Topics]

* rj/ref-filter-get-head-description-leakfix (2022-09-26) 1 commit
 - ref-filter.c: fix a leak in get_head_description

 Leakfix.

 Will merge to 'next'?
 source: <6ff29e96-7f8d-c354-dced-b1b363e54467@gmail.com>

--------------------------------------------------
[Stalled]

* es/mark-gc-cruft-as-experimental (2022-08-03) 2 commits
 - config: let feature.experimental imply gc.cruftPacks=true
 - gc: add tests for --cruft and friends

 Enable gc.cruftpacks by default for those who opt into
 feature.experimental setting.

 Expecting a reroll.
 cf. <220804.86a68ke9d5.gmgdl@evledraar.gmail.com>
 cf. <6803b725-526e-a1c8-f15c-a9ed4a144d4c@github.com>
 source: <20220803205721.3686361-1-emilyshaffer@google.com>


* es/doc-creation-factor-fix (2022-07-28) 2 commits
 - range-diff: clarify --creation-factor=<factor>
 - format-patch: clarify --creation-factor=<factor>

 Expecting a reroll by somebody more familiar with the logic
 cf. <xmqqo7wfix7p.fsf@gitster.g>
 source: <7229p500-p2r4-on87-6802-8o90s36rr3s4@tzk.qr>


* cw/remote-object-info (2022-08-13) 7 commits
 - SQUASH???
 - cat-file: add remote-object-info to batch-command
 - transport: add client support for object-info
 - serve: advertise object-info feature
 - protocol-caps: initialization bug fix
 - fetch-pack: move fetch initialization
 - fetch-pack: refactor packet writing

 A client component to talk with the object-info endpoint.

 Expecting a reroll.
 cf. <20220728230210.2952731-1-calvinwan@google.com>
 cf. <CAFySSZDvgwbbHCHfyuaqX3tKsr-GjJ9iihygg6rNNe46Ys7_EA@mail.gmail.com>
 source: <20220728230210.2952731-1-calvinwan@google.com>

--------------------------------------------------
[Cooking]

* ds/scalar-unregister-idempotent (2022-09-27) 4 commits
 - string-list: document iterator behavior on NULL input
 - gc: replace config subprocesses with API calls
 - scalar: make 'unregister' idempotent
 - maintenance: add 'unregister --force'

 "git maintenance unregister" in a repository that is already been
 unregistered reported an error.

 Will merge to 'next'.
 source: <pull.1358.v4.git.1664287021.gitgitgadget@gmail.com>


* jk/clone-allow-bare-and-o-together (2022-09-22) 1 commit
  (merged to 'next' on 2022-09-27 at 1feca721ac)
 + clone: allow "--bare" with "-o"

 "git clone" did not like to see the "--bare" and the "--origin"
 options used together without a good reason.

 Will cook in 'next'.
 source: <YyvzVdfQVdysvMp2@coredump.intra.peff.net>


* jk/fsck-on-diet (2022-09-22) 3 commits
  (merged to 'next' on 2022-09-27 at c2e93082a1)
 + parse_object_buffer(): respect save_commit_buffer
 + fsck: turn off save_commit_buffer
 + fsck: free tree buffers after walking unreachable objects

 "git fsck" failed to release contents of tree objects already used
 from the memory, which has been fixed.

 Will cook in 'next'.
 source: <Yyw0PSVe3YTQGgRS@coredump.intra.peff.net>


* jk/remote-rename-without-fetch-refspec (2022-09-22) 1 commit
  (merged to 'next' on 2022-09-27 at 165fe0a832)
 + remote: handle rename of remote without fetch refspec

 "git remote rename" failed to rename a remote without fetch
 refspec, which has been corrected.

 Will cook in 'next'.
 source: <YyvzqZ5tmI0UdRAW@coredump.intra.peff.net>


* js/merge-ort-in-read-only-repo (2022-09-26) 2 commits
 - merge-ort: return early when failing to write a blob
 - merge-ort: fix segmentation fault in read-only repositories

 In read-only repositories, "git merge-tree" tried to come up with a
 merge result tree object, which it failed (which is not wrong) and
 led to a segfault (which is bad), which has been corrected.

 Expecting a reroll.
 cf. <CABPp-BGJueKtcdvzGWH_ZK6yeA6r3457ue5Uub9_gdU5M-WmvQ@mail.gmail.com>
 The proposed log message for the second step may need updating.
 source: <pull.1362.v4.git.1664229348.gitgitgadget@gmail.com>


* mc/cred-helper-ignore-unknown (2022-09-22) 3 commits
  (merged to 'next' on 2022-09-27 at ce6e3616aa)
 + osxkeychain: clarify that we ignore unknown lines
 + netrc: ignore unknown lines (do not die)
 + wincred: ignore unknown lines (do not die)

 Most credential helpers ignored unknown entries in a credential
 description, but a few died upon seeing them.  The latter were
 taught to ignore them, too

 Will cook in 'next'.
 source: <pull.1363.git.1663865974.gitgitgadget@gmail.com>


* es/retire-efgrep (2022-09-23) 2 commits
  (merged to 'next' on 2022-09-27 at 344fdd138e)
 + check-non-portable-shell: detect obsolescent egrep/fgrep
 + Merge branch 'dd/retire-efgrep' into es/retire-efgrep
 (this branch uses dd/retire-efgrep.)

 Prepare for GNU [ef]grep that throw warning of their uses.

 Will cook in 'next'.
 source: <pull.1338.git.git.1663805905554.gitgitgadget@gmail.com>


* cw/submodule-status-in-parallel (2022-09-23) 4 commits
 - diff-lib: parallelize run_diff_files for submodules
 - diff-lib: refactor functions
 - submodule: move status parsing into function
 - run-command: add pipe_output to run_processes_parallel

 Allow the internal "diff-files" engine to run "how has this
 submodule changed?" in parallel to speed up "git status".

 Breaks its self check.
 cf. https://github.com/git/git/actions/runs/3115673002/jobs/5052804463
 source: <20220922232947.631309-1-calvinwan@google.com>


* vd/fix-unaligned-read-index-v4 (2022-09-23) 1 commit
 - read-cache: avoid misaligned reads in index v4

 The codepath that reads from the index v4 had unaligned memory
 accesses, which has been corrected.

 Expecting a reroll.
 cf. <Yy4nkEnhuzt2iH+R@coredump.intra.peff.net>
 cf. <bb3a2470-7ff5-e4a6-040a-96e0e3833978@gmail.com>
 source: <pull.1366.git.1663962236069.gitgitgadget@gmail.com>


* dd/retire-efgrep (2022-09-21) 4 commits
  (merged to 'next' on 2022-09-21 at 22bc339be1)
 + t: convert fgrep usage to "grep -F"
 + t: convert egrep usage to "grep -E"
 + t: remove \{m,n\} from BRE grep usage
 + CodingGuidelines: allow grep -E
 (this branch is used by es/retire-efgrep.)

 Prepare for GNU [ef]grep that throw warning of their uses.

 Will cook in 'next'.
 source: <cover.1663765176.git.congdanhqx@gmail.com>


* tb/midx-repack-ignore-cruft-packs (2022-09-21) 7 commits
 - midx.c: avoid cruft packs with non-zero `repack --batch-size`
 - midx.c: remove unnecessary loop condition
 - midx.c: replace `xcalloc()` with `CALLOC_ARRAY()`
 - midx.c: avoid cruft packs with `repack --batch-size=0`
 - midx.c: prevent `expire` from removing the cruft pack
 - Documentation/git-multi-pack-index.txt: clarify expire behavior
 - Documentation/git-multi-pack-index.txt: fix typo

 "git multi-pack-index repack/expire" used to repack unreachable
 cruft into a new pack, which have been corrected.

 Will merge to 'next'?
 cf. <63a1c3d4-eff3-af10-4263-058c88e74594@github.com>
 source: <cover.1663638929.git.me@ttaylorr.com>


* ah/fsmonitor-daemon-usage-non-l10n (2022-09-21) 1 commit
  (merged to 'next' on 2022-09-21 at bc69a73c6c)
 + fsmonitor--daemon: don't translate literal commands

 Fix messages incorrectly marked for translation.

 Will cook in 'next'.
 source: <20220920050709.326359-1-alexhenrie24@gmail.com>


* so/diff-merges-cleanup (2022-09-16) 3 commits
  (merged to 'next' on 2022-09-22 at 57694bbed7)
 + diff-merges: clarify log.diffMerges documentation
 + diff-merges: cleanup set_diff_merges()
 + diff-merges: cleanup func_by_opt()

 Code clean-up.

 Will cook in 'next'.
 source: <20220914193102.5275-1-sorganov@gmail.com>


* ja/rebase-i-avoid-amending-self (2022-09-26) 1 commit
 - sequencer: avoid dropping fixup commit that targets self via commit-ish

 "git rebase -i" can mistakenly attempt to apply a fixup to a commit
 itself, which has been corrected.

 Will merge to 'next'?
 source: <20220924222904.1784975-1-aclopte@gmail.com>


* ac/fuzzers (2022-09-19) 1 commit
 - fuzz: reorganise the path for existing oss-fuzz fuzzers

 Source file shuffling.

 Will merge to 'next'?
 source: <pull.1353.v4.git.1663598215154.gitgitgadget@gmail.com>


* hn/parse-worktree-ref (2022-09-19) 1 commit
 - refs: unify parse_worktree_ref() and ref_type()

 Code and semantics cleaning.

 Will merge to 'next'?
 source: <pull.1325.v2.git.git.1663605291172.gitgitgadget@gmail.com>


* ed/fsmonitor-on-networked-macos (2022-09-25) 6 commits
 - fsmonitor: add documentation for allowRemote and socketDir options
 - fsmonitor: check for compatability before communicating with fsmonitor
 - fsmonitor: deal with synthetic firmlinks on macOS
 - fsmonitor: avoid socket location check if using hook
 - fsmonitor: relocate socket file if .git directory is remote
 - fsmonitor: refactor filesystem checks to common interface

 By default, use of fsmonitor on a repository on networked
 filesystem is disabled. Add knobs to make it workable on macOS.

 Will merge to 'next'?
 source: <pull.1326.v12.git.1664048782.gitgitgadget@gmail.com>


* rj/branch-edit-description-with-nth-checkout (2022-09-12) 2 commits
 - branch: support for shortcuts like @{-1} completed
 - branch: refactor "edit_description" code path

 "git branch --edit-description @{-1}" is now a way to edit branch
 description of the branch you were on before switching to the
 current branch.

 Needs review.
 source: <7abdb5a9-5707-7897-4196-8d2892beeb81@gmail.com>


* mj/credential-helper-auth-headers (2022-09-13) 8 commits
 - http: set specific auth scheme depending on credential
 - http: move proactive auth to first slot creation
 - http: store all request headers on active_request_slot
 - credential: add WWW-Authenticate header to cred requests
 - http: read HTTP WWW-Authenticate response headers
 - osxkeychain: clarify that we ignore unknown lines
 - netrc: ignore unknown lines (do not die)
 - wincred: ignore unknown lines (do not die)

 RFC
 source: <pull.1352.git.1663097156.gitgitgadget@gmail.com>


* jc/environ-docs (2022-09-16) 5 commits
 - environ: GIT_INDEX_VERSION affects not just a new repository
 - environ: simplify description of GIT_INDEX_FILE
 - environ: GIT_FLUSH should be made a usual Boolean
 - environ: explain Boolean environment variables
 - environ: document GIT_SSL_NO_VERIFY

 Documentation on various Boolean GIT_* environment variables have
 been clarified.

 Will merge to 'next'?
 source: <20220915160659.126441-1-gitster@pobox.com>


* rs/diff-caret-bang-with-parents (2022-09-15) 6 commits
 - revision: add parents after child for ^!
 - revision: rename add_parents_only() to add_nth_parent()
 - revision: factor out add_parents()
 - revision: factor out add_parent()
 - revision: factor out get_commit()
 - revision: use strtol_i() for exclude_parent

 "git diff rev^!" did not show combined diff to go to the rev from
 its parents.

 Needs review.
 source: <ba6eea28-fb3a-b376-2529-351727c02f1a@web.de>


* ab/doc-synopsis-and-cmd-usage (2022-09-07) 34 commits
 - tests: start asserting that *.txt SYNOPSIS matches -h output
 - doc txt & -h consistency: make "worktree" consistent
 - worktree: define subcommand -h in terms of command -h
 - reflog doc: list real subcommands up-front
 - doc txt & -h consistency: make "commit" consistent
 - doc txt & -h consistency: make "diff-tree" consistent
 - doc txt & -h consistency: use "[<label>...]" for "zero or more"
 - doc txt & -h consistency: make "annotate" consistent
 - doc txt & -h consistency: make "stash" consistent
 - doc txt & -h consistency: add missing options
 - doc txt & -h consistency: use "git foo" form, not "git-foo"
 - doc txt & -h consistency: make "bundle" consistent
 - doc txt & -h consistency: make "read-tree" consistent
 - doc txt & -h consistency: make "rerere" consistent
 - doc txt & -h consistency: add missing options and labels
 - doc txt & -h consistency: make output order consistent
 - doc txt & -h consistency: add or fix optional "--" syntax
 - doc txt & -h consistency: fix mismatching labels
 - t/helper/test-proc-receive.c: use "<options>", not "<options>..."
 - doc txt & -h consistency: use "<options>", not "<options>..."
 - stash doc SYNOPSIS & -h: correct padding around "[]()"
 - doc txt & -h consistency: correct padding around "[]()"
 - doc txt & -h consistency: add missing "]" to bugreport "-h"
 - doc txt & -h consistency: add "-z" to cat-file "-h"
 - doc txt & -h consistency: fix incorrect alternates syntax
 - doc txt & -h consistency: word-wrap
 - built-ins: consistently add "\n" between "usage" and options
 - doc SYNOPSIS & -h: fix incorrect alternates syntax
 - doc SYNOPSIS: consistently use ' for commands
 - doc SYNOPSIS: don't use ' for subcommands
 - blame: use a more detailed usage_msg_optf() error on bad -L
 - bundle: define subcommand -h in terms of command -h
 - builtin/bundle.c: use \t, not fix indentation 2-SP indentation
 - CodingGuidelines: update and clarify command-line conventions

 The short-help text shown by "git cmd -h" and the synopsis text
 shown at the beginning of "git help cmd" have been made more
 consistent.

 Needs review.
 source: <cover-00.34-00000000000-20220902T092734Z-avarab@gmail.com>


* ab/coccicheck-incremental (2022-08-31) 9 commits
 - spatchcache: add a ccache-alike for "spatch"
 - cocci: optimistically use COMPUTE_HEADER_DEPENDENCIES
 - cocci: make "coccicheck" rule incremental
 - cocci: split off "--all-includes" from SPATCH_FLAGS
 - cocci: split off include-less "tests" from SPATCH_FLAGS
 - Makefile: split off SPATCH_BATCH_SIZE comment from "cocci" heading
 - Makefile: have "coccicheck" re-run if flags change
 - Makefile: add ability to TAB-complete cocci *.patch rules
 - cocci rules: remove unused "F" metavariable from pending rule

 "make coccicheck" is time consuming. It has been made to run more
 incrementally.

 Needs review.
 source: <cover-v2-0.9-00000000000-20220831T205130Z-avarab@gmail.com>


* tb/diffstat-with-utf8-strwidth (2022-09-14) 1 commit
 - diff.c: use utf8_strwidth() to count display width

 "git diff --stat" etc. were invented back when everything was ASCII
 and strlen() was a way to measure the display width of a string;
 adjust them to compute the display width assuming UTF-8 pathnames.

 Expecting a reroll.
 source: <20220914151333.3309-1-tboegi@web.de>


* gc/submodule-clone-update-with-branches (2022-08-29) 6 commits
 - clone, submodule update: check out branches
 - submodule--helper: refactor up-to-date criterion
 - submodule: return target of submodule symref
 - t5617: drop references to remote-tracking branches
 - repo-settings: add submodule_propagate_branches
 - clone: teach --detach option

 "git clone --recurse-submodules" and "git submodule update" learns
 to honor the "propagete branches" option.

 Expecting a reroll.
 cf. <20220901200047.515294-1-jonathantanmy@google.com> and others
 source: <pull.1321.git.git.1661806456.gitgitgadget@gmail.com>


* sy/sparse-grep (2022-09-23) 1 commit
 - builtin/grep.c: integrate with sparse index

 "git grep" learned to expand the sparse-index more lazily and on
 demand in a sparse checkout.
 source: <20220923041842.27817-2-shaoxuan.yuan02@gmail.com>


* ds/use-platform-regex-on-macos (2022-08-26) 1 commit
  (merged to 'next' on 2022-09-14 at 80905596d8)
 + grep: fix multibyte regex handling under macOS

 With a bit of header twiddling, use the native regexp library on
 macOS instead of the compat/ one.

 Will cook in 'next'.
 cf. <xmqqzgf389k9.fsf@gitster.g>
 source: <20220826085815.2771102-1-dds@aueb.gr>


* ds/bundle-uri-3 (2022-09-09) 10 commits
 - bundle-uri: fetch a list of bundles
 - bundle-uri: limit recursion depth for bundle lists
 - bundle-uri: parse bundle list in config format
 - bundle-uri: unit test "key=value" parsing
 - bundle-uri: create "key=value" line parsing
 - bundle-uri: create base key-value pair parsing
 - bundle-uri: create bundle_list struct and helpers
 - bundle-uri: use plain string in find_temp_filename()
 - bundle-uri: short-circuit capability parsing
 - Merge branch 'ds/bundle-uri-clone' into ds/bundle-uri-3

 Define the logical elements of a "bundle list", data structure to
 store them in-core, format to transfer them, and code to parse
 them.

 Needs review.
 source: <pull.1333.v2.git.1662734015.gitgitgadget@gmail.com>


* js/cmake-updates (2022-08-24) 5 commits
 - cmake: increase time-out for a long-running test
 - cmake: avoid editing t/test-lib.sh
 - add -p: avoid ambiguous signed/unsigned comparison
 - cmake: copy the merge tools for testing
 - cmake: make it easier to diagnose regressions in CTest runs

 Update to build procedure with VS using CMake/CTest.

 Expecting a reroll.
 cf. <3df77ffd-85a2-3a54-9005-34a24ec6e82d@github.com>
 cf. <531620e1-de4c-74aa-c840-c12ce81f8740@github.com> and others
 source: <pull.1320.v2.git.1661243463.gitgitgadget@gmail.com>


* pw/rebase-keep-base-fixes (2022-09-07) 7 commits
 - rebase --keep-base: imply --no-fork-point
 - rebase --keep-base: imply --reapply-cherry-picks
 - rebase: factor out branch_base calculation
 - rebase: rename merge_base to branch_base
 - rebase: store orig_head as a commit
 - t3416: set $EDITOR in subshell
 - t3416: tighten two tests

 "git rebase --keep-base" used to discard the commits that are
 already cherry-picked to the upstream, even when "keep-base" meant
 that the base, on top of which the history is being rebuilt, does
 not yet include these cherry-picked commits.  The --keep-base
 option now implies --reapply-cherry-picks and --no-fork-point
 options.

 Expecting a reroll.
 cf. <e25127f3-6135-b716-a12f-5dbe4f40dc42@gmail.com>
 source: <pull.1323.v2.git.1662561470.gitgitgadget@gmail.com>


* ag/merge-strategies-in-c (2022-08-10) 14 commits
 - sequencer: use the "octopus" strategy without forking
 - sequencer: use the "resolve" strategy without forking
 - merge: use the "octopus" strategy without forking
 - merge: use the "resolve" strategy without forking
 - merge-octopus: rewrite in C
 - merge-recursive: move better_branch_name() to merge.c
 - merge-resolve: rewrite in C
 - merge-one-file: rewrite in C
 - update-index: move add_cacheinfo() to read-cache.c
 - merge-index: add a new way to invoke `git-merge-one-file'
 - merge-index: drop the index
 - merge-index: libify merge_one_path() and merge_all()
 - t6060: add tests for removed files
 - t6060: modify multiple files to expose a possible issue with merge-index

 An attempt to rewrite remaining merge strategies from shell to C.

 Needs more work.
 At the minimum, we should lose 11/14 and possibly 08/14.
 cf. <xmqq7d36vfur.fsf@gitster.g>
 source: <20220809185429.20098-1-alban.gruin@gmail.com>


* po/glossary-around-traversal (2022-07-09) 3 commits
 - glossary: add reachability bitmap description
 - glossary: add commit graph description
 - glossary: add Object DataBase (ODB) abbreviation

 The glossary entries for "commit-graph file" and "reachability
 bitmap" have been added.

 Expecting a reroll.
 cf. <dfe0c1ab-33f8-f13e-71ce-1829bb0d2d7f@iee.email>
 source: <pull.1282.git.1657385781.gitgitgadget@gmail.com>


* js/bisect-in-c (2022-08-30) 17 commits
 - bisect: no longer try to clean up left-over `.git/head-name` files
 - bisect: remove Cogito-related code
 - Turn `git bisect` into a full built-in
 - bisect: move even the command-line parsing to `bisect--helper`
 - bisect--helper: make `state` optional
 - bisect--helper: calling `bisect_state()` without an argument is a bug
 - bisect: avoid double-quoting when printing the failed command
 - bisect run: fix the error message
 - bisect: verify that a bogus option won't try to start a bisection
 - bisect--helper: migrate to OPT_SUBCOMMAND()
 - bisect--helper: make the order consistently `argc, argv`
 - bisect--helper: make `terms` an explicit singleton
 - bisect--helper: simplify exit code computation
 - bisect--helper: really retire `--bisect-autostart`
 - bisect--helper: really retire --bisect-next-check
 - bisect--helper: retire the --no-log option
 - Merge branch 'sg/parse-options-subcommand' into js/bisect-in-c

 Final bits of "git bisect.sh" have been rewritten in C.

 Needs review.
 cf. <xmqqv8pr8903.fsf@gitster.g>
 source: <pull.1132.v6.git.1661885419.gitgitgadget@gmail.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: What's cooking in git.git (Sep 2022, #08; Tue, 27)
  2022-09-27 21:11 What's cooking in git.git (Sep 2022, #08; Tue, 27) Junio C Hamano
@ 2022-09-28  1:52 ` Victoria Dye
  2022-09-28  4:18   ` vd/fix-unaligned-read-index-v4, was " Jeff King
  0 siblings, 1 reply; 11+ messages in thread
From: Victoria Dye @ 2022-09-28  1:52 UTC (permalink / raw)
  To: Junio C Hamano, git, Jeff King, Phillip Wood

Junio C Hamano wrote:
> * vd/fix-unaligned-read-index-v4 (2022-09-23) 1 commit
>  - read-cache: avoid misaligned reads in index v4
> 
>  The codepath that reads from the index v4 had unaligned memory
>  accesses, which has been corrected.
> 
>  Expecting a reroll.
>  cf. <Yy4nkEnhuzt2iH+R@coredump.intra.peff.net>
>  cf. <bb3a2470-7ff5-e4a6-040a-96e0e3833978@gmail.com>
>  source: <pull.1366.git.1663962236069.gitgitgadget@gmail.com>
> 

How drastic an update were you expecting for this re-roll? To keep the fix
minimal (that is, focused on 'create_from_disk()'), I was planning to just
add some comments explaining the implementation (in response to [1]). If the
goal is to get this merged quickly, I'd want to avoid a larger refactor
(suggested in [2] & [3]), since doing so would either make the
implementations of "read from disk" ('create_from_disk()') and "write to
disk" ('copy_cache_entry_to_ondisk()') different/difficult to compare, or
would involve a more invasive refactor to update both functions [4].

However, if there's no time pressure (after all, this bug has existed since
the introduction of index v4!), I'm happy to do that refactor. It would
expand the series to 2 or 3 patches, but should address all of the
suggestions made so far and make the code overall a bit clearer.

Let me know what you think!

Thanks,
- Victoria

[1] https://lore.kernel.org/git/bb3a2470-7ff5-e4a6-040a-96e0e3833978@gmail.com/
[2] https://lore.kernel.org/git/Yy4nkEnhuzt2iH+R@coredump.intra.peff.net/
[3] https://lore.kernel.org/git/YzH+IPFBGleIsAUe@coredump.intra.peff.net/
[4] https://lore.kernel.org/git/e5954e90-6b5c-46a6-0842-b3d7d1e06b33@github.com/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* vd/fix-unaligned-read-index-v4, was Re: What's cooking in git.git (Sep 2022, #08; Tue, 27)
  2022-09-28  1:52 ` Victoria Dye
@ 2022-09-28  4:18   ` Jeff King
  2022-09-28  4:19     ` [PATCH 1/3] pack-bitmap: make read_be32() public Jeff King
                       ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Jeff King @ 2022-09-28  4:18 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Junio C Hamano, git, Phillip Wood

On Tue, Sep 27, 2022 at 06:52:02PM -0700, Victoria Dye wrote:

> Junio C Hamano wrote:
> > * vd/fix-unaligned-read-index-v4 (2022-09-23) 1 commit
> >  - read-cache: avoid misaligned reads in index v4
> > 
> >  The codepath that reads from the index v4 had unaligned memory
> >  accesses, which has been corrected.
> > 
> >  Expecting a reroll.
> >  cf. <Yy4nkEnhuzt2iH+R@coredump.intra.peff.net>
> >  cf. <bb3a2470-7ff5-e4a6-040a-96e0e3833978@gmail.com>
> >  source: <pull.1366.git.1663962236069.gitgitgadget@gmail.com>
> > 
> 
> How drastic an update were you expecting for this re-roll? To keep the fix
> minimal (that is, focused on 'create_from_disk()'), I was planning to just
> add some comments explaining the implementation (in response to [1]). If the
> goal is to get this merged quickly, I'd want to avoid a larger refactor
> (suggested in [2] & [3]), since doing so would either make the
> implementations of "read from disk" ('create_from_disk()') and "write to
> disk" ('copy_cache_entry_to_ondisk()') different/difficult to compare, or
> would involve a more invasive refactor to update both functions [4].

The first "cf" there is my initial request for a v2, but I since
retracted that. I have no objection to adding more comments, but I am
happy enough without them (like Junio, it may be that I'm overly
familiar with how I expect our get_be() functions to handle alignment).

I think even if we want to go further in the near term, it is still best
to build it on top of your basic fix. And we can take that fix and make
further refactoring (if any) its own topic.

I don't think even that basic fix needs to happen before the release,
though.  While it is a slight regression that SANITIZE=undefined does
not pass in the 2.38 release candidates:

  - the bug really was there all along. It's just that a new test
    triggers it.

  - I don't think it's a sign of actual problems; we are forming an
    unaligned pointer via a funny cast, which is technically UB, but I
    don't think any real-world platforms actually care, since we
    dereference it only via get_be16().

So it's mostly just a minor annoyance for running the tests; we're
probably better not to change any code, even trivially, this late in the
release cycle.

I was going to point to my branch with commits that are slightly less
rough than what I posted, for somebody to work on it later if
interested. But after applying a minimal amount of polish, I think they
are pretty reasonable, so I'll actually share them here.

IMHO it would be OK to proceed with them even if we don't redo the
writing side. But I can see the argument that they should come together.

I'm also OK if we just drop it. It was a fun puzzle to make it all work,
and I do like the result. But arguably it's just churn, and in
particular I haven't measured the re-ordering patch to see if it
produces a performance difference. I doubt it does, but it's important
enough that we should get a clear answer.

So here are my refactoring patches for the read side, built on top of
your earlier patch. They resolve the "yuck" comment by storing and
copying individual fields (which is much less verbose than you'd think
because of struct assignment). I left the pointer type as signed; there
are still come casts, but the same ones that were necessary before my
series.

  [1/3]: pack-bitmap: make read_be32() public
  [2/3]: read-cache: read on-disk entries in byte order
  [3/3]: read-cache: use read_be32() in create_from_disk()

 compat/bswap.h | 22 ++++++++++++++++++
 pack-bitmap.c  | 12 ----------
 read-cache.c   | 60 +++++++++++++++++++++++++-------------------------
 3 files changed, 52 insertions(+), 42 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] pack-bitmap: make read_be32() public
  2022-09-28  4:18   ` vd/fix-unaligned-read-index-v4, was " Jeff King
@ 2022-09-28  4:19     ` Jeff King
  2022-09-28  4:21     ` [PATCH 2/3] read-cache: read on-disk entries in byte order Jeff King
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2022-09-28  4:19 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Junio C Hamano, git, Phillip Wood

It's a handy shortcut for other in-order parsers of binary formats.
We'll drag along read_be8(), as well, for consistency, and add
read_be16() to round out the set.

We'll also switch the signature to take a void pointer. This lets it be
used equally well with either signed or unsigned byte strings (the
get_be functions all cast to unsigned under the hood, so we don't care
either way at this level).

Signed-off-by: Jeff King <peff@peff.net>
---
 compat/bswap.h | 22 ++++++++++++++++++++++
 pack-bitmap.c  | 12 ------------
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/compat/bswap.h b/compat/bswap.h
index 512f6f4b99..6cda2cc50d 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -190,4 +190,26 @@ static inline void put_be64(void *ptr, uint64_t value)
 	p[7] = value >>  0;
 }
 
+static inline uint32_t read_be32(const void *vbuffer, size_t *pos)
+{
+	const unsigned char *buffer = vbuffer;
+	uint32_t result = get_be32(buffer + *pos);
+	(*pos) += sizeof(result);
+	return result;
+}
+
+static inline uint16_t read_be16(const void *vbuffer, size_t *pos)
+{
+	const unsigned char *buffer = vbuffer;
+	uint16_t result = get_be16(buffer + *pos);
+	(*pos) += sizeof(result);
+	return result;
+}
+
+static inline uint8_t read_u8(const void *vbuffer, size_t *pos)
+{
+	const unsigned char *buffer = vbuffer;
+	return buffer[(*pos)++];
+}
+
 #endif /* COMPAT_BSWAP_H */
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 440407f1be..51d1e79b70 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -242,18 +242,6 @@ static struct stored_bitmap *store_bitmap(struct bitmap_index *index,
 	return stored;
 }
 
-static inline uint32_t read_be32(const unsigned char *buffer, size_t *pos)
-{
-	uint32_t result = get_be32(buffer + *pos);
-	(*pos) += sizeof(result);
-	return result;
-}
-
-static inline uint8_t read_u8(const unsigned char *buffer, size_t *pos)
-{
-	return buffer[(*pos)++];
-}
-
 #define MAX_XOR_OFFSET 160
 
 static int nth_bitmap_object_oid(struct bitmap_index *index,
-- 
2.38.0.rc2.615.g4fac75f9e3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] read-cache: read on-disk entries in byte order
  2022-09-28  4:18   ` vd/fix-unaligned-read-index-v4, was " Jeff King
  2022-09-28  4:19     ` [PATCH 1/3] pack-bitmap: make read_be32() public Jeff King
@ 2022-09-28  4:21     ` Jeff King
  2022-09-29 11:27       ` Jeff King
  2022-09-28  4:23     ` [PATCH 3/3] read-cache: use read_be32() in create_from_disk() Jeff King
  2022-09-28 16:41     ` vd/fix-unaligned-read-index-v4, was Re: What's cooking in git.git (Sep 2022, #08; Tue, 27) Junio C Hamano
  3 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2022-09-28  4:21 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Junio C Hamano, git, Phillip Wood

An index entry starts with stat data and mode, followed by an oid,
followed by flags, followed by the entry name. We parse this out of
order:

  1. we must read the flags to know how long the name is

  2. we must know how long the name is in order to allocate the
     struct cache_entry, since it has a FLEX_ARRAY

  3. we must allocate the cache_entry in order to parse the stat_data
     and oid into it

This makes the parser a little hard to follow, because we have to access
the flags using an offset, rather than walking through it in byte order.

We can break the cyclic dependency by parsing the stat_data, etc, into
temporary variables, then allocating the cache_entry and copying the
parsed values in. This sets us up for simplifying the parsing in the
next commit.

The downside is that we're copying the data an extra time. It's not very
much data, and it's all fixed size, so the compiler should be able to do
a reasonable job of optimizing here. But I didn't time the potential
impact.

Note one subtlety in the patch: besides reordering the flags/name versus
stat data, we adjust the order within the stat data to match the on-disk
order (notably both fields of each cache_time struct are adjacent).

Signed-off-by: Jeff King <peff@peff.net>
---
 read-cache.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index d16eb97906..efb9efa5ee 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1879,11 +1879,14 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 					    unsigned long *ent_size,
 					    const struct cache_entry *previous_ce)
 {
+	struct stat_data sd;
+	unsigned int mode;
+	struct object_id oid;
 	struct cache_entry *ce;
 	size_t len;
 	const char *name;
 	const unsigned hashsz = the_hash_algo->rawsz;
-	const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;
+	const char *flagsp;
 	unsigned int flags;
 	size_t copy_len = 0;
 	/*
@@ -1895,6 +1898,24 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 	 */
 	int expand_name_field = version == 4;
 
+	sd.sd_ctime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
+							+ offsetof(struct cache_time, sec));
+	sd.sd_ctime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
+							 + offsetof(struct cache_time, nsec));
+	sd.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
+							+ offsetof(struct cache_time, sec));
+	sd.sd_mtime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
+							 + offsetof(struct cache_time, nsec));
+	sd.sd_dev   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, dev));
+	sd.sd_ino   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ino));
+	mode        = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mode));
+	sd.sd_uid   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, uid));
+	sd.sd_gid   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, gid));
+	sd.sd_size  = get_be32(ondisk + offsetof(struct ondisk_cache_entry, size));
+
+	oidread(&oid, (const unsigned char *)ondisk + offsetof(struct ondisk_cache_entry, data));
+	flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;
+
 	/* On-disk flags are just 16 bits */
 	flags = get_be16(flagsp);
 	len = flags & CE_NAMEMASK;
@@ -1934,25 +1955,12 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 	}
 
 	ce = mem_pool__ce_alloc(ce_mem_pool, len);
-
-	ce->ce_stat_data.sd_ctime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
-							+ offsetof(struct cache_time, sec));
-	ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
-							+ offsetof(struct cache_time, sec));
-	ce->ce_stat_data.sd_ctime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
-							 + offsetof(struct cache_time, nsec));
-	ce->ce_stat_data.sd_mtime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
-							 + offsetof(struct cache_time, nsec));
-	ce->ce_stat_data.sd_dev   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, dev));
-	ce->ce_stat_data.sd_ino   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ino));
-	ce->ce_mode  = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mode));
-	ce->ce_stat_data.sd_uid   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, uid));
-	ce->ce_stat_data.sd_gid   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, gid));
-	ce->ce_stat_data.sd_size  = get_be32(ondisk + offsetof(struct ondisk_cache_entry, size));
+	ce->ce_stat_data = sd;
+	ce->ce_mode = mode;
 	ce->ce_flags = flags & ~CE_NAMEMASK;
 	ce->ce_namelen = len;
 	ce->index = 0;
-	oidread(&ce->oid, (const unsigned char *)ondisk + offsetof(struct ondisk_cache_entry, data));
+	oidcpy(&ce->oid, &oid);
 
 	if (expand_name_field) {
 		if (copy_len)
-- 
2.38.0.rc2.615.g4fac75f9e3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] read-cache: use read_be32() in create_from_disk()
  2022-09-28  4:18   ` vd/fix-unaligned-read-index-v4, was " Jeff King
  2022-09-28  4:19     ` [PATCH 1/3] pack-bitmap: make read_be32() public Jeff King
  2022-09-28  4:21     ` [PATCH 2/3] read-cache: read on-disk entries in byte order Jeff King
@ 2022-09-28  4:23     ` Jeff King
  2022-09-28 16:41     ` vd/fix-unaligned-read-index-v4, was Re: What's cooking in git.git (Sep 2022, #08; Tue, 27) Junio C Hamano
  3 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2022-09-28  4:23 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Junio C Hamano, git, Phillip Wood

Since the previous patch put our parsing in byte order, we can simplify
the function by keeping a "pos" variable with our current offset into
the on-disk data. That lets us use read_be32() to read and advance the
pos, rather than calculating an offset for each field. This is easier to
read and harder to get wrong.

Likewise, we no longer need the "flagsp" pointer; by the time we read
the flags, our "pos" index has advanced there. And we no longer need a
separate "name" pointer. After parsing the flags, "pos" points there
(and we increment as expected if CE_EXTENDED asks us to read extra
flags).

We can also drop the local "hashsz", as its value is used in only one
place now.

Signed-off-by: Jeff King <peff@peff.net>
---
I guess you could argue that dropping "name" reduces readability, as now
we just say "ondisk + pos". It is nice having only one position pointer,
but we lose the human-readable word "name" which tells what we expect
there. I dunno.

 read-cache.c | 53 ++++++++++++++++++++++------------------------------
 1 file changed, 22 insertions(+), 31 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index efb9efa5ee..a277836ee2 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1884,11 +1884,9 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 	struct object_id oid;
 	struct cache_entry *ce;
 	size_t len;
-	const char *name;
-	const unsigned hashsz = the_hash_algo->rawsz;
-	const char *flagsp;
 	unsigned int flags;
 	size_t copy_len = 0;
+	size_t pos = 0;
 	/*
 	 * Adjacent cache entries tend to share the leading paths, so it makes
 	 * sense to only store the differences in later entries.  In the v4
@@ -1898,42 +1896,35 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 	 */
 	int expand_name_field = version == 4;
 
-	sd.sd_ctime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
-							+ offsetof(struct cache_time, sec));
-	sd.sd_ctime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime)
-							 + offsetof(struct cache_time, nsec));
-	sd.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
-							+ offsetof(struct cache_time, sec));
-	sd.sd_mtime.nsec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)
-							 + offsetof(struct cache_time, nsec));
-	sd.sd_dev   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, dev));
-	sd.sd_ino   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ino));
-	mode        = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mode));
-	sd.sd_uid   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, uid));
-	sd.sd_gid   = get_be32(ondisk + offsetof(struct ondisk_cache_entry, gid));
-	sd.sd_size  = get_be32(ondisk + offsetof(struct ondisk_cache_entry, size));
-
-	oidread(&oid, (const unsigned char *)ondisk + offsetof(struct ondisk_cache_entry, data));
-	flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;
+	sd.sd_ctime.sec = read_be32(ondisk, &pos);
+	sd.sd_ctime.nsec = read_be32(ondisk, &pos);
+	sd.sd_mtime.sec = read_be32(ondisk, &pos);
+	sd.sd_mtime.nsec = read_be32(ondisk, &pos);
+	sd.sd_dev   = read_be32(ondisk, &pos);
+	sd.sd_ino   = read_be32(ondisk, &pos);
+	mode        = read_be32(ondisk, &pos);
+	sd.sd_uid   = read_be32(ondisk, &pos);
+	sd.sd_gid   = read_be32(ondisk, &pos);
+	sd.sd_size  = read_be32(ondisk, &pos);
+
+	oidread(&oid, (const unsigned char *)ondisk + pos);
+	pos += the_hash_algo->rawsz;
 
 	/* On-disk flags are just 16 bits */
-	flags = get_be16(flagsp);
+	flags = read_be16(ondisk, &pos);
 	len = flags & CE_NAMEMASK;
 
 	if (flags & CE_EXTENDED) {
 		int extended_flags;
-		extended_flags = get_be16(flagsp + sizeof(uint16_t)) << 16;
+		extended_flags = read_be16(ondisk, &pos) << 16;
 		/* We do not yet understand any bit out of CE_EXTENDED_FLAGS */
 		if (extended_flags & ~CE_EXTENDED_FLAGS)
 			die(_("unknown index entry format 0x%08x"), extended_flags);
 		flags |= extended_flags;
-		name = (const char *)(flagsp + 2 * sizeof(uint16_t));
 	}
-	else
-		name = (const char *)(flagsp + sizeof(uint16_t));
 
 	if (expand_name_field) {
-		const unsigned char *cp = (const unsigned char *)name;
+		const unsigned char *cp = (const unsigned char *)ondisk + pos;
 		size_t strip_len, previous_len;
 
 		/* If we're at the beginning of a block, ignore the previous name */
@@ -1945,11 +1936,11 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 					previous_ce->name);
 			copy_len = previous_len - strip_len;
 		}
-		name = (const char *)cp;
+		pos = (const char *)cp - ondisk;
 	}
 
 	if (len == CE_NAMEMASK) {
-		len = strlen(name);
+		len = strlen(ondisk + pos);
 		if (expand_name_field)
 			len += copy_len;
 	}
@@ -1965,10 +1956,10 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
 	if (expand_name_field) {
 		if (copy_len)
 			memcpy(ce->name, previous_ce->name, copy_len);
-		memcpy(ce->name + copy_len, name, len + 1 - copy_len);
-		*ent_size = (name - ((char *)ondisk)) + len + 1 - copy_len;
+		memcpy(ce->name + copy_len, ondisk + pos, len + 1 - copy_len);
+		*ent_size = pos + len + 1 - copy_len;
 	} else {
-		memcpy(ce->name, name, len + 1);
+		memcpy(ce->name, ondisk + pos, len + 1);
 		*ent_size = ondisk_ce_size(ce);
 	}
 	return ce;
-- 
2.38.0.rc2.615.g4fac75f9e3

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: vd/fix-unaligned-read-index-v4, was Re: What's cooking in git.git (Sep 2022, #08; Tue, 27)
  2022-09-28  4:18   ` vd/fix-unaligned-read-index-v4, was " Jeff King
                       ` (2 preceding siblings ...)
  2022-09-28  4:23     ` [PATCH 3/3] read-cache: use read_be32() in create_from_disk() Jeff King
@ 2022-09-28 16:41     ` Junio C Hamano
  2022-09-28 17:01       ` Ævar Arnfjörð Bjarmason
  3 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2022-09-28 16:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Victoria Dye, git, Phillip Wood

Jeff King <peff@peff.net> writes:

> ... I have no objection to adding more comments, but I am
> happy enough without them (like Junio, it may be that I'm overly
> familiar with how I expect our get_be() functions to handle alignment).
> ...
> So it's mostly just a minor annoyance for running the tests; we're
> probably better not to change any code, even trivially, this late in the
> release cycle.

Yup.  I never planned to merge the topic to 'master'.  The finishing
touch I expected was to help Phillip and friends with a bit of
explanation in the log message, and then it would be ready to wait
in 'next' for the next cycle.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: vd/fix-unaligned-read-index-v4, was Re: What's cooking in git.git (Sep 2022, #08; Tue, 27)
  2022-09-28 16:41     ` vd/fix-unaligned-read-index-v4, was Re: What's cooking in git.git (Sep 2022, #08; Tue, 27) Junio C Hamano
@ 2022-09-28 17:01       ` Ævar Arnfjörð Bjarmason
  2022-09-28 17:41         ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-09-28 17:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Victoria Dye, git, Phillip Wood


On Wed, Sep 28 2022, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
>
>> ... I have no objection to adding more comments, but I am
>> happy enough without them (like Junio, it may be that I'm overly
>> familiar with how I expect our get_be() functions to handle alignment).
>> ...
>> So it's mostly just a minor annoyance for running the tests; we're
>> probably better not to change any code, even trivially, this late in the
>> release cycle.
>
> Yup.  I never planned to merge the topic to 'master'.  The finishing
> touch I expected was to help Phillip and friends with a bit of
> explanation in the log message, and then it would be ready to wait
> in 'next' for the next cycle.

In the interim are we interested in a minimal patch to the specific
scalar test that's finding this under SANITIZE=undefined, as running
un-cleanly will be new in this release?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: vd/fix-unaligned-read-index-v4, was Re: What's cooking in git.git (Sep 2022, #08; Tue, 27)
  2022-09-28 17:01       ` Ævar Arnfjörð Bjarmason
@ 2022-09-28 17:41         ` Junio C Hamano
  0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2022-09-28 17:41 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Victoria Dye, git, Phillip Wood

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Sep 28 2022, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>>
>>> ... I have no objection to adding more comments, but I am
>>> happy enough without them (like Junio, it may be that I'm overly
>>> familiar with how I expect our get_be() functions to handle alignment).
>>> ...
>>> So it's mostly just a minor annoyance for running the tests; we're
>>> probably better not to change any code, even trivially, this late in the
>>> release cycle.
>>
>> Yup.  I never planned to merge the topic to 'master'.  The finishing
>> touch I expected was to help Phillip and friends with a bit of
>> explanation in the log message, and then it would be ready to wait
>> in 'next' for the next cycle.
>
> In the interim are we interested in a minimal patch to the specific
> scalar test that's finding this under SANITIZE=undefined, as running
> un-cleanly will be new in this release?

Yes, it is a new minor annoyance that is better left this late in
the cycle.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] read-cache: read on-disk entries in byte order
  2022-09-28  4:21     ` [PATCH 2/3] read-cache: read on-disk entries in byte order Jeff King
@ 2022-09-29 11:27       ` Jeff King
  2022-09-29 15:47         ` Junio C Hamano
  0 siblings, 1 reply; 11+ messages in thread
From: Jeff King @ 2022-09-29 11:27 UTC (permalink / raw)
  To: Victoria Dye; +Cc: Junio C Hamano, git, Phillip Wood

On Wed, Sep 28, 2022 at 12:21:15AM -0400, Jeff King wrote:

> The downside is that we're copying the data an extra time. It's not very
> much data, and it's all fixed size, so the compiler should be able to do
> a reasonable job of optimizing here. But I didn't time the potential
> impact.

I timed this using "test-tool read-cache". It's kind of an artificial
benchmark, but it does isolate the code we're touching here. The results
are...not good. Here's the time to read the index of linux.git 1000
times, before and after this reordering patch:

  Benchmark 1: ./test-tool.old read-cache 1000
    Time (mean ± σ):      2.870 s ±  0.073 s    [User: 2.555 s, System: 0.315 s]
    Range (min … max):    2.789 s …  3.001 s    10 runs
  
  Benchmark 2: ./test-tool.new read-cache 1000
    Time (mean ± σ):      3.180 s ±  0.080 s    [User: 2.849 s, System: 0.331 s]
    Range (min … max):    3.092 s …  3.297 s    10 runs
  
  Summary
    './test-tool.old read-cache 1000' ran
      1.11 ± 0.04 times faster than './test-tool.new read-cache 1000'

I think that's probably the nail in the coffin for my proposed approach.
To be fair, it's only .3ms extra for a normal program which reads the
index once. That's not that big in absolute numbers. But there are
larger index files in the wild. And the improvement in simplicity and
readability is simply not that great.

-Peff

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] read-cache: read on-disk entries in byte order
  2022-09-29 11:27       ` Jeff King
@ 2022-09-29 15:47         ` Junio C Hamano
  0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2022-09-29 15:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Victoria Dye, git, Phillip Wood

Jeff King <peff@peff.net> writes:

> On Wed, Sep 28, 2022 at 12:21:15AM -0400, Jeff King wrote:
>
>> The downside is that we're copying the data an extra time. It's not very
>> much data, and it's all fixed size, so the compiler should be able to do
>> a reasonable job of optimizing here. But I didn't time the potential
>> impact.
> ...
>   Summary
>     './test-tool.old read-cache 1000' ran
>       1.11 ± 0.04 times faster than './test-tool.new read-cache 1000'
>
> I think that's probably the nail in the coffin for my proposed approach.
> To be fair, it's only .3ms extra for a normal program which reads the
> index once. That's not that big in absolute numbers. But there are
> larger index files in the wild. And the improvement in simplicity and
> readability is simply not that great.

Thanks for the due diligence.  This result may be an indication of
how efficient the existing code is, but it is a bit surprising that
one more copy of the stat_data struct makes that much difference.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-09-29 15:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27 21:11 What's cooking in git.git (Sep 2022, #08; Tue, 27) Junio C Hamano
2022-09-28  1:52 ` Victoria Dye
2022-09-28  4:18   ` vd/fix-unaligned-read-index-v4, was " Jeff King
2022-09-28  4:19     ` [PATCH 1/3] pack-bitmap: make read_be32() public Jeff King
2022-09-28  4:21     ` [PATCH 2/3] read-cache: read on-disk entries in byte order Jeff King
2022-09-29 11:27       ` Jeff King
2022-09-29 15:47         ` Junio C Hamano
2022-09-28  4:23     ` [PATCH 3/3] read-cache: use read_be32() in create_from_disk() Jeff King
2022-09-28 16:41     ` vd/fix-unaligned-read-index-v4, was Re: What's cooking in git.git (Sep 2022, #08; Tue, 27) Junio C Hamano
2022-09-28 17:01       ` Ævar Arnfjörð Bjarmason
2022-09-28 17:41         ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).