git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: SHA256 support not experimental, or?
  2023-06-30 12:20  5%             ` Son Luong Ngoc
@ 2023-06-30 16:45  5%               ` Junio C Hamano
  0 siblings, 0 replies; 122+ results
From: Junio C Hamano @ 2023-06-30 16:45 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: Adam Majer, Patrick Steinhardt, brian m. carlson, git

Son Luong Ngoc <sluongng@gmail.com> writes:

> Build tools such as Bazel would often need to hash the content of the
> source files to build a dependency graph.  And in a FUSE setup, it would
> be ideal if the FUSE server could supply the hash via an xattr, so that
> FUSE client does not need to fetch the whole file content and only the
> metadata.

This is unrelated tangent, but the implementation of virtual
filesystem on top of Git's object store will be able to give such
SHA-256 hash only by computing the hash itself, if the "hash the
content of the source files" has to be exactly SHA-256.  Using Git
repository that uses SHA-256 would *not* help.

    $ git init --object-format sha256
    $ echo hello | git hash-object --stdin
    2cf8d83d9ee29543b34a87727421fdecb7e3f3a183d337639025de576db9ebb4
    $ echo hello | sha256sum
    5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03  -

This is because the object name used by Git is not the hash of the
content.  It is a hash of an object header (object type and byte
count) followed by its contents.

    $ printf "blob 6\0hello\n" | sha256sum
    2cf8d83d9ee29543b34a87727421fdecb7e3f3a183d337639025de576db9ebb4  -

The build systems can choose to tell FUSE server to expose the Git
object names via xattr, but if it needs to see if some contents (not
in FUSE) it has on hand is the same as what is stored in the FUSE
server, it needs to use the "slightly modified SHA-256" that matches
what Git uses.  It would still be using some hash that has the same
strength as underlying SHA-256, but it is *not* SHA-256.


^ permalink raw reply	[relevance 5%]

* Re: SHA256 support not experimental, or?
  @ 2023-06-30 12:20  5%             ` Son Luong Ngoc
  2023-06-30 16:45  5%               ` Junio C Hamano
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2023-06-30 12:20 UTC (permalink / raw)
  To: Adam Majer; +Cc: Patrick Steinhardt, brian m. carlson, Junio C Hamano, git

Hi,

> On 30 Jun 2023, at 13:25, Adam Majer <adamm@zombino.com> wrote:...
> On 6/30/23 11:31, Patrick Steinhardt wrote:
...
> > In any case I'm fully supportive of relaxing the current warning. Except
> > for the recently discussed edge case where cloning empty repositories
> > didn't create a SHA256 repository I have found the SHA256 code to be
> > stable and working as advertised. We should caution people that many
> > services will not work with SHA256 yet though.
>
> That is exactly true. But this is also chicken-egg problem. Services are not adapted for sha256 repositories because there is simply no demand for them. Only when people will start using sha256 repos, will there be some demand generated.

FWIW, in the Bazel ecosystem where SHA256 is very popular, there has
been an increasing appetite for FUSE file system to lazily fetch contents
of a git repository.

Build tools such as Bazel would often need to hash the content of the
source files to build a dependency graph.  And in a FUSE setup, it would
be ideal if the FUSE server could supply the hash via an xattr, so that
FUSE client does not need to fetch the whole file content and only the
metadata.

Most tools in this space (Bazel, Buck2) are using SHA256 and are exploring
faster hash such as Blake3, Aegis, KangarooTwelve for larger file
support.  As these matured build tools gains popularity, so will the usage
of SHA256 (and newer hash algorithm).

Another point I think might help motivate different forges to
move would be switching from the object's hash to digest (hash and
file size).  The additional file size information would help tremendously
in predicting compute resources when serving files of a repository.

So I think Git would simply need a bit more time for these related
ecosystems to reach a critical mass and help fuel the transition to a
<new-hasher>.

> - Adam

Regards,
Son Luong.

References:

- https://buck2.build/docs/rfcs/drafts/digest-kinds/#use-cases
- https://github.com/bazelbuild/bazel/pull/18784

^ permalink raw reply	[relevance 5%]

* Re: Automatically re-running commands during an interactive rebase or post commit
  @ 2023-05-30  7:22  5% ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2023-05-30  7:22 UTC (permalink / raw)
  To: Paul Jolly; +Cc: git

Hey Paul,

On Mon, May 29, 2023 at 3:44 PM Paul Jolly <paul@myitcv.io> wrote:
>
> Hi all,
>
> I would appreciate some advice on the best way to solve the following problem.
>
...
>
> I've tried to experiment with how I might do this using git commit
> hooks. But so far, my git foo is failing me. It mainly fails because
> when doing an edit of an earlier commit via an interactive rebase,
> later changes might well conflict (in the generated file) with the
> results of the code generator having been re-run on the edited commit.
> At this point, my git rebase --continue stops until I have fixed the
> conflict. But in almost all situations, the conflict comes in the
> generated hash file. Which I fix by simply re-running the code
> generation script (I could optionally fix it by doing a git checkout
> --theirs, and then re-running the code generation script).
>
> This all feels tantalisingly close to being a perfect workflow! But I
> can't quite figure out how to make the git hooks "work" in such a way
> that doesn't require any intervention from me (except in those
> situations where there is a conflict during the rebase that is _not_
> in the code generated file and so does require my intervention).
>
> The code generation step is incredibly fast if there is nothing to do,
> and is quite fast even when there is something to do (in any case it
> can't avoid doing this work).
>
> Please can someone help nudge me in the right direction?

In general, there are 2 cases that you would want to handle:
1. Inserting format directive in between commit rebase that DOES NOT
    come with merge conflicts
2. Same but DOES come with merge conflicts.

For (1), you might be interested in tools such as
- Git Absorb(a) that automatically fixup your stack of commits with your
  current dirty changes.
- Git Branchless(b) "git test" feature

Both of these tools are heavily influenced by Meta's internal Phabricator
mercurial workflow. Since the release of these tools, Meta has also
open-sourced their internal tool at Sapling SCM(c) which they touted
to be git-compatible.

For (2), and if none of the tools above solve your problem,
then I recommend using git-rebase interactive with a vim macro to
generate the needed rebase todo. You can find my comment in (d)
to see what such a rebase todo list would look like.

Tools such as Restack (e) take it a step further by providing a custom Git
`sequence.editor` to programmatically generate the rebase todo for you.
This could be a bash script, or a perl script... or a custom Go binary of
your choosing. You might want to go down this route if a vim macro is
not sufficient and you require some custom logic.

Finally, I would recommend turning on rerere.enabled (f) config to store
the conflict resolution for subsequent rebase attempts. This way, you would
only need to resolve each rebase conflict once.

(a): https://github.com/tummychow/git-absorb
(b): https://github.com/arxanas/git-branchless/wiki/Command:-git-test#fixing-formatting-and-linting-issues
(c): https://sapling-scm.com/docs/commands/absorb
(d): https://github.com/arxanas/git-branchless/discussions/45#discussioncomment-3364792
(e): https://github.com/abhinav/restack
(f): https://git-scm.com/docs/git-config#Documentation/git-config.txt-rerereenabled

>
> Many thanks,
>
>
> Paul

Cheers,
Son Luong.

^ permalink raw reply	[relevance 5%]

* Re: [RFC PATCH] upload_pack.c: make deepen-not more tree-ish
       [not found]       ` <CAL3xRKdCkAAR0r3jyKFy+TtUi65LQcHaste=2WCqYHtwi8cUhw@mail.gmail.com>
@ 2023-02-12 14:12 12%     ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2023-02-12 14:12 UTC (permalink / raw)
  To: wansink; +Cc: git

Re-send to the Git mailing-list as setting a font on gmail switched
plain-text to HTML and thus, got blocked by mailing-list.

On Sun, Feb 12, 2023 at 3:09 PM Son Luong Ngoc <sluongng@gmail.com> wrote:
>
> Hi Andrew,
>
> On Sat, Feb 11, 2023 at 11:49 PM Andrew Wansink <andy@halogix.com> wrote:
> >
> > This unlocks `git clone --shallow-exclude=<commit-sha1>`
> >
> > git-clone only accepts --shallow-excude arguments where
> > the argument is a branch or tag because upload_pack only
> > searches deepen-not arguments for branches and tags.
> >
> > Make process_deepen_not search for commit objects if no
> > branch or tag is found then add them to the deepen_not
> > list.
> >
> > Signed-off-by: Andrew Wansink <wansink@uber.com>
> > ---
> >
> > At Uber we have a lot of patches in CI simultaneously,
> > the CI jobs will frequently clone the monorepo multiple
> > times for each patch.  They do this to calculate diffs
> > between a patch and its parent commit.
> >
>
> I used to manage a CI system that support monorepo use cases not so long ago.
> We had several hosts(VM/Baremetal) on which we spin up containers for CI to run.
>
> We maintain a bare copy of the monorepo on the host level (cron job / systemd / DaemonSet) and mount this as read-only into each of the CI containers.
>
> Each of the CI containers would attempt to clone/fetch the monorepo with `--reference-if-able ./path/to/read-only-mount/repo.git` (1)
> So that most of the needed objects are already on disk in the shared bare repo.
>
>
> +-----------+  +-----------+  +-----------+
> | container |  | container |  | container |
> +-----------+  +-----------+  +-----------+
>              \       |       /
>       (mount) \      |      /
>               +------------+                 +--------+
>               | bare-repo  | <-------------- | Remote |
>               +------------+   (git-fetch)   +--------+
>                     |
>                     | (maintain)
>                     |
>               +----------+
>               | cron-job |
>               +----------+
>
> (forgive my horrible drawing)
>
> With this setup, we did not have a need to shallow clone any longer,
> and our git-clone in each container is simply a combination of git-ls-remote and a very light-weighted git-fetch.
> In some cases, such as a job in the later stages of a CI pipeline,
> the host would already download all the needed objects into the bare copy of the repository.
> This lets us skip git-fetch entirely when the CI container executes.
>
> Compared to the shallow clone approach,
> our "local cache" approach sped up the clone speed drastically
> while allowing developers to interact with git history inside tests a lot easier.
>
> > One optimisation in this flow is to clone only to a specific
> > depth, this may or may not work, depending on how old the
> > patch is.  In this case we have to --unshallow or discard
> > the shallow clone and fully clone the repo.
> >
> > This patch would allow us to clone to exactly the depth we
> > need to find a patch's parent commit.
>
> Hope it helps,
> Son Luong.
>
> (1): https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---reference-if-ableltrepositorygt

^ permalink raw reply	[relevance 12%]

* What's cooking in git.git (Jun 2022, #04; Mon, 13)
@ 2022-06-14  1:46  1% Junio C Hamano
  0 siblings, 0 replies; 122+ results
From: Junio C Hamano @ 2022-06-14  1:46 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking in my tree.  Commits
prefixed with '+' are in 'next' (being in 'next' is a sign that a
topic is stable enough to be used and are candidate to be in a
future release).  Commits prefixed with '-' are only in 'seen',
and aren't considered "accepted" at all.

I just tagged Git 2.37-rc0, after merging some topics to the
'master' branch.  For some topics, it is a day early (I usually try
to have topics cook at least 7 calendar days in 'next'), but since
tomorrow is my "offline every other Tuesday" day, I am merging them
early, among them is a fix for another (and hopefully the last
known) 2.36 regression.  I plan to tag -rc1 around the end of the
week, at which time we will stop merging any new topic from the
'next' branch down to 'master' until the final release that will
happen around the end of the month (https://tinyurl.com/gitCal).

Copies of the source code to Git live in many repositories, and the
following is a list of the ones I push into or their mirrors.  Some
repositories have only a subset of branches.

With maint, master, next, seen, todo:

	git://git.kernel.org/pub/scm/git/git.git/
	git://repo.or.cz/alt-git.git/
	https://kernel.googlesource.com/pub/scm/git/git/
	https://github.com/git/git/
	https://gitlab.com/git-vcs/git/

With all the integration branches and topics broken out:

	https://github.com/gitster/git/

Even though the preformatted documentation in HTML and man format
are not sources, they are published in these repositories for
convenience (replace "htmldocs" with "manpages" for the manual
pages):

	git://git.kernel.org/pub/scm/git/git-htmldocs.git/
	https://github.com/gitster/git-htmldocs.git/

Release tarballs are available at:

	https://www.kernel.org/pub/software/scm/git/

--------------------------------------------------
[Graduated to 'master']

* ab/hooks-regression-fix (2022-06-07) 2 commits
  (merged to 'next' on 2022-06-08 at c1109feb67)
 + hook API: fix v2.36.0 regression: hooks should be connected to a TTY
 + run-command: add an "ungroup" option to run_process_parallel()

 In Git 2.36 we revamped the way how hooks are invoked.  One change
 that is end-user visible is that the output of a hook is no longer
 directly connected to the standard output of "git" that spawns the
 hook, which was noticed post release.  This is getting corrected.
 source: <cover-v6-0.2-00000000000-20220606T170356Z-avarab@gmail.com>


* ab/remote-free-fix (2022-06-07) 2 commits
  (merged to 'next' on 2022-06-08 at 03c3aeaeee)
 + remote.c: don't dereference NULL in freeing loop
 + remote.c: remove braces from one-statement "for"-loops

 Use-after-free (with another forget-to-free) fix.
 source: <cover-0.3-00000000000-20220607T154520Z-avarab@gmail.com>


* ds/credentials-in-url (2022-06-06) 1 commit
  (merged to 'next' on 2022-06-08 at 3db83a2012)
 + remote: create fetch.credentialsInUrl config

 The "fetch.credentialsInUrl" configuration variable controls what
 happens when a URL with embedded login credential is used.
 source: <pull.1237.v5.git.1654526176695.gitgitgadget@gmail.com>


* gc/document-config-worktree-scope (2022-06-07) 1 commit
  (merged to 'next' on 2022-06-08 at 85f62a864a)
 + config: document and test the 'worktree' scope

 Doc update.
 source: <pull.1274.git.git.1654637044966.gitgitgadget@gmail.com>


* js/wait-or-whine-can-fail (2022-06-07) 1 commit
  (merged to 'next' on 2022-06-08 at 54fe70c95d)
 + run-command: don't spam trace2_child_exit()

 We used to log an error return from wait_or_whine() as process
 termination of the waited child, which was incorrect.
 source: <50d872a057a558fa5519856b95abd048ddb514dc.1654625626.git.steadmon@google.com>


* jt/unparse-commit-upon-graft-change (2022-06-06) 1 commit
  (merged to 'next' on 2022-06-08 at 3d8de84325)
 + commit,shallow: unparse commits if grafts changed

 Updating the graft information invalidates the list of parents of
 in-core commit objects that used to be in the graft file.
 source: <20220606175437.1740447-1-jonathantanmy@google.com>


* pb/range-diff-with-submodule (2022-06-06) 1 commit
  (merged to 'next' on 2022-06-07 at e5e31590c4)
 + range-diff: show submodule changes irrespective of diff.submodule

 "git -c diff.submodule=log range-diff" did not show anything for
 submodules that changed in the ranges being compared, and
 "git -c diff.submodule=diff range-diff" did not work correctly.
 Fix this by including the "--submodule=short" output
 unconditionally to be compared.
 source: <pull.1244.v2.git.1654549153769.gitgitgadget@gmail.com>


* sn/fsmonitor-missing-clock (2022-06-07) 1 commit
  (merged to 'next' on 2022-06-08 at 812b99338c)
 + fsmonitor: query watchman with right valid json

 Sample watchman interface hook sometimes failed to produce
 correctly formatted JSON message, which has been corrected.
 source: <20220607111419.15753-1-sluongng@gmail.com>


* tb/show-ref-optim (2022-06-06) 1 commit
  (merged to 'next' on 2022-06-08 at 683a3cc261)
 + builtin/show-ref.c: avoid over-iterating with --heads, --tags

 "git show-ref --heads" (and "--tags") still iterated over all the
 refs only to discard refs outside the specified area, which has
 been corrected.
 source: <3fa6932641f18d78156bbf60b1571383f2cb5046.1654293264.git.me@ttaylorr.com>


* tl/ls-tree-oid-only (2022-06-03) 1 commit
  (merged to 'next' on 2022-06-07 at e1c1e0b25a)
 + ls-tree: test for the regression in 9c4d58ff2c3

 Add tests for a regression fixed earlier.
 source: <patch-v2-1.1-f2beb02dd29-20220603T102148Z-avarab@gmail.com>


* zh/read-cache-copy-name-entry-fix (2022-06-06) 1 commit
  (merged to 'next' on 2022-06-08 at 760f43dd19)
 + read-cache.c: reduce unnecessary cache entry name copying

 Remove redundant copying (with index v3 and older) or possible
 over-reading beyond end of mmapped memory (with index v4) has been
 corrected.
 source: <pull.1249.git.1654436248249.gitgitgadget@gmail.com>

--------------------------------------------------
[New Topics]

* jc/apply-icase-tests (2022-06-13) 1 commit
 - t4141: test "git apply" with core.ignorecase

 source: <xmqqo7yw77qo.fsf@gitster.g>


* ll/curl-accept-language (2022-06-13) 2 commits
 - PREP??? give initializer to rpc_state
 - remote-curl: send Accept-Language header to server

 source: <pull.1251.v3.git.1655054421697.gitgitgadget@gmail.com>


* pb/diff-doc-raw-format (2022-06-13) 3 commits
 - diff-index.txt: update raw output format in examples
 - diff-format.txt: correct misleading wording
 - diff-format.txt: dst can be 0* SHA-1 when path is deleted, too

 source: <pull.1259.git.1655123383.gitgitgadget@gmail.com>


* rs/archive-with-internal-gzip (2022-06-13) 5 commits
 - archive-tar: use internal gzip by default
 - archive-tar: use OS_CODE 3 (Unix) for internal gzip
 - archive-tar: add internal gzip implementation
 - archive-tar: factor out write_block()
 - archive: rename archiver data field to filter_command

 source: <217a2f4d-4fc2-aaed-f5c2-1b7e134b046d@web.de>


* tl/pack-bitmap-trace (2022-06-13) 5 commits
 - bitmap: add trace2 outputs during open "bitmap" file
 - pack-bitmap.c: using error() instead of silently returning -1
 - pack-bitmap.c: make warnings support i18N when opening bitmap
 - pack-bitmap.c: rename "idx_name" to "bitmap_name"
 - pack-bitmap.c: continue looping when first MIDX bitmap is found

 source: <cover.1655018322.git.dyroneteng@gmail.com>

--------------------------------------------------
[Stalled]

* en/merge-tree (2022-02-23) 13 commits
 - git-merge-tree.txt: add a section on potentional usage mistakes
 - merge-tree: add a --allow-unrelated-histories flag
 - merge-tree: allow `ls-files -u` style info to be NUL terminated
 - merge-tree: provide easy access to `ls-files -u` style info
 - merge-tree: provide a list of which files have conflicts
 - merge-ort: provide a merge_get_conflicted_files() helper function
 - merge-tree: support including merge messages in output
 - merge-ort: split out a separate display_update_messages() function
 - merge-tree: implement real merges
 - merge-tree: add option parsing and initial shell for real merge function
 - merge-tree: move logic for existing merge into new function
 - merge-tree: rename merge_trees() to trivial_merge_trees()
 - Merge branch 'en/remerge-diff' into en/merge-trees

 A new command is introduced that takes two commits and computes a
 tree that would be contained in the resulting merge commit, if the
 histories leading to these two commits were to be merged, and is
 added as a new mode of "git merge-tree" subcommand.

 On hold.
 cf. <CABPp-BGZ7OAYRR5YKRsxJSo-C=ho+qcNAkqwkim8CkhCfCeHsA@mail.gmail.com>
 source: <pull.1122.v6.git.1645602413.gitgitgadget@gmail.com>


* bc/stash-export (2022-04-08) 4 commits
 - builtin/stash: provide a way to import stashes from a ref
 - builtin/stash: provide a way to export stashes to a ref
 - builtin/stash: factor out revision parsing into a function
 - object-name: make get_oid quietly return an error

 A mechanism to export and import stash entries to and from a normal
 commit to transfer it across repositories has been introduced.

 Expecting a reroll.
 cf. <YnL2d4Vr9Vr7W4Hj@camp.crustytoothpaste.net>
 source: <20220407215352.3491567-1-sandals@crustytoothpaste.net>


* cw/remote-object-info (2022-05-06) 11 commits
 - SQUASH??? coccicheck
 - SQUASH??? ensure that coccicheck is happy
 - SQUASH??? compilation fix
 - cat-file: add --batch-command remote-object-info command
 - cat-file: move parse_cmd and DEFAULT_FORMAT up
 - transport: add object-info fallback to fetch
 - transport: add client side capability to request object-info
 - object-info: send attribute packet regardless of object ids
 - object-store: add function to free object_info contents
 - fetch-pack: move fetch default settings
 - fetch-pack: refactor packet writing

 A client component to talk with the object-info endpoint.

 Expecting a reroll.
 source: <20220502170904.2770649-1-calvinwan@google.com>

--------------------------------------------------
[Cooking]

* ds/branch-checked-out (2022-06-13) 5 commits
 - branch: fix branch_checked_out() leaks
 - branch: use branch_checked_out() when deleting refs
 - fetch: use new branch_checked_out() and add tests
 - branch: check for bisects and rebases
 - branch: add branch_checked_out() helper

 Introduce a helper to see if a branch is already being worked on
 (hence should not be newly checked out in a working tree), which
 performs much better than the existing find_shared_symref() to
 replace many uses of the latter.

 Will merge to 'next'?
 source: <pull.1254.git.1654718942.gitgitgadget@gmail.com>


* fs/ssh-default-key-command-doc (2022-06-08) 1 commit
  (merged to 'next' on 2022-06-10 at b5cc5b6619)
 + gpg docs: explain better use of ssh.defaultKeyCommand

 Doc update.

 Will merge to 'master'.
 source: <20220608152437.126276-1-fs@gigacodes.de>


* js/ci-github-workflow-markup (2022-06-13) 3 commits
 - ci(github): also mark up compile errors
 - ci(github): use grouping also in the `win-build` job
 - ci(github): bring back the 'print test failures' step

 Recent CI update hides certain failures in test jobs, which has
 been corrected.

 Will merge to 'next'.
 source: <pull.1253.v2.git.1655125988.gitgitgadget@gmail.com>


* jt/connected-show-missing-from-which-side (2022-06-10) 1 commit
 - fetch,fetch-pack: clarify connectivity check error

 We may find an object missing after a "git fetch" stores the
 objects it obtained from the other side, but it is not necessarily
 because the remote failed to send necessary objects.  Reword the
 messages in an attempt to help users explore other possibilities
 when they hit this error.

 Expecting a reroll.
 source: <20220610195247.1177549-1-jonathantanmy@google.com>


* gc/submodule-update (2022-06-10) 8 commits
 - submodule update: remove never-used expansion
 - submodule update: stop parsing options in .sh
 - submodule update: remove -v, pass --quiet
 - submodule--helper update: use one param per type
 - submodule update: pass --require-init and --init
 - submodule update: pass options with stuck forms
 - submodule update: pass options containing "[no-]"
 - submodule update: remove intermediate parsing

 More work on "git submodule update".

 Needs review.
 source: <pull.1275.git.git.1654820781.gitgitgadget@gmail.com>


* jc/resolve-undo (2022-06-09) 1 commit
 - revision: mark blobs needed for resolve-undo as reachable

 The resolve-undo information in the index was not protected against
 GC, which has been corrected.

 Will merge to 'next'?
 source: <xmqqfskdieqz.fsf@gitster.g>


* jp/prompt-clear-before-upstream-mark (2022-06-10) 2 commits
 - git-prompt: fix expansion of branch colour codes
  (merged to 'next' on 2022-06-08 at 201a84ad63)
 + git-prompt: make colourization consistent

 Bash command line prompt (in contrib/) update.

 Will merge to 'next'.
 source: <20220609204447.32841-1-joak-pet@online.no>
 source: <20220606175022.8410-1-joak-pet@online.no>


* ab/build-gitweb (2022-06-02) 7 commits
 - Makefile: build 'gitweb' in the default target
 - gitweb/Makefile: include in top-level Makefile
 - gitweb: remove "test" and "test-installed" targets
 - gitweb/Makefile: prepare to merge into top-level Makefile
 - gitweb/Makefile: clear up and de-duplicate the gitweb.{css,js} vars
 - gitweb/Makefile: add a $(GITWEB_ALL) variable
 - gitweb/Makefile: define all .PHONY prerequisites inline

 Teach "make all" to build gitweb as well.

 Needs review.
 source: <cover-v2-0.7-00000000000-20220531T173805Z-avarab@gmail.com>


* ab/test-without-templates (2022-06-06) 7 commits
 - tests: don't assume a .git/info for .git/info/sparse-checkout
 - tests: don't assume a .git/info for .git/info/exclude
 - tests: don't assume a .git/info for .git/info/refs
 - tests: don't assume a .git/info for .git/info/attributes
 - tests: don't assume a .git/info for .git/info/grafts
 - tests: don't depend on template-created .git/branches
 - t0008: don't rely on default ".git/info/exclude"

 Tweak tests so that they still work when the "git init" template
 did not create .git/info directory.

 Will merge to 'next'?
 source: <cover-v2-0.7-00000000000-20220603T110506Z-avarab@gmail.com>


* ac/bitmap-format-doc (2022-06-10) 3 commits
 - bitmap-format.txt: add information for trailing checksum
 - bitmap-format.txt: fix some formatting issues
 - bitmap-format.txt: feed the file to asciidoc to generate html

 Adjust technical/bitmap-format to be formatted by AsciiDoc, and
 add some missing information to the documentation.

 Will merge to 'next'?
 source: <pull.1246.v3.git.1654858481.gitgitgadget@gmail.com>


* hx/unpack-streaming (2022-06-13) 6 commits
 - unpack-objects: use stream_loose_object() to unpack large objects
 - core doc: modernize core.bigFileThreshold documentation
 - object-file.c: add "stream_loose_object()" to handle large object
 - object-file.c: factor out deflate part of write_loose_object()
 - object-file.c: refactor write_loose_object() to several steps
 - unpack-objects: low memory footprint for get_data() in dry_run mode

 Allow large objects read from a packstream to be streamed into a
 loose object file straight, without having to keep it in-core as a
 whole.

 Will merge to 'next'?
 source: <cover.1654914555.git.chiyutianyi@gmail.com>


* po/rebase-preserve-merges (2022-06-06) 4 commits
  (merged to 'next' on 2022-06-10 at 471f67aebc)
 + rebase: translate a die(preserve-merges) message
 + rebase: note `preserve` merges may be a pull config option
 + rebase: help users when dying with `preserve-merges`
 + rebase.c: state preserve-merges has been removed

 Various error messages that talk about the removal of
 "--preserve-merges" in "rebase" have been strengthened, and "rebase
 --abort" learned to get out of a state that was left by an earlier
 use of the option.

 Will merge to 'master'.
 source: <pull.1242.v2.git.1654341469.gitgitgadget@gmail.com>


* tb/show-ref-count (2022-06-06) 2 commits
 - builtin/show-ref.c: limit output with `--count`
 - builtin/show-ref.c: rename `found_match` to `matches_nr`

 "git show-ref" learned to stop after emitting N refs with the new
 "--count=N" option.

 Expecting a reroll.
 cf. <xmqqczfl4ce1.fsf@gitster.g>
 source: <cover.1654552560.git.me@ttaylorr.com>


* jc/cocci-cleanup (2022-06-07) 1 commit
 - cocci: retire is_null_sha1() rule

 Remove a coccinelle rule that is no longer relevant.

 Will merge to 'next'.
 source: <xmqq7d5suoqt.fsf@gitster.g>


* ds/bundle-uri-more (2022-06-06) 6 commits
 - fetch: add 'refs/bundle/' to log.excludeDecoration
 - bundle-uri: add support for http(s):// and file://
 - fetch: add --bundle-uri option
 - bundle-uri: create basic file-copy logic
 - remote-curl: add 'get' capability
 - docs: document bundle URI standard

 The "bundle URI" topic.

 Needs review.
 source: <pull.1248.git.1654545325.gitgitgadget@gmail.com>


* jc/revert-show-parent-info (2022-05-31) 2 commits
  (merged to 'next' on 2022-06-07 at e405211ff4)
 + revert: --reference should apply only to 'revert', not 'cherry-pick'
  (merged to 'next' on 2022-05-30 at b5da52dc14)
 + revert: optionally refer to commit in the "reference" format

 "git revert" learns "--reference" option to use more human-readable
 reference to the commit it reverts in the message template it
 prepares for the user.

 Will merge to 'master'.
 source: <xmqq8rqn7buk.fsf_-_@gitster.g>


* js/bisect-in-c (2022-05-21) 15 commits
 - bisect: no longer try to clean up left-over `.git/head-name` files
 - bisect: remove Cogito-related code
 - Turn `git bisect` into a full built-in
 - bisect: teach the `bisect--helper` command to show the correct usage strings
 - bisect: move even the command-line parsing to `bisect--helper`
 - bisect--helper: return only correct exit codes in `cmd_*()`
 - bisect--helper: move the `BISECT_STATE` case to the end
 - bisect--helper: make `--bisect-state` optional
 - bisect--helper: align the sub-command order with git-bisect.sh
 - bisect--helper: using `--bisect-state` without an argument is a bug
 - bisect--helper: really retire `--bisect-autostart`
 - bisect--helper: really retire --bisect-next-check
 - bisect--helper: retire the --no-log option
 - bisect: avoid double-quoting when printing the failed command
 - bisect run: fix the error message

 Final bits of "git bisect.sh" have been rewritten in C.

 The command line parsing is reported to be still broken.
 cf. <220521.86zgjazuy4.gmgdl@evledraar.gmail.com>
 source: <pull.1132.v3.git.1653144546.gitgitgadget@gmail.com>


* cb/path-owner-check-with-sudo-plus (2022-05-12) 1 commit
 - git-compat-util: allow root to access both SUDO_UID and root owned

 "sudo git foo" used to consider a repository owned by the original
 user a safe one to access; it now also considers a repository owned
 by root a safe one, too (after all, if an attacker can craft a
 malicious repository owned by root, the box is 0wned already).

 Will merge to 'next'?
 cf. <20220519152344.ktrifm3pc42bjruh@Carlos-MacBook-Pro-2.local>
 source: <20220513010020.55361-5-carenas@gmail.com>


* gc/bare-repo-discovery (2022-06-07) 5 commits
 - setup.c: create `discovery.bare`
 - safe.directory: use git_protected_config()
 - config: read protected config with `git_protected_config()`
 - Documentation: define protected configuration
 - Documentation/git-config.txt: add SCOPES section

 Introduce a discovery.barerepository configuration variable that
 allows users to forbid discovery of bare repositories.

 Expecting a reroll.
 source: <29053d029f8ec61095a2ad557be38b1d485a158f.1654635432.git.gitgitgadget@gmail.com>


* gg/worktree-from-the-above (2022-05-20) 3 commits
 - dir: minor refactoring / clean-up
 - dir: cache git_dir's realpath
 - dir: traverse into repository

 With a non-bare repository, with core.worktree pointing at a
 directory that has the repository as its subdirectory, regressed in
 Git 2.27 days.

 Needs review.
 source: <20220520192840.8942-1-ggossdev@gmail.com>


* ar/send-email-confirm-by-default (2022-04-22) 1 commit
 - send-email: always confirm by default

 "git send-email" is changed so that by default it asks for
 confirmation before sending each message out.

 Will discard.

 I wanted to like this, and had it in the version of Git I use
 myself for daily work, but the prompting turned out to be somewhat
 distracting.

 Thoughts?
 source: <20220422083629.1404989-1-hi@alyssa.is>

^ permalink raw reply	[relevance 1%]

* What's cooking in git.git (Jun 2022, #03; Fri, 10)
@ 2022-06-11  3:39  1% Junio C Hamano
  0 siblings, 0 replies; 122+ results
From: Junio C Hamano @ 2022-06-11  3:39 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking in my tree.  Commits
prefixed with '+' are in 'next' (being in 'next' is a sign that a
topic is stable enough to be used and are candidate to be in a
future release).  Commits prefixed with '-' are only in 'seen',
and aren't considered "accepted" at all.

This cycle will complete at around the end of June
(https://tinyurl.com/gitCal); -rc0 and -rc1 are scheduled to happen
next week.

Copies of the source code to Git live in many repositories, and the
following is a list of the ones I push into or their mirrors.  Some
repositories have only a subset of branches.

With maint, master, next, seen, todo:

	git://git.kernel.org/pub/scm/git/git.git/
	git://repo.or.cz/alt-git.git/
	https://kernel.googlesource.com/pub/scm/git/git/
	https://github.com/git/git/
	https://gitlab.com/git-vcs/git/

With all the integration branches and topics broken out:

	https://github.com/gitster/git/

Even though the preformatted documentation in HTML and man format
are not sources, they are published in these repositories for
convenience (replace "htmldocs" with "manpages" for the manual
pages):

	git://git.kernel.org/pub/scm/git/git-htmldocs.git/
	https://github.com/gitster/git-htmldocs.git/

Release tarballs are available at:

	https://www.kernel.org/pub/software/scm/git/

--------------------------------------------------
[Graduated to 'master']

* ab/bug-if-bug (2022-06-02) 6 commits
  (merged to 'next' on 2022-06-03 at 25290bb7ec)
 + cache-tree.c: use bug() and BUG_if_bug()
 + receive-pack: use bug() and BUG_if_bug()
 + parse-options.c: use optbug() instead of BUG() "opts" check
 + parse-options.c: use new bug() API for optbug()
 + usage.c: add a non-fatal bug() function to go with BUG()
 + common-main.c: move non-trace2 exit() behavior out of trace2.c

 A new bug() and BUG_if_bug() API is introduced to make it easier to
 uniformly log "detect multiple bugs and abort in the end" pattern.
 source: <cover-v3-0.6-00000000000-20220602T122106Z-avarab@gmail.com>


* ab/env-array (2022-06-02) 2 commits
  (merged to 'next' on 2022-06-02 at e1e05318d3)
 + run-command API users: use "env" not "env_array" in comments & names
 + run-command API: rename "env_array" to "env"

 Rename .env_array member to .env in the child_process structure.
 source: <cover-v3-0.2-00000000000-20220602T090745Z-avarab@gmail.com>


* cb/buggy-gcc-12-workaround (2022-06-01) 1 commit
  (merged to 'next' on 2022-06-01 at 01e199fd58)
 + Revert -Wno-error=dangling-pointer

 With a more targetted workaround in http.c in another topic, we may
 be able to lift this blanket "GCC12 dangling-pointer warning is
 broken and unsalvageable" workaround.


* gc/zero-length-branch-config-fix (2022-06-01) 2 commits
  (merged to 'next' on 2022-06-02 at 438605f627)
 + remote.c: reject 0-length branch names
 + remote.c: don't BUG() on 0-length branch names

 A misconfigured 'branch..remote' led to a bug in configuration
 parsing.
 source: <pull.1273.git.git.1654038754.gitgitgadget@gmail.com>


* jh/builtin-fsmonitor-part3 (2022-05-26) 31 commits
  (merged to 'next' on 2022-06-02 at 3599e359b3)
 + t7527: improve implicit shutdown testing in fsmonitor--daemon
 + fsmonitor--daemon: allow --super-prefix argument
 + t7527: test Unicode NFC/NFD handling on MacOS
 + t/lib-unicode-nfc-nfd: helper prereqs for testing unicode nfc/nfd
 + t/helper/hexdump: add helper to print hexdump of stdin
 + fsmonitor: on macOS also emit NFC spelling for NFD pathname
 + t7527: test FSMonitor on case insensitive+preserving file system
 + fsmonitor: never set CE_FSMONITOR_VALID on submodules
 + t/perf/p7527: add perf test for builtin FSMonitor
 + t7527: FSMonitor tests for directory moves
 + fsmonitor: optimize processing of directory events
 + fsm-listen-darwin: shutdown daemon if worktree root is moved/renamed
 + fsm-health-win32: force shutdown daemon if worktree root moves
 + fsm-health-win32: add polling framework to monitor daemon health
 + fsmonitor--daemon: stub in health thread
 + fsmonitor--daemon: rename listener thread related variables
 + fsmonitor--daemon: prepare for adding health thread
 + fsmonitor--daemon: cd out of worktree root
 + fsm-listen-darwin: ignore FSEvents caused by xattr changes on macOS
 + unpack-trees: initialize fsmonitor_has_run_once in o->result
 + fsmonitor-settings: NTFS and FAT32 on MacOS are incompatible
 + fsmonitor-settings: remote repos on Windows are incompatible
 + fsmonitor-settings: remote repos on macOS are incompatible
 + fsmonitor-settings: stub in macOS-specific incompatibility checking
 + fsmonitor-settings: VFS for Git virtual repos are incompatible
 + fsmonitor-settings: stub in Win32-specific incompatibility checking
 + fsmonitor-settings: bare repos are incompatible with FSMonitor
 + t/helper/fsmonitor-client: create stress test
 + t7527: test FSMonitor on repos with Unicode root paths
 + fsm-listen-win32: handle shortnames
 + Merge branch 'jh/builtin-fsmonitor-part2' into jh/builtin-fsmonitor-part3

 More fsmonitor--daemon.
 source: <pull.1143.v9.git.1653601644.gitgitgadget@gmail.com>


* jy/gitweb-xhtml5 (2022-06-02) 1 commit
  (merged to 'next' on 2022-06-02 at cc6a77b48b)
 + gitweb: switch to an XHTML5 DOCTYPE

 Update the doctype written in gitweb output to xhtml5.
 source: <20220602114305.5915-1-jason@jasonyundt.email>

--------------------------------------------------
[New Topics]

* gc/document-config-worktree-scope (2022-06-07) 1 commit
  (merged to 'next' on 2022-06-08 at 85f62a864a)
 + config: document and test the 'worktree' scope

 Doc update.

 Will merge to 'master'.
 source: <pull.1274.git.git.1654637044966.gitgitgadget@gmail.com>


* ds/branch-checked-out (2022-06-08) 4 commits
 - branch: use branch_checked_out() when deleting refs
 - fetch: use new branch_checked_out() and add tests
 - branch: check for bisects and rebases
 - branch: add branch_checked_out() helper

 Introduce a helper to see if a branch is already being worked on
 (hence should not be newly checked out in a working tree), which
 performs much better than the existing find_shared_symref() to
 replace many uses of the latter.

 Will merge to 'next'?
 source: <pull.1254.git.1654718942.gitgitgadget@gmail.com>


* fs/ssh-default-key-command-doc (2022-06-08) 1 commit
  (merged to 'next' on 2022-06-10 at b5cc5b6619)
 + gpg docs: explain better use of ssh.defaultKeyCommand

 Doc update.

 Will merge to 'master'.
 source: <20220608152437.126276-1-fs@gigacodes.de>


* js/ci-github-workflow-markup (2022-06-10) 3 commits
 - ci(github): also mark up compile errors
 - ci(github): use grouping also in the `win-build` job
 - ci(github): bring back the 'print test failures' step

 Recent CI update hides certain failures in test jobs, which has
 been corrected.

 Will merge to 'next'?
 source: <pull.1253.git.1654774347.gitgitgadget@gmail.com>


* jt/connected-show-missing-from-which-side (2022-06-10) 1 commit
 - fetch,fetch-pack: clarify connectivity check error

 We may find an object missing after a "git fetch" stores the
 objects it obtained from the other side, but it is not necessarily
 because the remote failed to send necessary objects.  Reword the
 messages in an attempt to help users explore other possibilities
 when they hit this error.

 Expecting a reroll.
 source: <20220610195247.1177549-1-jonathantanmy@google.com>


* gc/submodule-update (2022-06-10) 8 commits
 - submodule update: remove never-used expansion
 - submodule update: stop parsing options in .sh
 - submodule update: remove -v, pass --quiet
 - submodule--helper update: use one param per type
 - submodule update: pass --require-init and --init
 - submodule update: pass options with stuck forms
 - submodule update: pass options containing "[no-]"
 - submodule update: remove intermediate parsing

 More work on "git submodule update".

 Needs review.
 source: <pull.1275.git.git.1654820781.gitgitgadget@gmail.com>


* jc/resolve-undo (2022-06-09) 1 commit
 - revision: mark blobs needed for resolve-undo as reachable

 The resolve-undo information in the index was not protected against
 GC, which has been corrected.

 Will merge to 'next'?
 source: <xmqqfskdieqz.fsf@gitster.g>

--------------------------------------------------
[Stalled]

* en/merge-tree (2022-02-23) 13 commits
 - git-merge-tree.txt: add a section on potentional usage mistakes
 - merge-tree: add a --allow-unrelated-histories flag
 - merge-tree: allow `ls-files -u` style info to be NUL terminated
 - merge-tree: provide easy access to `ls-files -u` style info
 - merge-tree: provide a list of which files have conflicts
 - merge-ort: provide a merge_get_conflicted_files() helper function
 - merge-tree: support including merge messages in output
 - merge-ort: split out a separate display_update_messages() function
 - merge-tree: implement real merges
 - merge-tree: add option parsing and initial shell for real merge function
 - merge-tree: move logic for existing merge into new function
 - merge-tree: rename merge_trees() to trivial_merge_trees()
 - Merge branch 'en/remerge-diff' into en/merge-trees

 A new command is introduced that takes two commits and computes a
 tree that would be contained in the resulting merge commit, if the
 histories leading to these two commits were to be merged, and is
 added as a new mode of "git merge-tree" subcommand.

 On hold.
 cf. <CABPp-BGZ7OAYRR5YKRsxJSo-C=ho+qcNAkqwkim8CkhCfCeHsA@mail.gmail.com>
 source: <pull.1122.v6.git.1645602413.gitgitgadget@gmail.com>


* bc/stash-export (2022-04-08) 4 commits
 - builtin/stash: provide a way to import stashes from a ref
 - builtin/stash: provide a way to export stashes to a ref
 - builtin/stash: factor out revision parsing into a function
 - object-name: make get_oid quietly return an error

 A mechanism to export and import stash entries to and from a normal
 commit to transfer it across repositories has been introduced.

 Expecting a reroll.
 cf. <YnL2d4Vr9Vr7W4Hj@camp.crustytoothpaste.net>
 source: <20220407215352.3491567-1-sandals@crustytoothpaste.net>


* cw/remote-object-info (2022-05-06) 11 commits
 - SQUASH??? coccicheck
 - SQUASH??? ensure that coccicheck is happy
 - SQUASH??? compilation fix
 - cat-file: add --batch-command remote-object-info command
 - cat-file: move parse_cmd and DEFAULT_FORMAT up
 - transport: add object-info fallback to fetch
 - transport: add client side capability to request object-info
 - object-info: send attribute packet regardless of object ids
 - object-store: add function to free object_info contents
 - fetch-pack: move fetch default settings
 - fetch-pack: refactor packet writing

 A client component to talk with the object-info endpoint.

 Expecting a reroll.
 source: <20220502170904.2770649-1-calvinwan@google.com>

--------------------------------------------------
[Cooking]

* pb/range-diff-with-submodule (2022-06-06) 1 commit
  (merged to 'next' on 2022-06-07 at e5e31590c4)
 + range-diff: show submodule changes irrespective of diff.submodule

 "git range-diff" did not show anything for submodules that changed
 in the ranges being compared.  Change the behaviour to include the
 "--submodule=short" output unconditionally to be compared.

 Will merge to 'master'.
 source: <pull.1244.v2.git.1654549153769.gitgitgadget@gmail.com>


* jp/prompt-clear-before-upstream-mark (2022-06-10) 2 commits
 - git-prompt: fix expansion of branch colour codes
  (merged to 'next' on 2022-06-08 at 201a84ad63)
 + git-prompt: make colourization consistent

 Bash command line prompt (in contrib/) update.

 Will merge to 'next'.
 source: <20220609204447.32841-1-joak-pet@online.no>
 source: <20220606175022.8410-1-joak-pet@online.no>


* jt/unparse-commit-upon-graft-change (2022-06-06) 1 commit
  (merged to 'next' on 2022-06-08 at 3d8de84325)
 + commit,shallow: unparse commits if grafts changed

 Updating the graft information invalidates the list of parents of
 in-core commit objects that used to be in the graft file.

 Will merge to 'master'.
 source: <20220606175437.1740447-1-jonathantanmy@google.com>


* ab/build-gitweb (2022-06-02) 7 commits
 - Makefile: build 'gitweb' in the default target
 - gitweb/Makefile: include in top-level Makefile
 - gitweb: remove "test" and "test-installed" targets
 - gitweb/Makefile: prepare to merge into top-level Makefile
 - gitweb/Makefile: clear up and de-duplicate the gitweb.{css,js} vars
 - gitweb/Makefile: add a $(GITWEB_ALL) variable
 - gitweb/Makefile: define all .PHONY prerequisites inline

 Teach "make all" to build gitweb as well.

 Needs review.
 source: <cover-v2-0.7-00000000000-20220531T173805Z-avarab@gmail.com>


* ab/test-without-templates (2022-06-06) 7 commits
 - tests: don't assume a .git/info for .git/info/sparse-checkout
 - tests: don't assume a .git/info for .git/info/exclude
 - tests: don't assume a .git/info for .git/info/refs
 - tests: don't assume a .git/info for .git/info/attributes
 - tests: don't assume a .git/info for .git/info/grafts
 - tests: don't depend on template-created .git/branches
 - t0008: don't rely on default ".git/info/exclude"

 Tweak tests so that they still work when the "git init" template
 did not create .git/info directory.

 Will merge to 'next'?
 source: <cover-v2-0.7-00000000000-20220603T110506Z-avarab@gmail.com>


* ac/bitmap-format-doc (2022-06-10) 3 commits
 - bitmap-format.txt: add information for trailing checksum
 - bitmap-format.txt: fix some formatting issues
 - bitmap-format.txt: feed the file to asciidoc to generate html

 Adjust technical/bitmap-format to be formatted by AsciiDoc, and
 add some missing information to the documentation.

 Will merge to 'next'?
 source: <pull.1246.v3.git.1654858481.gitgitgadget@gmail.com>


* hx/unpack-streaming (2022-06-10) 7 commits
 - unpack-objects: use stream_loose_object() to unpack large objects
 - core doc: modernize core.bigFileThreshold documentation
 - object-file.c: add "stream_loose_object()" to handle large object
 - object-file.c: factor out deflate part of write_loose_object()
 - object-file.c: refactor write_loose_object() to several steps
 - object-file.c: do fsync() and close() before post-write die()
 - unpack-objects: low memory footprint for get_data() in dry_run mode

 Allow large objects read from a packstream to be streamed into a
 loose object file straight, without having to keep it in-core as a
 whole.

 Will merge to 'next'?
 source: <cover.1654871915.git.chiyutianyi@gmail.com>


* po/rebase-preserve-merges (2022-06-06) 4 commits
  (merged to 'next' on 2022-06-10 at 471f67aebc)
 + rebase: translate a die(preserve-merges) message
 + rebase: note `preserve` merges may be a pull config option
 + rebase: help users when dying with `preserve-merges`
 + rebase.c: state preserve-merges has been removed

 Various error messages that talk about the removal of
 "--preserve-merges" in "rebase" have been strengthened, and "rebase
 --abort" learned to get out of a state that was left by an earlier
 use of the option.

 Will merge to 'master'.
 source: <pull.1242.v2.git.1654341469.gitgitgadget@gmail.com>


* tb/show-ref-optim (2022-06-06) 1 commit
  (merged to 'next' on 2022-06-08 at 683a3cc261)
 + builtin/show-ref.c: avoid over-iterating with --heads, --tags

 "git show-ref --heads" (and "--tags") still iterated over all the
 refs, which has been corrected.

 Will merge to 'master'.
 source: <3fa6932641f18d78156bbf60b1571383f2cb5046.1654293264.git.me@ttaylorr.com>


* zh/read-cache-copy-name-entry-fix (2022-06-06) 1 commit
  (merged to 'next' on 2022-06-08 at 760f43dd19)
 + read-cache.c: reduce unnecessary cache entry name copying

 Remove redundant copying (with index v3 and older) or possible
 over-reading beyond end of mmapped memory (with index v4) has been
 corrected.

 Will merge to 'master'.
 source: <pull.1249.git.1654436248249.gitgitgadget@gmail.com>


* ab/remote-free-fix (2022-06-07) 2 commits
  (merged to 'next' on 2022-06-08 at 03c3aeaeee)
 + remote.c: don't dereference NULL in freeing loop
 + remote.c: remove braces from one-statement "for"-loops

 Use-after-free (with another forget-to-free) fix.

 Will merge to 'master'.
 source: <cover-0.3-00000000000-20220607T154520Z-avarab@gmail.com>


* sn/fsmonitor-missing-clock (2022-06-07) 1 commit
  (merged to 'next' on 2022-06-08 at 812b99338c)
 + fsmonitor: query watchman with right valid json

 Sample watchman interface hook sometimes failed to produce
 correctly formatted JSON message, which has been corrected.

 Will merge to 'master'.
 source: <20220607111419.15753-1-sluongng@gmail.com>


* tb/show-ref-count (2022-06-06) 2 commits
 - builtin/show-ref.c: limit output with `--count`
 - builtin/show-ref.c: rename `found_match` to `matches_nr`

 "git show-ref" learned to stop after emitting N refs with the new
 "--count=N" option.

 Expecting a reroll.
 cf. <xmqqczfl4ce1.fsf@gitster.g>
 source: <cover.1654552560.git.me@ttaylorr.com>


* jc/cocci-cleanup (2022-06-07) 1 commit
 - cocci: retire is_null_sha1() rule

 Remove a coccinelle rule that is no longer relevant.

 Will merge to 'next'.
 source: <xmqq7d5suoqt.fsf@gitster.g>


* js/wait-or-whine-can-fail (2022-06-07) 1 commit
  (merged to 'next' on 2022-06-08 at 54fe70c95d)
 + run-command: don't spam trace2_child_exit()

 We used to log an error return from wait_or_whine() as process
 termination of the waited child, which was incorrect.

 Will merge to 'master'.
 source: <50d872a057a558fa5519856b95abd048ddb514dc.1654625626.git.steadmon@google.com>


* ds/credentials-in-url (2022-06-06) 1 commit
  (merged to 'next' on 2022-06-08 at 3db83a2012)
 + remote: create fetch.credentialsInUrl config

 The "fetch.credentialsInUrl" configuration variable controls what
 happens when a URL with embedded login credential is used.

 Will merge to 'master'.
 source: <pull.1237.v5.git.1654526176695.gitgitgadget@gmail.com>


* tl/ls-tree-oid-only (2022-06-03) 1 commit
  (merged to 'next' on 2022-06-07 at e1c1e0b25a)
 + ls-tree: test for the regression in 9c4d58ff2c3

 Add tests for a regression fixed earlier.

 Will merge to 'master'.
 source: <patch-v2-1.1-f2beb02dd29-20220603T102148Z-avarab@gmail.com>


* ds/bundle-uri-more (2022-06-06) 6 commits
 - fetch: add 'refs/bundle/' to log.excludeDecoration
 - bundle-uri: add support for http(s):// and file://
 - fetch: add --bundle-uri option
 - bundle-uri: create basic file-copy logic
 - remote-curl: add 'get' capability
 - docs: document bundle URI standard

 The "bundle URI" topic.

 Needs review.
 source: <pull.1248.git.1654545325.gitgitgadget@gmail.com>


* jc/revert-show-parent-info (2022-05-31) 2 commits
  (merged to 'next' on 2022-06-07 at e405211ff4)
 + revert: --reference should apply only to 'revert', not 'cherry-pick'
  (merged to 'next' on 2022-05-30 at b5da52dc14)
 + revert: optionally refer to commit in the "reference" format

 "git revert" learns "--reference" option to use more human-readable
 reference to the commit it reverts in the message template it
 prepares for the user.

 Will merge to 'master'.
 source: <xmqq8rqn7buk.fsf_-_@gitster.g>


* js/bisect-in-c (2022-05-21) 15 commits
 - bisect: no longer try to clean up left-over `.git/head-name` files
 - bisect: remove Cogito-related code
 - Turn `git bisect` into a full built-in
 - bisect: teach the `bisect--helper` command to show the correct usage strings
 - bisect: move even the command-line parsing to `bisect--helper`
 - bisect--helper: return only correct exit codes in `cmd_*()`
 - bisect--helper: move the `BISECT_STATE` case to the end
 - bisect--helper: make `--bisect-state` optional
 - bisect--helper: align the sub-command order with git-bisect.sh
 - bisect--helper: using `--bisect-state` without an argument is a bug
 - bisect--helper: really retire `--bisect-autostart`
 - bisect--helper: really retire --bisect-next-check
 - bisect--helper: retire the --no-log option
 - bisect: avoid double-quoting when printing the failed command
 - bisect run: fix the error message

 Final bits of "git bisect.sh" have been rewritten in C.

 The command line parsing is reported to be still broken.
 cf. <220521.86zgjazuy4.gmgdl@evledraar.gmail.com>
 source: <pull.1132.v3.git.1653144546.gitgitgadget@gmail.com>


* cb/path-owner-check-with-sudo-plus (2022-05-12) 1 commit
 - git-compat-util: allow root to access both SUDO_UID and root owned

 "sudo git foo" used to consider a repository owned by the original
 user a safe one to access; it now also considers a repository owned
 by root a safe one, too (after all, if an attacker can craft a
 malicious repository owned by root, the box is 0wned already).

 Will merge to 'next'?
 cf. <20220519152344.ktrifm3pc42bjruh@Carlos-MacBook-Pro-2.local>
 source: <20220513010020.55361-5-carenas@gmail.com>


* ab/hooks-regression-fix (2022-06-07) 2 commits
  (merged to 'next' on 2022-06-08 at c1109feb67)
 + hook API: fix v2.36.0 regression: hooks should be connected to a TTY
 + run-command: add an "ungroup" option to run_process_parallel()

 In Git 2.36 we revamped the way how hooks are invoked.  One change
 that is end-user visible is that the output of a hook is no longer
 directly connected to the standard output of "git" that spawns the
 hook, which was noticed post release.  This is getting corrected.

 Will merge to 'master'.
 source: <cover-v6-0.2-00000000000-20220606T170356Z-avarab@gmail.com>


* gc/bare-repo-discovery (2022-06-07) 5 commits
 - setup.c: create `discovery.bare`
 - safe.directory: use git_protected_config()
 - config: read protected config with `git_protected_config()`
 - Documentation: define protected configuration
 - Documentation/git-config.txt: add SCOPES section

 Introduce a discovery.barerepository configuration variable that
 allows users to forbid discovery of bare repositories.

 Expecting a reroll.
 source: <29053d029f8ec61095a2ad557be38b1d485a158f.1654635432.git.gitgitgadget@gmail.com>


* gg/worktree-from-the-above (2022-05-20) 3 commits
 - dir: minor refactoring / clean-up
 - dir: cache git_dir's realpath
 - dir: traverse into repository

 With a non-bare repository, with core.worktree pointing at a
 directory that has the repository as its subdirectory, regressed in
 Git 2.27 days.

 Needs review.
 source: <20220520192840.8942-1-ggossdev@gmail.com>


* ar/send-email-confirm-by-default (2022-04-22) 1 commit
 - send-email: always confirm by default

 "git send-email" is changed so that by default it asks for
 confirmation before sending each message out.

 Will discard.

 I wanted to like this, and had it in the version of Git I use
 myself for daily work, but the prompting turned out to be somewhat
 distracting.

 Thoughts?
 source: <20220422083629.1404989-1-hi@alyssa.is>

--------------------------------------------------
[Discarded]

* ds/rebase-update-refs (2022-06-07) 7 commits
 . rebase: add rebase.updateRefs config option
 . sequencer: implement 'update-refs' command
 . rebase: add --update-refs option
 . sequencer: add update-refs command
 . sequencer: define array with enum values
 . branch: add branch_checked_out() helper
 . log-tree: create for_each_decoration()

 "git rebase" learns "--update-refs" to update the refs that point
 at commits being rewritten so that they point at the corresponding
 commits in the rewritten history.

 Retracted for possible future redesign.
 cf. <pull.1254.git.1654718942.gitgitgadget@gmail.com>
 source: <pull.1247.v2.git.1654634569.gitgitgadget@gmail.com>


* ab/ci-github-workflow-markup (2022-05-26) 14 commits
 . CI: make the --github-workflow-markup "github" output the default
 . CI: make --github-workflow-markup ci-config, off by default
 . ci: call `finalize_test_case_output` a little later
 . ci(github): mention where the full logs can be found
 . ci(github): avoid printing test case preamble twice
 . ci(github): skip "skip" tests in --github-workflow-markup
 . ci(github): skip the logs of the successful test cases
 . ci: make it easier to find failed tests' logs in the GitHub workflow
 . ci: optionally mark up output in the GitHub workflow
 . test(junit): avoid line feeds in XML attributes
 . tests: refactor --write-junit-xml code
 . CI: stop setting FAILED_TEST_ARTIFACTS N times
 . CI: don't include "test-results/" in ci/print-test-failures.sh output
 . CI: don't "cd" in ci/print-test-failures.sh
 (this branch uses ab/ci-setup-simplify.)

 Discarded to stop "competing" with js/ci-github-workflow-markup.


* ab/ci-setup-simplify (2022-05-26) 29 commits
 . CI: make it easy to use ci/*.sh outside of CI
 . CI: don't use "set -x" in "ci/lib.sh" output
 . CI: set PYTHON_PATH setting for osx-{clang,gcc} into "$jobname" case
 . CI: set SANITIZE=leak in MAKEFLAGS directly
 . CI: set CC in MAKEFLAGS directly, don't add it to the environment
 . CI: add more variables to MAKEFLAGS, except under vs-build
 . CI: narrow down variable definitions in --build and --test
 . CI: only invoke ci/lib.sh as "steps" in main.yml
 . CI: pre-select test slice in Windows & VS tests
 . ci/run-test-slice.sh: replace shelling out with "echo"
 . CI: move "env" definitions into ci/lib.sh
 . CI: combine ci/install{,-docker}-dependencies.sh
 . CI: split up and reduce "ci/test-documentation.sh"
 . CI: invoke "make artifacts-tar" directly in windows-build
 . CI: check ignored unignored build artifacts in "win[+VS] build" too
 . CI: make ci/{lib,install-dependencies}.sh POSIX-compatible
 . CI: remove "run-build-and-tests.sh", run "make [test]" directly
 . CI: export variables via a wrapper
 . CI: consistently use "export" in ci/lib.sh
 . CI: move p4 and git-lfs variables to ci/install-dependencies.sh
 . CI: have "static-analysis" run "check-builtins", not "documentation"
 . CI: have "static-analysis" run a "make ci-static-analysis" target
 . CI: don't have "git grep" invoke a pager in tree content check
 . CI/lib.sh: stop adding leading whitespace to $MAKEFLAGS
 . CI: remove unused Azure ci/* code
 . CI: remove dead "tree skipping" code
 . CI: remove more dead Travis CI support
 . CI: make "$jobname" explicit, remove fallback
 . CI: run "set -ex" early in ci/lib.sh
 (this branch is used by ab/ci-github-workflow-markup.)

 Discarded to stop "conflicing" with js/ci-github-workflow-markup;
 good bits from the series may want to be resurrected and rebuilt on
 top of the other topics.

^ permalink raw reply	[relevance 1%]

* Re: How to watch files in a Git repository
  @ 2022-06-09  8:33  6% ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2022-06-09  8:33 UTC (permalink / raw)
  To: R. Diez; +Cc: git

Hi Diez,

> On Jun 6, 2022, at 6:04 PM, R. Diez <rdiez1999@gmail.com> wrote:
> 
> Hi all:
> 
> I would like to get a notification e-mail when certain files or directories change in a Git repository.
> 
> In the good old CVS days, you could just 'watch' a file with your favourite CVS GUI.
> 
> Some online services like GitHub offer their own notification mechanism, but I would like something generic. I am not looking for a hook solution, because the Git repositories may not be mine, so I may only have read access.
> 
> The idea is that I can set up a cron job to periodically pull a repository, and run a script to generate the e-mails from the commit history. Any new commits which match the desired branch and modify the desired files and/or directories would trigger the notifications.
> 
> I've searched the Web, but couldn't find anything straightforward.

I would encourage you to try SourceGraph’s CodeMonitoring feature (1).
You can configure a search query which target a file path inside
a repository, then it will send email to you when there are new
commits/diffs touching those files.

I have no affiliation with them except for being a happy end user.
In fact, I used SourceGraph extensively while studying git/git codebase.

> 
> If there is nothing of the sort, I could write my own script in Bash or Perl. I can handle cron and sending e-mails, but I do not know much about Git's internals. Could someone provide a few pointers about how to code this? I would expect there is some command to list commits, and all files touched by a particular commit. And there would be some way to interface with Bash or Perl, which does not need parsing complicated text output from Git.
> 
> Thanks in advance,
>  rdiez

Cheers,
Son Luong

(1): https://docs.sourcegraph.com/code_monitoring

^ permalink raw reply	[relevance 6%]

* What's cooking in git.git (Jun 2022, #02; Tue, 7)
@ 2022-06-08  1:12  1% Junio C Hamano
  0 siblings, 0 replies; 122+ results
From: Junio C Hamano @ 2022-06-08  1:12 UTC (permalink / raw)
  To: git

Here are the topics that have been cooking in my tree.  Commits
prefixed with '+' are in 'next' (being in 'next' is a sign that a
topic is stable enough to be used and are candidate to be in a
future release).  Commits prefixed with '-' are only in 'seen',
and aren't considered "accepted" at all.

This cycle will complete at around the end of June
(https://tinyurl.com/gitCal); we are in the week #7 of the cycle.

Copies of the source code to Git live in many repositories, and the
following is a list of the ones I push into or their mirrors.  Some
repositories have only a subset of branches.

With maint, master, next, seen, todo:

	git://git.kernel.org/pub/scm/git/git.git/
	git://repo.or.cz/alt-git.git/
	https://kernel.googlesource.com/pub/scm/git/git/
	https://github.com/git/git/
	https://gitlab.com/git-vcs/git/

With all the integration branches and topics broken out:

	https://github.com/gitster/git/

Even though the preformatted documentation in HTML and man format
are not sources, they are published in these repositories for
convenience (replace "htmldocs" with "manpages" for the manual
pages):

	git://git.kernel.org/pub/scm/git/git-htmldocs.git/
	https://github.com/gitster/git-htmldocs.git/

Release tarballs are available at:

	https://www.kernel.org/pub/software/scm/git/

--------------------------------------------------
[Graduated to 'master']

* ab/plug-leak-in-revisions (2022-04-13) 28 commits
  (merged to 'next' on 2022-05-30 at 2ff85c8e71)
 + revisions API: add a TODO for diff_free(&revs->diffopt)
 + revisions API: have release_revisions() release "topo_walk_info"
 + revisions API: have release_revisions() release "date_mode"
 + revisions API: call diff_free(&revs->pruning) in revisions_release()
 + revisions API: release "reflog_info" in release revisions()
 + revisions API: clear "boundary_commits" in release_revisions()
 + revisions API: have release_revisions() release "prune_data"
 + revisions API: have release_revisions() release "grep_filter"
 + revisions API: have release_revisions() release "filter"
 + revisions API: have release_revisions() release "cmdline"
 + revisions API: have release_revisions() release "mailmap"
 + revisions API: have release_revisions() release "commits"
 + revisions API users: use release_revisions() for "prune_data" users
 + revisions API users: use release_revisions() with UNLEAK()
 + revisions API users: use release_revisions() in builtin/log.c
 + revisions API users: use release_revisions() in http-push.c
 + revisions API users: add "goto cleanup" for release_revisions()
 + stash: always have the owner of "stash_info" free it
 + revisions API users: use release_revisions() needing REV_INFO_INIT
 + revision.[ch]: document and move code declared around "init"
 + revisions API users: add straightforward release_revisions()
 + revision.[ch]: provide and start using a release_revisions()
 + cocci: add and apply free_commit_list() rules
 + format-patch: don't leak "extra_headers" or "ref_message_ids"
 + string_list API users: use string_list_init_{no,}dup
 + blame: use "goto cleanup" for cleanup_scoreboard()
 + t/helper/test-fast-rebase.c: don't leak "struct strbuf"
 + Merge branch 'ds/partial-bundle-more' into ab/plug-leak-in-revisions

 Plug the memory leaks from the trickiest API of all, the revision
 walker.
 source: <cover-v6-00.27-00000000000-20220413T195935Z-avarab@gmail.com>


* ds/bundle-uri (2022-05-16) 8 commits
  (merged to 'next' on 2022-05-25 at 43b1b9092c)
 + bundle.h: make "fd" version of read_bundle_header() public
 + remote: allow relative_url() to return an absolute url
 + remote: move relative_url()
 + http: make http_get_file() external
 + fetch-pack: move --keep=* option filling to a function
 + fetch-pack: add a deref_without_lazy_fetch_extended()
 + dir API: add a generalized path_match_flags() function
 + connect.c: refactor sending of agent & object-format
 (this branch is used by ds/bundle-uri-more.)

 Preliminary code refactoring around transport and bundle code.
 source: <pull.1233.git.1652731865.gitgitgadget@gmail.com>


* ds/object-file-unpack-loose-header-fix (2022-05-16) 1 commit
  (merged to 'next' on 2022-05-26 at b35a1d5db6)
 + object-file: convert 'switch' back to 'if'

 Coding style fix.
 source: <377be0e9-8a0f-4a86-0a66-3b08c0284dae@github.com>


* ds/sparse-sparse-checkout (2022-05-23) 10 commits
  (merged to 'next' on 2022-05-26 at e0e07693c5)
 + sparse-checkout: integrate with sparse index
 + p2000: add test for 'git sparse-checkout [add|set]'
 + sparse-index: complete partial expansion
 + sparse-index: partially expand directories
 + sparse-checkout: --no-sparse-index needs a full index
 + cache-tree: implement cache_tree_find_path()
 + sparse-index: introduce partially-sparse indexes
 + sparse-index: create expand_index()
 + t1092: stress test 'git sparse-checkout set'
 + t1092: refactor 'sparse-index contents' test

 "sparse-checkout" learns to work well with the sparse-index
 feature.
 source: <pull.1208.v3.git.1653313726.gitgitgadget@gmail.com>


* en/sparse-cone-becomes-default (2022-04-21) 9 commits
  (merged to 'next' on 2022-05-13 at c168eb55cf)
 + Documentation: some sparsity wording clarifications
 + git-sparse-checkout.txt: mark non-cone mode as deprecated
 + git-sparse-checkout.txt: flesh out pattern set sections a bit
 + git-sparse-checkout.txt: add a new EXAMPLES section
 + git-sparse-checkout.txt: shuffle some sections and mark as internal
 + git-sparse-checkout.txt: update docs for deprecation of 'init'
 + git-sparse-checkout.txt: wording updates for the cone mode default
 + sparse-checkout: make --cone the default
 + tests: stop assuming --no-cone is the default mode for sparse-checkout

 Deprecate non-cone mode of the sparse-checkout feature.
 source: <pull.1148.v3.git.1650594746.gitgitgadget@gmail.com>


* fh/transport-push-leakfix (2022-05-27) 3 commits
  (merged to 'next' on 2022-05-30 at e70a36b915)
 + transport: free local and remote refs in transport_push()
 + transport: unify return values and exit point from transport_push()
 + transport: remove unnecessary indenting in transport_push()

 Leakfix.
 source: <20220520124952.2393299-1-frantisek@hrbata.com>


* jc/all-negative-pathspec (2022-05-29) 1 commit
  (merged to 'next' on 2022-05-31 at 2d65a13996)
 + pathspec: correct an empty string used as a pathspec element

 A git subcommand like "git add -p" spawns a separate git process
 while relaying its command line arguments.  A pathspec with only
 negative elements was mistakenly passed with an empty string, which
 has been corrected.
 source: <xmqqpmjwx8so.fsf_-_@gitster.g>


* js/ci-github-workflow-markup (2022-05-21) 12 commits
  (merged to 'next' on 2022-05-30 at bd37e9e41f)
 + ci: call `finalize_test_case_output` a little later
 + ci(github): mention where the full logs can be found
 + ci: use `--github-workflow-markup` in the GitHub workflow
 + ci(github): avoid printing test case preamble twice
 + ci(github): skip the logs of the successful test cases
 + ci: optionally mark up output in the GitHub workflow
 + ci/run-build-and-tests: add some structure to the GitHub workflow output
 + ci: make it easier to find failed tests' logs in the GitHub workflow
 + ci/run-build-and-tests: take a more high-level view
 + test(junit): avoid line feeds in XML attributes
 + tests: refactor --write-junit-xml code
 + ci: fix code style

 Update the GitHub workflow support to make it quicker to get to the
 failing test.
 source: <pull.1117.v3.git.1653171536.gitgitgadget@gmail.com>


* js/scalar-diagnose (2022-05-30) 8 commits
  (merged to 'next' on 2022-05-31 at 8c878f3ac5)
 + scalar: teach `diagnose` to gather loose objects information
 + scalar: teach `diagnose` to gather packfile info
 + scalar diagnose: include disk space information
 + scalar: implement `scalar diagnose`
 + scalar: validate the optional enlistment argument
 + archive --add-virtual-file: allow paths containing colons
 + archive: optionally add "virtual" files
 + Merge branch 'rs/document-archive-prefix' into js/scalar-diagnose
 (this branch uses rs/document-archive-prefix.)

 Implementation of "scalar diagnose" subcommand.
 source: <20220528231118.3504387-1-gitster@pobox.com>


* jx/l10n-workflow-change (2022-05-26) 10 commits
  (merged to 'next' on 2022-05-26 at 252c979843)
 + l10n: Document the new l10n workflow
 + Makefile: add "po-init" rule to initialize po/XX.po
 + Makefile: add "po-update" rule to update po/XX.po
 + po/git.pot: don't check in result of "make pot"
 + po/git.pot: this is now a generated file
 + Makefile: remove duplicate and unwanted files in FOUND_SOURCE_FILES
 + i18n CI: stop allowing non-ASCII source messages in po/git.pot
 + Makefile: have "make pot" not "reset --hard"
 + Makefile: generate "po/git.pot" from stable LOCALIZED_C
 + Makefile: sort source files before feeding to xgettext

 A workflow change for translators are being proposed.
 source: <20220523152128.26380-1-worldhello.net@gmail.com>


* kl/setup-in-unreadable-worktree (2022-05-24) 1 commit
  (merged to 'next' on 2022-05-27 at 4867873678)
 + setup: don't die if realpath(3) fails on getcwd(3)

 Disable the "do not remove the directory the user started Git in"
 logic when Git cannot tell where that directory is.  Earlier we
 refused to run in such a case.
 source: <8b20840014d214023c50ee62439147f798e6f9cc.1653419993.git.kevin@kevinlocke.name>


* ns/batch-fsync (2022-04-06) 13 commits
  (merged to 'next' on 2022-05-23 at 379d8bd500)
 + core.fsyncmethod: performance tests for batch mode
 + t/perf: add iteration setup mechanism to perf-lib
 + core.fsyncmethod: tests for batch mode
 + test-lib-functions: add parsing helpers for ls-files and ls-tree
 + core.fsync: use batch mode and sync loose objects by default on Windows
 + unpack-objects: use the bulk-checkin infrastructure
 + update-index: use the bulk-checkin infrastructure
 + builtin/add: add ODB transaction around add_files_to_cache
 + cache-tree: use ODB transaction around writing a tree
 + core.fsyncmethod: batched disk flushes for loose-objects
 + bulk-checkin: rebrand plug/unplug APIs as 'odb transactions'
 + bulk-checkin: rename 'state' variable and separate 'plugged' boolean
 + Merge branch 'ns/core-fsyncmethod' into ns/batch-fsync

 Introduce a filesystem-dependent mechanism to optimize the way the
 bits for many loose object files are ensured to hit the disk
 platter.
 source: <pull.1134.v5.git.1648616734.gitgitgadget@gmail.com>


* pb/use-freebsd-12.3-in-cirrus-ci (2022-05-25) 1 commit
  (merged to 'next' on 2022-05-26 at cea1e33100)
 + ci: update Cirrus-CI image to FreeBSD 12.3

 Update the version of FreeBSD image used in Cirrus CI.
 source: <20220525125112.86954-1-levraiphilippeblain@gmail.com>


* rs/document-archive-prefix (2022-05-28) 1 commit
  (merged to 'next' on 2022-05-30 at c9e9c54880)
 + archive: improve documentation of --prefix
 (this branch is used by js/scalar-diagnose.)

 The documentation on the interaction between "--add-file" and
 "--prefix" options of "git archive" has been improved.
 source: <6ef7f836-45f6-8386-03c0-dc18b125ec67@web.de>


* tb/cruft-packs (2022-05-26) 17 commits
  (merged to 'next' on 2022-05-27 at cfa4cbd790)
 + sha1-file.c: don't freshen cruft packs
 + builtin/gc.c: conditionally avoid pruning objects via loose
 + builtin/repack.c: add cruft packs to MIDX during geometric repack
 + builtin/repack.c: use named flags for existing_packs
 + builtin/repack.c: allow configuring cruft pack generation
 + builtin/repack.c: support generating a cruft pack
 + builtin/pack-objects.c: --cruft with expiration
 + reachable: report precise timestamps from objects in cruft packs
 + reachable: add options to add_unseen_recent_objects_to_traversal
 + builtin/pack-objects.c: --cruft without expiration
 + builtin/pack-objects.c: return from create_object_entry()
 + t/helper: add 'pack-mtimes' test-tool
 + pack-mtimes: support writing pack .mtimes files
 + chunk-format.h: extract oid_version()
 + pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles'
 + pack-mtimes: support reading .mtimes files
 + Documentation/technical: add cruft-packs.txt

 A mechanism to pack unreachable objects into a "cruft pack",
 instead of ejecting them into loose form to be reclaimed later, has
 been introduced.
 source: <cover.1653088640.git.me@ttaylorr.com>


* tb/geom-repack-with-keep-and-max (2022-05-20) 3 commits
  (merged to 'next' on 2022-05-26 at 4068f4afd3)
 + builtin/repack.c: ensure that `names` is sorted
 + t7703: demonstrate object corruption with pack.packSizeLimit
 + repack: respect --keep-pack with geometric repack

 Teach "git repack --geometric" work better with "--keep-pack" and
 avoid corrupting the repository when packsize limit is used.
 source: <cover.1653073280.git.me@ttaylorr.com>


* tb/midx-race-in-pack-objects (2022-05-24) 4 commits
  (merged to 'next' on 2022-05-26 at b51897dfc4)
 + builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects
 + builtin/pack-objects.c: ensure included `--stdin-packs` exist
 + builtin/pack-objects.c: avoid redundant NULL check
 + pack-bitmap.c: check preferred pack validity when opening MIDX bitmap

 The multi-pack-index code did not protect the packfile it is going
 to depend on from getting removed while in use, which has been
 corrected.
 source: <cover.1653418457.git.me@ttaylorr.com>


* yw/cmake-updates (2022-05-24) 3 commits
  (merged to 'next' on 2022-05-30 at ff3184441c)
 + cmake: remove (_)UNICODE def on Windows in CMakeLists.txt
 + cmake: add pcre2 support
 + cmake: fix CMakeLists.txt on Linux

 CMake updates.
 source: <pull.1267.v2.git.git.1653374328.gitgitgadget@gmail.com>

--------------------------------------------------
[New Topics]

* jy/gitweb-xhtml5 (2022-06-02) 1 commit
  (merged to 'next' on 2022-06-02 at cc6a77b48b)
 + gitweb: switch to an XHTML5 DOCTYPE

 Update the doctype written in gitweb output to xhtml5.

 Will merge to 'master'.
 source: <20220602114305.5915-1-jason@jasonyundt.email>


* pb/range-diff-with-submodule (2022-06-06) 1 commit
  (merged to 'next' on 2022-06-07 at e5e31590c4)
 + range-diff: show submodule changes irrespective of diff.submodule

 "git range-diff" did not show anything for submodules that changed
 in the ranges being compared.  Change the behaviour to include the
 "--submodule=short" output unconditionally to be compared.

 Will merge to 'master'.
 source: <pull.1244.v2.git.1654549153769.gitgitgadget@gmail.com>


* jp/prompt-clear-before-upstream-mark (2022-06-07) 1 commit
 - git-prompt: make colourization consistent

 Bash command line prompt (in contrib/) update.

 Will merge to 'next'.
 source: <20220607115024.64724-1-joak-pet@online.no>


* jt/unparse-commit-upon-graft-change (2022-06-06) 1 commit
 - commit,shallow: unparse commits if grafts changed

 Updating the graft information invalidates the list of parents of
 in-core commit objects that used to be in the graft file.

 Will merge to 'next'.
 source: <20220606175437.1740447-1-jonathantanmy@google.com>


* ds/rebase-update-refs (2022-06-07) 7 commits
 - rebase: add rebase.updateRefs config option
 - sequencer: implement 'update-refs' command
 - rebase: add --update-refs option
 - sequencer: add update-refs command
 - sequencer: define array with enum values
 - branch: add branch_checked_out() helper
 - log-tree: create for_each_decoration()

 "git rebase" learns "--update-refs" to update the refs that point
 at commits being rewritten so that they point at the corresponding
 commits in the rewritten history.

 source: <3d7d3f656b4e93e8caa0d18d29c318ede956d1d7.1654634569.git.gitgitgadget@gmail.com>


* ab/build-gitweb (2022-06-02) 7 commits
 - Makefile: build 'gitweb' in the default target
 - gitweb/Makefile: include in top-level Makefile
 - gitweb: remove "test" and "test-installed" targets
 - gitweb/Makefile: prepare to merge into top-level Makefile
 - gitweb/Makefile: clear up and de-duplicate the gitweb.{css,js} vars
 - gitweb/Makefile: add a $(GITWEB_ALL) variable
 - gitweb/Makefile: define all .PHONY prerequisites inline

 source: <cover-v2-0.7-00000000000-20220531T173805Z-avarab@gmail.com>


* ab/test-without-templates (2022-06-06) 7 commits
 - tests: don't assume a .git/info for .git/info/sparse-checkout
 - tests: don't assume a .git/info for .git/info/exclude
 - tests: don't assume a .git/info for .git/info/refs
 - tests: don't assume a .git/info for .git/info/attributes
 - tests: don't assume a .git/info for .git/info/grafts
 - tests: don't depend on template-created .git/branches
 - t0008: don't rely on default ".git/info/exclude"

 source: <cover-v2-0.7-00000000000-20220603T110506Z-avarab@gmail.com>


* ac/bitmap-format-doc (2022-06-07) 3 commits
 - bitmap-format.txt: add information for trailing checksum
 - bitmap-format.txt: fix some formatting issues
 - bitmap-format.txt: feed the file to asciidoc to generate html

 Adjust technical/bitmap-format to be formatted by AsciiDoc.

 Needs more work to really use AsciiDoc to produce true HTML.
 source: <pull.1246.v2.git.1654623814.gitgitgadget@gmail.com>


* hx/unpack-streaming (2022-06-06) 7 commits
 - unpack-objects: use stream_loose_object() to unpack large objects
 - core doc: modernize core.bigFileThreshold documentation
 - object-file.c: add "stream_loose_object()" to handle large object
 - object-file.c: factor out deflate part of write_loose_object()
 - object-file.c: refactor write_loose_object() to several steps
 - object-file.c: do fsync() and close() before post-write die()
 - unpack-objects: low memory footprint for get_data() in dry_run mode

 Allow large objects read from a packstream to be streamed into a
 loose object file straight, without having to keep it in-core as a
 whole.

 Needs rebasing on batched-fsync stuff.
 cf. <7ba4858a-d1cc-a4eb-b6d6-4c04a5dd6ce7@gmail.com>
 source: <cover-v13-0.7-00000000000-20220604T095113Z-avarab@gmail.com>


* po/rebase-preserve-merges (2022-06-06) 4 commits
 - rebase: translate a die(preserve-merges) message
 - rebase: note `preserve` merges may be a pull config option
 - rebase: help users when dying with `preserve-merges`
 - rebase.c: state preserve-merges has been removed

 Various error messages that talk about the removal of
 "--preserve-merges" in "rebase" have been strengthened, and "rebase
 --abort" learned to get out of a state that was left by an earlier
 use of the option.

 Will merge to 'next'.
 source: <pull.1242.v2.git.1654341469.gitgitgadget@gmail.com>


* tb/show-ref-optim (2022-06-06) 1 commit
 - builtin/show-ref.c: avoid over-iterating with --heads, --tags

 "git show-ref --heads" (and "--tags") still iterated over all the
 refs, which has been corrected.

 Will merge to 'next'.
 source: <3fa6932641f18d78156bbf60b1571383f2cb5046.1654293264.git.me@ttaylorr.com>


* zh/read-cache-copy-name-entry-fix (2022-06-06) 1 commit
 - read-cache.c: reduce unnecessary cache entry name copying

 Remove redundant copying (with index v3 and older) or possible
 over-reading beyond end of mmapped memory (with index v4) has been
 corrected.

 Will merge to 'next'.
 source: <pull.1249.git.1654436248249.gitgitgadget@gmail.com>


* ab/remote-free-fix (2022-06-07) 2 commits
 - remote.c: don't dereference NULL in freeing loop
 - remote.c: remove braces from one-statement "for"-loops

 Use-after-free (with another forget-to-free) fix.

 Will merge to 'next'.
 source: <cover-0.3-00000000000-20220607T154520Z-avarab@gmail.com>


* sn/fsmonitor-missing-clock (2022-06-07) 1 commit
 - fsmonitor: query watchman with right valid json

 Sample watchman interface hook sometimes failed to produce
 correctly formatted JSON message, which has been corrected.

 Will merge to 'next'.
 source: <20220607111419.15753-1-sluongng@gmail.com>


* tb/show-ref-count (2022-06-06) 2 commits
 - builtin/show-ref.c: limit output with `--count`
 - builtin/show-ref.c: rename `found_match` to `matches_nr`

 "git show-ref" learned to stop after emitting N refs with the new
 "--count=N" option.

 Expecting a reroll.
 cf. <xmqqczfl4ce1.fsf@gitster.g>
 source: <cover.1654552560.git.me@ttaylorr.com>


* jc/cocci-cleanup (2022-06-07) 1 commit
 - cocci: retire is_null_sha1() rule

 Remove a coccinelle rule that is no longer relevant.

 source: <xmqq7d5suoqt.fsf@gitster.g>

--------------------------------------------------
[Stalled]

* en/merge-tree (2022-02-23) 13 commits
 - git-merge-tree.txt: add a section on potentional usage mistakes
 - merge-tree: add a --allow-unrelated-histories flag
 - merge-tree: allow `ls-files -u` style info to be NUL terminated
 - merge-tree: provide easy access to `ls-files -u` style info
 - merge-tree: provide a list of which files have conflicts
 - merge-ort: provide a merge_get_conflicted_files() helper function
 - merge-tree: support including merge messages in output
 - merge-ort: split out a separate display_update_messages() function
 - merge-tree: implement real merges
 - merge-tree: add option parsing and initial shell for real merge function
 - merge-tree: move logic for existing merge into new function
 - merge-tree: rename merge_trees() to trivial_merge_trees()
 - Merge branch 'en/remerge-diff' into en/merge-trees

 A new command is introduced that takes two commits and computes a
 tree that would be contained in the resulting merge commit, if the
 histories leading to these two commits were to be merged, and is
 added as a new mode of "git merge-tree" subcommand.

 On hold.
 cf. <CABPp-BGZ7OAYRR5YKRsxJSo-C=ho+qcNAkqwkim8CkhCfCeHsA@mail.gmail.com>
 source: <pull.1122.v6.git.1645602413.gitgitgadget@gmail.com>


* ab/ci-github-workflow-markup (2022-05-26) 14 commits
 . CI: make the --github-workflow-markup "github" output the default
 . CI: make --github-workflow-markup ci-config, off by default
 . ci: call `finalize_test_case_output` a little later
 . ci(github): mention where the full logs can be found
 . ci(github): avoid printing test case preamble twice
 . ci(github): skip "skip" tests in --github-workflow-markup
 . ci(github): skip the logs of the successful test cases
 . ci: make it easier to find failed tests' logs in the GitHub workflow
 . ci: optionally mark up output in the GitHub workflow
 . test(junit): avoid line feeds in XML attributes
 . tests: refactor --write-junit-xml code
 . CI: stop setting FAILED_TEST_ARTIFACTS N times
 . CI: don't include "test-results/" in ci/print-test-failures.sh output
 . CI: don't "cd" in ci/print-test-failures.sh
 (this branch uses ab/ci-setup-simplify.)

 Build a moral equivalent of js/ci-github-workflow-markup on top of
 ab/ci-setup-simplify.


* ab/ci-setup-simplify (2022-05-26) 29 commits
 . CI: make it easy to use ci/*.sh outside of CI
 . CI: don't use "set -x" in "ci/lib.sh" output
 . CI: set PYTHON_PATH setting for osx-{clang,gcc} into "$jobname" case
 . CI: set SANITIZE=leak in MAKEFLAGS directly
 . CI: set CC in MAKEFLAGS directly, don't add it to the environment
 . CI: add more variables to MAKEFLAGS, except under vs-build
 . CI: narrow down variable definitions in --build and --test
 . CI: only invoke ci/lib.sh as "steps" in main.yml
 . CI: pre-select test slice in Windows & VS tests
 . ci/run-test-slice.sh: replace shelling out with "echo"
 . CI: move "env" definitions into ci/lib.sh
 . CI: combine ci/install{,-docker}-dependencies.sh
 . CI: split up and reduce "ci/test-documentation.sh"
 . CI: invoke "make artifacts-tar" directly in windows-build
 . CI: check ignored unignored build artifacts in "win[+VS] build" too
 . CI: make ci/{lib,install-dependencies}.sh POSIX-compatible
 . CI: remove "run-build-and-tests.sh", run "make [test]" directly
 . CI: export variables via a wrapper
 . CI: consistently use "export" in ci/lib.sh
 . CI: move p4 and git-lfs variables to ci/install-dependencies.sh
 . CI: have "static-analysis" run "check-builtins", not "documentation"
 . CI: have "static-analysis" run a "make ci-static-analysis" target
 . CI: don't have "git grep" invoke a pager in tree content check
 . CI/lib.sh: stop adding leading whitespace to $MAKEFLAGS
 . CI: remove unused Azure ci/* code
 . CI: remove dead "tree skipping" code
 . CI: remove more dead Travis CI support
 . CI: make "$jobname" explicit, remove fallback
 . CI: run "set -ex" early in ci/lib.sh
 (this branch is used by ab/ci-github-workflow-markup.)

 Drive more actions done in CI via the Makefile instead of shell
 commands sprinkled in .github/workflows/main.yml
 source: <cover-v6-00.29-00000000000-20220525T094123Z-avarab@gmail.com>


* bc/stash-export (2022-04-08) 4 commits
 - builtin/stash: provide a way to import stashes from a ref
 - builtin/stash: provide a way to export stashes to a ref
 - builtin/stash: factor out revision parsing into a function
 - object-name: make get_oid quietly return an error

 A mechanism to export and import stash entries to and from a normal
 commit to transfer it across repositories has been introduced.

 Expecting a reroll.
 cf. <YnL2d4Vr9Vr7W4Hj@camp.crustytoothpaste.net>
 source: <20220407215352.3491567-1-sandals@crustytoothpaste.net>


* cw/remote-object-info (2022-05-06) 11 commits
 - SQUASH??? coccicheck
 - SQUASH??? ensure that coccicheck is happy
 - SQUASH??? compilation fix
 - cat-file: add --batch-command remote-object-info command
 - cat-file: move parse_cmd and DEFAULT_FORMAT up
 - transport: add object-info fallback to fetch
 - transport: add client side capability to request object-info
 - object-info: send attribute packet regardless of object ids
 - object-store: add function to free object_info contents
 - fetch-pack: move fetch default settings
 - fetch-pack: refactor packet writing

 A client component to talk with the object-info endpoint.

 Expecting a reroll.
 source: <20220502170904.2770649-1-calvinwan@google.com>

--------------------------------------------------
[Cooking]

* js/wait-or-whine-can-fail (2022-06-07) 1 commit
 - run-command: don't spam trace2_child_exit()

 We used to log an error return from wait_or_whine() as process
 termination of the waited child, which was incorrect.

 Will merge to 'next'.
 source: <50d872a057a558fa5519856b95abd048ddb514dc.1654625626.git.steadmon@google.com>


* ds/credentials-in-url (2022-06-06) 1 commit
 - remote: create fetch.credentialsInUrl config

 The "fetch.credentialsInUrl" configuration variable controls what
 happens when a URL with embedded login credential is used.

 Will merge to 'next'.
 source: <pull.1237.v5.git.1654526176695.gitgitgadget@gmail.com>


* ab/bug-if-bug (2022-06-02) 6 commits
  (merged to 'next' on 2022-06-03 at 25290bb7ec)
 + cache-tree.c: use bug() and BUG_if_bug()
 + receive-pack: use bug() and BUG_if_bug()
 + parse-options.c: use optbug() instead of BUG() "opts" check
 + parse-options.c: use new bug() API for optbug()
 + usage.c: add a non-fatal bug() function to go with BUG()
 + common-main.c: move non-trace2 exit() behavior out of trace2.c

 A new bug() and BUG_if_bug() API is introduced to make it easier to
 uniformly log "detect multiple bugs and abort in the end" pattern.

 Will merge to 'master'.
 source: <cover-v3-0.6-00000000000-20220602T122106Z-avarab@gmail.com>


* cb/buggy-gcc-12-workaround (2022-06-01) 1 commit
  (merged to 'next' on 2022-06-01 at 01e199fd58)
 + Revert -Wno-error=dangling-pointer

 With a more targetted workaround in http.c in another topic, we may
 be able to lift this blanket "GCC12 dangling-pointer warning is
 broken and unsalvageable" workaround.

 Will merge to 'master'.


* gc/zero-length-branch-config-fix (2022-06-01) 2 commits
  (merged to 'next' on 2022-06-02 at 438605f627)
 + remote.c: reject 0-length branch names
 + remote.c: don't BUG() on 0-length branch names

 A misconfigured 'branch..remote' led to a bug in configuration
 parsing.

 Will merge to 'master'.
 source: <pull.1273.git.git.1654038754.gitgitgadget@gmail.com>


* tl/ls-tree-oid-only (2022-06-03) 1 commit
  (merged to 'next' on 2022-06-07 at e1c1e0b25a)
 + ls-tree: test for the regression in 9c4d58ff2c3

 Add tests for a regression fixed earlier.

 Will merge to 'master'.
 source: <patch-v2-1.1-f2beb02dd29-20220603T102148Z-avarab@gmail.com>


* ds/bundle-uri-more (2022-06-06) 6 commits
 - fetch: add 'refs/bundle/' to log.excludeDecoration
 - bundle-uri: add support for http(s):// and file://
 - fetch: add --bundle-uri option
 - bundle-uri: create basic file-copy logic
 - remote-curl: add 'get' capability
 - docs: document bundle URI standard

 source: <pull.1248.git.1654545325.gitgitgadget@gmail.com>


* jc/revert-show-parent-info (2022-05-31) 2 commits
  (merged to 'next' on 2022-06-07 at e405211ff4)
 + revert: --reference should apply only to 'revert', not 'cherry-pick'
  (merged to 'next' on 2022-05-30 at b5da52dc14)
 + revert: optionally refer to commit in the "reference" format

 "git revert" learns "--reference" option to use more human-readable
 reference to the commit it reverts in the message template it
 prepares for the user.

 Will merge to 'master'.
 source: <xmqq8rqn7buk.fsf_-_@gitster.g>


* js/bisect-in-c (2022-05-21) 15 commits
 - bisect: no longer try to clean up left-over `.git/head-name` files
 - bisect: remove Cogito-related code
 - Turn `git bisect` into a full built-in
 - bisect: teach the `bisect--helper` command to show the correct usage strings
 - bisect: move even the command-line parsing to `bisect--helper`
 - bisect--helper: return only correct exit codes in `cmd_*()`
 - bisect--helper: move the `BISECT_STATE` case to the end
 - bisect--helper: make `--bisect-state` optional
 - bisect--helper: align the sub-command order with git-bisect.sh
 - bisect--helper: using `--bisect-state` without an argument is a bug
 - bisect--helper: really retire `--bisect-autostart`
 - bisect--helper: really retire --bisect-next-check
 - bisect--helper: retire the --no-log option
 - bisect: avoid double-quoting when printing the failed command
 - bisect run: fix the error message

 Final bits of "git bisect.sh" have been rewritten in C.

 The command line parsing is reported to be still broken.
 cf. <220521.86zgjazuy4.gmgdl@evledraar.gmail.com>
 source: <pull.1132.v3.git.1653144546.gitgitgadget@gmail.com>


* cb/path-owner-check-with-sudo-plus (2022-05-12) 1 commit
 - git-compat-util: allow root to access both SUDO_UID and root owned

 "sudo git foo" used to consider a repository owned by the original
 user a safe one to access; it now also considers a repository owned
 by root a safe one, too (after all, if an attacker can craft a
 malicious repository owned by root, the box is 0wned already).

 What's our take on this one?  IIRC, the last time we discussed,
 Carlo was hesitant to push this step forward?
 cf. <20220519152344.ktrifm3pc42bjruh@Carlos-MacBook-Pro-2.local>
 source: <20220513010020.55361-5-carenas@gmail.com>


* ab/hooks-regression-fix (2022-06-07) 2 commits
 - hook API: fix v2.36.0 regression: hooks should be connected to a TTY
 - run-command: add an "ungroup" option to run_process_parallel()

 In Git 2.36 we revamped the way how hooks are invoked.  One change
 that is end-user visible is that the output of a hook is no longer
 directly connected to the standard output of "git" that spawns the
 hook, which was noticed post release.  This is getting corrected.

 Will merge to 'next'.
 source: <cover-v6-0.2-00000000000-20220606T170356Z-avarab@gmail.com>


* gc/bare-repo-discovery (2022-06-07) 5 commits
 - setup.c: create `discovery.bare`
 - safe.directory: use git_protected_config()
 - config: read protected config with `git_protected_config()`
 - Documentation: define protected configuration
 - Documentation/git-config.txt: add SCOPES section

 Introduce a discovery.barerepository configuration variable that
 allows users to forbid discovery of bare repositories.

 Expecting a reroll.
 source: <29053d029f8ec61095a2ad557be38b1d485a158f.1654635432.git.gitgitgadget@gmail.com>


* gg/worktree-from-the-above (2022-05-20) 3 commits
 - dir: minor refactoring / clean-up
 - dir: cache git_dir's realpath
 - dir: traverse into repository

 With a non-bare repository, with core.worktree pointing at a
 directory that has the repository as its subdirectory, regressed in
 Git 2.27 days.

 Needs review.
 source: <20220520192840.8942-1-ggossdev@gmail.com>


* ar/send-email-confirm-by-default (2022-04-22) 1 commit
 - send-email: always confirm by default

 "git send-email" is changed so that by default it asks for
 confirmation before sending each message out.

 Will discard.

 I wanted to like this, and had it in the version of Git I use
 myself for daily work, but the prompting turned out to be somewhat
 distracting.

 Thoughts?
 source: <20220422083629.1404989-1-hi@alyssa.is>


* ab/env-array (2022-06-02) 2 commits
  (merged to 'next' on 2022-06-02 at e1e05318d3)
 + run-command API users: use "env" not "env_array" in comments & names
 + run-command API: rename "env_array" to "env"

 Rename .env_array member to .env in the child_process structure.

 Will merge to 'master'.
 source: <cover-v3-0.2-00000000000-20220602T090745Z-avarab@gmail.com>


* jh/builtin-fsmonitor-part3 (2022-05-26) 31 commits
  (merged to 'next' on 2022-06-02 at 3599e359b3)
 + t7527: improve implicit shutdown testing in fsmonitor--daemon
 + fsmonitor--daemon: allow --super-prefix argument
 + t7527: test Unicode NFC/NFD handling on MacOS
 + t/lib-unicode-nfc-nfd: helper prereqs for testing unicode nfc/nfd
 + t/helper/hexdump: add helper to print hexdump of stdin
 + fsmonitor: on macOS also emit NFC spelling for NFD pathname
 + t7527: test FSMonitor on case insensitive+preserving file system
 + fsmonitor: never set CE_FSMONITOR_VALID on submodules
 + t/perf/p7527: add perf test for builtin FSMonitor
 + t7527: FSMonitor tests for directory moves
 + fsmonitor: optimize processing of directory events
 + fsm-listen-darwin: shutdown daemon if worktree root is moved/renamed
 + fsm-health-win32: force shutdown daemon if worktree root moves
 + fsm-health-win32: add polling framework to monitor daemon health
 + fsmonitor--daemon: stub in health thread
 + fsmonitor--daemon: rename listener thread related variables
 + fsmonitor--daemon: prepare for adding health thread
 + fsmonitor--daemon: cd out of worktree root
 + fsm-listen-darwin: ignore FSEvents caused by xattr changes on macOS
 + unpack-trees: initialize fsmonitor_has_run_once in o->result
 + fsmonitor-settings: NTFS and FAT32 on MacOS are incompatible
 + fsmonitor-settings: remote repos on Windows are incompatible
 + fsmonitor-settings: remote repos on macOS are incompatible
 + fsmonitor-settings: stub in macOS-specific incompatibility checking
 + fsmonitor-settings: VFS for Git virtual repos are incompatible
 + fsmonitor-settings: stub in Win32-specific incompatibility checking
 + fsmonitor-settings: bare repos are incompatible with FSMonitor
 + t/helper/fsmonitor-client: create stress test
 + t7527: test FSMonitor on repos with Unicode root paths
 + fsm-listen-win32: handle shortnames
 + Merge branch 'jh/builtin-fsmonitor-part2' into jh/builtin-fsmonitor-part3

 More fsmonitor--daemon.

 Will merge to 'master'.
 source: <pull.1143.v9.git.1653601644.gitgitgadget@gmail.com>

--------------------------------------------------
[Discarded]

* jx/uniq-source-list (2022-05-25) 1 commit
 . Makefile: dedup git-ls-files output to prevent duplicate targets

 Build fix.

 Will discard.
 No longer needed with the updated jx/l10n-workflow-change.
 source: <20220526021540.2812-1-worldhello.net@gmail.com>


* et/xdiff-indirection (2022-02-17) 1 commit
 . xdiff: provide indirection to git functions

 Insert a layer of preprocessor macros for common functions in xdiff
 codebase.

 Will discard, as it has been stalled for way too long.
 cf. <xmqqbkyudb8n.fsf@gitster.g>
 source: <20220217225408.GB7@edef91d97c94>


* dl/prompt-pick-fix (2022-03-25) 1 commit
 . git-prompt: fix sequencer/todo detection

 Fix shell prompt script (in contrib/) for those who set
 rebase.abbreviateCommands; we failed to recognize that we were in a
 multi-step cherry-pick session.

 Will discard, as it has been stalled for way too long.
 cf. <xmqqwngdzque.fsf@gitster.g>
 source: <20220325145301.3370-1-danny0838@gmail.com>


* es/superproject-aware-submodules (2022-03-09) 3 commits
 . rev-parse: short-circuit superproject worktree when config unset
 . introduce submodule.hasSuperproject record
 . t7400-submodule-basic: modernize inspect() helper

 A configuration variable in a repository tells if it is (or is not)
 a submodule of a superproject.

 Will discard, as it has been stalled for way too long.
 cf. <kl6l4k45s7cb.fsf@chooglen-macbookpro.roam.corp.google.com>
 source: <20220310004423.2627181-1-emilyshaffer@google.com>


* sg/build-gitweb (2022-05-25) 1 commit
 . Makefile: build 'gitweb' in the default target

 "make all" should but didn't build "gitweb".

 Will discard.
 cf. <220526.86k0a96sv2.gmgdl@evledraar.gmail.com>
 cf. <Yo8y3AHWa3PChLwd@coredump.intra.peff.net>
 source: <20220525205651.825669-1-szeder.dev@gmail.com>

^ permalink raw reply	[relevance 1%]

* Re: [PATCH v2] fsmonitor: query watchman with right valid json
  2022-06-07 11:14 21%     ` [PATCH v2] " Son Luong Ngoc
  2022-06-07 14:39  0%       ` Ævar Arnfjörð Bjarmason
@ 2022-06-07 17:00  6%       ` Junio C Hamano
  1 sibling, 0 replies; 122+ results
From: Junio C Hamano @ 2022-06-07 17:00 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: git, Ævar Arnfjörð Bjarmason

Son Luong Ngoc <sluongng@gmail.com> writes:

> In rare circumstances where the current git index does not carry the
> last_update_token, the fsmonitor v2 hook will be invoked with an
> empty string which would caused the final rendered json to be invalid.
>
>   ["query", "/path/to/my/git/repository/", {
>           "since": ,
>           "fields": ["name"],
>           "expression": ["not", ["dirname", ".git"]]
>   }]
>
> Which will left user with the following error message

"left" -> "leave" (or "give")

>   > git status
>   failed to parse command from stdin: line 2, column 13, position 67: unexpected token near ','
>   Watchman: command returned no output.
>   Falling back to scanning...
>
> Hide the "since" field in json query when "last_update_token" is empty.
>
> Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

It looked more like Helped-by to me, but I dunno.

> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
> ---
>  templates/hooks--fsmonitor-watchman.sample | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample
> index 14ed0aa42d..23e856f5de 100755
> --- a/templates/hooks--fsmonitor-watchman.sample
> +++ b/templates/hooks--fsmonitor-watchman.sample
> @@ -86,12 +86,13 @@ sub watchman_query {
>  	# recency index to select candidate nodes and "fields" to limit the
>  	# output to file names only. Then we're using the "expression" term to
>  	# further constrain the results.
> +	my $last_update_line = "";
>  	if (substr($last_update_token, 0, 1) eq "c") {
>  		$last_update_token = "\"$last_update_token\"";
> +		$last_update_line = qq[\n"since": $last_update_token,];
>  	}
>  	my $query = <<"	END";
> -		["query", "$git_work_tree", {
> -			"since": $last_update_token,
> +		["query", "$git_work_tree", {$last_update_line
>  			"fields": ["name"],
>  			"expression": ["not", ["dirname", ".git"]]
>  		}]

OK.  Compared to v1, this looks much more reasonable.

This is totally unrelated to the "hide invalid since" topic, but I
wonder if $git_work_tree needs a bit more careful quoting.  It comes
directly from get_working_dir() but can it contain say a double quote
character, to make the resulting string in the variable $query not
quite well formed?

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v2] fsmonitor: query watchman with right valid json
  2022-06-07 11:14 21%     ` [PATCH v2] " Son Luong Ngoc
@ 2022-06-07 14:39  0%       ` Ævar Arnfjörð Bjarmason
  2022-06-07 17:00  6%       ` Junio C Hamano
  1 sibling, 0 replies; 122+ results
From: Ævar Arnfjörð Bjarmason @ 2022-06-07 14:39 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: git


On Tue, Jun 07 2022, Son Luong Ngoc wrote:

> In rare circumstances where the current git index does not carry the
> last_update_token, the fsmonitor v2 hook will be invoked with an
> empty string which would caused the final rendered json to be invalid.
>
>   ["query", "/path/to/my/git/repository/", {
>           "since": ,
>           "fields": ["name"],
>           "expression": ["not", ["dirname", ".git"]]
>   }]
>
> Which will left user with the following error message
>
>   > git status
>   failed to parse command from stdin: line 2, column 13, position 67: unexpected token near ','
>   Watchman: command returned no output.
>   Falling back to scanning...
>
> Hide the "since" field in json query when "last_update_token" is empty.
>
> Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>

Thanks for the quick turnaround.

>  templates/hooks--fsmonitor-watchman.sample | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample
> index 14ed0aa42d..23e856f5de 100755
> --- a/templates/hooks--fsmonitor-watchman.sample
> +++ b/templates/hooks--fsmonitor-watchman.sample
> @@ -86,12 +86,13 @@ sub watchman_query {
>  	# recency index to select candidate nodes and "fields" to limit the
>  	# output to file names only. Then we're using the "expression" term to
>  	# further constrain the results.
> +	my $last_update_line = "";
>  	if (substr($last_update_token, 0, 1) eq "c") {
>  		$last_update_token = "\"$last_update_token\"";
> +		$last_update_line = qq[\n"since": $last_update_token,];
>  	}

This LGTM, just a note...

>  	my $query = <<"	END";
> -		["query", "$git_work_tree", {
> -			"since": $last_update_token,
> +		["query", "$git_work_tree", {$last_update_line

...doesn't really need a re-roll, but doesn't this trade "we don't have
too many \n" for not indenting the query properly anymore?

I think skipping both is fine, but between the two I think having
indenting is better than having a redundant \n some of the time.

FWIW you could just add the variable on its own line, and then do this
instead:

	(my $query = <<"        END") =~ s/(?<=\n)\t*\n//s; 

To post-hoc fix the extra \n in this case :)

But I think this is also fine as-is, thanks!

^ permalink raw reply	[relevance 0%]

* [PATCH v2] fsmonitor: query watchman with right valid json
  2022-06-07 10:56  6%   ` Son Luong Ngoc
@ 2022-06-07 11:14 21%     ` Son Luong Ngoc
  2022-06-07 14:39  0%       ` Ævar Arnfjörð Bjarmason
  2022-06-07 17:00  6%       ` Junio C Hamano
  0 siblings, 2 replies; 122+ results
From: Son Luong Ngoc @ 2022-06-07 11:14 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Ævar Arnfjörð Bjarmason

In rare circumstances where the current git index does not carry the
last_update_token, the fsmonitor v2 hook will be invoked with an
empty string which would caused the final rendered json to be invalid.

  ["query", "/path/to/my/git/repository/", {
          "since": ,
          "fields": ["name"],
          "expression": ["not", ["dirname", ".git"]]
  }]

Which will left user with the following error message

  > git status
  failed to parse command from stdin: line 2, column 13, position 67: unexpected token near ','
  Watchman: command returned no output.
  Falling back to scanning...

Hide the "since" field in json query when "last_update_token" is empty.

Co-authored-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
 templates/hooks--fsmonitor-watchman.sample | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample
index 14ed0aa42d..23e856f5de 100755
--- a/templates/hooks--fsmonitor-watchman.sample
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -86,12 +86,13 @@ sub watchman_query {
 	# recency index to select candidate nodes and "fields" to limit the
 	# output to file names only. Then we're using the "expression" term to
 	# further constrain the results.
+	my $last_update_line = "";
 	if (substr($last_update_token, 0, 1) eq "c") {
 		$last_update_token = "\"$last_update_token\"";
+		$last_update_line = qq[\n"since": $last_update_token,];
 	}
 	my $query = <<"	END";
-		["query", "$git_work_tree", {
-			"since": $last_update_token,
+		["query", "$git_work_tree", {$last_update_line
 			"fields": ["name"],
 			"expression": ["not", ["dirname", ".git"]]
 		}]
-- 
2.36.1.476.g0c4daa206d


^ permalink raw reply related	[relevance 21%]

* Re: [PATCH] fsmonitor: query watchman with right valid json
  2022-06-07  8:40  0% ` Ævar Arnfjörð Bjarmason
@ 2022-06-07 10:56  6%   ` Son Luong Ngoc
  2022-06-07 11:14 21%     ` [PATCH v2] " Son Luong Ngoc
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2022-06-07 10:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

Hi Ævar,

On Tue, Jun 7, 2022 at 10:42 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
>
> On Tue, Jun 07 2022, Son Luong Ngoc wrote:
>
> > In rare circumstances where the current git index does not carry the
> > last_update_token, the fsmonitor v2 hook will be invoked with an
> > empty string which would caused the final rendered json to be invalid.
> >
> >   ["query", "/path/to/my/git/repository/", {
> >           "since": ,
> >           "fields": ["name"],
> >           "expression": ["not", ["dirname", ".git"]]
> >   }]
> >
> > Which will left user with the following error message
> >
> >   > git status
> >   failed to parse command from stdin: line 2, column 13, position 67: unexpected token near ','
> >   Watchman: command returned no output.
> >   Falling back to scanning...
> >
> > Hide the "since" field in json query when "last_update_token" is empty.
> >
> > Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
> > ---
> >  templates/hooks--fsmonitor-watchman.sample | 21 +++++++++++++--------
> >  1 file changed, 13 insertions(+), 8 deletions(-)
> >
> > diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample
> > index 14ed0aa42d..b4ee86dfc4 100755
> > --- a/templates/hooks--fsmonitor-watchman.sample
> > +++ b/templates/hooks--fsmonitor-watchman.sample
> > @@ -79,6 +79,12 @@ sub watchman_query {
> >       or die "open2() failed: $!\n" .
> >       "Falling back to scanning...\n";
> >
> > +     my $query = <<" END";
> > +             ["query", "$git_work_tree", {
> > +                     "fields": ["name"],
> > +                     "expression": ["not", ["dirname", ".git"]]
> > +             }]
> > +     END
>
> Wouldn't a more minimal & obvious patch here be....
>
> >       # In the query expression below we're asking for names of files that
> >       # changed since $last_update_token but not from the .git folder.
> >       #
> > @@ -87,15 +93,14 @@ sub watchman_query {
> >       # output to file names only. Then we're using the "expression" term to
> >       # further constrain the results.
> >       if (substr($last_update_token, 0, 1) eq "c") {
> > -             $last_update_token = "\"$last_update_token\"";
>
> To just change this to be:
>
>         # same as now:
>         $last_update_token = "\"$last_update_token\"";
>         $last_update_line = qq["since": $last_update_token,];
>
> Of course having declared the new $last_update_line variable earlier, then:
>

Yup, I think this is a sensible suggestion.
I will fixup and send a V2 shortly.

> > +             $query = <<"            END";
> > +                     ["query", "$git_work_tree", {
> > +                             "since": "$last_update_token",
> > +                             "fields": ["name"],
> > +                             "expression": ["not", ["dirname", ".git"]]
> > +                     }]
> > +             END
> >       }
> > -     my $query = <<" END";
> > -             ["query", "$git_work_tree", {
> > -                     "since": $last_update_token,
>
> Just change this line to:
>
>         $last_update_line
>
> I.e. you don't need to duplicate the whole query just to omit/include a
> single line in it, or am I missing something?
>
> (This suggestion *would* include a redundant line, but I'm assuming
> JSON/watchman deals with that just fine...).

I think we can remove that redundant line by adding '\n' before
$last_update_line.
I will be including this into the next version.

Thanks,
Son Luong.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH] fsmonitor: query watchman with right valid json
  2022-06-07  7:54 20% [PATCH] fsmonitor: query watchman with right valid json Son Luong Ngoc
@ 2022-06-07  8:40  0% ` Ævar Arnfjörð Bjarmason
  2022-06-07 10:56  6%   ` Son Luong Ngoc
  0 siblings, 1 reply; 122+ results
From: Ævar Arnfjörð Bjarmason @ 2022-06-07  8:40 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: git


On Tue, Jun 07 2022, Son Luong Ngoc wrote:

> In rare circumstances where the current git index does not carry the
> last_update_token, the fsmonitor v2 hook will be invoked with an
> empty string which would caused the final rendered json to be invalid.
>
>   ["query", "/path/to/my/git/repository/", {
>           "since": ,
>           "fields": ["name"],
>           "expression": ["not", ["dirname", ".git"]]
>   }]
>
> Which will left user with the following error message
>
>   > git status
>   failed to parse command from stdin: line 2, column 13, position 67: unexpected token near ','
>   Watchman: command returned no output.
>   Falling back to scanning...
>
> Hide the "since" field in json query when "last_update_token" is empty.
>
> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
> ---
>  templates/hooks--fsmonitor-watchman.sample | 21 +++++++++++++--------
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample
> index 14ed0aa42d..b4ee86dfc4 100755
> --- a/templates/hooks--fsmonitor-watchman.sample
> +++ b/templates/hooks--fsmonitor-watchman.sample
> @@ -79,6 +79,12 @@ sub watchman_query {
>  	or die "open2() failed: $!\n" .
>  	"Falling back to scanning...\n";
>  
> +	my $query = <<"	END";
> +		["query", "$git_work_tree", {
> +			"fields": ["name"],
> +			"expression": ["not", ["dirname", ".git"]]
> +		}]
> +	END

Wouldn't a more minimal & obvious patch here be....

>  	# In the query expression below we're asking for names of files that
>  	# changed since $last_update_token but not from the .git folder.
>  	#
> @@ -87,15 +93,14 @@ sub watchman_query {
>  	# output to file names only. Then we're using the "expression" term to
>  	# further constrain the results.
>  	if (substr($last_update_token, 0, 1) eq "c") {
> -		$last_update_token = "\"$last_update_token\"";

To just change this to be:

	# same as now:
	$last_update_token = "\"$last_update_token\"";
        $last_update_line = qq["since": $last_update_token,];

Of course having declared the new $last_update_line variable earlier, then:

> +		$query = <<"		END";
> +			["query", "$git_work_tree", {
> +				"since": "$last_update_token",
> +				"fields": ["name"],
> +				"expression": ["not", ["dirname", ".git"]]
> +			}]
> +		END
>  	}
> -	my $query = <<"	END";
> -		["query", "$git_work_tree", {
> -			"since": $last_update_token,

Just change this line to:

	$last_update_line

I.e. you don't need to duplicate the whole query just to omit/include a
single line in it, or am I missing something?

(This suggestion *would* include a redundant line, but I'm assuming
JSON/watchman deals with that just fine...).

^ permalink raw reply	[relevance 0%]

* [PATCH] fsmonitor: query watchman with right valid json
@ 2022-06-07  7:54 20% Son Luong Ngoc
  2022-06-07  8:40  0% ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2022-06-07  7:54 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc

In rare circumstances where the current git index does not carry the
last_update_token, the fsmonitor v2 hook will be invoked with an
empty string which would caused the final rendered json to be invalid.

  ["query", "/path/to/my/git/repository/", {
          "since": ,
          "fields": ["name"],
          "expression": ["not", ["dirname", ".git"]]
  }]

Which will left user with the following error message

  > git status
  failed to parse command from stdin: line 2, column 13, position 67: unexpected token near ','
  Watchman: command returned no output.
  Falling back to scanning...

Hide the "since" field in json query when "last_update_token" is empty.

Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
 templates/hooks--fsmonitor-watchman.sample | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/templates/hooks--fsmonitor-watchman.sample b/templates/hooks--fsmonitor-watchman.sample
index 14ed0aa42d..b4ee86dfc4 100755
--- a/templates/hooks--fsmonitor-watchman.sample
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -79,6 +79,12 @@ sub watchman_query {
 	or die "open2() failed: $!\n" .
 	"Falling back to scanning...\n";
 
+	my $query = <<"	END";
+		["query", "$git_work_tree", {
+			"fields": ["name"],
+			"expression": ["not", ["dirname", ".git"]]
+		}]
+	END
 	# In the query expression below we're asking for names of files that
 	# changed since $last_update_token but not from the .git folder.
 	#
@@ -87,15 +93,14 @@ sub watchman_query {
 	# output to file names only. Then we're using the "expression" term to
 	# further constrain the results.
 	if (substr($last_update_token, 0, 1) eq "c") {
-		$last_update_token = "\"$last_update_token\"";
+		$query = <<"		END";
+			["query", "$git_work_tree", {
+				"since": "$last_update_token",
+				"fields": ["name"],
+				"expression": ["not", ["dirname", ".git"]]
+			}]
+		END
 	}
-	my $query = <<"	END";
-		["query", "$git_work_tree", {
-			"since": $last_update_token,
-			"fields": ["name"],
-			"expression": ["not", ["dirname", ".git"]]
-		}]
-	END
 
 	# Uncomment for debugging the watchman query
 	# open (my $fh, ">", ".git/watchman-query.json");
-- 
2.36.1.231.g6924ef9a07


^ permalink raw reply related	[relevance 20%]

* Antw: [EXT] Re: Filtering commits after filtering the tree
  2021-12-31 23:48  5%   ` Elijah Newren
@ 2022-01-03  9:26  0%     ` Ulrich Windl
  0 siblings, 0 replies; 122+ results
From: Ulrich Windl @ 2022-01-03  9:26 UTC (permalink / raw)
  To: Elijah Newren, sluongng; +Cc: git

>>> Elijah Newren <newren@gmail.com> schrieb am 01.01.2022 um 00:48 in Nachricht
<CABPp-BHnpKZ8LJzd_NL_6TGe7U3A2xPDPuvBkDQ68iTH_un6=A@mail.gmail.com>:
> On Fri, Dec 31, 2021 at 2:27 AM Son Luong Ngoc <sluongng@gmail.com> wrote:
>>
>> Hi Ulrich,
>>
>> On Thu, Dec 30, 2021 at 12:28 PM Ulrich Windl
>> <Ulrich.Windl@rz.uni-regensburg.de> wrote:
>> >
>> > Hi guys!
>> >
>> >
>> > As  I know there are really smart ones around, please don't laugh how I 
> helped myself with this problem:
>> > https://stackoverflow.com/q/70505903/6607497 
>> > I'm sure you wouldn't have wasted hours with rebasing interactively...
>> >
>> >
>> > Feel free to comment either on the list or at SO (comment or improved 
> answer).
>>
>> You probably want to try git-filter-repo (1)
>> while using `--message-callback` as documented in (2)
> 
> In particular, you'd get most of the way there with a simple
> 
>    git filter-repo --path my-module/
> 
> That will remove all files not under my-module/ from the repository,
> AND remove all commits that become empty due to removing all the other
> files.
> 
> 
> If you had commits which both touched my-module/ and also made
> reference to other files outside of my-module/, then you may also want
> to clean those up.  If that's something you can write code to do
> (perhaps because the commit messages were regular, or you are an
> expert at parsing and rewriting natural language programatically),
> then the --message-callback suggested by Son could help you out.  That
> sounds difficult to me, because I don't know how to even identify such
> commits without having a human being read every single one.
> 
> But it sounded to me like most of the commit messages you didn't want
> were ones that just touched paths outside of your selected module, in
> which case the simple path filtering I suggested above would clear
> those all out for you.

Yes, as I had a changelog type of file I had many commits describing changes in changelog that refer to files that are no longer part of the repository (commits for the files themselves had vanished already).
(I know changelogs are a bad idea)

I'll try your proposal next time. Writing custom Python filters is too much for me at the moment...

Thanks and kind Regards (and a happy new year),
Ulrich





^ permalink raw reply	[relevance 0%]

* Re: Filtering commits after filtering the tree
  2021-12-30 13:19  6% ` Son Luong Ngoc
@ 2021-12-31 23:48  5%   ` Elijah Newren
  2022-01-03  9:26  0%     ` Antw: [EXT] " Ulrich Windl
  0 siblings, 1 reply; 122+ results
From: Elijah Newren @ 2021-12-31 23:48 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: Ulrich Windl, git

On Fri, Dec 31, 2021 at 2:27 AM Son Luong Ngoc <sluongng@gmail.com> wrote:
>
> Hi Ulrich,
>
> On Thu, Dec 30, 2021 at 12:28 PM Ulrich Windl
> <Ulrich.Windl@rz.uni-regensburg.de> wrote:
> >
> > Hi guys!
> >
> >
> > As  I know there are really smart ones around, please don't laugh how I helped myself with this problem:
> > https://stackoverflow.com/q/70505903/6607497
> > I'm sure you wouldn't have wasted hours with rebasing interactively...
> >
> >
> > Feel free to comment either on the list or at SO (comment or improved answer).
>
> You probably want to try git-filter-repo (1)
> while using `--message-callback` as documented in (2)

In particular, you'd get most of the way there with a simple

   git filter-repo --path my-module/

That will remove all files not under my-module/ from the repository,
AND remove all commits that become empty due to removing all the other
files.


If you had commits which both touched my-module/ and also made
reference to other files outside of my-module/, then you may also want
to clean those up.  If that's something you can write code to do
(perhaps because the commit messages were regular, or you are an
expert at parsing and rewriting natural language programatically),
then the --message-callback suggested by Son could help you out.  That
sounds difficult to me, because I don't know how to even identify such
commits without having a human being read every single one.

But it sounded to me like most of the commit messages you didn't want
were ones that just touched paths outside of your selected module, in
which case the simple path filtering I suggested above would clear
those all out for you.

^ permalink raw reply	[relevance 5%]

* Re: Filtering commits after filtering the tree
  @ 2021-12-30 13:19  6% ` Son Luong Ngoc
  2021-12-31 23:48  5%   ` Elijah Newren
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2021-12-30 13:19 UTC (permalink / raw)
  To: Ulrich Windl; +Cc: git

Hi Ulrich,

On Thu, Dec 30, 2021 at 12:28 PM Ulrich Windl
<Ulrich.Windl@rz.uni-regensburg.de> wrote:
>
> Hi guys!
>
>
> As  I know there are really smart ones around, please don't laugh how I helped myself with this problem:
> https://stackoverflow.com/q/70505903/6607497
> I'm sure you wouldn't have wasted hours with rebasing interactively...
>
>
> Feel free to comment either on the list or at SO (comment or improved answer).

You probably want to try git-filter-repo (1)
while using `--message-callback` as documented in (2)

>
>
> Regards,
> Urich
>
>

Hope it helps,
Son Luong.

(1) https://github.com/newren/git-filter-repo
(2) https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#CALLBACKS

^ permalink raw reply	[relevance 6%]

* Re: [Summit topic] Increasing diversity & inclusion (transition to `main`, etc)
  @ 2021-10-21 12:55  6%   ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2021-10-21 12:55 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Hi,

On Thu, Oct 21, 2021 at 1:57 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> This session was led by Johannes "Dscho" Schindelin. Supporting cast:
> brian "Bmc" carlson, Jeff "Peff" King, Taylor Blau, CB Bailey, Ævar
> Arnfjörð "Avarab" Bjarmason, Jonathan "Jrnieder" Nieder, Derrick Stolee,
> Lessley Dennington, Glen Choo, Philip Oakley, Victoria Dye, and Jonathan
> "Jonathantanmy" Tan.
>
> Notes:
>

...

>
>  5.  Git the community is still overwhelmingly male, white - what can we do?
>

...

>
>  19. Stolee: When we moved from MS to GH, we received quick feedback that we
>      weren’t communicating well - too direct and unemotional. Maybe Git
>      community communicates that way, but that’s not how most people interact;
>      that makes me think that our “efficient and effective” communication is
>      actually too aggressive, and easily interpreted as attacks on
>      contributors. Basically… let’s all lighten up? :)
>
>  20. Taylor: Yep, my “talking to GitHubbers at GitHub” voice is different from
>      my “talking to Gitters on Git list” voice. New contributors, are we on the
>      right track here?
>

...

>
>  34. Avarab: I think it’s a good thing to work on; we need to be really careful
>      about what guidelines we pick and choose. Need to ensure an easy path for
>      new contributors so they don’t need to read hours of documentation for a
>      typo fix. Plus we need to ensure that this doc is accessible for folks who
>      have different first language than English.
>
>  35. Bmc: on git-lfs we have a contributor with very little English, so when we
>      did the review I’d offer an alternative text, and we would work together.
>      That process was useful to come up with readable documentation in a
>      helpful way. That is, proposing a solution instead of pointing out the
>      problem and saying “fix it” can help a lot in scenarios like this.
>
>  36. Dscho: Yep, this is important and will help us be more accessible to
>      contributors whose English is not super top notch Cambridge exam :)

Yes, thanks for mentioning the non-English speaking community.

I have been an avid reader of the Git Mailing List for the past years and can't
help but notice contributions from folks working in Alibaba(China) have been
taking a lot more iterations to get to final reviews than usual contributions.

I would recommend, on top of having a guideline document, to have a
Valve check (1) setup as a commit-msg hook and run it as part of
GitGitGadget CI to help folks shorten the feedback loops in some basic cases.

Cheers,
Son Luong.

(1): https://docs.errata.ai/vale/styles

^ permalink raw reply	[relevance 6%]

* Re: [Summit topic] Crazy (and not so crazy) ideas
  @ 2021-10-21 12:30  6%   ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2021-10-21 12:30 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Hi,

On Thu, Oct 21, 2021 at 1:56 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> This session was led by Elijah Newren. Supporting cast: Johannes "Dscho"
> Schindelin, Jonathan Tan, Jonathan "jrnieder" Nieder, brian m. carlson,
> Jeff "Peff" King, Ævar Arnfjörð Bjarmason, Emily Shaffer, CB Bailey,
> Taylor Blau, and Philip Oakley.
>
> Notes:
>

...

>
> * Biggest idea: there are a lot of people who version control things via
>   tarballs or .zip files per version. This prevents history from
>   compressing well. Some people check in those compressed files into Git
>   for purposes of history.
>

...

>
>    * Old suggestion of a “blob-tree” type that allows storing a single
>      index entry that corresponds to multiple trees and blobs in the
>      background, possibly.
>
>    * One long-term dream (inspired by Avery Pennarun’s “bup” tool) is to
>      store large binary files in a tree-structured way that can store
>      common regions as deltas, improve random access, parallelized
>      hashing. Involves a consistent way to split the file into stable
>      pieces, like --rsyncable uses (based on a rolling hash being zero).
>
>    * Peff: you can do that at the object model layer or at the storage
>      layer. The latter is less invasive.
>
>    * jrnieder: The benefits of blobtree are greater at the object model
>      layer --- e.g. not having to transmit chunks over the wire that you
>      already have. I think the main obstacle has been that the benefits
>      haven’t been enough to be worth the complexity. If that changes, we
>      can imagine bundling it with some other object format changes, e.g.
>      putting blob sizes in tree objects, and rolling it out as a new
>      object-format.
>

I think this was implemented as 'Blob Ref' in Yandex's vcs named Arc.
I was suggesting this to Gitlab folks earlier (1) as a possible solution to
large file storage.

Very glad to hear that it was brought up during the summit.

Cheers,
Son Luong.

(1): https://gitlab.com/gitlab-org/git/-/issues/93

^ permalink raw reply	[relevance 6%]

* t5607 fail with GIT_TEST_FAIL_PREREQS enabled
@ 2021-08-11 13:02  5% Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2021-08-11 13:02 UTC (permalink / raw)
  To: git; +Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano

Hi folks,

Our internal CI spotted a failing test when build from 'next' and
'master' branch

    git/t% GIT_TEST_FAIL_PREREQS=1 ./t5607-clone-bundle.sh
    ok 1 - setup
    ok 2 - "verify" needs a worktree
    ok 3 - annotated tags can be excluded by rev-list options
    ok 4 - die if bundle file cannot be created
    ok 5 - bundle --stdin
    ok 6 - bundle --stdin <rev-list options>
    ok 7 - empty bundle file is rejected
    not ok 8 - ridiculously long subject in boundary
    #
    #               >file4 &&
    #               test_tick &&
    #               git add file4 &&
    #               printf "%01200d\n" 0 | git commit -F - &&
    #               test_commit fifth &&
    #               git bundle create long-subject-bundle.bdl HEAD^..HEAD &&
    #               cat >expect <<-EOF &&
    #               $(git rev-parse main) HEAD
    #               EOF
    #               git bundle list-heads long-subject-bundle.bdl >actual &&
    #               test_cmp expect actual &&
    #
    #               git fetch long-subject-bundle.bdl &&
    #
    #               if ! test_have_prereq SHA1
    #               then
    #                       echo "@object-format=sha256"
    #               fi >expect &&
    #               cat >>expect <<-EOF &&
    #               -$(git log --pretty=format:"%H %s" -1 HEAD^)
    #               $(git rev-parse HEAD) HEAD
    #               EOF
    #
    #               if test_have_prereq SHA1
    #               then
    #                       head -n 3 long-subject-bundle.bdl
    #               else
    #                       head -n 4 long-subject-bundle.bdl
    #               fi | grep -v "^#" >actual &&
    #
    #               test_cmp expect actual
    #
    ok 9 - prerequisites with an empty commit message
    ok 10 - failed bundle creation does not leave cruft
    ok 11 - fetch SHA-1 from bundle
    ok 12 - git bundle uses expected default format
    ok 13 - git bundle v3 has expected contents
    ok 14 - git bundle v3 rejects unknown capabilities
    # failed 1 among 14 test(s)
    1..14

Cheers,
Son Luong.

^ permalink raw reply	[relevance 5%]

* Re: [PATCH v2] packfile: freshen the mtime of packfile by configuration
  @ 2021-07-15  8:23  5%         ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2021-07-15  8:23 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Martin Fick, Taylor Blau, Sun Chao, Taylor Blau,
	Sun Chao via GitGitGadget, git

Hi folks,

On Wed, Jul 14, 2021 at 10:03 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> *nod*
>
> FWIW at an ex-job I helped systems administrators who'd produced such a
> broken backup-via-rsync create a hybrid version as an interim
> solution. I.e. it would sync the objects via git transport, and do an
> rsync on a whitelist (or blacklist), so pickup config, but exclude
> objects.
>
> "Hybrid" because it was in a state of needing to deal with manual
> tweaking of config.
>
> But usually someone who's needing to thoroughly solve this backup
> problem will inevitably end up with wanting to drive everything that's
> not in the object or refstore from some external system, i.e. have
> config be generated from puppet, a database etc., ditto for alternates
> etc.
>
> But even if you can't get to that point (or don't want to) I'd say aim
> for the hybrid system.

FWIW, we are running our repo on top of a some-what flickery DRBD setup and
we decided to use both

  git clone --upload-pack 'git -c transfer.hiderefs="!refs"
upload-pack' --mirror`

and

  `tar`

to create 2 separate snapshots for backup in parallel (full backup,
not incremental).

In case of recovery (manual), we first rely on the git snapshot and if
there is any
missing objects/refs, we will try to get it from the tarball.

>
> This isn't some purely theoretical concern b.t.w., the system using
> rsync like this was producing repos that wouldn't fsck all the time, and
> it wasn't such a busy site.
>
> I suspect (but haven't tried) that for someone who can't easily change
> their backup solution they'd get most of the benefits of git-native
> transport by having their "rsync" sync refs, then objects, not the other
> way around. Glob order dictates that most backup systems will do
> objects, then refs (which will of course, at that point, refer to
> nonexisting objects).
>
> It's still not safe, you'll still be subject to races, but probably a
> lot better in practice.

I would love to get some guidance in official documentation on what is the best
practice around handling git data on the server side.

Is git-clone + git-bundle the go-to solution?
Should tar/rsync not be used completely or is there a trade-off?

Thanks,
Son Luong.

^ permalink raw reply	[relevance 5%]

* Re: [PATCH] pull: abort if --ff-only is given and fast-forwarding is impossible
  2021-07-14 15:22  5%   ` Elijah Newren
@ 2021-07-14 17:31  0%     ` Felipe Contreras
  0 siblings, 0 replies; 122+ results
From: Felipe Contreras @ 2021-07-14 17:31 UTC (permalink / raw)
  To: Elijah Newren, Son Luong Ngoc
  Cc: Alex Henrie, git, Phillip Wood, Ævar Arnfjörð,
	Junio C Hamano, Felipe Contreras

Elijah Newren wrote:
> On Wed, Jul 14, 2021 at 1:37 AM Son Luong Ngoc <sluongng@gmail.com> wrote:
> > I am out of the loop in this thread but I have been seeing strange behaviors
> > with pull.rebase=true in the 'next' branch and also in the 'master'
> > branch in recent days.
> 
> I'm not surprised it happens with recent versions, but I'd expect this
> to have happened with older versions too.  Is this not reproducible
> with git-2.32.0 or older git versions?

I already provided an accurate target [1].

> >   > git version
> >   git version 2.32.0.432.gabb21c7263
> >   > git config -l | grep pull
> >   pull.rebase=true
> >   pull.ff=false
> 
> So, you have conflicting configuration options set.  pull.ff=false
> maps to --no-ff which is documented to create a merge.
> pull.rebase=true maps to --rebase which says to run a rebase.
> 
> You probably want to drop one of these.

`pull.ff` will be honored by `git pull --merge`.

[1] https://lore.kernel.org/git/60eeff69293fb_10e52087a@natae.notmuch/

-- 
Felipe Contreras

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] pull: abort if --ff-only is given and fast-forwarding is impossible
  2021-07-14  8:37  5% ` Son Luong Ngoc
@ 2021-07-14 15:22  5%   ` Elijah Newren
  2021-07-14 17:31  0%     ` Felipe Contreras
  0 siblings, 1 reply; 122+ results
From: Elijah Newren @ 2021-07-14 15:22 UTC (permalink / raw)
  To: Son Luong Ngoc
  Cc: Alex Henrie, git, Phillip Wood, Ævar Arnfjörð,
	Junio C Hamano, Felipe Contreras

On Wed, Jul 14, 2021 at 1:37 AM Son Luong Ngoc <sluongng@gmail.com> wrote:
>
> Hi folks,
>
> I am out of the loop in this thread but I have been seeing strange behaviors
> with pull.rebase=true in the 'next' branch and also in the 'master'
> branch in recent days.

I'm not surprised it happens with recent versions, but I'd expect this
to have happened with older versions too.  Is this not reproducible
with git-2.32.0 or older git versions?

>   > git version
>   git version 2.32.0.432.gabb21c7263
>   > git config -l | grep pull
>   pull.rebase=true
>   pull.ff=false

So, you have conflicting configuration options set.  pull.ff=false
maps to --no-ff which is documented to create a merge.
pull.rebase=true maps to --rebase which says to run a rebase.

You probably want to drop one of these.

> But a git pull would still run fast-forward.
> Some of our users (including myself) rely on disabling fast-forward to emit the
> per-file change log summary after each git-pull
>
>   Updating 245f278cb729..5e8d960db7b3
>   Fast-forward
>    some/file/dir.ext         | 44 ++++++++++++++++++++++++++++++++++++++++++++
>    another/file/dir.ext     |  6 +++---
>   2 files changed, 47 insertions(+), 3 deletions(-)
>
> In a big, fast moving monorepo, this summary is a lot of noise and
> switching to pull.rebase=true
> used to be the way to turn it off. If the change is intended for next
> version release, is there a
> workaround for this?

Thanks for the report.  This particular commit has not yet been picked
up, not even in seen.  But it's a good example of how conflicting
configuration really ought to result in an error rather than randomly
picking one to trump, and suggests why we should complete the patch.

However, since I'm commenting on this and the stat information appears
to be important to you, note that there are also merge.stat and
rebase.stat configuration variables for controlling whether those are
shown at the end of merge and rebase operations.

Hope that helps,
Elijah

^ permalink raw reply	[relevance 5%]

* Re: [PATCH] pull: abort if --ff-only is given and fast-forwarding is impossible
  @ 2021-07-14  8:37  5% ` Son Luong Ngoc
  2021-07-14 15:22  5%   ` Elijah Newren
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2021-07-14  8:37 UTC (permalink / raw)
  To: Alex Henrie
  Cc: git, phillip.wood123, avarab, Junio C Hamano, felipe.contreras,
	Elijah Newren

Hi folks,

I am out of the loop in this thread but I have been seeing strange behaviors
with pull.rebase=true in the 'next' branch and also in the 'master'
branch in recent days.

  > git version
  git version 2.32.0.432.gabb21c7263
  > git config -l | grep pull
  pull.rebase=true
  pull.ff=false

But a git pull would still run fast-forward.
Some of our users (including myself) rely on disabling fast-forward to emit the
per-file change log summary after each git-pull

  Updating 245f278cb729..5e8d960db7b3
  Fast-forward
   some/file/dir.ext         | 44 ++++++++++++++++++++++++++++++++++++++++++++
   another/file/dir.ext     |  6 +++---
  2 files changed, 47 insertions(+), 3 deletions(-)

In a big, fast moving monorepo, this summary is a lot of noise and
switching to pull.rebase=true
used to be the way to turn it off. If the change is intended for next
version release, is there a
workaround for this?

Cheers,
Son Luong

^ permalink raw reply	[relevance 5%]

* Re: Pain points in Git's patch flow
  2021-04-21 10:19  0%     ` Ævar Arnfjörð Bjarmason
@ 2021-04-28  7:21  0%       ` Eric Wong
  0 siblings, 0 replies; 122+ results
From: Eric Wong @ 2021-04-28  7:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Son Luong Ngoc, Jonathan Nieder, git, Raxel Gutierrez, mricon,
	patchwork, Junio C Hamano, Taylor Blau, Emily Shaffer

Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> On Mon, Apr 19 2021, Eric Wong wrote:
> > Son Luong Ngoc <sluongng@gmail.com> wrote:
> >> [...]
> >> 3. Isssue with archive:
> >> 
> >> - I don't find the ML archive trivial for new comers.  It took me a bit
> >>   of time to realize: 'Oh if I scroll to bottom and find the "Thread 
> >>   overview" then I can navigate a mailing thread a lot easier'.
> >
> > (I'm the maintainer of public-inbox, the archival software you
> > seem to be referring to).
> >
> > I'm not sure how to make "Thread overview" easier to find
> > without cluttering the display near the top.  Maybe I'll try
> > aria labels in the Subject: link...
> 
> I'd say the bare-bones style of it is probably jarring to most users
> today. I had to check if the site even had any CSS at all.
> 
> I.e. I think a more intuitive UI to users today would probably be some
> collapsible side-bar on the left of the screen, which would have a
> threaded view. The "Archives are clonable" would probably belong in some
> "help" tab in such a UI.

The plan is to support read-only JMAP, so it's a stable API that
users can build their own displays on top of (of course, NNTP
and IMAP support already exists).

I can't make drastic UI changes such as a sidebar without
breaking things for users who like the current UI.  I only know
about GNOME3 and Digg because they made drastic UI changes that
angered their existing userbase.

The current UI is designed to for a terminal with w3m|lynx since
it's the lowest common denominator.  Graphics drivers/stacks
seem to be most frequently broken thing on GNU/Linux systems, so
it's important users can find patches/configs/help easily with a
text-only browser in order to get graphics working.

^ permalink raw reply	[relevance 0%]

* Re: Pain points in Git's patch flow
  2021-04-19  2:57  4%   ` Eric Wong
  2021-04-21 10:19  0%     ` Ævar Arnfjörð Bjarmason
@ 2021-04-28  7:05  0%     ` Eric Wong
  1 sibling, 0 replies; 122+ results
From: Eric Wong @ 2021-04-28  7:05 UTC (permalink / raw)
  To: Son Luong Ngoc
  Cc: Jonathan Nieder, git, Raxel Gutierrez, mricon, patchwork,
	Junio C Hamano, Taylor Blau, Emily Shaffer

Eric Wong <e@80x24.org> wrote:
> Son Luong Ngoc <sluongng@gmail.com> wrote:
> > 3. Isssue with archive:
> > 
> > - I don't find the ML archive trivial for new comers.  It took me a bit
> >   of time to realize: 'Oh if I scroll to bottom and find the "Thread 
> >   overview" then I can navigate a mailing thread a lot easier'.
> 
> (I'm the maintainer of public-inbox, the archival software you
> seem to be referring to).
> 
> I'm not sure how to make "Thread overview" easier to find
> without cluttering the display near the top.  Maybe I'll try
> aria labels in the Subject: link...

I think I made [thread overview] easier-to-find without adding
more clutter:

	https://public-inbox.org/meta/20210428065522.12795-1-e@80x24.org/

Not sure about title attribute or aria labels (my version of w3m
doesn't support that, yet).  Anyways, an an example of it
deployed:

	https://public-inbox.org/git/20210419025754.GA26065@dcvr/

^ permalink raw reply	[relevance 0%]

* Re: Pain points in Git's patch flow
  2021-04-19  2:57  4%   ` Eric Wong
@ 2021-04-21 10:19  0%     ` Ævar Arnfjörð Bjarmason
  2021-04-28  7:21  0%       ` Eric Wong
  2021-04-28  7:05  0%     ` Eric Wong
  1 sibling, 1 reply; 122+ results
From: Ævar Arnfjörð Bjarmason @ 2021-04-21 10:19 UTC (permalink / raw)
  To: Eric Wong
  Cc: Son Luong Ngoc, Jonathan Nieder, git, Raxel Gutierrez, mricon,
	patchwork, Junio C Hamano, Taylor Blau, Emily Shaffer


On Mon, Apr 19 2021, Eric Wong wrote:

> Son Luong Ngoc <sluongng@gmail.com> wrote:
>> [...]
>> 3. Isssue with archive:
>> 
>> - I don't find the ML archive trivial for new comers.  It took me a bit
>>   of time to realize: 'Oh if I scroll to bottom and find the "Thread 
>>   overview" then I can navigate a mailing thread a lot easier'.
>
> (I'm the maintainer of public-inbox, the archival software you
> seem to be referring to).
>
> I'm not sure how to make "Thread overview" easier to find
> without cluttering the display near the top.  Maybe I'll try
> aria labels in the Subject: link...

I'd say the bare-bones style of it is probably jarring to most users
today. I had to check if the site even had any CSS at all.

I.e. I think a more intuitive UI to users today would probably be some
collapsible side-bar on the left of the screen, which would have a
threaded view. The "Archives are clonable" would probably belong in some
"help" tab in such a UI.

^ permalink raw reply	[relevance 0%]

* Re: Pain points in PRs [was: Re: RFC: Moving git-gui development to GitHub]
  2021-04-20  7:49  5%               ` Son Luong Ngoc
@ 2021-04-20 20:17  5%                 ` Junio C Hamano
  0 siblings, 0 replies; 122+ results
From: Junio C Hamano @ 2021-04-20 20:17 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: SZEDER Gábor, Elijah Newren, Git Mailing List

Son Luong Ngoc <sluongng@gmail.com> writes:

> On Mon, Apr 19, 2021 at 02:52:16PM -0700, Junio C Hamano wrote:
>> 
>> Interesting.
>> 
>> I recently had a similar experience with Gerrit, where a patch I
>> have seen quite a few times on Gerrit at $WORK had an embarrassing
>> syntactic issues I did not discover until it hit the public mailing
>> list.  It may be different from reviewer to reviewer, but at least
>> to me, e-mailed workflow forces me to apply the patch to my tree
>> before I can say anything non-trivially intelligent about it and
>> once applied to the tree, it actually let's me play with the code
>> (like, say, asking the compiler to give its opinion on it).
>> 
>
> I think this is very much the point of having a good CI pipeline:
>   - Apply patches into tree
>   - Compile
>   - Run relevant tests

It is true that CI can spot -Wdecl-after-stmt, but CI only covers
just one part of what is needed while I do my reviews.  It would
also be doable with web interface to look at all the places that
functions modified by the patch are referred to, and to check if the
change makes sense in the context of the entire tree.  It would also
be doable with web interface to looking at the evolution of the code
being changed.  There are some things, like building and using for
everyday life, running the built binary under debuggers, etc., that
may be harder to do with web interface, but I am sure many things
would become doable given enough time and effort.  However.

> Yes, having context beyond the diff is very important for Code Review.
> This is why I strongly recommend SourceGraph usages to folks I know.
> ...
> So I guess mordern toolings are available for these usecases, but
> fragmented and subjective to personal workflow.

My point in the message you are responding to was that I can do all
what is necessary locally, with my favorite toolset, once I apply a
patch to my tree.  The only thing that Gerrit allowed me to skip in
my recent adventure was to download the patch and apply to a newly
created topic branch locally to my tree, before I can start doing
some of the things (e.g. "look at the patch, examine with larger
context as needed", "grep for the symbols at the same revision in
paths that are not touched by the patch") that was needed to review.
And while I know I shouldn't blame the tool for this, but it did
mislead me to false sense of "I've reviewed this change well enough",
when I haven't.

By the way, I've been playing with "b4 am" and it's been a pleasant
experience so far.

Thanks.

^ permalink raw reply	[relevance 5%]

* Re: Pain points in PRs [was: Re: RFC: Moving git-gui development to GitHub]
  @ 2021-04-20  7:49  5%               ` Son Luong Ngoc
  2021-04-20 20:17  5%                 ` Junio C Hamano
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2021-04-20  7:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: SZEDER Gábor, Elijah Newren, Git Mailing List

Hi Junio,

On Mon, Apr 19, 2021 at 02:52:16PM -0700, Junio C Hamano wrote:
> 
> Interesting.
> 
> I recently had a similar experience with Gerrit, where a patch I
> have seen quite a few times on Gerrit at $WORK had an embarrassing
> syntactic issues I did not discover until it hit the public mailing
> list.  It may be different from reviewer to reviewer, but at least
> to me, e-mailed workflow forces me to apply the patch to my tree
> before I can say anything non-trivially intelligent about it and
> once applied to the tree, it actually let's me play with the code
> (like, say, asking the compiler to give its opinion on it).
> 

I think this is very much the point of having a good CI pipeline:
  - Apply patches into tree
  - Compile
  - Run relevant tests

I'm not sure about Github PR, but Gitlab's MR workflow also provide a
merge queue implementation(Merge Train) coupled with CI to ensure the
merge result is accurately verified against tests.

What might be missing from (most) CI services is a bisect pipeline that
help us identify culprit commit that broke the tests, but that could be
engineered.

> The experience I had with Gerrit at $WORK gave me side-to-side diff
> with context with arbitrary on-demand width, even with per-word
> differences highlighted, and it may be wonderful that I can get all
> of these _without_ having to apply the patch myself, but what it
> gave me stopped there.  There are a lot more things that need to
> happen beyond looking at what changed in the context of the files
> during a review, from grepping in the tree for functions and
> variables used in the patch to see their uses in other parts of the
> system that the patch does not touch, to make various trial merges
> to different topics that are in flight, and Gerrit didn't help me an
> iota, but still gave me a (false) impression that I _did_ review the
> patch fully, when I only have scraped its surface, and the worst
> part of the story was that the UI feld so nice that I didn't even
> realize that I was doing a lot more shoddy job in reviewing than
> what I usually do to e-mailed patches.
> 

Yes, having context beyond the diff is very important for Code Review.
This is why I strongly recommend SourceGraph usages to folks I know.

  > https://sourcegraph.com/github.com/git/git/-/blob/builtin/repack.c#L61:13
  > https://sourcegraph.com/github.com/git/git/-/commit/9218c6a40c37023a1f434222d501218cf8157857#diff-01ec5e99d04fb7ba9753f219ab638469R64

(I have no affiliation with SourceGraph, just really enjoy their product)

A mordern codesearch service like sourcegraph could help decorate diff
with relevant code intelligent like finding references, definitions and
assist with the Code Review process.

Afaik, sourcegraph has been building more integrations with Github and
Gitlab, not too sure about Gerrit (but Im sure it's not far reach given
their GraphQL API).

So I guess mordern toolings are available for these usecases, but
fragmented and subjective to personal workflow.

Regards,
Son Luong.

^ permalink raw reply	[relevance 5%]

* Re: Pain points in Git's patch flow
  2021-04-15 15:45  3% ` Son Luong Ngoc
@ 2021-04-19  2:57  4%   ` Eric Wong
  2021-04-21 10:19  0%     ` Ævar Arnfjörð Bjarmason
  2021-04-28  7:05  0%     ` Eric Wong
  0 siblings, 2 replies; 122+ results
From: Eric Wong @ 2021-04-19  2:57 UTC (permalink / raw)
  To: Son Luong Ngoc
  Cc: Jonathan Nieder, git, Raxel Gutierrez, mricon, patchwork,
	Junio C Hamano, Taylor Blau, Emily Shaffer

Son Luong Ngoc <sluongng@gmail.com> wrote:
> Hi there,
> 
> I'm not a regular contributor but I have started to subscribe to the
> Git's Mailing List recently.  So I thought it might be worth sharing my
> personal view on this.
> 
> After writting all the below, I do realize that I have written quite a
> rant, some of which I think some might consider to be off topic.  For
> that, I do want to appologize before hand.

Thanks for the feedback, some points below.

>  Tue, Apr 13, 2021 at 11:13:26PM -0700, Jonathan Nieder wrote:
> > Hi,
> > 
> ...
> > 
> > Those four are important in my everyday life.  Questions:
> > 
> >  1. What pain points in the patch flow for git.git are important to
> >     you?
> 
> There are several points I want to highlight:
> 
> 1. Issue about reading the Mailing List:
> 
> - Subscribing to Git's Mailing List is not trivial:
>   It takes a lot of time to setup the email subscription.  I remember
>   having to google through a few documents to get my subscription
>   working.
> 
> - And even after having subscribed, I was bombarded with a set
>   of spam emails that was sent to the mailing list address.  These spams
>   range anywhere from absurd to disguising themselves as legitimate
>   users trying to contact you about a new shiny tech product.

Note that subscription is totally optional.

Gmail's mail filters probably aren't very good, perhaps
SpamAssassin or similar filters can be added locally to improve
things for you.

Spam filtering is a complex topic and Google's monopolistic
power probably doesn't inspire them to do better.

> 2. Issue about joining the conversation in the Maling List:
> 
> - Setting up email client to reply to the Mailing List was definitely
>   not trivial.  It's not trivial to send a reply without subscribing to
>   the ML(i.e. using a Header provided from one of the archive).
>   The list does not accept HTML emails, which many clients
>   use as default format.  Getting the formatting to work for line
>   wrapping is also a challenge depends on the client that you use.

The spam (and phishing) problem would be worse if HTML mail were
accepted.  Obfuscation/misdirection techniques used by spammers
and phishers aren't available in plain-text.

It's also more expensive to filter + archive HTML mail due to
decoding and size overheads, which makes it more expensive for
others to mirror/fork things.

> - It's a bit intimidating to ask 'trivial questions' about the patch and
>   create 'noise' in the ML.

I'm sorry you feel that way.  I understand the Internet and its
persistence (especially with mail archives :x) can have a
chilling effect on people.  I think the way to balance things is
to allow/encourage anonymity or pseudonyms, but some folks here
might disagree with me for copyright reasons.  OTOH, don't ask,
don't tell :)

(I am not speaking as a representative of the git project)

> 3. Isssue with archive:
> 
> - I don't find the ML archive trivial for new comers.  It took me a bit
>   of time to realize: 'Oh if I scroll to bottom and find the "Thread 
>   overview" then I can navigate a mailing thread a lot easier'.

(I'm the maintainer of public-inbox, the archival software you
seem to be referring to).

I'm not sure how to make "Thread overview" easier to find
without cluttering the display near the top.  Maybe I'll try
aria labels in the Subject: link...

> - The lack of labeling / categorization that I can filter while browsing
>   through the archive make the 'browse' experience to be quite
>   unpleasant.  Search is one way to do it, but a new comers would not be
>   knowledgable enough to craft search query to get the archive view just
>   right.  Perhaps a way to provide a curate set of categories would be
>   nice.

Perhaps TODO files/comments in the source tree are acceptable;
or a regularly-posted mail similar to "What's cooking".

Having a centralized website/tracker would give too much power
and influence to people/orgs who run the site.  It would like
either require network access or require learning more software
to synchronize.

> - Lost track of issues / discussion:
>   A quick example would be me searching for Git's zstd support
>   recently with 
> 
>   > https://lore.kernel.org/git/?q=zstandard 
> 
>   and got next to no relevant result.  However if I were to query
> 
>   > 'https://lore.kernel.org/git/?q=zstd'
> 
>   then a very relevant thread from Peff appeared.  I think this could be
>   avoided if the search in ML archive do more than just matching exact
>   text.

I'm planning to support Xapian synonyms for that, but haven't
gotten around to making it configurable+reproducible by admins.
Everything in public-inbox is designed to be reproducible+forkable.

> 4. Lack of way to run test suite / CI:
> 
>   It would be nice if we can discuss patches while having CI result as
>   part of the conversation.  Right now mostly I see that we have to
>   manually running benchmarks/tests and share the paste the results.
> 
>   But for folks who don't have a dev environment ready at hand (new
>   comers, during travel with only phone access), it would be nice to
>   have a way to run tests without a dev environment.

Fwiw, the GCC Farm project gives ssh accounts for all free
software contributors, not just gcc hackers: https://cfarm.tetaneutral.net
Perhaps there's other similar services, too.

Slow down and enjoy travel :)  There's very little in free
software urgent enough to require constant attention.  Email is
well-suited for asynchronous work, and nobody should expect
instant replies.  The always-on nature of the modern Internet
and smartphones increases stress and dangerous situations; so I
hope free software hackers aren't contributing to that.

>   This was mostly solved in the context of works spent on Github's
>   Action Workflow.  But if we are discussing about pure patch flow, this
>   is a miss.
> 
> >  2. What tricks do you use to get by with those existing pain points?
> 
> For (1):
> - I had to invested a lot of time into setting up a set of Gmail search
>   filter.  Move mails with topics that Im interested in into a special
>   tag while the rest into archive.  Regularly check if anything
>   interesting went to archive by accident.
> 
> For (2):
> - I had to setup Mutt + Tmux to have a compatible experience sending
>   replies like this one.

Fwiw, git-send-email works for non-patch mails, too.  I don't
want a monoculture around mutt or any particular clients, either.
(I've never used tmux and don't see why it's necessary, here).

Anyways, thanks again for the feedback.

^ permalink raw reply	[relevance 4%]

* Re: Pain points in Git's patch flow
  @ 2021-04-15 15:45  3% ` Son Luong Ngoc
  2021-04-19  2:57  4%   ` Eric Wong
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2021-04-15 15:45 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Raxel Gutierrez, mricon, patchwork, Junio C Hamano,
	Taylor Blau, Emily Shaffer

Hi there,

I'm not a regular contributor but I have started to subscribe to the
Git's Mailing List recently.  So I thought it might be worth sharing my
personal view on this.

After writting all the below, I do realize that I have written quite a
rant, some of which I think some might consider to be off topic.  For
that, I do want to appologize before hand.

 Tue, Apr 13, 2021 at 11:13:26PM -0700, Jonathan Nieder wrote:
> Hi,
> 
...
> 
> Those four are important in my everyday life.  Questions:
> 
>  1. What pain points in the patch flow for git.git are important to
>     you?

There are several points I want to highlight:

1. Issue about reading the Mailing List:

- Subscribing to Git's Mailing List is not trivial:
  It takes a lot of time to setup the email subscription.  I remember
  having to google through a few documents to get my subscription
  working.

- And even after having subscribed, I was bombarded with a set
  of spam emails that was sent to the mailing list address.  These spams
  range anywhere from absurd to disguising themselves as legitimate
  users trying to contact you about a new shiny tech product.

2. Issue about joining the conversation in the Maling List:

- Setting up email client to reply to the Mailing List was definitely
  not trivial.  It's not trivial to send a reply without subscribing to
  the ML(i.e. using a Header provided from one of the archive).
  The list does not accept HTML emails, which many clients
  use as default format.  Getting the formatting to work for line
  wrapping is also a challenge depends on the client that you use.

- It's a bit intimidating to ask 'trivial questions' about the patch and
  create 'noise' in the ML.

3. Isssue with archive:

- I don't find the ML archive trivial for new comers.  It took me a bit
  of time to realize: 'Oh if I scroll to bottom and find the "Thread 
  overview" then I can navigate a mailing thread a lot easier'.

- The lack of labeling / categorization that I can filter while browsing
  through the archive make the 'browse' experience to be quite
  unpleasant.  Search is one way to do it, but a new comers would not be
  knowledgable enough to craft search query to get the archive view just
  right.  Perhaps a way to provide a curate set of categories would be
  nice.

- Lost track of issues / discussion:
  A quick example would be me searching for Git's zstd support
  recently with 

  > https://lore.kernel.org/git/?q=zstandard 

  and got next to no relevant result.  However if I were to query

  > 'https://lore.kernel.org/git/?q=zstd'

  then a very relevant thread from Peff appeared.  I think this could be
  avoided if the search in ML archive do more than just matching exact
  text.

4. Lack of way to run test suite / CI:

  It would be nice if we can discuss patches while having CI result as
  part of the conversation.  Right now mostly I see that we have to
  manually running benchmarks/tests and share the paste the results.

  But for folks who don't have a dev environment ready at hand (new
  comers, during travel with only phone access), it would be nice to
  have a way to run tests without a dev environment.

  This was mostly solved in the context of works spent on Github's
  Action Workflow.  But if we are discussing about pure patch flow, this
  is a miss.

>  2. What tricks do you use to get by with those existing pain points?

For (1):
- I had to invested a lot of time into setting up a set of Gmail search
  filter.  Move mails with topics that Im interested in into a special
  tag while the rest into archive.  Regularly check if anything
  interesting went to archive by accident.

For (2):
- I had to setup Mutt + Tmux to have a compatible experience sending
  replies like this one.

- All the patches I have submitted were through
  > https://github.com/gitgitgadget/git/pulls
  and it was not directly trivial to get permission to send email from a
  PR.

For (3):
- Spending time reading git blame / git log / commit message helps
  identifying the keywords I need to refine my search result in the ML
  archive.  This requires some commitments and is a barrier to entry for
  new comers.

- Using service like Github Search or SourceGraph helped a lot in term
  of navigating through the commit message / git blame.

For (4):
- I leverage both Github action and a patch that added Gitlab CI to run
  the test suite.

>  3. Do you think patchwork goes in a direction that is likely to help
>     with these?
>
>  4. What other tools would you like to see that could help?

With all that said, I don't know if patchwork will solve the problems
above.  I do understand that the current patch workflow comes with a
certain set of advantages, and adopting another tool will most likely be
a trade-off.

Personally I have been spending more and more time reading through
git.git via Sourcegraph Web UI and I would love for the search feature
to be able to extend to be able to search in the Mailing List from
relevant commit if possible.  I have also tried both Github's Codespace
and Microsoft's DevContainer to setup an opionated IDE with predefined
tasks that help executing the test suite.  I think these tools (or
their competitors such as GitPod) are quite ideal to quickly onboard
new contributors onto a history-rich codebase such as git.git.

Perhaps some configure a set of sane default, including editor extensions
that would handle email config for first time users.

As for code review and issue tracking toolings, I don't think there are
a perfect solution.  Any solutions: Github PR, Gitlab MR, Gerrit,
Phabricator would come with their own set of tradeoffs.  I like the
prospect of PatchWork gona improve the patch workflow though.  Perhaps I
will give it a try.

> 
> Thanks,
> Jonathan

Thanks,
Son Luong.

^ permalink raw reply	[relevance 3%]

* Re: [PATCH] t: annotate !PTHREADS tests with !FAIL_PREREQS
  2021-03-17 22:47  4%       ` [PATCH] t: annotate !PTHREADS tests with !FAIL_PREREQS Jeff King
@ 2021-03-18 21:17  0%         ` Junio C Hamano
  0 siblings, 0 replies; 122+ results
From: Junio C Hamano @ 2021-03-18 21:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Son Luong Ngoc, Taylor Blau, git, avarab, jonathantanmy

Jeff King <peff@peff.net> writes:

>> So I think the FAIL_PREREQS mode should probably be treating negated
>> prereqs differently (and always pretending that yes, we have them).
>> 
>> I hadn't investigated the t7810 case yet, but looking at it now, it
>> seems to be the exact same thing.
>
> It looks like the problem is indeed somewhat widespread, and there is a
> magic prereq already to skip such tests.
>
> I do still think that this is a fundamental failing of the FAIL_PREREQS
> mode, but it probably makes sense to annotate these tests in the
> meantime (I don't plan on looking further into it myself).

The README file in t/ directory claims that this "is useful for
discovering issues with the tests where say a later test implicitly
depends on an optional earlier test." but apparently it does not
work well with these negated prerequisites.  Its implementation
probably should force a safe bypass of the whole test_have_prereq()
etc. done in test_skip by hooking into test_verify_prereq and
overwrite any non-empty test_prereq with a single hardcoded
PRETEND_FAIL_PREREQ prerequisite that is never satisfied, or
something.

> Another rough edge I noticed: if you set GIT_TEST_HTTPD or
> GIT_TEST_GIT_DAEMON to "yes" in your config.mak, these play quite badly
> with GIT_TEST_FAIL_PREREQS. We think NOT_ROOT is not satisfied, so
> refuse to start httpd, and then complain that the setup fails (and the
> point of "yes" for those values is to loudly complain when setup fails,
> rather than quietly skipping the tests).

... and I think this would also be gone, as the NOT_ROOT test is
done with test_have_prereq that we wouldn't be mucking with if we
limit the FAIL_PREREQS only to tweak the test_expect_* prereqs.

In short, the biggest mistake in the current FAIL_PREREQS design is
to hook into test_have_prereq while the stated objective only needs
to futz with the prerequisite given to the test_expect_* functions,
I would think.

> -- >8 --
> Subject: [PATCH] t: annotate !PTHREADS tests with !FAIL_PREREQS
>
> Some tests in t5300 and t7810 expect us to complain about a "--threads"
> argument when Git is compiled without pthread support. Running these
> under GIT_TEST_FAIL_PREREQS produces a confusing failure: we pretend to
> the tests that there is no pthread support, so they expect the warning,
> but of course the actual build is perfectly happy to respect the
> --threads argument.
>
> We never noticed before the recent a926c4b904 (tests: remove most uses
> of C_LOCALE_OUTPUT, 2021-02-11), because the tests also were marked as
> requiring the C_LOCALE_OUTPUT prerequisite. Which means they'd never
> have run in FAIL_PREREQS mode, since it would always pretend that the
> locale prereq was not satisfied.
>
> These tests can't possibly work in this mode; it is a mismatch between
> what the tests expect and what the build was told to do. So let's just
> mark them to be skipped, using the special prereq introduced by
> dfe1a17df9 (tests: add a special setup where prerequisites fail,
> 2019-05-13).
>
> Reported-by: Son Luong Ngoc <sluongng@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t5300-pack-object.sh | 6 ++++--
>  t/t7810-grep.sh        | 3 ++-
>  2 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
> index d586fdc7a9..e830a37a38 100755
> --- a/t/t5300-pack-object.sh
> +++ b/t/t5300-pack-object.sh
> @@ -427,7 +427,8 @@ test_expect_success 'index-pack --strict <pack> works in non-repo' '
>  	test_path_is_file foo.idx
>  '
>  
> -test_expect_success !PTHREADS 'index-pack --threads=N or pack.threads=N warns when no pthreads' '
> +test_expect_success !PTHREADS,!FAIL_PREREQS \
> +	'index-pack --threads=N or pack.threads=N warns when no pthreads' '
>  	test_must_fail git index-pack --threads=2 2>err &&
>  	grep ^warning: err >warnings &&
>  	test_line_count = 1 warnings &&
> @@ -445,7 +446,8 @@ test_expect_success !PTHREADS 'index-pack --threads=N or pack.threads=N warns wh
>  	grep -F "no threads support, ignoring pack.threads" err
>  '
>  
> -test_expect_success !PTHREADS 'pack-objects --threads=N or pack.threads=N warns when no pthreads' '
> +test_expect_success !PTHREADS,!FAIL_PREREQS \
> +	'pack-objects --threads=N or pack.threads=N warns when no pthreads' '
>  	git pack-objects --threads=2 --stdout --all </dev/null >/dev/null 2>err &&
>  	grep ^warning: err >warnings &&
>  	test_line_count = 1 warnings &&
> diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
> index edfaa9a6d1..5830733f3d 100755
> --- a/t/t7810-grep.sh
> +++ b/t/t7810-grep.sh
> @@ -969,7 +969,8 @@ do
>  	"
>  done
>  
> -test_expect_success !PTHREADS 'grep --threads=N or pack.threads=N warns when no pthreads' '
> +test_expect_success !PTHREADS,!FAIL_PREREQS \
> +	'grep --threads=N or pack.threads=N warns when no pthreads' '
>  	git grep --threads=2 Hello hello_world 2>err &&
>  	grep ^warning: err >warnings &&
>  	test_line_count = 1 warnings &&

^ permalink raw reply	[relevance 0%]

* [PATCH] t: annotate !PTHREADS tests with !FAIL_PREREQS
  @ 2021-03-17 22:47  4%       ` Jeff King
  2021-03-18 21:17  0%         ` Junio C Hamano
  0 siblings, 1 reply; 122+ results
From: Jeff King @ 2021-03-17 22:47 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: Taylor Blau, git, avarab, jonathantanmy, gitster

On Wed, Mar 17, 2021 at 01:54:25PM -0400, Jeff King wrote:

> -test_expect_success !PTHREADS 'pack-objects --threads=N or pack.threads=N warns when no pthreads' '
> +test_expect_success !PTHREADS,IGNORE_FAIL_PREREQS \
> +	'pack-objects --threads=N or pack.threads=N warns when no pthreads' '
>  	git pack-objects --threads=2 --stdout --all </dev/null >/dev/null 2>err &&
>  	grep ^warning: err >warnings &&
>  	test_line_count = 1 warnings &&
> 
> but I think this points to a failing of the FAIL_PREREQS mode. It is
> generally OK to say "skip this test by pretending you do not have a
> prereq satisfied" (and that is the point: to see if skipping a test
> confuses later tests). But given a negated prereq here, it is not OK to
> say "run this test that we usually wouldn't", because it is almost
> certainly going to be mismatched with the actual build.
> 
> So I think the FAIL_PREREQS mode should probably be treating negated
> prereqs differently (and always pretending that yes, we have them).
> 
> I hadn't investigated the t7810 case yet, but looking at it now, it
> seems to be the exact same thing.

It looks like the problem is indeed somewhat widespread, and there is a
magic prereq already to skip such tests.

I do still think that this is a fundamental failing of the FAIL_PREREQS
mode, but it probably makes sense to annotate these tests in the
meantime (I don't plan on looking further into it myself).

Another rough edge I noticed: if you set GIT_TEST_HTTPD or
GIT_TEST_GIT_DAEMON to "yes" in your config.mak, these play quite badly
with GIT_TEST_FAIL_PREREQS. We think NOT_ROOT is not satisfied, so
refuse to start httpd, and then complain that the setup fails (and the
point of "yes" for those values is to loudly complain when setup fails,
rather than quietly skipping the tests).

-- >8 --
Subject: [PATCH] t: annotate !PTHREADS tests with !FAIL_PREREQS

Some tests in t5300 and t7810 expect us to complain about a "--threads"
argument when Git is compiled without pthread support. Running these
under GIT_TEST_FAIL_PREREQS produces a confusing failure: we pretend to
the tests that there is no pthread support, so they expect the warning,
but of course the actual build is perfectly happy to respect the
--threads argument.

We never noticed before the recent a926c4b904 (tests: remove most uses
of C_LOCALE_OUTPUT, 2021-02-11), because the tests also were marked as
requiring the C_LOCALE_OUTPUT prerequisite. Which means they'd never
have run in FAIL_PREREQS mode, since it would always pretend that the
locale prereq was not satisfied.

These tests can't possibly work in this mode; it is a mismatch between
what the tests expect and what the build was told to do. So let's just
mark them to be skipped, using the special prereq introduced by
dfe1a17df9 (tests: add a special setup where prerequisites fail,
2019-05-13).

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5300-pack-object.sh | 6 ++++--
 t/t7810-grep.sh        | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index d586fdc7a9..e830a37a38 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -427,7 +427,8 @@ test_expect_success 'index-pack --strict <pack> works in non-repo' '
 	test_path_is_file foo.idx
 '
 
-test_expect_success !PTHREADS 'index-pack --threads=N or pack.threads=N warns when no pthreads' '
+test_expect_success !PTHREADS,!FAIL_PREREQS \
+	'index-pack --threads=N or pack.threads=N warns when no pthreads' '
 	test_must_fail git index-pack --threads=2 2>err &&
 	grep ^warning: err >warnings &&
 	test_line_count = 1 warnings &&
@@ -445,7 +446,8 @@ test_expect_success !PTHREADS 'index-pack --threads=N or pack.threads=N warns wh
 	grep -F "no threads support, ignoring pack.threads" err
 '
 
-test_expect_success !PTHREADS 'pack-objects --threads=N or pack.threads=N warns when no pthreads' '
+test_expect_success !PTHREADS,!FAIL_PREREQS \
+	'pack-objects --threads=N or pack.threads=N warns when no pthreads' '
 	git pack-objects --threads=2 --stdout --all </dev/null >/dev/null 2>err &&
 	grep ^warning: err >warnings &&
 	test_line_count = 1 warnings &&
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index edfaa9a6d1..5830733f3d 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -969,7 +969,8 @@ do
 	"
 done
 
-test_expect_success !PTHREADS 'grep --threads=N or pack.threads=N warns when no pthreads' '
+test_expect_success !PTHREADS,!FAIL_PREREQS \
+	'grep --threads=N or pack.threads=N warns when no pthreads' '
 	git grep --threads=2 Hello hello_world 2>err &&
 	grep ^warning: err >warnings &&
 	test_line_count = 1 warnings &&
-- 
2.31.0.559.g509d4a088b


^ permalink raw reply related	[relevance 4%]

* Re: Tests failed with GIT_TEST_FAIL_PREREQS and/or GIT_TEST_PROTOCOL_VERSION
  @ 2021-03-17 13:38  5%   ` Son Luong Ngoc
    0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2021-03-17 13:38 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, avarab, jonathantanmy, gitster

Hi Taylor,

On Tue, Mar 16, 2021 at 09:52:47AM -0400, Taylor Blau wrote:
> Hi,
> 
> Is it possible that your bisection script doesn't report success
> properly? Bisecting the same range (v2.30.0..v2.31.0) with
> 
>     $ cat run.sh
>     #!/bin/sh
>     false
> 
> does say that my 3b1ca60f8f (ewah/ewah_bitmap.c: avoid open-coding
> ALLOC_GROW(), 2020-12-08) is the first bad commit.

You are spot on.  It was a busy day and I only had a few minutes to
look at our internal pipeline of the test suite.  I guess I was doing
something along the line of.

      $ git bisect start HEAD v2.30.0
      $ git bisect run 'cd t && GIT_TEST_PROTOCOL_VERSION=1 ./t5606-clone-options.sh'

Which does indeed errored out and pointed to your commit.

> 
> Thanks,
> Taylor

I have properly re-run the bisection in a './test.sh' bash script and
here are the suspicious commits:

1. For t7810 and t5300 failing when GIT_TEST_FAIL_PREREQS=1:

    a926c4b904bdc339568c2898af955cdc61b31542 is the first bad commit
    commit a926c4b904bdc339568c2898af955cdc61b31542
    Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
    Date:   Thu Feb 11 02:53:51 2021 +0100

        tests: remove most uses of C_LOCALE_OUTPUT

        As a follow-up to d162b25f956 (tests: remove support for
        GIT_TEST_GETTEXT_POISON, 2021-01-20) remove those uses of the now
        always true C_LOCALE_OUTPUT prerequisite from those tests which
        declare it as an argument to test_expect_{success,failure}.

        Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
        Signed-off-by: Junio C Hamano <gitster@pobox.com>

2. For failing t5606 while 'GIT_TEST_PROTOCOL_VERSION=1' was used:

    4f37d45706514a4b3d0259d26f719678a0cf3521 is the first bad commit
    commit 4f37d45706514a4b3d0259d26f719678a0cf3521
    Author: Jonathan Tan <jonathantanmy@google.com>
    Date:   Fri Feb 5 12:48:49 2021 -0800

        clone: respect remote unborn HEAD

        Teach Git to use the "unborn" feature introduced in a previous patch as
        follows: Git will always send the "unborn" argument if it is supported
        by the server. During "git clone", if cloning an empty repository, Git
        will use the new information to determine the local branch to create. In
        all other cases, Git will ignore it.

        Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
        Signed-off-by: Junio C Hamano <gitster@pobox.com>

     Documentation/config/init.txt |  2 +-
     builtin/clone.c               | 16 ++++++++++++++--
     connect.c                     | 28 ++++++++++++++++++++++++++--
     t/t5606-clone-options.sh      |  8 +++++---
     t/t5702-protocol-v2.sh        | 25 +++++++++++++++++++++++++
     transport.h                   |  8 ++++++++
     6 files changed, 79 insertions(+), 8 deletions(-)


Thanks,
Son Luong.

^ permalink raw reply	[relevance 5%]

* Tests failed with GIT_TEST_FAIL_PREREQS and/or GIT_TEST_PROTOCOL_VERSION
@ 2021-03-16  9:45  5% Son Luong Ngoc
    0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2021-03-16  9:45 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau

Hi folks,

Running the test suit with GIT_TEST_FAIL_PREREQS=1 on master (and
next) seem to result in some failures:

  Test Summary Report
 -------------------
  t5300-pack-object.sh (Wstat: 256 Tests: 46 Failed: 2)
  Failed tests: 35-36
  Non-zero exit status: 1
  t7810-grep.sh (Wstat: 256 Tests: 229 Failed: 1)
  Failed test: 160
  Non-zero exit status: 1
  Files=924, Tests=22400, 422 wallclock secs (12.57 usr 2.52 sys +
601.02 cusr 1047.02 csys = 1663.13 CPU)
  Result: FAIL

A quick git-bisect run seems to point back to this commit:

3b1ca60f8f317b483c8c1805ab500ff2b014cbec is the first bad commit
commit 3b1ca60f8f317b483c8c1805ab500ff2b014cbec
Author: Taylor Blau <me@ttaylorr.com>
Date:   Tue Dec 8 17:03:14 2020 -0500

    ewah/ewah_bitmap.c: avoid open-coding ALLOC_GROW()

    'ewah/ewah_bitmap.c:buffer_grow()' is responsible for growing the buffer
    used to store the bits of an EWAH bitmap. It is essentially doing the
    same task as the 'ALLOC_GROW()' macro, so use that instead.

    This simplifies the callers of 'buffer_grow()', who no longer have to
    ask for a specific size, but rather specify how much of the buffer they
    need. They also no longer need to guard 'buffer_grow()' behind an if
    statement, since 'ALLOC_GROW()' (and, by extension, 'buffer_grow()') is
    a noop if the buffer is already large enough.

    But, the most significant change is that this fixes a bug when calling
    buffer_grow() with both 'alloc_size' and 'new_size' set to 1. In this
    case, truncating integer math will leave the new size set to 1, causing
    the buffer to never grow.

    Instead, let alloc_nr() handle this, which asks for '(new_size + 16) * 3
    / 2' instead of 'new_size * 3 / 2'.

    Signed-off-by: Taylor Blau <me@ttaylorr.com>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>

 ewah/ewah_bitmap.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

I also found that a test is failing with protocol V1 set
(GIT_TEST_PROTOCOL_VERSION=1)

  Test Summary Report
  -------------------
  t5606-clone-options.sh (Wstat: 256 Tests: 15 Failed: 1)
  Failed test: 14
  Non-zero exit status: 1
  Files=924, Tests=22852, 568 wallclock secs (12.69 usr 2.73 sys +
842.87 cusr 1322.91 csys = 2181.20 CPU)
  Result: FAIL

Which git-bisect is telling me that was caused by the same commit.

Regards,
Son Luong.

^ permalink raw reply	[relevance 5%]

* [PATCH v2 0/2] Maintenance: add pack-refs task
  @ 2021-02-09 13:42  3% ` Derrick Stolee via GitGitGadget
  0 siblings, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2021-02-09 13:42 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Eric Sunshine, Derrick Stolee, Derrick Stolee

This patch series adds a new pack-refs task to the maintenance builtin. This
operation already happens within git gc (and hence the gc task) but it is
easy to extract. Packing refs does not delete any data, only collects loose
objects into a combined file. This makes things faster in subtle ways,
especially when a command needs to iterate through refs (especially tags).

Credit for inspiring this goes to Suolong, who asked for this to be added to
Scalar [1]. I've been waiting instead to add it directly to Git and its
background maintenance. Now is the time!

[1] https://github.com/microsoft/scalar/issues/382

I chose to add it to the incremental maintenance strategy at a weekly
cadence. I'm not sure there is significant value to the difference between
weekly and daily. It just seems to me that weekly is often enough. Feel free
to correct me if you have a different opinion.

My hope is that this patch series could be used as an example for further
extracting tasks out of the gc task and making them be full maintenance
tasks. Doing more of these extractions could be a good project for a new
contributor.

One thing that is not implemented in this series is a notion of the behavior
for the pack-refs task during git maintenance run --auto. This could be
added in the future, but I wanted to focus on getting this behavior into the
incremental maintenance schedule.


Updates in V2
=============

 * Fixed doc typo. Thanks, Eric!
 * Updated commit messages to make it clear that the 'pack-refs' step will
   still happen within the 'gc' task.
 * Updated the test to check that we run the correct subcommand.
 * maintenance_task_pack_refs() uses MAYBE_UNUSED on its parameter.

Thanks, -Stolee

Cc: gitster@pobox.com Cc: sluongng@gmail.com Cc: martin.agren@gmail.com Cc:
sunshine@sunshineco.com

Derrick Stolee (2):
  maintenance: add pack-refs task
  maintenance: incremental strategy runs pack-refs weekly

 Documentation/config/maintenance.txt |  5 +++--
 Documentation/git-maintenance.txt    |  6 ++++++
 builtin/gc.c                         | 23 +++++++++++++++++++----
 t/t7900-maintenance.sh               | 26 ++++++++++++++++++++++++++
 4 files changed, 54 insertions(+), 6 deletions(-)


base-commit: fb7fa4a1fd273f22efcafdd13c7f897814fd1eb9
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-871%2Fderrickstolee%2Fmaintenance%2Fpack-refs-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-871/derrickstolee/maintenance/pack-refs-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/871

Range-diff vs v1:

 1:  33b7a74af4eb ! 1:  bedaeb548b06 maintenance: add pack-refs task
     @@ Commit message
          by terminal prompts to show when a detatched HEAD is pointing to an
          existing tag, so having it be slow causes significant delays for users.
      
     -    Add a new 'pack-refs' maintenance task. This is already a sub-step of
     -    the 'gc' task, but users could run this at other intervals if they are
     -    interested. Also, if users opt-in to the default background maintenance
     -    schedule, then the 'gc' task is disabled.
     +    Add a new 'pack-refs' maintenance task. It runs 'git pack-refs --all
     +    --prune' to move loose refs into a packed form. For now, that is the
     +    packed-refs file, but could adjust to other file formats in the future.
     +
     +    This is the first of several sub-tasks of the 'gc' task that could be
     +    extracted to their own tasks. In this process, we should not change the
     +    behavior of the 'gc' task since that remains the default way to keep
     +    repositories maintained. Creating a new task for one of these sub-tasks
     +    only provides more customization options for those choosing to not use
     +    the 'gc' task. It is certainly possible to have both the 'gc' and
     +    'pack-refs' tasks enabled and run regularly. While they may repeat
     +    effort, they do not conflict in a destructive way.
     +
     +    The 'auto_condition' function pointer is left NULL for now. We could
     +    extend this in the future to have a condition check if pack-refs should
     +    be run during 'git maintenance run --auto'.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
     @@ Documentation/git-maintenance.txt: incremental-repack::
      +pack-refs::
      +	The `pack-refs` task collects the loose reference files and
      +	collects them into a single file. This speeds up operations that
     -+	need to iterate across many refereences. See linkgit:git-pack-refs[1]
     ++	need to iterate across many references. See linkgit:git-pack-refs[1]
      +	for more information.
      +
       OPTIONS
     @@ builtin/gc.c: static void gc_config(void)
       }
       
      +struct maintenance_run_opts;
     -+static int maintenance_task_pack_refs(struct maintenance_run_opts *opts)
     ++static int maintenance_task_pack_refs(MAYBE_UNUSED struct maintenance_run_opts *opts)
      +{
      +	struct strvec pack_refs_cmd = STRVEC_INIT;
      +	strvec_pushl(&pack_refs_cmd, "pack-refs", "--all", "--prune", NULL);
     @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
      +	do
      +		git branch -f to-pack/$n HEAD || return 1
      +	done &&
     -+	git maintenance run --task=pack-refs &&
     ++	GIT_TRACE2_EVENT="$(pwd)/pack-refs.txt" \
     ++		git maintenance run --task=pack-refs &&
      +	ls .git/refs/heads/ >after &&
     -+	test_must_be_empty after
     ++	test_must_be_empty after &&
     ++	test_subcommand git pack-refs --all --prune <pack-refs.txt
      +'
      +
       test_expect_success '--auto and --schedule incompatible' '
 2:  8012d2dc1420 = 2:  c38fc9a4170e maintenance: incremental strategy runs pack-refs weekly

-- 
gitgitgadget

^ permalink raw reply	[relevance 3%]

* [PATCH v7 0/4] Maintenance IV: Platform-specific background maintenance
  2020-12-09 19:28  4% ` [PATCH v6 " Derrick Stolee via GitGitGadget
@ 2021-01-05 13:08  3%   ` Derrick Stolee via GitGitGadget
  0 siblings, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2021-01-05 13:08 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Derrick Stolee, Derrick Stolee

This is based on ds/maintenance-part-3.

After sitting with the background maintenance as it has been cooking, I
wanted to come back around and implement the background maintenance for
Windows. However, I noticed that there were some things bothering me with
background maintenance on my macOS machine. These are detailed in PATCH 3,
but the tl;dr is that 'cron' is not recommended by Apple and instead
'launchd' satisfies our needs.

This series implements the background scheduling so git maintenance
(start|stop) works on those platforms. I've been operating with these
schedules for a while now without the problems described in the patches.

There is a particularly annoying case about console windows popping up on
Windows, but PATCH 4 describes a plan to get around that.


Update in V7
============

 * I had included an "encoding" string in the XML file for schtasks based on
   an example using UTF-8. The cross-platform tests then complained (in
   xmllint) because they wrote in ASCII instead. However, actually testing
   the situation on Windows (see [1]) against the real schtasks finds that
   it doesn't like that encoding string. I removed it entirely, and
   everything seems happier.

 * I squashed Eric's two commits making the tests better. He remains a
   co-author and I kept his Helped-by. I had to rearrange the commit message
   a bit to point out the care he took for the cross-platform tests without
   referring to the test doing the wrong thing.

[1] https://github.com/microsoft/git/pull/304

Thanks, -Stolee

cc: jrnieder@gmail.com cc: jonathantanmy@google.com cc: sluongng@gmail.com
cc: Đoàn Trần Công Danh congdanhqx@gmail.com cc: Martin Ågren
martin.agren@gmail.com cc: Eric Sunshine sunshine@sunshineco.com cc: Derrick
Stolee stolee@gmail.com

Derrick Stolee (4):
  maintenance: extract platform-specific scheduling
  maintenance: include 'cron' details in docs
  maintenance: use launchctl on macOS
  maintenance: use Windows scheduled tasks

 Documentation/git-maintenance.txt | 116 ++++++++
 builtin/gc.c                      | 422 ++++++++++++++++++++++++++++--
 t/t7900-maintenance.sh            | 104 +++++++-
 t/test-lib.sh                     |   7 +-
 4 files changed, 615 insertions(+), 34 deletions(-)


base-commit: 0016b618182f642771dc589cf0090289f9fe1b4f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-776%2Fderrickstolee%2Fmaintenance%2FmacOS-v7
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-776/derrickstolee/maintenance/macOS-v7
Pull-Request: https://github.com/gitgitgadget/git/pull/776

Range-diff vs v6:

 1:  4807342b001 = 1:  4807342b001 maintenance: extract platform-specific scheduling
 2:  7cc70a8fe7b = 2:  7cc70a8fe7b maintenance: include 'cron' details in docs
 3:  cd015a5cbd7 ! 3:  3576c7aa54e maintenance: use launchctl on macOS
     @@ Commit message
          the XML format. This is useful for any system that might contain
          the tool, so use it whenever it is available.
      
     +    We strive to make these tests work on all platforms, but Windows caused
     +    some headaches. In particular, the value of getuid() called by the C
     +    code is not guaranteed to be the same as `$(id -u)` invoked by a test.
     +    This is because `git.exe` is a native Windows program, whereas the
     +    utility programs run by the test script mostly utilize the MSYS2 runtime,
     +    which emulates a POSIX-like environment. Since the purpose of the test
     +    is to check that the input to the hook is well-formed, the actual user
     +    ID is immaterial, thus we can work around the problem by making the the
     +    test UID-agnostic. Another subtle issue is the $HOME environment
     +    variable being a Windows-style path instead of a Unix-style path. We can
     +    be more flexible here instead of expecting exact path matches.
     +
     +    Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
          Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
          Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     @@ t/t7900-maintenance.sh: test_expect_success 'start preserves existing schedule'
       	grep "Important information!" cron.txt
       '
       
     -+test_expect_success !MINGW 'start and stop macOS maintenance' '
     -+	uid=$(id -u) &&
     ++test_expect_success 'start and stop macOS maintenance' '
     ++	# ensure $HOME can be compared against hook arguments on all platforms
     ++	pfx=$(cd "$HOME" && pwd) &&
      +
      +	write_script print-args <<-\EOF &&
     -+	echo $* >>args
     ++	echo $* | sed "s:gui/[0-9][0-9]*:gui/[UID]:" >>args
      +	EOF
      +
      +	rm -f args &&
     @@ t/t7900-maintenance.sh: test_expect_success 'start preserves existing schedule'
      +	EOF
      +	test_cmp expect actual &&
      +
     -+	rm expect &&
     ++	rm -f expect &&
      +	for frequency in hourly daily weekly
      +	do
     -+		PLIST="$HOME/Library/LaunchAgents/org.git-scm.git.$frequency.plist" &&
     ++		PLIST="$pfx/Library/LaunchAgents/org.git-scm.git.$frequency.plist" &&
      +		test_xmllint "$PLIST" &&
      +		grep schedule=$frequency "$PLIST" &&
     -+		echo "bootout gui/$uid $PLIST" >>expect &&
     -+		echo "bootstrap gui/$uid $PLIST" >>expect || return 1
     ++		echo "bootout gui/[UID] $PLIST" >>expect &&
     ++		echo "bootstrap gui/[UID] $PLIST" >>expect || return 1
      +	done &&
      +	test_cmp expect args &&
      +
     @@ t/t7900-maintenance.sh: test_expect_success 'start preserves existing schedule'
      +	# stop does not unregister the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
      +
     -+	printf "bootout gui/$uid $HOME/Library/LaunchAgents/org.git-scm.git.%s.plist\n" \
     ++	printf "bootout gui/[UID] $pfx/Library/LaunchAgents/org.git-scm.git.%s.plist\n" \
      +		hourly daily weekly >expect &&
      +	test_cmp expect args &&
      +	ls "$HOME/Library/LaunchAgents" >actual &&
 4:  6ad4a6b98c6 ! 4:  68f5013dee3 maintenance: use Windows scheduled tasks
     @@ Documentation/git-maintenance.txt: To create more advanced customizations to you
       Part of the linkgit:git[1] suite
      
       ## builtin/gc.c ##
     +@@ builtin/gc.c: static int launchctl_schedule_plist(const char *exec_path, enum schedule_priorit
     + 		die(_("failed to create directories for '%s'"), filename);
     + 	plist = xfopen(filename, "w");
     + 
     +-	preamble = "<?xml version=\"1.0\" encoding=\"US-ASCII\"?>\n"
     ++	preamble = "<?xml version=\"1.0\"?>\n"
     + 		   "<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">\n"
     + 		   "<plist version=\"1.0\">"
     + 		   "<dict>\n"
      @@ builtin/gc.c: static int launchctl_update_schedule(int run_maintenance, int fd, const char *cm
       		return launchctl_remove_plists(cmd);
       }
     @@ builtin/gc.c: static int launchctl_update_schedule(int run_maintenance, int fd,
      +	char *name = schtasks_task_name(frequency);
      +	struct strbuf tfilename = STRBUF_INIT;
      +
     -+	strbuf_addf(&tfilename, "schedule_%s_XXXXXX", frequency);
     ++	strbuf_addf(&tfilename, "%s/schedule_%s_XXXXXX",
     ++		    get_git_common_dir(), frequency);
      +	tfile = xmks_tempfile(tfilename.buf);
      +	strbuf_release(&tfilename);
      +
      +	if (!fdopen_tempfile(tfile, "w"))
      +		die(_("failed to create temp xml file"));
      +
     -+	xml = "<?xml version=\"1.0\" encoding=\"US-ASCII\"?>\n"
     ++	xml = "<?xml version=\"1.0\" ?>\n"
      +	      "<Task version=\"1.4\" xmlns=\"http://schemas.microsoft.com/windows/2004/02/mit/task\">\n"
      +	      "<Triggers>\n"
      +	      "<CalendarTrigger>\n";
     @@ builtin/gc.c: static int update_background_schedule(int enable)
       	else
      
       ## t/t7900-maintenance.sh ##
     -@@ t/t7900-maintenance.sh: test_expect_success !MINGW 'start and stop macOS maintenance' '
     +@@ t/t7900-maintenance.sh: test_expect_success 'start and stop macOS maintenance' '
       	test_line_count = 0 actual
       '
       
     @@ t/t7900-maintenance.sh: test_expect_success !MINGW 'start and stop macOS mainten
      +	EOF
      +
      +	rm -f args &&
     -+	GIT_TEST_MAINT_SCHEDULER="schtasks:./print-args" GIT_TRACE2_PERF=1 git maintenance start &&
     ++	GIT_TEST_MAINT_SCHEDULER="schtasks:./print-args" git maintenance start &&
      +
      +	# start registers the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
     @@ t/t7900-maintenance.sh: test_expect_success !MINGW 'start and stop macOS mainten
      +	for frequency in hourly daily weekly
      +	do
      +		grep "/create /tn Git Maintenance ($frequency) /f /xml" args &&
     -+		file=$(ls schedule_$frequency*.xml) &&
     -+		test_xmllint "$file" &&
     -+		grep "encoding=.US-ASCII." "$file" || return 1
     ++		file=$(ls .git/schedule_${frequency}*.xml) &&
     ++		test_xmllint "$file" || return 1
      +	done &&
      +
      +	rm -f args &&
     @@ t/t7900-maintenance.sh: test_expect_success !MINGW 'start and stop macOS mainten
      +	# stop does not unregister the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
      +
     -+	rm expect &&
      +	printf "/delete /tn Git Maintenance (%s) /f\n" \
      +		hourly daily weekly >expect &&
      +	test_cmp expect args

-- 
gitgitgadget

^ permalink raw reply	[relevance 3%]

* [PATCH v6 0/4] Maintenance IV: Platform-specific background maintenance
  @ 2020-12-09 19:28  4% ` Derrick Stolee via GitGitGadget
  2021-01-05 13:08  3%   ` [PATCH v7 " Derrick Stolee via GitGitGadget
  0 siblings, 1 reply; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-12-09 19:28 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Derrick Stolee

This is based on ds/maintenance-part-3.

After sitting with the background maintenance as it has been cooking, I
wanted to come back around and implement the background maintenance for
Windows. However, I noticed that there were some things bothering me with
background maintenance on my macOS machine. These are detailed in PATCH 3,
but the tl;dr is that 'cron' is not recommended by Apple and instead
'launchd' satisfies our needs.

This series implements the background scheduling so git maintenance
(start|stop) works on those platforms. I've been operating with these
schedules for a while now without the problems described in the patches.

There is a particularly annoying case about console windows popping up on
Windows, but PATCH 4 describes a plan to get around that.


Update in V6
============

 * The Windows platform uses the tempfile API a bit better, including using
   the frequency in the filename to make the test simpler.

Thanks, -Stolee

cc: jrnieder@gmail.com cc: jonathantanmy@google.com cc: sluongng@gmail.com
cc: Đoàn Trần Công Danh congdanhqx@gmail.com cc: Martin Ågren
martin.agren@gmail.com cc: Eric Sunshine sunshine@sunshineco.com cc: Derrick
Stolee stolee@gmail.com

Derrick Stolee (4):
  maintenance: extract platform-specific scheduling
  maintenance: include 'cron' details in docs
  maintenance: use launchctl on macOS
  maintenance: use Windows scheduled tasks

 Documentation/git-maintenance.txt | 116 ++++++++
 builtin/gc.c                      | 421 ++++++++++++++++++++++++++++--
 t/t7900-maintenance.sh            | 105 +++++++-
 t/test-lib.sh                     |   7 +-
 4 files changed, 615 insertions(+), 34 deletions(-)


base-commit: 0016b618182f642771dc589cf0090289f9fe1b4f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-776%2Fderrickstolee%2Fmaintenance%2FmacOS-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-776/derrickstolee/maintenance/macOS-v6
Pull-Request: https://github.com/gitgitgadget/git/pull/776

Range-diff vs v5:

 1:  4807342b001 = 1:  4807342b001 maintenance: extract platform-specific scheduling
 2:  7cc70a8fe7b = 2:  7cc70a8fe7b maintenance: include 'cron' details in docs
 3:  cd015a5cbd7 = 3:  cd015a5cbd7 maintenance: use launchctl on macOS
 4:  ac9a28bea39 ! 4:  6ad4a6b98c6 maintenance: use Windows scheduled tasks
     @@ Commit message
          by Git is valid when xmllint exists on the system.
      
          Since we use a temporary file for the XML files sent to 'schtasks', we
     -    must copy the file to a predictable filename. Use the number of lines in
     -    the 'args' file to provide a filename for xmllint. Instead of an exact
     -    match on the 'args' file, we 'grep' for the arguments other than the
     -    filename.
     +    prefix the random characters with the frequency so it is easier to
     +    examine the proper file during tests. Instead of an exact match on the
     +    'args' file, we 'grep' for the arguments other than the filename.
      
          There is a deficiency in the current design. Windows has two kinds of
          applications: GUI applications that start by "winmain()" and console
     @@ builtin/gc.c: static int launchctl_update_schedule(int run_maintenance, int fd,
      +	struct tempfile *tfile;
      +	const char *frequency = get_frequency(schedule);
      +	char *name = schtasks_task_name(frequency);
     ++	struct strbuf tfilename = STRBUF_INIT;
      +
     -+	tfile = xmks_tempfile("schedule_XXXXXX");
     -+	if (!tfile || !fdopen_tempfile(tfile, "w"))
     ++	strbuf_addf(&tfilename, "schedule_%s_XXXXXX", frequency);
     ++	tfile = xmks_tempfile(tfilename.buf);
     ++	strbuf_release(&tfilename);
     ++
     ++	if (!fdopen_tempfile(tfile, "w"))
      +		die(_("failed to create temp xml file"));
      +
      +	xml = "<?xml version=\"1.0\" encoding=\"US-ASCII\"?>\n"
     @@ builtin/gc.c: static int launchctl_update_schedule(int run_maintenance, int fd,
      +	      "</Task>\n";
      +	fprintf(tfile->fp, xml, exec_path, exec_path, frequency);
      +	strvec_split(&child.args, cmd);
     -+	strvec_pushl(&child.args, "/create", "/tn", name, "/f", "/xml", tfile->filename.buf, NULL);
     ++	strvec_pushl(&child.args, "/create", "/tn", name, "/f", "/xml",
     ++				  get_tempfile_path(tfile), NULL);
      +	close_tempfile_gently(tfile);
      +
      +	child.no_stdout = 1;
     @@ t/t7900-maintenance.sh: test_expect_success !MINGW 'start and stop macOS mainten
      +		*) shift ;;
      +		esac
      +	done
     -+	lines=$(wc -l args | awk "{print \$1;}")
     -+	test -z "$xmlfile" || cp "$xmlfile" "schedule-$lines.xml"
     ++	test -z "$xmlfile" || cp "$xmlfile" "$xmlfile.xml"
      +	EOF
      +
      +	rm -f args &&
     -+	GIT_TEST_MAINT_SCHEDULER="schtasks:./print-args" git maintenance start &&
     ++	GIT_TEST_MAINT_SCHEDULER="schtasks:./print-args" GIT_TRACE2_PERF=1 git maintenance start &&
      +
      +	# start registers the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
      +
      +	for frequency in hourly daily weekly
      +	do
     -+		grep "/create /tn Git Maintenance ($frequency) /f /xml" args \
     -+			|| return 1
     -+	done &&
     -+
     -+	for i in 1 2 3
     -+	do
     -+		test_xmllint "schedule-$i.xml" &&
     -+		grep "encoding=.US-ASCII." "schedule-$i.xml" || return 1
     ++		grep "/create /tn Git Maintenance ($frequency) /f /xml" args &&
     ++		file=$(ls schedule_$frequency*.xml) &&
     ++		test_xmllint "$file" &&
     ++		grep "encoding=.US-ASCII." "$file" || return 1
      +	done &&
      +
      +	rm -f args &&

-- 
gitgitgadget

^ permalink raw reply	[relevance 4%]

* [PATCH v4 0/4] Maintenance IV: Platform-specific background maintenance
  2020-11-13 14:00  3% ` [PATCH v3 " Derrick Stolee via GitGitGadget
@ 2020-11-17 21:13  2%   ` Derrick Stolee via GitGitGadget
  0 siblings, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-11-17 21:13 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Derrick Stolee, Derrick Stolee

This is based on ds/maintenance-part-3.

After sitting with the background maintenance as it has been cooking, I
wanted to come back around and implement the background maintenance for
Windows. However, I noticed that there were some things bothering me with
background maintenance on my macOS machine. These are detailed in PATCH 3,
but the tl;dr is that 'cron' is not recommended by Apple and instead
'launchd' satisfies our needs.

This series implements the background scheduling so git maintenance
(start|stop) works on those platforms. I've been operating with these
schedules for a while now without the problems described in the patches.

There is a particularly annoying case about console windows popping up on
Windows, but PATCH 4 describes a plan to get around that.

Updates in V4
=============

 * Eric did an excellent job providing a patch that cleans up several parts
   of my series. The most impressive is his mechanism for testing the
   platform-specific Git logic in a way that is (mostly) platform-agnostic.
   
   
 * Windows doesn't have the 'id' command, so we cannot run the macOS
   platform test on Windows.
   
   
 * I noticed far too late that while my example XML files had been edited
   with UTF-8 encoding, Git is actually writing them as US-ASCII. Somehow 
   xmllint and launchd are not complaining, but schtasks does complain.
   Unfortunately, I cannot find a way to catch this problem other than to
   install my tip version on all three platforms and go through the entire 
   git maintenance start process, and double-check that the processes are
   running on the hour.
   
   

Here is a diff from the tip of v3 + Eric's patch to the tip of v4:

diff --git a/builtin/gc.c b/builtin/gc.c
index 955d4b3baf..1a3725429c 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1642,13 +1642,13 @@ static int launchctl_schedule_plist(const char *exec_path, enum schedule_priorit
         break;
     }
     fprintf(plist, "</array>\n</dict>\n</plist>\n");
+    fclose(plist);

     /* bootout might fail if not already running, so ignore */
     launchctl_boot_plist(0, filename, cmd);
     if (launchctl_boot_plist(1, filename, cmd))
         die(_("failed to bootstrap service %s"), filename);

-    fclose(plist);
     free(filename);
     free(name);
     return 0;
@@ -1707,25 +1707,27 @@ static int schtasks_schedule_task(const char *exec_path, enum schedule_priority
     int result;
     struct child_process child = CHILD_PROCESS_INIT;
     const char *xml;
-    char *xmlpath, *tempDir;
-    FILE *xmlfp;
+    char *xmlpath;
+    struct tempfile *tfile;
     const char *frequency = get_frequency(schedule);
     char *name = schtasks_task_name(frequency);

-    tempDir = xstrfmt("%s/temp", the_repository->objects->odb->path);
-    xmlpath =  xstrfmt("%s/schedule-%s.xml", tempDir, frequency);
-    safe_create_leading_directories(xmlpath);
-    xmlfp = xfopen(xmlpath, "w");
+    xmlpath =  xstrfmt("%s/schedule-%s.xml",
+               the_repository->objects->odb->path,
+               frequency);
+    tfile = create_tempfile(xmlpath);
+    if (!tfile || !fdopen_tempfile(tfile, "w"))
+        die(_("failed to create '%s'"), xmlpath);

-    xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
+    xml = "<?xml version=\"1.0\" encoding=\"US-ASCII\"?>\n"
           "<Task version=\"1.4\" xmlns=\"http://schemas.microsoft.com/windows/2004/02/mit/task\">\n"
           "<Triggers>\n"
           "<CalendarTrigger>\n";
-    fputs(xml, xmlfp);
+    fputs(xml, tfile->fp);

     switch (schedule) {
     case SCHEDULE_HOURLY:
-        fprintf(xmlfp,
+        fprintf(tfile->fp,
             "<StartBoundary>2020-01-01T01:00:00</StartBoundary>\n"
             "<Enabled>true</Enabled>\n"
             "<ScheduleByDay>\n"
@@ -1739,7 +1741,7 @@ static int schtasks_schedule_task(const char *exec_path, enum schedule_priority
         break;

     case SCHEDULE_DAILY:
-        fprintf(xmlfp,
+        fprintf(tfile->fp,
             "<StartBoundary>2020-01-01T00:00:00</StartBoundary>\n"
             "<Enabled>true</Enabled>\n"
             "<ScheduleByWeek>\n"
@@ -1756,7 +1758,7 @@ static int schtasks_schedule_task(const char *exec_path, enum schedule_priority
         break;

     case SCHEDULE_WEEKLY:
-        fprintf(xmlfp,
+        fprintf(tfile->fp,
             "<StartBoundary>2020-01-01T00:00:00</StartBoundary>\n"
             "<Enabled>true</Enabled>\n"
             "<ScheduleByWeek>\n"
@@ -1771,7 +1773,7 @@ static int schtasks_schedule_task(const char *exec_path, enum schedule_priority
         break;
     }

-    xml=  "</CalendarTrigger>\n"
+    xml = "</CalendarTrigger>\n"
           "</Triggers>\n"
           "<Principals>\n"
           "<Principal id=\"Author\">\n"
@@ -1795,11 +1797,10 @@ static int schtasks_schedule_task(const char *exec_path, enum schedule_priority
           "</Exec>\n"
           "</Actions>\n"
           "</Task>\n";
-    fprintf(xmlfp, xml, exec_path, exec_path, frequency);
-    fclose(xmlfp);
-
+    fprintf(tfile->fp, xml, exec_path, exec_path, frequency);
     strvec_split(&child.args, cmd);
     strvec_pushl(&child.args, "/create", "/tn", name, "/f", "/xml", xmlpath, NULL);
+    close_tempfile_gently(tfile);

     child.no_stdout = 1;
     child.no_stderr = 1;
@@ -1808,8 +1809,7 @@ static int schtasks_schedule_task(const char *exec_path, enum schedule_priority
         die(_("failed to start schtasks"));
     result = finish_command(&child);

-    unlink(xmlpath);
-    rmdir(tempDir);
+    delete_tempfile(&tfile);
     free(xmlpath);
     free(name);
     return result;
@@ -1850,9 +1850,8 @@ static int crontab_update_schedule(int run_maintenance, int fd, const char *cmd)
     crontab_list.out = dup(fd);
     crontab_list.git_cmd = 0;

-    if (start_command(&crontab_list)) {
+    if (start_command(&crontab_list))
         return error(_("failed to run 'crontab -l'; your system might not support 'cron'"));
-    }

     /* Ignore exit code, as an empty crontab will return error. */
     finish_command(&crontab_list);
@@ -1868,9 +1867,8 @@ static int crontab_update_schedule(int run_maintenance, int fd, const char *cmd)
     crontab_edit.in = -1;
     crontab_edit.git_cmd = 0;

-    if (start_command(&crontab_edit)) {
+    if (start_command(&crontab_edit))
         return error(_("failed to run 'crontab'; your system might not support 'cron'"));
-    }

     cron_in = fdopen(crontab_edit.in, "w");
     if (!cron_in) {
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index e92946c10a..a26ff22541 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -408,7 +408,7 @@ test_expect_success 'start preserves existing schedule' '
     grep "Important information!" cron.txt
 '

-test_expect_success 'start and stop macOS maintenance' '
+test_expect_success !MINGW 'start and stop macOS maintenance' '
     uid=$(id -u) &&

     write_script print-args <<-\EOF &&
@@ -421,7 +421,6 @@ test_expect_success 'start and stop macOS maintenance' '
     # start registers the repo
     git config --get --global maintenance.repo "$(pwd)" &&

-    # ~/Library/LaunchAgents
     ls "$HOME/Library/LaunchAgents" >actual &&
     cat >expect <<-\EOF &&
     org.git-scm.git.daily.plist
@@ -468,12 +467,12 @@ test_expect_success 'start and stop Windows maintenance' '
     EOF

     rm -f args &&
-    GIT_TEST_MAINT_SCHEDULER="schtasks:/bin/sh print-args" git maintenance start &&
+    GIT_TEST_MAINT_SCHEDULER="schtasks:./print-args" git maintenance start &&

     # start registers the repo
     git config --get --global maintenance.repo "$(pwd)" &&

-    printf "/create /tn Git Maintenance (%s) /f /xml .git/objects/temp/schedule-%s.xml\n" \
+    printf "/create /tn Git Maintenance (%s) /f /xml .git/objects/schedule-%s.xml\n" \
         hourly hourly daily daily weekly weekly >expect &&
     test_cmp expect args &&

@@ -483,7 +482,7 @@ test_expect_success 'start and stop Windows maintenance' '
     done &&

     rm -f args &&
-    GIT_TEST_MAINT_SCHEDULER="schtasks:/bin/sh print-args" git maintenance stop &&
+    GIT_TEST_MAINT_SCHEDULER="schtasks:./print-args" git maintenance stop &&

     # stop does not unregister the repo
     git config --get --global maintenance.repo "$(pwd)" &&
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 4a60d1ed76..ddbeee1f5e 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1704,7 +1704,8 @@ test_lazy_prereq REBASE_P '
 '

 # Ensure that no test accidentally triggers a Git command
-# that runs 'crontab', affecting a user's cron schedule.
-# Tests that verify the cron integration must set this locally
+# that runs the actual maintenance scheduler, affecting a user's
+# system permanently.
+# Tests that verify the scheduler integration must set this locally
 # to avoid errors.
-GIT_TEST_CRONTAB="exit 1"
+GIT_TEST_MAINT_SCHEDULER="none:exit 1"

Thanks, -Stolee

cc: jrnieder@gmail.com [jrnieder@gmail.com], jonathantanmy@google.com
[jonathantanmy@google.com], sluongng@gmail.com [sluongng@gmail.com]cc:
Derrick Stolee stolee@gmail.com [stolee@gmail.com]cc: Đoàn Trần Công Danh 
congdanhqx@gmail.com [congdanhqx@gmail.com]cc: Martin Ågren 
martin.agren@gmail.com [martin.agren@gmail.com]cc: Eric Sunshine 
sunshine@sunshineco.com [sunshine@sunshineco.com]cc: Derrick Stolee 
stolee@gmail.com [stolee@gmail.com]

Derrick Stolee (4):
  maintenance: extract platform-specific scheduling
  maintenance: include 'cron' details in docs
  maintenance: use launchctl on macOS
  maintenance: use Windows scheduled tasks

 Documentation/git-maintenance.txt | 116 ++++++++
 builtin/gc.c                      | 421 ++++++++++++++++++++++++++++--
 t/t7900-maintenance.sh            | 106 +++++++-
 t/test-lib.sh                     |   7 +-
 4 files changed, 616 insertions(+), 34 deletions(-)


base-commit: 0016b618182f642771dc589cf0090289f9fe1b4f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-776%2Fderrickstolee%2Fmaintenance%2FmacOS-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-776/derrickstolee/maintenance/macOS-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/776

Range-diff vs v3:

 1:  d35f1aa162 ! 1:  4807342b00 maintenance: extract platform-specific scheduling
     @@ Commit message
          swapped at compile time with new implementations on specialized
          platforms.
      
     +    As we add this generality, rename GIT_TEST_CRONTAB to
     +    GIT_TEST_MAINT_SCHEDULER. Further, this variable is now parsed as
     +    "<scheduler>:<command>" so we can test platform-specific scheduling
     +    logic even when not on the correct platform. By specifying the
     +    <scheduler> in this string, we will be able to test all three sets of
     +    Git logic from a Linux machine.
     +
     +    Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
     +    Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## builtin/gc.c ##
     @@ builtin/gc.c: static int maintenance_unregister(void)
       #define END_LINE "# END GIT MAINTENANCE SCHEDULE"
       
      -static int update_background_schedule(int run_maintenance)
     -+static int platform_update_schedule(int run_maintenance, int fd)
     ++static int crontab_update_schedule(int run_maintenance, int fd, const char *cmd)
       {
       	int result = 0;
       	int in_old_region = 0;
     -@@ builtin/gc.c: static int update_background_schedule(int run_maintenance)
     + 	struct child_process crontab_list = CHILD_PROCESS_INIT;
     + 	struct child_process crontab_edit = CHILD_PROCESS_INIT;
       	FILE *cron_list, *cron_in;
     - 	const char *crontab_name;
     +-	const char *crontab_name;
       	struct strbuf line = STRBUF_INIT;
      -	struct lock_file lk;
      -	char *lock_path = xstrfmt("%s/schedule", the_repository->objects->odb->path);
     --
     + 
      -	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0)
      -		return error(_("another process is scheduling background maintenance"));
     - 
     - 	crontab_name = getenv("GIT_TEST_CRONTAB");
     - 	if (!crontab_name)
     -@@ builtin/gc.c: static int update_background_schedule(int run_maintenance)
     - 	strvec_split(&crontab_list.args, crontab_name);
     +-
     +-	crontab_name = getenv("GIT_TEST_CRONTAB");
     +-	if (!crontab_name)
     +-		crontab_name = "crontab";
     +-
     +-	strvec_split(&crontab_list.args, crontab_name);
     ++	strvec_split(&crontab_list.args, cmd);
       	strvec_push(&crontab_list.args, "-l");
       	crontab_list.in = -1;
      -	crontab_list.out = dup(lk.tempfile->fd);
      +	crontab_list.out = dup(fd);
       	crontab_list.git_cmd = 0;
       
     - 	if (start_command(&crontab_list)) {
     +-	if (start_command(&crontab_list)) {
      -		result = error(_("failed to run 'crontab -l'; your system might not support 'cron'"));
      -		goto cleanup;
     +-	}
     ++	if (start_command(&crontab_list))
      +		return error(_("failed to run 'crontab -l'; your system might not support 'cron'"));
     - 	}
       
       	/* Ignore exit code, as an empty crontab will return error. */
     + 	finish_command(&crontab_list);
      @@ builtin/gc.c: static int update_background_schedule(int run_maintenance)
       	 * Read from the .lock file, filtering out the old
       	 * schedule while appending the new schedule.
     @@ builtin/gc.c: static int update_background_schedule(int run_maintenance)
      +	cron_list = fdopen(fd, "r");
       	rewind(cron_list);
       
     - 	strvec_split(&crontab_edit.args, crontab_name);
     -@@ builtin/gc.c: static int update_background_schedule(int run_maintenance)
     +-	strvec_split(&crontab_edit.args, crontab_name);
     ++	strvec_split(&crontab_edit.args, cmd);
     + 	crontab_edit.in = -1;
       	crontab_edit.git_cmd = 0;
       
     - 	if (start_command(&crontab_edit)) {
     +-	if (start_command(&crontab_edit)) {
      -		result = error(_("failed to run 'crontab'; your system might not support 'cron'"));
      -		goto cleanup;
     +-	}
     ++	if (start_command(&crontab_edit))
      +		return error(_("failed to run 'crontab'; your system might not support 'cron'"));
     - 	}
       
       	cron_in = fdopen(crontab_edit.in, "w");
     + 	if (!cron_in) {
      @@ builtin/gc.c: static int update_background_schedule(int run_maintenance)
       	close(crontab_edit.in);
       
     @@ builtin/gc.c: static int update_background_schedule(int run_maintenance)
      +	if (finish_command(&crontab_edit))
       		result = error(_("'crontab' died"));
      -		goto cleanup;
     --	}
     --	fclose(cron_list);
      +	else
      +		fclose(cron_list);
      +	return result;
      +}
      +
     -+static int update_background_schedule(int run_maintenance)
     ++static const char platform_scheduler[] = "crontab";
     ++
     ++static int update_background_schedule(int enable)
      +{
      +	int result;
     ++	const char *scheduler = platform_scheduler;
     ++	const char *cmd = scheduler;
     ++	char *testing;
      +	struct lock_file lk;
      +	char *lock_path = xstrfmt("%s/schedule", the_repository->objects->odb->path);
      +
     ++	testing = xstrdup_or_null(getenv("GIT_TEST_MAINT_SCHEDULER"));
     ++	if (testing) {
     ++		char *sep = strchr(testing, ':');
     ++		if (!sep)
     ++			die("GIT_TEST_MAINT_SCHEDULER unparseable: %s", testing);
     ++		*sep = '\0';
     ++		scheduler = testing;
     ++		cmd = sep + 1;
     + 	}
     +-	fclose(cron_list);
     + 
     +-cleanup:
      +	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0)
      +		return error(_("another process is scheduling background maintenance"));
      +
     -+	result = platform_update_schedule(run_maintenance, lk.tempfile->fd);
     - 
     --cleanup:
     ++	if (!strcmp(scheduler, "crontab"))
     ++		result = crontab_update_schedule(enable, lk.tempfile->fd, cmd);
     ++	else
     ++		die("unknown background scheduler: %s", scheduler);
     ++
       	rollback_lock_file(&lk);
     ++	free(testing);
       	return result;
       }
     + 
     +
     + ## t/t7900-maintenance.sh ##
     +@@ t/t7900-maintenance.sh: test_expect_success 'register and unregister' '
     + '
     + 
     + test_expect_success 'start from empty cron table' '
     +-	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
     ++	GIT_TEST_MAINT_SCHEDULER="crontab:test-tool crontab cron.txt" git maintenance start &&
     + 
     + 	# start registers the repo
     + 	git config --get --global maintenance.repo "$(pwd)" &&
     +@@ t/t7900-maintenance.sh: test_expect_success 'start from empty cron table' '
     + '
     + 
     + test_expect_success 'stop from existing schedule' '
     +-	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop &&
     ++	GIT_TEST_MAINT_SCHEDULER="crontab:test-tool crontab cron.txt" git maintenance stop &&
     + 
     + 	# stop does not unregister the repo
     + 	git config --get --global maintenance.repo "$(pwd)" &&
     + 
     + 	# Operation is idempotent
     +-	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop &&
     ++	GIT_TEST_MAINT_SCHEDULER="crontab:test-tool crontab cron.txt" git maintenance stop &&
     + 	test_must_be_empty cron.txt
     + '
     + 
     + test_expect_success 'start preserves existing schedule' '
     + 	echo "Important information!" >cron.txt &&
     +-	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
     ++	GIT_TEST_MAINT_SCHEDULER="crontab:test-tool crontab cron.txt" git maintenance start &&
     + 	grep "Important information!" cron.txt
     + '
     + 
     +
     + ## t/test-lib.sh ##
     +@@ t/test-lib.sh: test_lazy_prereq REBASE_P '
     + '
     + 
     + # Ensure that no test accidentally triggers a Git command
     +-# that runs 'crontab', affecting a user's cron schedule.
     +-# Tests that verify the cron integration must set this locally
     ++# that runs the actual maintenance scheduler, affecting a user's
     ++# system permanently.
     ++# Tests that verify the scheduler integration must set this locally
     + # to avoid errors.
     +-GIT_TEST_CRONTAB="exit 1"
     ++GIT_TEST_MAINT_SCHEDULER="none:exit 1"
 2:  0dfe53092e ! 2:  99170df462 maintenance: include 'cron' details in docs
     @@ Documentation/git-maintenance.txt: Further, the `git gc` command should not be c
      +---------------------------------------
      +
      +The standard mechanism for scheduling background tasks on POSIX systems
     -+is cron(8). This tool executes commands based on a given schedule. The
     ++is `cron`. This tool executes commands based on a given schedule. The
      +current list of user-scheduled tasks can be found by running `crontab -l`.
      +The schedule written by `git maintenance start` is similar to this:
      +
     @@ Documentation/git-maintenance.txt: Further, the `git gc` command should not be c
      +Any modifications within this region will be completely deleted by
      +`git maintenance stop` or overwritten by `git maintenance start`.
      +
     -+The `crontab` entry specifies the full path of the `git` executable to
     -+ensure that the executed `git` command is the same one with which
     -+`git maintenance start` was issued independent of `PATH`. If the same user
     -+runs `git maintenance start` with multiple Git executables, then only the
     -+latest executable is used.
     ++The `<path>` string is loaded to specifically use the location for the
     ++`git` executable used in the `git maintenance start` command. This allows
     ++for multiple versions to be compatible. However, if the same user runs
     ++`git maintenance start` with multiple Git executables, then only the
     ++latest executable will be used.
      +
      +These commands use `git for-each-repo --config=maintenance.repo` to run
      +`git maintenance run --schedule=<frequency>` on each repository listed in
      +the multi-valued `maintenance.repo` config option. These are typically
     -+loaded from the user-specific global config. The `git maintenance` process
     -+then determines which maintenance tasks are configured to run on each
     -+repository with each `<frequency>` using the `maintenance.<task>.schedule`
     -+config options. These values are loaded from the global or repository
     -+config values.
     ++loaded from the user-specific global config located at `~/.gitconfig`.
     ++The `git maintenance` process then determines which maintenance tasks
     ++are configured to run on each repository with each `<frequency>` using
     ++the `maintenance.<task>.schedule` config options. These values are loaded
     ++from the global or repository config values.
      +
      +If the config values are insufficient to achieve your desired background
      +maintenance schedule, then you can create your own schedule. If you run
      +`crontab -e`, then an editor will load with your user-specific `cron`
      +schedule. In that editor, you can add your own schedule lines. You could
      +start by adapting the default schedule listed earlier, or you could read
     -+the crontab(5) documentation for advanced scheduling techniques. Please
     -+do use the full path and `--exec-path` techniques from the default
     -+schedule to ensure you are executing the correct binaries in your
     -+schedule.
     ++https://man7.org/linux/man-pages/man5/crontab.5.html[the `crontab` documentation]
     ++for advanced scheduling techniques. Please do use the full path and
     ++`--exec-path` techniques from the default schedule to ensure you are
     ++executing the correct binaries in your schedule.
      +
       
       GIT
 3:  1629bcfcf8 ! 3:  ed0a0011fb maintenance: use launchctl on macOS
     @@ Commit message
          plist file. We also need to 'bootout' a task before the 'bootstrap'
          subcommand will succeed, if such a task already exists.
      
     +    The need for a user id requires us to run 'id -u' which works on
     +    POSIX systems but not Windows. The test therefore has a prerequisite
     +    that we are not on Windows. The cross-platform logic still allows us to
     +    test the macOS logic on a Linux machine.
     +
          We can verify the commands that were run by 'git maintenance start'
          and 'git maintenance stop' by injecting a script that writes the
     -    command-line arguments into GIT_TEST_CRONTAB.
     +    command-line arguments into GIT_TEST_MAINT_SCHEDULER.
      
          An earlier version of this patch accidentally had an opening
          "<dict>" tag when it should have had a closing "</dict>" tag. This
          was caught during manual testing with actual 'launchctl' commands,
          but we do not want to update developers' tasks when running tests.
          It appears that macOS includes the "xmllint" tool which can verify
     -    the XML format, so call it from the macOS-specific tests to ensure
     -    the .plist files are well-formatted.
     +    the XML format. This is useful for any system that might contain
     +    the tool, so use it whenever it is available.
      
     -    Helped-by: Eric Sunshine <sunshine@sunshineco.com>
     +    Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
     +    Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/git-maintenance.txt ##
     -@@ Documentation/git-maintenance.txt: schedule to ensure you are executing the correct binaries in your
     - schedule.
     +@@ Documentation/git-maintenance.txt: for advanced scheduling techniques. Please do use the full path and
     + executing the correct binaries in your schedule.
       
       
      +BACKGROUND MAINTENANCE ON MACOS SYSTEMS
     @@ builtin/gc.c: static int maintenance_unregister(void)
       	return run_command(&config_unset);
       }
       
     -+#if defined(__APPLE__)
     ++static const char *get_frequency(enum schedule_priority schedule)
     ++{
     ++	switch (schedule) {
     ++	case SCHEDULE_HOURLY:
     ++		return "hourly";
     ++	case SCHEDULE_DAILY:
     ++		return "daily";
     ++	case SCHEDULE_WEEKLY:
     ++		return "weekly";
     ++	default:
     ++		BUG("invalid schedule %d", schedule);
     ++	}
     ++}
      +
     -+static char *get_service_name(const char *frequency)
     ++static char *launchctl_service_name(const char *frequency)
      +{
      +	struct strbuf label = STRBUF_INIT;
      +	strbuf_addf(&label, "org.git-scm.git.%s", frequency);
      +	return strbuf_detach(&label, NULL);
      +}
      +
     -+static char *get_service_filename(const char *name)
     ++static char *launchctl_service_filename(const char *name)
      +{
      +	char *expanded;
      +	struct strbuf filename = STRBUF_INIT;
     @@ builtin/gc.c: static int maintenance_unregister(void)
      +	return expanded;
      +}
      +
     -+static const char *get_frequency(enum schedule_priority schedule)
     ++static char *launchctl_get_uid(void)
      +{
     -+	switch (schedule) {
     -+	case SCHEDULE_HOURLY:
     -+		return "hourly";
     -+	case SCHEDULE_DAILY:
     -+		return "daily";
     -+	case SCHEDULE_WEEKLY:
     -+		return "weekly";
     -+	default:
     -+		BUG("invalid schedule %d", schedule);
     -+	}
     ++	return xstrfmt("gui/%d", getuid());
      +}
      +
     -+static char *get_uid(void)
     -+{
     -+	struct strbuf output = STRBUF_INIT;
     -+	struct child_process id = CHILD_PROCESS_INIT;
     -+
     -+	strvec_pushl(&id.args, "/usr/bin/id", "-u", NULL);
     -+	if (capture_command(&id, &output, 0))
     -+		die(_("failed to discover user id"));
     -+
     -+	strbuf_trim_trailing_newline(&output);
     -+	return strbuf_detach(&output, NULL);
     -+}
     -+
     -+static int boot_plist(int enable, const char *filename)
     ++static int launchctl_boot_plist(int enable, const char *filename, const char *cmd)
      +{
      +	int result;
      +	struct child_process child = CHILD_PROCESS_INIT;
     -+	char *uid = get_uid();
     -+	const char *launchctl = getenv("GIT_TEST_CRONTAB");
     -+	if (!launchctl)
     -+		launchctl = "/bin/launchctl";
     -+
     -+	strvec_split(&child.args, launchctl);
     ++	char *uid = launchctl_get_uid();
      +
     ++	strvec_split(&child.args, cmd);
      +	if (enable)
      +		strvec_push(&child.args, "bootstrap");
      +	else
      +		strvec_push(&child.args, "bootout");
     -+	strvec_pushf(&child.args, "gui/%s", uid);
     ++	strvec_push(&child.args, uid);
      +	strvec_push(&child.args, filename);
      +
      +	child.no_stderr = 1;
     @@ builtin/gc.c: static int maintenance_unregister(void)
      +	return result;
      +}
      +
     -+static int remove_plist(enum schedule_priority schedule)
     ++static int launchctl_remove_plist(enum schedule_priority schedule, const char *cmd)
      +{
      +	const char *frequency = get_frequency(schedule);
     -+	char *name = get_service_name(frequency);
     -+	char *filename = get_service_filename(name);
     -+	int result = boot_plist(0, filename);
     ++	char *name = launchctl_service_name(frequency);
     ++	char *filename = launchctl_service_filename(name);
     ++	int result = launchctl_boot_plist(0, filename, cmd);
      +	unlink(filename);
      +	free(filename);
      +	free(name);
      +	return result;
      +}
      +
     -+static int remove_plists(void)
     ++static int launchctl_remove_plists(const char *cmd)
      +{
     -+	return remove_plist(SCHEDULE_HOURLY) ||
     -+		remove_plist(SCHEDULE_DAILY) ||
     -+		remove_plist(SCHEDULE_WEEKLY);
     ++	return launchctl_remove_plist(SCHEDULE_HOURLY, cmd) ||
     ++		launchctl_remove_plist(SCHEDULE_DAILY, cmd) ||
     ++		launchctl_remove_plist(SCHEDULE_WEEKLY, cmd);
      +}
      +
     -+static int schedule_plist(const char *exec_path, enum schedule_priority schedule)
     ++static int launchctl_schedule_plist(const char *exec_path, enum schedule_priority schedule, const char *cmd)
      +{
      +	FILE *plist;
      +	int i;
      +	const char *preamble, *repeat;
      +	const char *frequency = get_frequency(schedule);
     -+	char *name = get_service_name(frequency);
     -+	char *filename = get_service_filename(name);
     ++	char *name = launchctl_service_name(frequency);
     ++	char *filename = launchctl_service_filename(name);
      +
      +	if (safe_create_leading_directories(filename))
      +		die(_("failed to create directories for '%s'"), filename);
      +	plist = xfopen(filename, "w");
      +
     -+	preamble = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
     ++	preamble = "<?xml version=\"1.0\" encoding=\"US-ASCII\"?>\n"
      +		   "<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">\n"
      +		   "<plist version=\"1.0\">"
      +		   "<dict>\n"
     @@ builtin/gc.c: static int maintenance_unregister(void)
      +		break;
      +	}
      +	fprintf(plist, "</array>\n</dict>\n</plist>\n");
     ++	fclose(plist);
      +
      +	/* bootout might fail if not already running, so ignore */
     -+	boot_plist(0, filename);
     -+	if (boot_plist(1, filename))
     ++	launchctl_boot_plist(0, filename, cmd);
     ++	if (launchctl_boot_plist(1, filename, cmd))
      +		die(_("failed to bootstrap service %s"), filename);
      +
     -+	fclose(plist);
      +	free(filename);
      +	free(name);
      +	return 0;
      +}
      +
     -+static int add_plists(void)
     ++static int launchctl_add_plists(const char *cmd)
      +{
      +	const char *exec_path = git_exec_path();
      +
     -+	return schedule_plist(exec_path, SCHEDULE_HOURLY) ||
     -+		schedule_plist(exec_path, SCHEDULE_DAILY) ||
     -+		schedule_plist(exec_path, SCHEDULE_WEEKLY);
     ++	return launchctl_schedule_plist(exec_path, SCHEDULE_HOURLY, cmd) ||
     ++		launchctl_schedule_plist(exec_path, SCHEDULE_DAILY, cmd) ||
     ++		launchctl_schedule_plist(exec_path, SCHEDULE_WEEKLY, cmd);
      +}
      +
     -+static int platform_update_schedule(int run_maintenance, int fd)
     ++static int launchctl_update_schedule(int run_maintenance, int fd, const char *cmd)
      +{
      +	if (run_maintenance)
     -+		return add_plists();
     ++		return launchctl_add_plists(cmd);
      +	else
     -+		return remove_plists();
     ++		return launchctl_remove_plists(cmd);
      +}
     -+#else
     ++
       #define BEGIN_LINE "# BEGIN GIT MAINTENANCE SCHEDULE"
       #define END_LINE "# END GIT MAINTENANCE SCHEDULE"
       
     -@@ builtin/gc.c: static int platform_update_schedule(int run_maintenance, int fd)
     - 		fclose(cron_list);
     +@@ builtin/gc.c: static int crontab_update_schedule(int run_maintenance, int fd, const char *cmd)
       	return result;
       }
     + 
     ++#if defined(__APPLE__)
     ++static const char platform_scheduler[] = "launchctl";
     ++#else
     + static const char platform_scheduler[] = "crontab";
      +#endif
       
     - static int update_background_schedule(int run_maintenance)
     + static int update_background_schedule(int enable)
       {
     +@@ builtin/gc.c: static int update_background_schedule(int enable)
     + 	if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0)
     + 		return error(_("another process is scheduling background maintenance"));
     + 
     +-	if (!strcmp(scheduler, "crontab"))
     ++	if (!strcmp(scheduler, "launchctl"))
     ++		result = launchctl_update_schedule(enable, lk.tempfile->fd, cmd);
     ++	else if (!strcmp(scheduler, "crontab"))
     + 		result = crontab_update_schedule(enable, lk.tempfile->fd, cmd);
     + 	else
     + 		die("unknown background scheduler: %s", scheduler);
      
       ## t/t7900-maintenance.sh ##
     -@@ t/t7900-maintenance.sh: test_expect_success 'register and unregister' '
     - 	test_cmp before actual
     - '
     - 
     --test_expect_success 'start from empty cron table' '
     -+test_expect_success !MACOS_MAINTENANCE 'start from empty cron table' '
     - 	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
     - 
     - 	# start registers the repo
     -@@ t/t7900-maintenance.sh: test_expect_success 'start from empty cron table' '
     - 	grep "for-each-repo --config=maintenance.repo maintenance run --schedule=weekly" cron.txt
     - '
     - 
     --test_expect_success 'stop from existing schedule' '
     -+test_expect_success !MACOS_MAINTENANCE 'stop from existing schedule' '
     - 	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop &&
     +@@ t/t7900-maintenance.sh: test_description='git maintenance builtin'
     + GIT_TEST_COMMIT_GRAPH=0
     + GIT_TEST_MULTI_PACK_INDEX=0
       
     - 	# stop does not unregister the repo
     -@@ t/t7900-maintenance.sh: test_expect_success 'stop from existing schedule' '
     - 	test_must_be_empty cron.txt
     - '
     - 
     --test_expect_success 'start preserves existing schedule' '
     -+test_expect_success !MACOS_MAINTENANCE 'start preserves existing schedule' '
     - 	echo "Important information!" >cron.txt &&
     - 	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
     ++test_lazy_prereq XMLLINT '
     ++	xmllint --version
     ++'
     ++
     ++test_xmllint () {
     ++	if test_have_prereq XMLLINT
     ++	then
     ++		xmllint --noout "$@"
     ++	else
     ++		true
     ++	fi
     ++}
     ++
     + test_expect_success 'help text' '
     + 	test_expect_code 129 git maintenance -h 2>err &&
     + 	test_i18ngrep "usage: git maintenance <subcommand>" err &&
     +@@ t/t7900-maintenance.sh: test_expect_success 'start preserves existing schedule' '
       	grep "Important information!" cron.txt
       '
       
     -+test_expect_success MACOS_MAINTENANCE 'start and stop macOS maintenance' '
     -+	write_script print-args "#!/bin/sh\necho \$* >>args" &&
     ++test_expect_success !MINGW 'start and stop macOS maintenance' '
     ++	uid=$(id -u) &&
     ++
     ++	write_script print-args <<-\EOF &&
     ++	echo $* >>args
     ++	EOF
      +
      +	rm -f args &&
     -+	GIT_TEST_CRONTAB="./print-args" git maintenance start &&
     ++	GIT_TEST_MAINT_SCHEDULER=launchctl:./print-args git maintenance start &&
      +
      +	# start registers the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
      +
     -+	# ~/Library/LaunchAgents
      +	ls "$HOME/Library/LaunchAgents" >actual &&
      +	cat >expect <<-\EOF &&
      +	org.git-scm.git.daily.plist
     @@ t/t7900-maintenance.sh: test_expect_success 'stop from existing schedule' '
      +	for frequency in hourly daily weekly
      +	do
      +		PLIST="$HOME/Library/LaunchAgents/org.git-scm.git.$frequency.plist" &&
     -+		xmllint --noout "$PLIST" &&
     ++		test_xmllint "$PLIST" &&
      +		grep schedule=$frequency "$PLIST" &&
     -+		echo "bootout gui/$UID $PLIST" >>expect &&
     -+		echo "bootstrap gui/$UID $PLIST" >>expect || return 1
     ++		echo "bootout gui/$uid $PLIST" >>expect &&
     ++		echo "bootstrap gui/$uid $PLIST" >>expect || return 1
      +	done &&
      +	test_cmp expect args &&
      +
      +	rm -f args &&
     -+	GIT_TEST_CRONTAB="./print-args" git maintenance stop &&
     ++	GIT_TEST_MAINT_SCHEDULER=launchctl:./print-args git maintenance stop &&
      +
      +	# stop does not unregister the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
      +
     -+	printf "bootout gui/$UID $HOME/Library/LaunchAgents/org.git-scm.git.%s.plist\n" \
     ++	printf "bootout gui/$uid $HOME/Library/LaunchAgents/org.git-scm.git.%s.plist\n" \
      +		hourly daily weekly >expect &&
      +	test_cmp expect args &&
      +	ls "$HOME/Library/LaunchAgents" >actual &&
     @@ t/t7900-maintenance.sh: test_expect_success 'stop from existing schedule' '
       test_expect_success 'register preserves existing strategy' '
       	git config maintenance.strategy none &&
       	git maintenance register &&
     -
     - ## t/test-lib.sh ##
     -@@ t/test-lib.sh: test_lazy_prereq REBASE_P '
     - 	test -z "$GIT_TEST_SKIP_REBASE_P"
     - '
     - 
     -+test_lazy_prereq MACOS_MAINTENANCE '
     -+	launchctl list
     -+'
     -+
     - # Ensure that no test accidentally triggers a Git command
     - # that runs 'crontab', affecting a user's cron schedule.
     - # Tests that verify the cron integration must set this locally
 4:  ed7a61978f ! 4:  b8d86fb983 maintenance: use Windows scheduled tasks
     @@ Commit message
          logged in, and more fields are populated with the current username and
          SID at run-time by 'schtasks'.
      
     +    Since the GIT_TEST_MAINT_SCHEDULER environment variable allows us to
     +    specify 'schtasks' as the scheduler, we can test the Windows-specific
     +    logic on a macOS platform. Thus, add a check that the XML file written
     +    by Git is valid when xmllint exists on the system.
     +
          There is a deficiency in the current design. Windows has two kinds of
          applications: GUI applications that start by "winmain()" and console
          applications that start by "main()". Console applications are attached
     @@ Commit message
          short term. In the long term, we can consider creating this GUI
          shim application within core Git, perhaps in contrib/.
      
     -    Helped-by: Eric Sunshine <sunshine@sunshineco.com>
     +    Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
     +    Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/git-maintenance.txt ##
     @@ Documentation/git-maintenance.txt: To create more advanced customizations to you
       Part of the linkgit:git[1] suite
      
       ## builtin/gc.c ##
     -@@ builtin/gc.c: static int platform_update_schedule(int run_maintenance, int fd)
     - 	else
     - 		return remove_plists();
     +@@ builtin/gc.c: static int launchctl_update_schedule(int run_maintenance, int fd, const char *cm
     + 		return launchctl_remove_plists(cmd);
       }
     -+
     -+#elif defined(GIT_WINDOWS_NATIVE)
     -+
     -+static const char *get_frequency(enum schedule_priority schedule)
     -+{
     -+	switch (schedule) {
     -+	case SCHEDULE_HOURLY:
     -+		return "hourly";
     -+	case SCHEDULE_DAILY:
     -+		return "daily";
     -+	case SCHEDULE_WEEKLY:
     -+		return "weekly";
     -+	default:
     -+		BUG("invalid schedule %d", schedule);
     -+	}
     -+}
     -+
     -+static char *get_task_name(const char *frequency)
     + 
     ++static char *schtasks_task_name(const char *frequency)
      +{
      +	struct strbuf label = STRBUF_INIT;
      +	strbuf_addf(&label, "Git Maintenance (%s)", frequency);
      +	return strbuf_detach(&label, NULL);
      +}
      +
     -+static int remove_task(enum schedule_priority schedule)
     ++static int schtasks_remove_task(enum schedule_priority schedule, const char *cmd)
      +{
      +	int result;
      +	struct strvec args = STRVEC_INIT;
      +	const char *frequency = get_frequency(schedule);
     -+	char *name = get_task_name(frequency);
     -+	const char *schtasks = getenv("GIT_TEST_CRONTAB");
     -+	if (!schtasks)
     -+		schtasks = "schtasks";
     ++	char *name = schtasks_task_name(frequency);
      +
     -+	strvec_split(&args, schtasks);
     ++	strvec_split(&args, cmd);
      +	strvec_pushl(&args, "/delete", "/tn", name, "/f", NULL);
      +
      +	result = run_command_v_opt(args.v, 0);
     @@ builtin/gc.c: static int platform_update_schedule(int run_maintenance, int fd)
      +	return result;
      +}
      +
     -+static int remove_scheduled_tasks(void)
     ++static int schtasks_remove_tasks(const char *cmd)
      +{
     -+	return remove_task(SCHEDULE_HOURLY) ||
     -+		remove_task(SCHEDULE_DAILY) ||
     -+		remove_task(SCHEDULE_WEEKLY);
     ++	return schtasks_remove_task(SCHEDULE_HOURLY, cmd) ||
     ++		schtasks_remove_task(SCHEDULE_DAILY, cmd) ||
     ++		schtasks_remove_task(SCHEDULE_WEEKLY, cmd);
      +}
      +
     -+static int schedule_task(const char *exec_path, enum schedule_priority schedule)
     ++static int schtasks_schedule_task(const char *exec_path, enum schedule_priority schedule, const char *cmd)
      +{
      +	int result;
      +	struct child_process child = CHILD_PROCESS_INIT;
     -+	const char *xml, *schtasks;
     -+	char *xmlpath, *tempDir;
     -+	FILE *xmlfp;
     ++	const char *xml;
     ++	char *xmlpath;
     ++	struct tempfile *tfile;
      +	const char *frequency = get_frequency(schedule);
     -+	char *name = get_task_name(frequency);
     ++	char *name = schtasks_task_name(frequency);
      +
     -+	tempDir = xstrfmt("%s/temp", the_repository->objects->odb->path);
     -+	xmlpath =  xstrfmt("%s/schedule-%s.xml", tempDir, frequency);
     -+	safe_create_leading_directories(xmlpath);
     -+	xmlfp = xfopen(xmlpath, "w");
     ++	xmlpath =  xstrfmt("%s/schedule-%s.xml",
     ++			   the_repository->objects->odb->path,
     ++			   frequency);
     ++	tfile = create_tempfile(xmlpath);
     ++	if (!tfile || !fdopen_tempfile(tfile, "w"))
     ++		die(_("failed to create '%s'"), xmlpath);
      +
     -+	xml = "<?xml version=\"1.0\" encoding=\"UTF-16\"?>\n"
     ++	xml = "<?xml version=\"1.0\" encoding=\"US-ASCII\"?>\n"
      +	      "<Task version=\"1.4\" xmlns=\"http://schemas.microsoft.com/windows/2004/02/mit/task\">\n"
      +	      "<Triggers>\n"
      +	      "<CalendarTrigger>\n";
     -+	fprintf(xmlfp, xml);
     ++	fputs(xml, tfile->fp);
      +
      +	switch (schedule) {
      +	case SCHEDULE_HOURLY:
     -+		fprintf(xmlfp,
     ++		fprintf(tfile->fp,
      +			"<StartBoundary>2020-01-01T01:00:00</StartBoundary>\n"
      +			"<Enabled>true</Enabled>\n"
      +			"<ScheduleByDay>\n"
     @@ builtin/gc.c: static int platform_update_schedule(int run_maintenance, int fd)
      +		break;
      +
      +	case SCHEDULE_DAILY:
     -+		fprintf(xmlfp,
     ++		fprintf(tfile->fp,
      +			"<StartBoundary>2020-01-01T00:00:00</StartBoundary>\n"
      +			"<Enabled>true</Enabled>\n"
      +			"<ScheduleByWeek>\n"
     @@ builtin/gc.c: static int platform_update_schedule(int run_maintenance, int fd)
      +		break;
      +
      +	case SCHEDULE_WEEKLY:
     -+		fprintf(xmlfp,
     ++		fprintf(tfile->fp,
      +			"<StartBoundary>2020-01-01T00:00:00</StartBoundary>\n"
      +			"<Enabled>true</Enabled>\n"
      +			"<ScheduleByWeek>\n"
     @@ builtin/gc.c: static int platform_update_schedule(int run_maintenance, int fd)
      +		break;
      +	}
      +
     -+	xml=  "</CalendarTrigger>\n"
     ++	xml = "</CalendarTrigger>\n"
      +	      "</Triggers>\n"
      +	      "<Principals>\n"
      +	      "<Principal id=\"Author\">\n"
     @@ builtin/gc.c: static int platform_update_schedule(int run_maintenance, int fd)
      +	      "</Exec>\n"
      +	      "</Actions>\n"
      +	      "</Task>\n";
     -+	fprintf(xmlfp, xml, exec_path, exec_path, frequency);
     -+	fclose(xmlfp);
     -+
     -+	schtasks = getenv("GIT_TEST_CRONTAB");
     -+	if (!schtasks)
     -+		schtasks = "schtasks";
     -+	strvec_split(&child.args, schtasks);
     ++	fprintf(tfile->fp, xml, exec_path, exec_path, frequency);
     ++	strvec_split(&child.args, cmd);
      +	strvec_pushl(&child.args, "/create", "/tn", name, "/f", "/xml", xmlpath, NULL);
     ++	close_tempfile_gently(tfile);
      +
      +	child.no_stdout = 1;
      +	child.no_stderr = 1;
     @@ builtin/gc.c: static int platform_update_schedule(int run_maintenance, int fd)
      +		die(_("failed to start schtasks"));
      +	result = finish_command(&child);
      +
     -+	unlink(xmlpath);
     -+	rmdir(tempDir);
     ++	delete_tempfile(&tfile);
      +	free(xmlpath);
      +	free(name);
      +	return result;
      +}
      +
     -+static int add_scheduled_tasks(void)
     ++static int schtasks_schedule_tasks(const char *cmd)
      +{
      +	const char *exec_path = git_exec_path();
      +
     -+	return schedule_task(exec_path, SCHEDULE_HOURLY) ||
     -+		schedule_task(exec_path, SCHEDULE_DAILY) ||
     -+		schedule_task(exec_path, SCHEDULE_WEEKLY);
     ++	return schtasks_schedule_task(exec_path, SCHEDULE_HOURLY, cmd) ||
     ++		schtasks_schedule_task(exec_path, SCHEDULE_DAILY, cmd) ||
     ++		schtasks_schedule_task(exec_path, SCHEDULE_WEEKLY, cmd);
      +}
      +
     -+static int platform_update_schedule(int run_maintenance, int fd)
     ++static int schtasks_update_schedule(int run_maintenance, int fd, const char *cmd)
      +{
      +	if (run_maintenance)
     -+		return add_scheduled_tasks();
     ++		return schtasks_schedule_tasks(cmd);
      +	else
     -+		return remove_scheduled_tasks();
     ++		return schtasks_remove_tasks(cmd);
      +}
      +
     - #else
       #define BEGIN_LINE "# BEGIN GIT MAINTENANCE SCHEDULE"
       #define END_LINE "# END GIT MAINTENANCE SCHEDULE"
     -
     - ## t/t7900-maintenance.sh ##
     -@@ t/t7900-maintenance.sh: test_expect_success 'register and unregister' '
     - 	test_cmp before actual
     - '
     - 
     --test_expect_success !MACOS_MAINTENANCE 'start from empty cron table' '
     -+test_expect_success !MACOS_MAINTENANCE,!MINGW 'start from empty cron table' '
     - 	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
     - 
     - 	# start registers the repo
     -@@ t/t7900-maintenance.sh: test_expect_success !MACOS_MAINTENANCE 'start from empty cron table' '
     - 	grep "for-each-repo --config=maintenance.repo maintenance run --schedule=weekly" cron.txt
     - '
       
     --test_expect_success !MACOS_MAINTENANCE 'stop from existing schedule' '
     -+test_expect_success !MACOS_MAINTENANCE,!MINGW 'stop from existing schedule' '
     - 	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop &&
     +@@ builtin/gc.c: static int crontab_update_schedule(int run_maintenance, int fd, const char *cmd)
       
     - 	# stop does not unregister the repo
     -@@ t/t7900-maintenance.sh: test_expect_success !MACOS_MAINTENANCE 'stop from existing schedule' '
     - 	test_must_be_empty cron.txt
     - '
     + #if defined(__APPLE__)
     + static const char platform_scheduler[] = "launchctl";
     ++#elif defined(GIT_WINDOWS_NATIVE)
     ++static const char platform_scheduler[] = "schtasks";
     + #else
     + static const char platform_scheduler[] = "crontab";
     + #endif
     +@@ builtin/gc.c: static int update_background_schedule(int enable)
       
     --test_expect_success !MACOS_MAINTENANCE 'start preserves existing schedule' '
     -+test_expect_success !MACOS_MAINTENANCE,!MINGW 'start preserves existing schedule' '
     - 	echo "Important information!" >cron.txt &&
     - 	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
     - 	grep "Important information!" cron.txt
     -@@ t/t7900-maintenance.sh: test_expect_success MACOS_MAINTENANCE 'start and stop macOS maintenance' '
     + 	if (!strcmp(scheduler, "launchctl"))
     + 		result = launchctl_update_schedule(enable, lk.tempfile->fd, cmd);
     ++	else if (!strcmp(scheduler, "schtasks"))
     ++		result = schtasks_update_schedule(enable, lk.tempfile->fd, cmd);
     + 	else if (!strcmp(scheduler, "crontab"))
     + 		result = crontab_update_schedule(enable, lk.tempfile->fd, cmd);
     + 	else
     +
     + ## t/t7900-maintenance.sh ##
     +@@ t/t7900-maintenance.sh: test_expect_success !MINGW 'start and stop macOS maintenance' '
       	test_line_count = 0 actual
       '
       
     -+test_expect_success MINGW 'start and stop Windows maintenance' '
     ++test_expect_success 'start and stop Windows maintenance' '
      +	write_script print-args <<-\EOF &&
      +	echo $* >>args
     ++	while test $# -gt 0
     ++	do
     ++		case "$1" in
     ++		/xml) shift; xmlfile=$1; break ;;
     ++		*) shift ;;
     ++		esac
     ++	done
     ++	test -z "$xmlfile" || cp "$xmlfile" .
      +	EOF
      +
      +	rm -f args &&
     -+	GIT_TEST_CRONTAB="/bin/sh print-args" git maintenance start &&
     ++	GIT_TEST_MAINT_SCHEDULER="schtasks:./print-args" git maintenance start &&
      +
      +	# start registers the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
      +
     -+	printf "/create /tn Git Maintenance (%s) /f /xml .git/objects/temp/schedule-%s.xml\n" \
     ++	printf "/create /tn Git Maintenance (%s) /f /xml .git/objects/schedule-%s.xml\n" \
      +		hourly hourly daily daily weekly weekly >expect &&
      +	test_cmp expect args &&
      +
     ++	for frequency in hourly daily weekly
     ++	do
     ++		test_xmllint "schedule-$frequency.xml"
     ++	done &&
     ++
      +	rm -f args &&
     -+	GIT_TEST_CRONTAB="/bin/sh print-args" git maintenance stop &&
     ++	GIT_TEST_MAINT_SCHEDULER="schtasks:./print-args" git maintenance stop &&
      +
      +	# stop does not unregister the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&

-- 
gitgitgadget

^ permalink raw reply related	[relevance 2%]

* [PATCH v3 0/4] Maintenance IV: Platform-specific background maintenance
  @ 2020-11-13 14:00  3% ` Derrick Stolee via GitGitGadget
  2020-11-17 21:13  2%   ` [PATCH v4 " Derrick Stolee via GitGitGadget
  0 siblings, 1 reply; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-11-13 14:00 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Derrick Stolee, Derrick Stolee

This is based on ds/maintenance-part-3.

After sitting with the background maintenance as it has been cooking, I
wanted to come back around and implement the background maintenance for
Windows. However, I noticed that there were some things bothering me with
background maintenance on my macOS machine. These are detailed in PATCH 3,
but the tl;dr is that 'cron' is not recommended by Apple and instead
'launchd' satisfies our needs.

This series implements the background scheduling so git maintenance
(start|stop) works on those platforms. I've been operating with these
schedules for a while now without the problems described in the patches.

There is a particularly annoying case about console windows popping up on
Windows, but PATCH 4 describes a plan to get around that.

Updates in V3
=============

 * This actually includes the feedback responses I had intended for v2.
   Sorry about that!
   
   
 * One major change is the use of a 'struct child_process' instead of just
   run_command_v_opt() so we can suppress error messages from the schedule
   helpers. We will rely on exit code and present our own error messages, as
   necessary.
   
   
 * Some doc and test fixes.
   
   

Updates in V2
=============

 * This is a faster turnaround for a v2 than I would normally like, but Eric
   inspired extra documentation about how to customize background schedules.
   
   
 * New extensions to git-maintenance.txt include guidelines for inspecting
   what git maintenance start does and how to customize beyond that. This
   includes a new PATCH 2 that includes documentation for 'cron' on
   non-macOS non-Windows systems.
   
   
 * Several improvements, especially in the tests, are included.
   
   
 * While testing manually, I noticed that somehow I had incorrectly had an
   opening <dict> tag instead of a closing </dict> tag in the hourly format
   on macOS. I found that the xmllint tool can verify the XML format of a
   file, which catches the bug. This seems like a good approach since the
   test is macOS-only. Does anyone have concerns about adding this
   dependency?
   
   

Thanks, -Stolee

cc: jrnieder@gmail.com [jrnieder@gmail.com], jonathantanmy@google.com
[jonathantanmy@google.com], sluongng@gmail.com [sluongng@gmail.com]cc:
Derrick Stolee stolee@gmail.com [stolee@gmail.com]cc: Đoàn Trần Công Danh 
congdanhqx@gmail.com [congdanhqx@gmail.com]cc: Martin Ågren 
martin.agren@gmail.com [martin.agren@gmail.com]cc: Eric Sunshine 
sunshine@sunshineco.com [sunshine@sunshineco.com]cc: Derrick Stolee 
stolee@gmail.com [stolee@gmail.com]

Derrick Stolee (4):
  maintenance: extract platform-specific scheduling
  maintenance: include 'cron' details in docs
  maintenance: use launchctl on macOS
  maintenance: use Windows scheduled tasks

 Documentation/git-maintenance.txt | 116 +++++++++
 builtin/gc.c                      | 417 ++++++++++++++++++++++++++++--
 t/t7900-maintenance.sh            |  75 +++++-
 t/test-lib.sh                     |   4 +
 4 files changed, 592 insertions(+), 20 deletions(-)


base-commit: 0016b618182f642771dc589cf0090289f9fe1b4f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-776%2Fderrickstolee%2Fmaintenance%2FmacOS-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-776/derrickstolee/maintenance/macOS-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/776

Range-diff vs v2:

 1:  d35f1aa162 = 1:  d35f1aa162 maintenance: extract platform-specific scheduling
 2:  709a173720 ! 2:  0dfe53092e maintenance: include 'cron' details in docs
     @@ Commit message
          baseline can provide a way forward for users who have never worked with
          cron schedules.
      
     +    Helped-by: Eric Sunshine <sunshine@sunshineco.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/git-maintenance.txt ##
     @@ Documentation/git-maintenance.txt: Further, the `git gc` command should not be c
      +---------------------------------------
      +
      +The standard mechanism for scheduling background tasks on POSIX systems
     -+is `cron`. This tool executes commands based on a given schedule. The
     ++is cron(8). This tool executes commands based on a given schedule. The
      +current list of user-scheduled tasks can be found by running `crontab -l`.
      +The schedule written by `git maintenance start` is similar to this:
      +
     @@ Documentation/git-maintenance.txt: Further, the `git gc` command should not be c
      +Any modifications within this region will be completely deleted by
      +`git maintenance stop` or overwritten by `git maintenance start`.
      +
     -+The `<path>` string is loaded to specifically use the location for the
     -+`git` executable used in the `git maintenance start` command. This allows
     -+for multiple versions to be compatible. However, if the same user runs
     -+`git maintenance start` with multiple Git executables, then only the
     -+latest executable will be used.
     ++The `crontab` entry specifies the full path of the `git` executable to
     ++ensure that the executed `git` command is the same one with which
     ++`git maintenance start` was issued independent of `PATH`. If the same user
     ++runs `git maintenance start` with multiple Git executables, then only the
     ++latest executable is used.
      +
      +These commands use `git for-each-repo --config=maintenance.repo` to run
      +`git maintenance run --schedule=<frequency>` on each repository listed in
      +the multi-valued `maintenance.repo` config option. These are typically
     -+loaded from the user-specific global config located at `~/.gitconfig`.
     -+The `git maintenance` process then determines which maintenance tasks
     -+are configured to run on each repository with each `<frequency>` using
     -+the `maintenance.<task>.schedule` config options. These values are loaded
     -+from the global or repository config values.
     ++loaded from the user-specific global config. The `git maintenance` process
     ++then determines which maintenance tasks are configured to run on each
     ++repository with each `<frequency>` using the `maintenance.<task>.schedule`
     ++config options. These values are loaded from the global or repository
     ++config values.
      +
      +If the config values are insufficient to achieve your desired background
      +maintenance schedule, then you can create your own schedule. If you run
      +`crontab -e`, then an editor will load with your user-specific `cron`
      +schedule. In that editor, you can add your own schedule lines. You could
      +start by adapting the default schedule listed earlier, or you could read
     -+https://man7.org/linux/man-pages/man5/crontab.5.html[the `crontab` documentation]
     -+for advanced scheduling techniques. Please do use the full path and
     -+`--exec-path` techniques from the default schedule to ensure you are
     -+executing the correct binaries in your schedule.
     ++the crontab(5) documentation for advanced scheduling techniques. Please
     ++do use the full path and `--exec-path` techniques from the default
     ++schedule to ensure you are executing the correct binaries in your
     ++schedule.
      +
       
       GIT
 3:  0fafd75d10 ! 3:  1629bcfcf8 maintenance: use launchctl on macOS
     @@ Commit message
          of macOS 10.11, which was released in September 2015. Before that
          release the 'launchctl load' subcommand was recommended. The best
          source of information on this transition I have seen is available
     -    at [2].
     +    at [2]. The current design does not preclude a future version that
     +    detects the available fatures of 'launchctl' to use the older
     +    commands. However, it is best to rely on the newest version since
     +    Apple might completely remove the deprecated version on short
     +    notice.
      
          [2] https://babodee.wordpress.com/2016/04/09/launchctl-2-0-syntax/
      
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/git-maintenance.txt ##
     -@@ Documentation/git-maintenance.txt: for advanced scheduling techniques. Please do use the full path and
     - executing the correct binaries in your schedule.
     +@@ Documentation/git-maintenance.txt: schedule to ensure you are executing the correct binaries in your
     + schedule.
       
       
      +BACKGROUND MAINTENANCE ON MACOS SYSTEMS
      +---------------------------------------
      +
      +While macOS technically supports `cron`, using `crontab -e` requires
     -+elevated privileges and the executed process do not have a full user
     ++elevated privileges and the executed process does not have a full user
      +context. Without a full user context, Git and its credential helpers
      +cannot access stored credentials, so some maintenance tasks are not
      +functional.
      +
      +Instead, `git maintenance start` interacts with the `launchctl` tool,
     -+which is the recommended way to
     -+https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/ScheduledJobs.html[schedule timed jobs in macOS].
     -+
     -+Scheduling maintenance through `git maintenance (start|stop)` requires
     -+some `launchctl` features available only in macOS 10.11 or later.
     ++which is the recommended way to schedule timed jobs in macOS. Scheduling
     ++maintenance through `git maintenance (start|stop)` requires some
     ++`launchctl` features available only in macOS 10.11 or later.
      +
      +Your user-specific scheduled tasks are stored as XML-formatted `.plist`
      +files in `~/Library/LaunchAgents/`. You can see the currently-registered
      +tasks using the following command:
      +
      +-----------------------------------------------------------------------
     -+$ ls ~/Library/LaunchAgents/ | grep org.git-scm.git
     ++$ ls ~/Library/LaunchAgents/org.git-scm.git*
      +org.git-scm.git.daily.plist
      +org.git-scm.git.hourly.plist
      +org.git-scm.git.weekly.plist
     @@ Documentation/git-maintenance.txt: for advanced scheduling techniques. Please do
      +and delete the `.plist` files.
      +
      +To create more advanced customizations to your background tasks, see
     -+https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingLaunchdJobs.html#//apple_ref/doc/uid/TP40001762-104142[the `launchctl` documentation]
     -+for more information.
     ++launchctl.plist(5) for more information.
      +
      +
       GIT
     @@ builtin/gc.c: static int maintenance_unregister(void)
      +	return strbuf_detach(&output, NULL);
      +}
      +
     -+static int bootout(const char *filename)
     ++static int boot_plist(int enable, const char *filename)
      +{
      +	int result;
     -+	struct strvec args = STRVEC_INIT;
     ++	struct child_process child = CHILD_PROCESS_INIT;
      +	char *uid = get_uid();
      +	const char *launchctl = getenv("GIT_TEST_CRONTAB");
      +	if (!launchctl)
      +		launchctl = "/bin/launchctl";
      +
     -+	strvec_split(&args, launchctl);
     -+	strvec_push(&args, "bootout");
     -+	strvec_pushf(&args, "gui/%s", uid);
     -+	strvec_push(&args, filename);
     ++	strvec_split(&child.args, launchctl);
      +
     -+	result = run_command_v_opt(args.v, 0);
     ++	if (enable)
     ++		strvec_push(&child.args, "bootstrap");
     ++	else
     ++		strvec_push(&child.args, "bootout");
     ++	strvec_pushf(&child.args, "gui/%s", uid);
     ++	strvec_push(&child.args, filename);
      +
     -+	strvec_clear(&args);
     -+	free(uid);
     -+	return result;
     -+}
     ++	child.no_stderr = 1;
     ++	child.no_stdout = 1;
      +
     -+static int bootstrap(const char *filename)
     -+{
     -+	int result;
     -+	struct strvec args = STRVEC_INIT;
     -+	char *uid = get_uid();
     -+	const char *launchctl = getenv("GIT_TEST_CRONTAB");
     -+	if (!launchctl)
     -+		launchctl = "/bin/launchctl";
     ++	if (start_command(&child))
     ++		die(_("failed to start launchctl"));
      +
     -+	strvec_split(&args, launchctl);
     -+	strvec_push(&args, "bootstrap");
     -+	strvec_pushf(&args, "gui/%s", uid);
     -+	strvec_push(&args, filename);
     ++	result = finish_command(&child);
      +
     -+	result = run_command_v_opt(args.v, 0);
     -+
     -+	strvec_clear(&args);
      +	free(uid);
      +	return result;
      +}
     @@ builtin/gc.c: static int maintenance_unregister(void)
      +	const char *frequency = get_frequency(schedule);
      +	char *name = get_service_name(frequency);
      +	char *filename = get_service_filename(name);
     -+	int result = bootout(filename);
     ++	int result = boot_plist(0, filename);
     ++	unlink(filename);
      +	free(filename);
      +	free(name);
      +	return result;
     @@ builtin/gc.c: static int maintenance_unregister(void)
      +
      +	if (safe_create_leading_directories(filename))
      +		die(_("failed to create directories for '%s'"), filename);
     -+	plist = fopen(filename, "w");
     -+
     -+	if (!plist)
     -+		die(_("failed to open '%s'"), filename);
     ++	plist = xfopen(filename, "w");
      +
      +	preamble = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
      +		   "<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">\n"
     @@ builtin/gc.c: static int maintenance_unregister(void)
      +	fprintf(plist, "</array>\n</dict>\n</plist>\n");
      +
      +	/* bootout might fail if not already running, so ignore */
     -+	bootout(filename);
     -+	if (bootstrap(filename))
     ++	boot_plist(0, filename);
     ++	if (boot_plist(1, filename))
      +		die(_("failed to bootstrap service %s"), filename);
      +
      +	fclose(plist);
     @@ t/t7900-maintenance.sh: test_expect_success 'stop from existing schedule' '
       '
       
      +test_expect_success MACOS_MAINTENANCE 'start and stop macOS maintenance' '
     -+	echo "#!/bin/sh\necho \$@ >>args" >print-args &&
     -+	chmod a+x print-args &&
     ++	write_script print-args "#!/bin/sh\necho \$* >>args" &&
      +
      +	rm -f args &&
      +	GIT_TEST_CRONTAB="./print-args" git maintenance start &&
     @@ t/t7900-maintenance.sh: test_expect_success 'stop from existing schedule' '
      +	for frequency in hourly daily weekly
      +	do
      +		PLIST="$HOME/Library/LaunchAgents/org.git-scm.git.$frequency.plist" &&
     -+		xmllint "$PLIST" >/dev/null &&
     ++		xmllint --noout "$PLIST" &&
      +		grep schedule=$frequency "$PLIST" &&
      +		echo "bootout gui/$UID $PLIST" >>expect &&
      +		echo "bootstrap gui/$UID $PLIST" >>expect || return 1
     @@ t/t7900-maintenance.sh: test_expect_success 'stop from existing schedule' '
      +	test_cmp expect args &&
      +
      +	rm -f args &&
     -+	GIT_TEST_CRONTAB="./print-args"  git maintenance stop &&
     ++	GIT_TEST_CRONTAB="./print-args" git maintenance stop &&
      +
      +	# stop does not unregister the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
      +
     -+	# stop does not remove plist files, but boots them out
     -+	rm expect &&
     -+	for frequency in hourly daily weekly
     -+	do
     -+		PLIST="$HOME/Library/LaunchAgents/org.git-scm.git.$frequency.plist" &&
     -+		grep schedule=$frequency "$PLIST" &&
     -+		echo "bootout gui/$UID $PLIST" >>expect || return 1
     -+	done &&
     -+	test_cmp expect args
     ++	printf "bootout gui/$UID $HOME/Library/LaunchAgents/org.git-scm.git.%s.plist\n" \
     ++		hourly daily weekly >expect &&
     ++	test_cmp expect args &&
     ++	ls "$HOME/Library/LaunchAgents" >actual &&
     ++	test_line_count = 0 actual
      +'
      +
       test_expect_success 'register preserves existing strategy' '
 4:  84eb44de31 ! 4:  ed7a61978f maintenance: use Windows scheduled tasks
     @@ Commit message
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/git-maintenance.txt ##
     -@@ Documentation/git-maintenance.txt: https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSy
     - for more information.
     +@@ Documentation/git-maintenance.txt: To create more advanced customizations to your background tasks, see
     + launchctl.plist(5) for more information.
       
       
      +BACKGROUND MAINTENANCE ON WINDOWS SYSTEMS
     @@ builtin/gc.c: static int platform_update_schedule(int run_maintenance, int fd)
      +static int schedule_task(const char *exec_path, enum schedule_priority schedule)
      +{
      +	int result;
     -+	struct strvec args = STRVEC_INIT;
     ++	struct child_process child = CHILD_PROCESS_INIT;
      +	const char *xml, *schtasks;
     -+	char *xmlpath;
     ++	char *xmlpath, *tempDir;
      +	FILE *xmlfp;
      +	const char *frequency = get_frequency(schedule);
      +	char *name = get_task_name(frequency);
      +
     -+	xmlpath =  xstrfmt("%s/schedule-%s.xml",
     -+			   the_repository->objects->odb->path,
     -+			   frequency);
     -+	xmlfp = fopen(xmlpath, "w");
     -+	if (!xmlfp)
     -+		die(_("failed to open '%s'"), xmlpath);
     ++	tempDir = xstrfmt("%s/temp", the_repository->objects->odb->path);
     ++	xmlpath =  xstrfmt("%s/schedule-%s.xml", tempDir, frequency);
     ++	safe_create_leading_directories(xmlpath);
     ++	xmlfp = xfopen(xmlpath, "w");
      +
      +	xml = "<?xml version=\"1.0\" encoding=\"UTF-16\"?>\n"
      +	      "<Task version=\"1.4\" xmlns=\"http://schemas.microsoft.com/windows/2004/02/mit/task\">\n"
     @@ builtin/gc.c: static int platform_update_schedule(int run_maintenance, int fd)
      +	schtasks = getenv("GIT_TEST_CRONTAB");
      +	if (!schtasks)
      +		schtasks = "schtasks";
     -+	strvec_split(&args, schtasks);
     -+	strvec_pushl(&args, "/create", "/tn", name, "/f", "/xml", xmlpath, NULL);
     ++	strvec_split(&child.args, schtasks);
     ++	strvec_pushl(&child.args, "/create", "/tn", name, "/f", "/xml", xmlpath, NULL);
      +
     -+	result = run_command_v_opt(args.v, 0);
     ++	child.no_stdout = 1;
     ++	child.no_stderr = 1;
     ++
     ++	if (start_command(&child))
     ++		die(_("failed to start schtasks"));
     ++	result = finish_command(&child);
      +
     -+	strvec_clear(&args);
      +	unlink(xmlpath);
     ++	rmdir(tempDir);
      +	free(xmlpath);
      +	free(name);
      +	return result;
     @@ t/t7900-maintenance.sh: test_expect_success !MACOS_MAINTENANCE 'stop from existi
       	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance start &&
       	grep "Important information!" cron.txt
      @@ t/t7900-maintenance.sh: test_expect_success MACOS_MAINTENANCE 'start and stop macOS maintenance' '
     - 	test_cmp expect args
     + 	test_line_count = 0 actual
       '
       
      +test_expect_success MINGW 'start and stop Windows maintenance' '
     @@ t/t7900-maintenance.sh: test_expect_success MACOS_MAINTENANCE 'start and stop ma
      +	# start registers the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
      +
     -+	for frequency in hourly daily weekly
     -+	do
     -+		printf "/create /tn Git Maintenance (%s) /f /xml .git/objects/schedule-%s.xml\n" \
     -+			$frequency $frequency
     -+	done >expect &&
     ++	printf "/create /tn Git Maintenance (%s) /f /xml .git/objects/temp/schedule-%s.xml\n" \
     ++		hourly hourly daily daily weekly weekly >expect &&
      +	test_cmp expect args &&
      +
      +	rm -f args &&

-- 
gitgitgadget

^ permalink raw reply	[relevance 3%]

* [PATCH v4 7/8] maintenance: auto-size incremental-repack batch
  @ 2020-09-25 12:33  3%       ` Derrick Stolee via GitGitGadget
  0 siblings, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-09-25 12:33 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Jonathan Tan,
	Derrick Stolee, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.

Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.

The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.

Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.

Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.

We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.

Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.

Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c           | 43 +++++++++++++++++++++++++++++++++++++++++-
 t/t7900-maintenance.sh | 36 +++++++++++++++++++++++++++++++++--
 2 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 5f877b097a..8d22361fa9 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1035,6 +1035,46 @@ static int multi_pack_index_expire(struct maintenance_run_opts *opts)
 	return 0;
 }
 
+#define TWO_GIGABYTES (INT32_MAX)
+
+static off_t get_auto_pack_size(void)
+{
+	/*
+	 * The "auto" value is special: we optimize for
+	 * one large pack-file (i.e. from a clone) and
+	 * expect the rest to be small and they can be
+	 * repacked quickly.
+	 *
+	 * The strategy we select here is to select a
+	 * size that is one more than the second largest
+	 * pack-file. This ensures that we will repack
+	 * at least two packs if there are three or more
+	 * packs.
+	 */
+	off_t max_size = 0;
+	off_t second_largest_size = 0;
+	off_t result_size;
+	struct packed_git *p;
+	struct repository *r = the_repository;
+
+	reprepare_packed_git(r);
+	for (p = get_all_packs(r); p; p = p->next) {
+		if (p->pack_size > max_size) {
+			second_largest_size = max_size;
+			max_size = p->pack_size;
+		} else if (p->pack_size > second_largest_size)
+			second_largest_size = p->pack_size;
+	}
+
+	result_size = second_largest_size + 1;
+
+	/* But limit ourselves to a batch size of 2g */
+	if (result_size > TWO_GIGABYTES)
+		result_size = TWO_GIGABYTES;
+
+	return result_size;
+}
+
 static int multi_pack_index_repack(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -1045,7 +1085,8 @@ static int multi_pack_index_repack(struct maintenance_run_opts *opts)
 	if (opts->quiet)
 		strvec_push(&child.args, "--no-progress");
 
-	strvec_push(&child.args, "--batch-size=0");
+	strvec_pushf(&child.args, "--batch-size=%"PRIuMAX,
+				  (uintmax_t)get_auto_pack_size());
 
 	close_object_store(the_repository->objects);
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index a2db2291b0..9e6ea23f35 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -181,10 +181,42 @@ test_expect_success 'incremental-repack task' '
 	test_line_count = 4 packs-between &&
 
 	# the job deletes the two old packs, and does not write
-	# a new one because only one pack remains.
+	# a new one because the batch size is not high enough to
+	# pack the largest pack-file.
 	git maintenance run --task=incremental-repack &&
 	ls .git/objects/pack/*.pack >packs-after &&
-	test_line_count = 1 packs-after
+	test_line_count = 2 packs-after
+'
+
+test_expect_success EXPENSIVE 'incremental-repack 2g limit' '
+	for i in $(test_seq 1 5)
+	do
+		test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
+		return 1
+	done &&
+	git add big &&
+	git commit -m "Add big file (1)" &&
+
+	# ensure any possible loose objects are in a pack-file
+	git maintenance run --task=loose-objects &&
+
+	rm big &&
+	for i in $(test_seq 6 10)
+	do
+		test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
+		return 1
+	done &&
+	git add big &&
+	git commit -m "Add big file (2)" &&
+
+	# ensure any possible loose objects are in a pack-file
+	git maintenance run --task=loose-objects &&
+
+	# Now run the incremental-repack task and check the batch-size
+	GIT_TRACE2_EVENT="$(pwd)/run-2g.txt" git maintenance run \
+		--task=incremental-repack 2>/dev/null &&
+	test_subcommand git multi-pack-index repack \
+		 --no-progress --batch-size=2147483647 <run-2g.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[relevance 3%]

* [PATCH v5 00/11] Maintenance I: Command, gc and commit-graph tasks
  2020-09-04 13:09  4% ` [PATCH v4 " Derrick Stolee via GitGitGadget
@ 2020-09-17 18:11  4%   ` Derrick Stolee via GitGitGadget
  0 siblings, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-09-17 18:11 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee

This series is based on jc/no-update-fetch-head.

This patch series contains 11patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help focus
the conversation, I'm splitting out the portions that create and test the
'maintenance' builtin from the additional tasks (prefetch, loose-objects,
incremental-repack) that can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/

As mentioned before, git gc already plays the role of maintaining Git
repositories. It has accumulated several smaller pieces in its long history,
including:

 1. Repacking all reachable objects into one pack-file (and deleting
    unreachable objects).
 2. Packing refs.
 3. Expiring reflogs.
 4. Clearing rerere logs.
 5. Updating the commit-graph file.
 6. Pruning worktrees.

While expiring reflogs, clearing rererelogs, and deleting unreachable
objects are suitable under the guise of "garbage collection", packing refs
and updating the commit-graph file are not as obviously fitting. Further,
these operations are "all or nothing" in that they rewrite almost all
repository data, which does not perform well at extremely large scales.
These operations can also be disruptive to foreground Git commands when git
gc --auto triggers during routine use.

This series does not intend to change what git gc does, but instead create
new choices for automatic maintenance activities, of which git gc remains
the only one enabled by default.

The new maintenance tasks are:

 * 'commit-graph' : write and verify a single layer of an incremental
   commit-graph.
 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

The only included tasks are the 'gc' and 'commit-graph' tasks. The rest will
follow in a follow-up series. Including the 'commit-graph' task here allows
us to build and test features like config settings and the --task= 
command-line argument.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". There are
additional config options to allow customizing the conditions for which the
tasks run during the '--auto' option.

 Because 'gc' is implemented as a maintenance task, the most dramatic change
of this series is to convert the 'git gc --auto' calls into 'git maintenance
run --auto' calls at the end of some Git commands. By default, the only
change is that 'git gc --auto' will be run below an additional 'git
maintenance' process.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves.

Update in v4
============

A segfault when running just "git maintenance" is fixed.

Updates since v3
================

 * Two commit-message typos are fixed.

Thanks for all of the review!

Updates since v2
================

 * Based on jc/no-update-fetch-head instead of jk/strvec.
   
   
 * I realized that the other maintenance subcommands should not accept the
   options for the 'run' subcommand, so I reorganized the option parsing
   into that subcommand. This makes the range-diff noisier than it would
   have been otherwise.
   
   
 * While updating the parsing, I also updated the usage strings.
   
   
 * The "verify, then delete and rewrite on failure" logic is removed. I'll
   consider bringing this back with a way to test the behavior in a later
   patch series.
   
   
 * Other feedback from Jonathan Tan is applied.
   
   

Updates since v1 (of this series)
=================================

 * Documentation fixes.
   
   
 * The builtin code had some slight tweaks in PATCH 1.
   
   

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Cc: sandals@crustytoothpaste.net [sandals@crustytoothpaste.net], 
steadmon@google.com [steadmon@google.com], jrnieder@gmail.com
[jrnieder@gmail.com], peff@peff.net [peff@peff.net], congdanhqx@gmail.com
[congdanhqx@gmail.com], phillip.wood123@gmail.com
[phillip.wood123@gmail.com], emilyshaffer@google.com
[emilyshaffer@google.com], sluongng@gmail.com [sluongng@gmail.com], 
jonathantanmy@google.com [jonathantanmy@google.com]

Derrick Stolee (11):
  maintenance: create basic maintenance runner
  maintenance: add --quiet option
  maintenance: replace run_auto_gc()
  maintenance: initialize task array
  maintenance: add commit-graph task
  maintenance: add --task option
  maintenance: take a lock on the objects directory
  maintenance: create maintenance.<task>.enabled config
  maintenance: use pointers to check --auto
  maintenance: add auto condition for commit-graph task
  maintenance: add trace2 regions for task execution

 .gitignore                           |   1 +
 Documentation/config.txt             |   2 +
 Documentation/config/maintenance.txt |  16 ++
 Documentation/fetch-options.txt      |   6 +-
 Documentation/git-clone.txt          |   6 +-
 Documentation/git-maintenance.txt    |  79 +++++++
 builtin.h                            |   1 +
 builtin/am.c                         |   2 +-
 builtin/commit.c                     |   2 +-
 builtin/fetch.c                      |   6 +-
 builtin/gc.c                         | 337 +++++++++++++++++++++++++++
 builtin/merge.c                      |   2 +-
 builtin/rebase.c                     |   4 +-
 command-list.txt                     |   1 +
 commit-graph.c                       |   8 +-
 commit-graph.h                       |   1 +
 git.c                                |   1 +
 object.h                             |   1 +
 run-command.c                        |  16 +-
 run-command.h                        |   2 +-
 t/t5510-fetch.sh                     |   2 +-
 t/t5514-fetch-multiple.sh            |   2 +-
 t/t7900-maintenance.sh               |  65 ++++++
 t/test-lib-functions.sh              |  33 +++
 24 files changed, 568 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh


base-commit: 887952b8c680626f4721cb5fa57704478801aca4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-695%2Fderrickstolee%2Fmaintenance%2Fbuiltin-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-695/derrickstolee/maintenance/builtin-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/695

Range-diff vs v4:

  1:  aa961af387 !  1:  00a0d63508 maintenance: create basic maintenance runner
     @@ Commit message
      
          Helped-by: Jonathan Nieder <jrnieder@gmail.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## .gitignore ##
      @@
     @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
      +
      +int cmd_maintenance(int argc, const char **argv, const char *prefix)
      +{
     -+	if (argc == 2 && !strcmp(argv[1], "-h"))
     ++	if (argc < 2 ||
     ++	    (argc == 2 && !strcmp(argv[1], "-h")))
      +		usage(builtin_maintenance_usage);
      +
      +	if (!strcmp(argv[1], "run"))
     @@ t/t7900-maintenance.sh (new)
      +	test_expect_code 129 git maintenance -h 2>err &&
      +	test_i18ngrep "usage: git maintenance run" err &&
      +	test_expect_code 128 git maintenance barf 2>err &&
     -+	test_i18ngrep "invalid subcommand: barf" err
     ++	test_i18ngrep "invalid subcommand: barf" err &&
     ++	test_expect_code 129 git maintenance 2>err &&
     ++	test_i18ngrep "usage: git maintenance" err
      +'
      +
      +test_expect_success 'run [--auto]' '
  2:  5386d8a628 !  2:  52eb937f49 maintenance: add --quiet option
     @@ Commit message
          Pipe the option to the 'git gc' child process.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/git-maintenance.txt ##
      @@ Documentation/git-maintenance.txt: OPTIONS
     @@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'help text' '
     - 	test_i18ngrep "invalid subcommand: barf" err
     + 	test_i18ngrep "usage: git maintenance" err
       '
       
      -test_expect_success 'run [--auto]' '
  3:  e28b332df4 !  3:  3cbdeeafb5 maintenance: replace run_auto_gc()
     @@ Commit message
          documentation to include these options at the same time.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/fetch-options.txt ##
      @@ Documentation/fetch-options.txt: ifndef::git-pull[]
  4:  82326c1c38 !  4:  1d2f2731bd maintenance: initialize task array
     @@ Commit message
          Helped-by: Taylor Blau <me@ttaylorr.com>
          Helped-by: Junio C Hamano <gitster@pobox.com>
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## builtin/gc.c ##
      @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
  5:  06984a42bf !  5:  8204ebbf83 maintenance: add commit-graph task
     @@ Commit message
          argument when writing the commit-graph.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/git-maintenance.txt ##
      @@ Documentation/git-maintenance.txt: run::
  6:  69298aee24 !  6:  91b8555c9e maintenance: add --task option
     @@ Commit message
          member should be ignored if --task=<task> appears.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/git-maintenance.txt ##
      @@ Documentation/git-maintenance.txt: SUBCOMMANDS
  7:  c57998a1c8 !  7:  1a0a3eebb8 maintenance: take a lock on the objects directory
     @@ Commit message
          loop between 'git fetch' and 'git maintenance run --auto'.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## builtin/gc.c ##
      @@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
  8:  dc2a092366 !  8:  713207b4a1 maintenance: create maintenance.<task>.enabled config
     @@ Commit message
          tasks (or turn off the 'gc' task).
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/config.txt ##
      @@ Documentation/config.txt: include::config/mailinfo.txt[]
  9:  f5f1e85b03 !  9:  d424cda058 maintenance: use pointers to check --auto
     @@ Commit message
          gc.autoDetach as a maintenance.autoDetach config option.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## builtin/gc.c ##
      @@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_run_opts *opts)
 10:  8f84b03c46 ! 10:  7a2a4e1e52 maintenance: add auto condition for commit-graph task
     @@ Commit message
          run every time.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## Documentation/config/maintenance.txt ##
      @@ Documentation/config/maintenance.txt: maintenance.<task>.enabled::
 11:  2d6414d89b ! 11:  20a74abd96 maintenance: add trace2 regions for task execution
     @@ Commit message
          maintenance: add trace2 regions for task execution
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
     +    Signed-off-by: Junio C Hamano <gitster@pobox.com>
      
       ## builtin/gc.c ##
      @@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)

-- 
gitgitgadget

^ permalink raw reply	[relevance 4%]

* Re: Caching Git Pull
       [not found]     ` <70DB3786-CB8E-4D82-9774-439AB2A79A8D@gmail.com>
@ 2020-09-14  8:39  6%   ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-09-14  8:39 UTC (permalink / raw)
  To: Benson Muite; +Cc: git, Jonathan Tan

Note: resend with plain-text format

Hi Benson,

> On Sep 14, 2020, at 07:27, Benson Muite <benson_muite@emailplus.org> wrote:
> 
> Hi,
> 
> Is there some way I can add functionality for caching git pull to allow continuation of a partially complete pull from a git repository to a local machine. 

I believe there has been some recent works toward this direction with Packfile Uri feature [1] [2] where a packfile could be uploaded to
CDN and then advertised by the hosting remote so that clients who enabled the feature can download the big part of the clone
via CDN instead.

[1]: https://github.com/git/git/blob/master/Documentation/technical/packfile-uri.txt
[2]: https://public-inbox.org/git/cover.1591821067.git.jonathantanmy@google.com/

However I don't think any major Git hosting provider (Github, Bitbucket, Gitlab etc...) have started using this feature.

> As an example the command wget -c allows continuation of a partially complete download. This would be very helpful for large commits which fail with:
> 
> fatal: the remote end hung up unexpectedly
> fatal: early EOF
> 
> Regards,
> Benson

I will cc Jonathan Tan(author) to discuss more regarding path toward resumable git clone.

Cheers,
Son Luong.

^ permalink raw reply	[relevance 6%]

* [PATCH v4 00/11] Maintenance I: Command, gc and commit-graph tasks
  @ 2020-09-04 13:09  4% ` Derrick Stolee via GitGitGadget
  2020-09-17 18:11  4%   ` [PATCH v5 " Derrick Stolee via GitGitGadget
  0 siblings, 1 reply; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-09-04 13:09 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, Derrick Stolee

This series is based on jc/no-update-fetch-head.

This patch series contains 11patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help focus
the conversation, I'm splitting out the portions that create and test the
'maintenance' builtin from the additional tasks (prefetch, loose-objects,
incremental-repack) that can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/

As mentioned before, git gc already plays the role of maintaining Git
repositories. It has accumulated several smaller pieces in its long history,
including:

 1. Repacking all reachable objects into one pack-file (and deleting
    unreachable objects).
 2. Packing refs.
 3. Expiring reflogs.
 4. Clearing rerere logs.
 5. Updating the commit-graph file.
 6. Pruning worktrees.

While expiring reflogs, clearing rererelogs, and deleting unreachable
objects are suitable under the guise of "garbage collection", packing refs
and updating the commit-graph file are not as obviously fitting. Further,
these operations are "all or nothing" in that they rewrite almost all
repository data, which does not perform well at extremely large scales.
These operations can also be disruptive to foreground Git commands when git
gc --auto triggers during routine use.

This series does not intend to change what git gc does, but instead create
new choices for automatic maintenance activities, of which git gc remains
the only one enabled by default.

The new maintenance tasks are:

 * 'commit-graph' : write and verify a single layer of an incremental
   commit-graph.
 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

The only included tasks are the 'gc' and 'commit-graph' tasks. The rest will
follow in a follow-up series. Including the 'commit-graph' task here allows
us to build and test features like config settings and the --task= 
command-line argument.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". There are
additional config options to allow customizing the conditions for which the
tasks run during the '--auto' option.

 Because 'gc' is implemented as a maintenance task, the most dramatic change
of this series is to convert the 'git gc --auto' calls into 'git maintenance
run --auto' calls at the end of some Git commands. By default, the only
change is that 'git gc --auto' will be run below an additional 'git
maintenance' process.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves.

Updates since v3
================

 * Two commit-message typos are fixed.

Thanks for all of the review!

Updates since v2
================

 * Based on jc/no-update-fetch-head instead of jk/strvec.
   
   
 * I realized that the other maintenance subcommands should not accept the
   options for the 'run' subcommand, so I reorganized the option parsing
   into that subcommand. This makes the range-diff noisier than it would
   have been otherwise.
   
   
 * While updating the parsing, I also updated the usage strings.
   
   
 * The "verify, then delete and rewrite on failure" logic is removed. I'll
   consider bringing this back with a way to test the behavior in a later
   patch series.
   
   
 * Other feedback from Jonathan Tan is applied.
   
   

Updates since v1 (of this series)
=================================

 * Documentation fixes.
   
   
 * The builtin code had some slight tweaks in PATCH 1.
   
   

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Cc: sandals@crustytoothpaste.net [sandals@crustytoothpaste.net], 
steadmon@google.com [steadmon@google.com], jrnieder@gmail.com
[jrnieder@gmail.com], peff@peff.net [peff@peff.net], congdanhqx@gmail.com
[congdanhqx@gmail.com], phillip.wood123@gmail.com
[phillip.wood123@gmail.com], emilyshaffer@google.com
[emilyshaffer@google.com], sluongng@gmail.com [sluongng@gmail.com], 
jonathantanmy@google.com [jonathantanmy@google.com]

Derrick Stolee (11):
  maintenance: create basic maintenance runner
  maintenance: add --quiet option
  maintenance: replace run_auto_gc()
  maintenance: initialize task array
  maintenance: add commit-graph task
  maintenance: add --task option
  maintenance: take a lock on the objects directory
  maintenance: create maintenance.<task>.enabled config
  maintenance: use pointers to check --auto
  maintenance: add auto condition for commit-graph task
  maintenance: add trace2 regions for task execution

 .gitignore                           |   1 +
 Documentation/config.txt             |   2 +
 Documentation/config/maintenance.txt |  16 ++
 Documentation/fetch-options.txt      |   6 +-
 Documentation/git-clone.txt          |   6 +-
 Documentation/git-maintenance.txt    |  79 +++++++
 builtin.h                            |   1 +
 builtin/am.c                         |   2 +-
 builtin/commit.c                     |   2 +-
 builtin/fetch.c                      |   6 +-
 builtin/gc.c                         | 336 +++++++++++++++++++++++++++
 builtin/merge.c                      |   2 +-
 builtin/rebase.c                     |   4 +-
 command-list.txt                     |   1 +
 commit-graph.c                       |   8 +-
 commit-graph.h                       |   1 +
 git.c                                |   1 +
 object.h                             |   1 +
 run-command.c                        |  16 +-
 run-command.h                        |   2 +-
 t/t5510-fetch.sh                     |   2 +-
 t/t5514-fetch-multiple.sh            |   2 +-
 t/t7900-maintenance.sh               |  63 +++++
 t/test-lib-functions.sh              |  33 +++
 24 files changed, 565 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/config/maintenance.txt
 create mode 100644 Documentation/git-maintenance.txt
 create mode 100755 t/t7900-maintenance.sh


base-commit: 887952b8c680626f4721cb5fa57704478801aca4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-695%2Fderrickstolee%2Fmaintenance%2Fbuiltin-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-695/derrickstolee/maintenance/builtin-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/695

Range-diff vs v3:

  1:  aa961af387 =  1:  aa961af387 maintenance: create basic maintenance runner
  2:  5386d8a628 =  2:  5386d8a628 maintenance: add --quiet option
  3:  e28b332df4 =  3:  e28b332df4 maintenance: replace run_auto_gc()
  4:  82326c1c38 =  4:  82326c1c38 maintenance: initialize task array
  5:  06984a42bf =  5:  06984a42bf maintenance: add commit-graph task
  6:  69298aee24 =  6:  69298aee24 maintenance: add --task option
  7:  3d513acdd8 !  7:  c57998a1c8 maintenance: take a lock on the objects directory
     @@ Commit message
          lock is never committed, since it does not represent meaningful data.
          Instead, it is only a placeholder.
      
     -    If the lock file already exists, then fail with a warning. If '--auto'
     -    is specified, then instead no warning is shown and no tasks are attempted.
     -    This will become very important later when we implement the 'prefetch'
     -    task, as this is our stop-gap from creating a recursive process loop
     -    between 'git fetch' and 'git maintenance run --auto'.
     +    If the lock file already exists, then no maintenance tasks are
     +    attempted. This will become very important later when we implement the
     +    'prefetch' task, as this is our stop-gap from creating a recursive process
     +    loop between 'git fetch' and 'git maintenance run --auto'.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
  8:  712f5f2d8e =  8:  dc2a092366 maintenance: create maintenance.<task>.enabled config
  9:  69d3b48fd4 =  9:  f5f1e85b03 maintenance: use pointers to check --auto
 10:  4c3115fe35 ! 10:  8f84b03c46 maintenance: add auto condition for commit-graph task
     @@ Commit message
          option.
      
          To compute the count, use a depth-first search starting at each ref, and
     -    leaving markers using the PARENT1 flag. If this count reaches the limit,
     +    leaving markers using the SEEN flag. If this count reaches the limit,
          then terminate early and start the task. Otherwise, this operation will
          peel every ref and parse the commit it points to. If these are all in
          the commit-graph, then this is typically a very fast operation. Users
 11:  652a8eac57 = 11:  2d6414d89b maintenance: add trace2 regions for task execution

-- 
gitgitgadget

^ permalink raw reply	[relevance 4%]

* [PATCH v3 0/6] [RFC] Maintenance III: background maintenance
  @ 2020-08-28 15:45  2% ` Derrick Stolee via GitGitGadget
  0 siblings, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-08-28 15:45 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Derrick Stolee

This is based on v3 of Part II (ds/maintenance-part-2) [1].

[1] 
https://lore.kernel.org/git/pull.696.v3.git.1598380599.gitgitgadget@gmail.com/

This RFC is intended to show how I hope to integrate true background
maintenance into Git. As opposed to my original RFC [2], this entirely
integrates with cron (through crontab [-e|-l]) to launch maintenance
commands in the background.

[2] 
https://lore.kernel.org/git/pull.597.git.1585946894.gitgitgadget@gmail.com/

Some preliminary work is done to allow a new --schedule option that tells
the command which tasks to run based on a maintenance.<task>.schedule config
option. The timing is not enforced by Git, but instead is expected to be
provided as a hint from a cron schedule.

A new for-each-repo builtin runs Git commands on every repo in a given list.
Currently, the list is stored as a config setting, allowing a new 
maintenance.repos config list to store the repositories registered for
background maintenance. Others may want to add a --file=<file> option for
their own workflows, but I focused on making this as simple as possible for
now.

The updates to the git maintenance builtin include new register/unregister 
subcommands and start/stop subcommands. The register subcommand initializes
the config while the start subcommand does everything register does plus 
update the cron table. The unregister and stop commands reverse this
process.

The very last patch is entirely optional. It sets a recommended schedule
based on my own experience with very large repositories. I'm open to other
suggestions, but these are ones that I think work well and don't cause a
"rewrite the world" scenario like running nightly 'gc' would do.

I've been testing this scenario on my macOS laptop for a while and my Linux
machine. I have modified my cron task to provide logging via trace2 so I can
see what's happening. A future direction here would be to add some
maintenance logs to the repository so we can track what is happening and
diagnose whether the maintenance strategy is working on real repos.

Note: git maintenance (start|stop) only works on machines with cron by
design. The proper thing to do on Windows will come later. Perhaps this
command should be marked as unavailable on Windows somehow, or at least a
better error than "cron may not be available on your system". I did find
that that message is helpful sometimes: macOS worker agents for CI builds
typically do not have cron available.

Updates since RFC v2
====================

 * Update the cron schedule with three lines saying "run hourly except at
   midnight", "run daily except on first day of week", and "run weekly".
   This avoids parallel processes competing for the object database lock.
   
   
 * Update the --schedule= and 'maintenance..schedule' config options. This
   is reflected in the recommended schedule at the end.
   
   
 * Drop the *.lastRun config option. It was going to trash config files but
   it is also not needed by the new cron schedule.
   
   

I expect this to be my final RFC version before restarting the thread with a
v1 next week. Please throw any and all critique at the plan here!

Updates since RFC v1
====================

 * Some fallout from rewriting the option parsing in "Maintenance I"
   
   
 * This applies cleanly on v3 of "Maintenance II"
   
   
 * Several helpful feedback items from Đoàn Trần Công Danh are applied.
   
   
 * There is an unresolved comment around the use of approxidate("now").
   These calls are untouched from v1.
   
   

Thanks, -Stolee

Cc: sandals@crustytoothpaste.net [sandals@crustytoothpaste.net], 
steadmon@google.com [steadmon@google.com], jrnieder@gmail.com
[jrnieder@gmail.com], peff@peff.net [peff@peff.net], congdanhqx@gmail.com
[congdanhqx@gmail.com], phillip.wood123@gmail.com
[phillip.wood123@gmail.com], emilyshaffer@google.com
[emilyshaffer@google.com], sluongng@gmail.com [sluongng@gmail.com], 
jonathantanmy@google.com [jonathantanmy@google.com]

Derrick Stolee (6):
  maintenance: optionally skip --auto process
  maintenance: add --schedule option and config
  for-each-repo: run subcommands on configured repos
  maintenance: add [un]register subcommands
  maintenance: add start/stop subcommands
  maintenance: recommended schedule in register/start

 .gitignore                           |   1 +
 Documentation/config/maintenance.txt |  10 +
 Documentation/git-for-each-repo.txt  |  59 ++++++
 Documentation/git-maintenance.txt    |  44 +++-
 Makefile                             |   2 +
 builtin.h                            |   1 +
 builtin/for-each-repo.c              |  58 ++++++
 builtin/gc.c                         | 292 ++++++++++++++++++++++++++-
 command-list.txt                     |   1 +
 git.c                                |   1 +
 run-command.c                        |   6 +
 t/helper/test-crontab.c              |  35 ++++
 t/helper/test-tool.c                 |   1 +
 t/helper/test-tool.h                 |   1 +
 t/t0068-for-each-repo.sh             |  30 +++
 t/t7900-maintenance.sh               | 114 ++++++++++-
 t/test-lib.sh                        |   6 +
 17 files changed, 654 insertions(+), 8 deletions(-)
 create mode 100644 Documentation/git-for-each-repo.txt
 create mode 100644 builtin/for-each-repo.c
 create mode 100644 t/helper/test-crontab.c
 create mode 100755 t/t0068-for-each-repo.sh


base-commit: e9bb32f53ade2067f773bfe6e5c13ed1a5d694a6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-680%2Fderrickstolee%2Fmaintenance%2Fscheduled-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-680/derrickstolee/maintenance/scheduled-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/680

Range-diff vs v2:

 1:  5fdd8188b1 = 1:  5fdd8188b1 maintenance: optionally skip --auto process
 2:  e3ef0b9bea < -:  ---------- maintenance: store the "last run" time in config
 3:  c728c57d85 ! 2:  41a067894d maintenance: add --scheduled option and config
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    maintenance: add --scheduled option and config
     +    maintenance: add --schedule option and config
      
          A user may want to run certain maintenance tasks based on frequency, not
          conditions given in the repository. For example, the user may want to
          perform a 'prefetch' task every hour, or 'gc' task every day. To assist,
     -    update the 'git maintenance run --scheduled' command to check the config
     -    for the last run of that task and add a number of seconds. The task
     -    would then run only if the current time is beyond that minimum
     -    timestamp.
     +    update the 'git maintenance run' command to include a
     +    '--schedule=<frequency>' option. The allowed frequencies are 'hourly',
     +    'daily', and 'weekly'. These values are also allowed in a new config
     +    value 'maintenance.<task>.schedule'.
      
     -    Add a '--scheduled' option to 'git maintenance run' to only run tasks
     -    that have had enough time pass since their last run. This is done for
     -    each enabled task by checking if the current timestamp is at least as
     -    large as the sum of 'maintenance.<task>.lastRun' and
     -    'maintenance.<task>.schedule' in the Git config. This second value is
     -    new to this commit, storing a number of seconds intended between runs.
     +    The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
     +    config value for each enabled task to see if the configured frequency is
     +    at least as frequent as the frequency from the '--schedule' argument. We
     +    use the following order, for full clarity:
      
     -    A user could then set up an hourly maintenance run with the following
     -    cron table:
     +            'hourly' > 'daily' > 'weekly'
      
     -      0 * * * * git -C <repo> maintenance run --scheduled
     +    Use new 'enum schedule_priority' to track these values numerically.
      
     -    Then, the user could configure the repository with the following config
     -    values:
     +    The following cron table would run the scheduled tasks with the correct
     +    frequencies:
      
     -      maintenance.prefetch.schedule  3000
     -      maintenance.gc.schedule       86000
     +      0 1-23 * * *    git -C <repo> maintenance run --scheduled=hourly
     +      0 0    * * 1-6  git -C <repo> maintenance run --scheduled=daily
     +      0 0    * * 0    git -C <repo> maintenance run --scheduled=weekly
      
     -    These numbers are slightly lower than one hour and one day (in seconds).
     -    The cron schedule will enforce the hourly run rate, but we can use these
     -    schedules to ensure the 'gc' task runs once a day. The error is given
     -    because the *.lastRun config option is specified at the _start_ of the
     -    task run. Otherwise, a slow task run could shift the "daily" job of 'gc'
     -    from a 10:00pm run to 11:00pm run, or later.
     +    This cron schedule will run --scheduled=hourly every hour except at
     +    midnight. This avoids a concurrent run with the --scheduled=daily that
     +    runs at midnight every day except the first day of the week. This avoids
     +    a concurrent run with the --scheduled=weekly that runs at midnight on
     +    the first day of the week. Since --scheduled=daily also runs the
     +    'hourly' tasks and --scheduled=weekly runs the 'hourly' and 'daily'
     +    tasks, we will still see all tasks run with the proper frequencies.
      
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/config/maintenance.txt ##
     -@@ Documentation/config/maintenance.txt: maintenance.<task>.lastRun::
     - 	`<task>` is run. It stores a timestamp representing the most-recent
     - 	run of the `<task>`.
     +@@ Documentation/config/maintenance.txt: maintenance.<task>.enabled::
     + 	`--task` option exists. By default, only `maintenance.gc.enabled`
     + 	is true.
       
      +maintenance.<task>.schedule::
      +	This config option controls whether or not the given `<task>` runs
     -+	during a `git maintenance run --scheduled` command. If the option
     -+	is an integer value `S`, then the `<task>` is run when the current
     -+	time is `S` seconds after the timestamp stored in
     -+	`maintenance.<task>.lastRun`. If the option has no value or a
     -+	non-integer value, then the task will never run with the `--scheduled`
     -+	option.
     ++	during a `git maintenance run --schedule=<frequency>` command. The
     ++	value must be one of "hourly", "daily", or "weekly".
      +
       maintenance.commit-graph.auto::
       	This integer config option controls how often the `commit-graph` task
     @@ Documentation/git-maintenance.txt: OPTIONS
       	in the `gc.auto` config setting, or when the number of pack-files
      -	exceeds the `gc.autoPackLimit` config setting.
      +	exceeds the `gc.autoPackLimit` config setting. Not compatible with
     -+	the `--scheduled` option.
     ++	the `--schedule` option.
      +
     -+--scheduled::
     ++--schedule::
      +	When combined with the `run` subcommand, run maintenance tasks
      +	only if certain time conditions are met, as specified by the
      +	`maintenance.<task>.schedule` config value for each `<task>`.
     @@ Documentation/git-maintenance.txt: OPTIONS
      
       ## builtin/gc.c ##
      @@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
     + 	return 0;
       }
       
     - static const char * const builtin_maintenance_run_usage[] = {
     +-static const char * const builtin_maintenance_run_usage[] = {
      -	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>]"),
     -+	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>] [--scheduled]"),
     ++static const char *const builtin_maintenance_run_usage[] = {
     ++	N_("git maintenance run [--auto] [--[no-]quiet] [--task=<task>] [--schedule]"),
       	NULL
       };
       
     ++enum schedule_priority {
     ++	SCHEDULE_NONE = 0,
     ++	SCHEDULE_WEEKLY = 1,
     ++	SCHEDULE_DAILY = 2,
     ++	SCHEDULE_HOURLY = 3,
     ++};
     ++
     ++static enum schedule_priority parse_schedule(const char *value)
     ++{
     ++	if (!value)
     ++		return SCHEDULE_NONE;
     ++	if (!strcasecmp(value, "hourly"))
     ++		return SCHEDULE_HOURLY;
     ++	if (!strcasecmp(value, "daily"))
     ++		return SCHEDULE_DAILY;
     ++	if (!strcasecmp(value, "weekly"))
     ++		return SCHEDULE_WEEKLY;
     ++	return SCHEDULE_NONE;
     ++}
     ++
     ++static int maintenance_opt_schedule(const struct option *opt, const char *arg,
     ++				    int unset)
     ++{
     ++	enum schedule_priority *priority = opt->value;
     ++
     ++	if (unset)
     ++		die(_("--no-schedule is not allowed"));
     ++
     ++	*priority = parse_schedule(arg);
     ++
     ++	if (!*priority)
     ++		die(_("unrecognized --schedule argument '%s'"), arg);
     ++
     ++	return 0;
     ++}
     ++
       struct maintenance_run_opts {
       	int auto_flag;
     -+	int scheduled;
       	int quiet;
     ++	enum schedule_priority schedule;
       };
       
     + /* Remember to update object flag allocation in object.h */
      @@ builtin/gc.c: struct maintenance_task {
     - 	const char *name;
     - 	maintenance_task_fn *fn;
       	maintenance_auto_fn *auto_condition;
     --	unsigned enabled:1;
     -+	unsigned enabled:1,
     -+		 scheduled:1;
     + 	unsigned enabled:1;
       
     ++	enum schedule_priority schedule;
     ++
       	/* -1 if not selected. */
       	int selected_order;
     + };
      @@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
     - 		     !tasks[i].auto_condition()))
       			continue;
       
     -+		if (opts->scheduled && !tasks[i].scheduled)
     + 		if (opts->auto_flag &&
     +-		    (!tasks[i].auto_condition ||
     +-		     !tasks[i].auto_condition()))
     ++		    (!tasks[i].auto_condition || !tasks[i].auto_condition()))
      +			continue;
      +
     - 		update_last_run(&tasks[i]);
     ++		if (opts->schedule && tasks[i].schedule < opts->schedule)
     + 			continue;
       
       		trace2_region_enter("maintenance", tasks[i].name, r);
     -@@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts)
     - 	return result;
     - }
     - 
     -+static void fill_schedule_info(struct maintenance_task *task,
     -+			       const char *config_name,
     -+			       timestamp_t schedule_delay)
     -+{
     -+	timestamp_t now = approxidate("now");
     -+	char *value = NULL;
     -+	struct strbuf last_run = STRBUF_INIT;
     -+	int64_t previous_run;
     -+
     -+	strbuf_addf(&last_run, "maintenance.%s.lastrun", task->name);
     -+
     -+	if (git_config_get_string(last_run.buf, &value))
     -+		task->scheduled = 1;
     -+	else {
     -+		previous_run = git_config_int64(last_run.buf, value);
     -+		if (now >= previous_run + schedule_delay)
     -+			task->scheduled = 1;
     -+	}
     -+
     -+	free(value);
     -+	strbuf_release(&last_run);
     -+}
     -+
     - static void initialize_task_config(void)
     - {
     - 	int i;
      @@ builtin/gc.c: static void initialize_task_config(void)
       
       	for (i = 0; i < TASK__COUNT; i++) {
     @@ builtin/gc.c: static void initialize_task_config(void)
      +			    tasks[i].name);
      +
      +		if (!git_config_get_string(config_name.buf, &config_str)) {
     -+			timestamp_t schedule_delay = git_config_int64(
     -+							config_name.buf,
     -+							config_str);
     -+			fill_schedule_info(&tasks[i],
     -+						config_name.buf,
     -+						schedule_delay);
     ++			tasks[i].schedule = parse_schedule(config_str);
      +			free(config_str);
      +		}
       	}
     @@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char
       	struct option builtin_maintenance_run_options[] = {
       		OPT_BOOL(0, "auto", &opts.auto_flag,
       			 N_("run tasks based on the state of the repository")),
     -+		OPT_BOOL(0, "scheduled", &opts.scheduled,
     -+			 N_("run tasks based on time intervals")),
     ++		OPT_CALLBACK(0, "schedule", &opts.schedule, N_("frequency"),
     ++			     N_("run tasks based on frequency"),
     ++			     maintenance_opt_schedule),
       		OPT_BOOL(0, "quiet", &opts.quiet,
       			 N_("do not report progress or other information over stderr")),
       		OPT_CALLBACK_F(0, "task", NULL, N_("task"),
     @@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char
       			     builtin_maintenance_run_usage,
       			     PARSE_OPT_STOP_AT_NON_OPTION);
       
     -+	if (opts.auto_flag + opts.scheduled > 1)
     -+		die(_("use at most one of the --auto and --scheduled options"));
     ++	if (opts.auto_flag && opts.schedule)
     ++		die(_("use at most one of --auto and --schedule=<frequency>"));
      +
       	if (argc != 0)
       		usage_with_options(builtin_maintenance_run_usage,
     @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
       	done
       '
       
     -+test_expect_success '--auto and --scheduled incompatible' '
     -+	test_must_fail git maintenance run --auto --scheduled 2>err &&
     ++test_expect_success '--auto and --schedule incompatible' '
     ++	test_must_fail git maintenance run --auto --schedule=daily 2>err &&
      +	test_i18ngrep "at most one" err
      +'
      +
     - test_expect_success 'tasks update maintenance.<task>.lastRun' '
     - 	git config --unset maintenance.commit-graph.lastrun &&
     - 	GIT_TRACE2_EVENT="$(pwd)/run.txt" \
     -@@ t/t7900-maintenance.sh: test_expect_success 'tasks update maintenance.<task>.lastRun' '
     - 	test_cmp_config 1595000000 maintenance.commit-graph.lastrun
     - '
     - 
     -+test_expect_success '--scheduled with specific time' '
     -+	git config maintenance.commit-graph.schedule 100 &&
     -+	GIT_TRACE2_EVENT="$(pwd)/too-soon.txt" \
     -+		GIT_TEST_DATE_NOW=1595000099 \
     -+		git maintenance run --scheduled 2>/dev/null &&
     ++test_expect_success 'invalid --schedule value' '
     ++	test_must_fail git maintenance run --schedule=annually 2>err &&
     ++	test_i18ngrep "unrecognized --schedule" err
     ++'
     ++
     ++test_expect_success '--schedule inheritance weekly -> daily -> hourly' '
     ++	git config maintenance.loose-objects.enabled true &&
     ++	git config maintenance.loose-objects.schedule hourly &&
     ++	git config maintenance.commit-graph.enabled true &&
     ++	git config maintenance.commit-graph.schedule daily &&
     ++	git config maintenance.incremental-repack.enabled true &&
     ++	git config maintenance.incremental-repack.schedule weekly &&
     ++
     ++	GIT_TRACE2_EVENT="$(pwd)/hourly.txt" \
     ++		git maintenance run --schedule=hourly 2>/dev/null &&
     ++	test_subcommand git prune-packed --quiet <hourly.txt &&
      +	test_subcommand ! git commit-graph write --split --reachable \
     -+		--no-progress <too-soon.txt &&
     -+	GIT_TRACE2_EVENT="$(pwd)/long-enough.txt" \
     -+		GIT_TEST_DATE_NOW=1595000100 \
     -+		git maintenance run --scheduled 2>/dev/null &&
     ++		--no-progress <hourly.txt &&
     ++	test_subcommand ! git multi-pack-index write --no-progress <hourly.txt &&
     ++
     ++	GIT_TRACE2_EVENT="$(pwd)/daily.txt" \
     ++		git maintenance run --schedule=daily 2>/dev/null &&
     ++	test_subcommand git prune-packed --quiet <daily.txt &&
     ++	test_subcommand git commit-graph write --split --reachable \
     ++		--no-progress <daily.txt &&
     ++	test_subcommand ! git multi-pack-index write --no-progress <daily.txt &&
     ++
     ++	GIT_TRACE2_EVENT="$(pwd)/weekly.txt" \
     ++		git maintenance run --schedule=weekly 2>/dev/null &&
     ++	test_subcommand git prune-packed --quiet <weekly.txt &&
      +	test_subcommand git commit-graph write --split --reachable \
     -+		--no-progress <long-enough.txt &&
     -+	test_cmp_config 1595000100 maintenance.commit-graph.lastrun
     ++		--no-progress <weekly.txt &&
     ++	test_subcommand git multi-pack-index write --no-progress <weekly.txt
      +'
      +
       test_done
 4:  0314258c5c = 3:  b29b68614b for-each-repo: run subcommands on configured repos
 5:  c0ce1267a9 ! 4:  fc741fab5a maintenance: add [un]register subcommands
     @@ t/t7900-maintenance.sh: GIT_TEST_MULTI_PACK_INDEX=0
       	test_expect_code 128 git maintenance barf 2>err &&
       	test_i18ngrep "invalid subcommand: barf" err
       '
     -@@ t/t7900-maintenance.sh: test_expect_success '--scheduled with specific time' '
     - 	test_cmp_config 1595000100 maintenance.commit-graph.lastrun
     +@@ t/t7900-maintenance.sh: test_expect_success '--schedule inheritance weekly -> daily -> hourly' '
     + 	test_subcommand git multi-pack-index write --no-progress <weekly.txt
       '
       
      +test_expect_success 'register and unregister' '
 6:  8a7c34035a ! 5:  e9672c6a6c maintenance: add start/stop subcommands
     @@ Commit message
          maintenance using 'cron', when available. This integration is as simple
          as I could make it, barring some implementation complications.
      
     -    For now, the background maintenance is scheduled to run hourly via the
     -    following cron table row (ignore line breaks):
     +    The schedule is laid out as follows:
      
     -            0 * * * * $p/git --exec-path=$p
     -                    for-each-repo --config=maintenance.repo
     -                    maintenance run --scheduled
     +      0 1-23 * * *   $cmd maintenance run --schedule=hourly
     +      0 0    * * 1-6 $cmd maintenance run --schedule=daily
     +      0 0    * * 0   $cmd maintenance run --schedule=weekly
      
     -    Future extensions may want to add more complex schedules or some form of
     -    logging. For now, hourly runs seem frequent enough to satisfy the needs
     -    of tasks like 'prefetch' without being so frequent that users would
     -    complain about many no-op commands.
     +    where $cmd is a properly-qualified 'git for-each-repo' execution:
      
     -    Here, "$p" is a placeholder for the path to the current Git executable.
     -    This is critical for systems with multiple versions of Git.
     -    Specifically, macOS has a system version at '/usr/bin/git' while the
     -    version that users can install resides at '/usr/local/bin/git' (symlinked
     -    to '/usr/local/libexec/git-core/git'). This will also use your
     -    locally-built version if you build and run this in your development
     +    $cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo
     +
     +    where $path points to the location of the Git executable running 'git
     +    maintenance start'. This is critical for systems with multiple versions
     +    of Git. Specifically, macOS has a system version at '/usr/bin/git' while
     +    the version that users can install resides at '/usr/local/bin/git'
     +    (symlinked to '/usr/local/libexec/git-core/git'). This will also use
     +    your locally-built version if you build and run this in your development
          environment without installing first.
      
     +    This conditional schedule avoids having cron launch multiple 'git
     +    for-each-repo' commands in parallel. Such parallel commands would likely
     +    lead to the 'hourly' and 'daily' tasks competing over the object
     +    database lock. This could lead to to some tasks never being run! Since
     +    the --schedule=<frequency> argument will run all tasks with _at least_
     +    the given frequency, the daily runs will also run the hourly tasks.
     +    Similarly, the weekly runs will also run the daily and hourly tasks.
     +
          The GIT_TEST_CRONTAB environment variable is not intended for users to
          edit, but instead as a way to mock the 'crontab [-l]' command. This
          variable is set in test-lib.sh to avoid a future test from accidentally
     @@ builtin/gc.c: static int maintenance_unregister(void)
      +	}
      +
      +	if (run_maintenance) {
     ++		struct strbuf line_format = STRBUF_INIT;
      +		const char *exec_path = git_exec_path();
      +
     -+		fprintf(cron_in, "\n%s\n", BEGIN_LINE);
     -+		fprintf(cron_in, "# The following schedule was created by Git\n");
     ++		fprintf(cron_in, "%s\n", BEGIN_LINE);
     ++		fprintf(cron_in,
     ++			"# The following schedule was created by Git\n");
      +		fprintf(cron_in, "# Any edits made in this region might be\n");
     -+		fprintf(cron_in, "# replaced in the future by a Git command.\n\n");
     -+
      +		fprintf(cron_in,
     -+			"0 * * * * \"%s/git\" --exec-path=\"%s\" for-each-repo --config=maintenance.repo maintenance run --scheduled\n",
     -+			exec_path, exec_path);
     ++			"# replaced in the future by a Git command.\n\n");
     ++
     ++		strbuf_addf(&line_format,
     ++			    "%%s %%s * * %%s \"%s/git\" --exec-path=\"%s\" for-each-repo --config=maintenance.repo maintenance run --schedule=%%s\n",
     ++			    exec_path, exec_path);
     ++		fprintf(cron_in, line_format.buf, "0", "1-23", "*", "hourly");
     ++		fprintf(cron_in, line_format.buf, "0", "0", "1-6", "daily");
     ++		fprintf(cron_in, line_format.buf, "0", "0", "0", "weekly");
     ++		strbuf_release(&line_format);
      +
      +		fprintf(cron_in, "\n%s\n", END_LINE);
      +	}
     @@ t/t7900-maintenance.sh: test_expect_success 'register and unregister' '
      +	# start registers the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
      +
     -+	grep "for-each-repo --config=maintenance.repo maintenance run --scheduled" cron.txt
     ++	grep "for-each-repo --config=maintenance.repo maintenance run --schedule=daily" cron.txt &&
     ++	grep "for-each-repo --config=maintenance.repo maintenance run --schedule=hourly" cron.txt &&
     ++	grep "for-each-repo --config=maintenance.repo maintenance run --schedule=weekly" cron.txt
      +'
      +
      +test_expect_success 'stop from existing schedule' '
     @@ t/t7900-maintenance.sh: test_expect_success 'register and unregister' '
      +	# stop does not unregister the repo
      +	git config --get --global maintenance.repo "$(pwd)" &&
      +
     -+	# The newline is preserved
     -+	echo >empty &&
     -+	test_cmp empty cron.txt &&
     -+
      +	# Operation is idempotent
      +	GIT_TEST_CRONTAB="test-tool crontab cron.txt" git maintenance stop &&
     -+	test_cmp empty cron.txt
     ++	test_must_be_empty cron.txt
      +'
      +
      +test_expect_success 'start preserves existing schedule' '
 7:  9ecabeb055 ! 6:  62e8db8b2a maintenance: recommended schedule in register/start
     @@ Commit message
          repository. It does not specify what maintenance should occur or how
          often.
      
     -    If a user sets any 'maintenance.<task>.scheduled' config value, then
     +    If a user sets any 'maintenance.<task>.schedule' config value, then
          they have chosen a specific schedule for themselves and Git should
          respect that.
      
     @@ Commit message
          schedule we use in Scalar and VFS for Git for very large repositories
          using the GVFS protocol. While the schedule works in that environment,
          it is possible that "normal" Git repositories could benefit from
     -    something more obvious (such as running 'gc' once a day). However, this
     +    something more obvious (such as running 'gc' weekly). However, this
          patch gives us a place to start a conversation on what we should
          recommend. For my purposes, Scalar will set these config values so we
          can always differ from core Git's recommendations.
     @@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char
      +	prefix = config_name.len;
      +
      +	for (i = 0; !found && i < TASK__COUNT; i++) {
     -+		int value;
     ++		char *value;
      +
      +		strbuf_setlen(&config_name, prefix);
      +		strbuf_addf(&config_name, "%s.schedule", tasks[i].name);
      +
     -+		if (!git_config_get_int(config_name.buf, &value))
     ++		if (!git_config_get_string(config_name.buf, &value)) {
      +			found = 1;
     ++			FREE_AND_NULL(value);
     ++		}
      +	}
      +
      +	strbuf_release(&config_name);
     @@ builtin/gc.c: static int maintenance_run(int argc, const char **argv, const char
      +	git_config_set("maintenance.gc.enabled", "false");
      +
      +	git_config_set("maintenance.prefetch.enabled", "true");
     -+	git_config_set("maintenance.prefetch.schedule", "3500");
     ++	git_config_set("maintenance.prefetch.schedule", "hourly");
      +
      +	git_config_set("maintenance.commit-graph.enabled", "true");
     -+	git_config_set("maintenance.commit-graph.schedule", "3500");
     ++	git_config_set("maintenance.commit-graph.schedule", "hourly");
      +
      +	git_config_set("maintenance.loose-objects.enabled", "true");
     -+	git_config_set("maintenance.loose-objects.schedule", "86000");
     ++	git_config_set("maintenance.loose-objects.schedule", "daily");
      +
      +	git_config_set("maintenance.incremental-repack.enabled", "true");
     -+	git_config_set("maintenance.incremental-repack.schedule", "86000");
     ++	git_config_set("maintenance.incremental-repack.schedule", "daily");
      +}
      +
       static int maintenance_register(void)
     @@ builtin/gc.c: static int maintenance_register(void)
       	if (!the_repository || !the_repository->gitdir)
       		return 0;
       
     -+	if (has_schedule_config())
     ++	if (!has_schedule_config())
      +		set_recommended_schedule();
      +
       	config_get.git_cmd = 1;
     @@ builtin/gc.c: static int maintenance_register(void)
      
       ## t/t7900-maintenance.sh ##
      @@ t/t7900-maintenance.sh: test_expect_success 'register and unregister' '
     + 	git config --global --add maintenance.repo /existing1 &&
       	git config --global --add maintenance.repo /existing2 &&
       	git config --global --get-all maintenance.repo >before &&
     ++
     ++	# We still have maintenance.<task>.schedule config set,
     ++	# so this does not update the local schedule
     ++	git maintenance register &&
     ++	test_must_fail git config maintenance.auto &&
     ++
     ++	# Clear previous maintenance.<task>.schedule values
     ++	for task in loose-objects commit-graph incremental-repack
     ++	do
     ++		git config --unset maintenance.$task.schedule || return 1
     ++	done &&
       	git maintenance register &&
      +	test_cmp_config false maintenance.auto &&
      +	test_cmp_config false maintenance.gc.enabled &&
      +	test_cmp_config true maintenance.prefetch.enabled &&
     -+	test_cmp_config 3500 maintenance.commit-graph.schedule &&
     -+	test_cmp_config 86000 maintenance.incremental-repack.schedule &&
     ++	test_cmp_config hourly maintenance.commit-graph.schedule &&
     ++	test_cmp_config daily maintenance.incremental-repack.schedule &&
       	git config --global --get-all maintenance.repo >actual &&
       	cp before after &&
       	pwd >>after &&

-- 
gitgitgadget

^ permalink raw reply	[relevance 2%]

* Re: [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks
    2020-08-25 18:36  3%     ` [PATCH v3 7/8] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
@ 2020-08-26 15:15  6%     ` Son Luong Ngoc
    2 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-08-26 15:15 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, Jeff King, congdanhqx,
	phillip.wood123, Emily Shaffer, Jonathan Tan, Derrick Stolee

Hi Derrick,

> On Aug 25, 2020, at 20:36, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> 

...

> 
> Updates since v2
> ================
> 
> * Dropped "fetch: optionally allow disabling FETCH_HEAD update"
> 
> 
> * A lot of fallout from the change in the option parsing in v3 of
>   Maintenance II.
> 
> 
> * Dropped the "verify, and delete and rewrite on failure" logic from the
>   incremental-repack task. This might be added again later after it can be
>   tested more thoroughly.

Perhaps I missed some conversations related to this change but
why was this verify-rewrite strategy dropped?

Was the problem such strategy were created to solve is now no longer a concern?

I feel like it would be much better to add it in and then remove it using a separated commit?
That way we can follow the reasoning behind these decisions via commit message.

> 

...

> 
> -- 
> gitgitgadget

Thanks,
Son Luong.

^ permalink raw reply	[relevance 6%]

* [PATCH v3 7/8] maintenance: auto-size incremental-repack batch
  @ 2020-08-25 18:36  3%     ` Derrick Stolee via GitGitGadget
  2020-08-26 15:15  6%     ` [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks Son Luong Ngoc
    2 siblings, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-08-25 18:36 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.

Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.

The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.

Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.

Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.

We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.

Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.

Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c           | 43 +++++++++++++++++++++++++++++++++++++++++-
 t/t7900-maintenance.sh | 36 +++++++++++++++++++++++++++++++++--
 2 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index fbf84996fa..e043403400 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1036,6 +1036,46 @@ static int multi_pack_index_expire(struct maintenance_run_opts *opts)
 	return 0;
 }
 
+#define TWO_GIGABYTES (INT32_MAX)
+
+static off_t get_auto_pack_size(void)
+{
+	/*
+	 * The "auto" value is special: we optimize for
+	 * one large pack-file (i.e. from a clone) and
+	 * expect the rest to be small and they can be
+	 * repacked quickly.
+	 *
+	 * The strategy we select here is to select a
+	 * size that is one more than the second largest
+	 * pack-file. This ensures that we will repack
+	 * at least two packs if there are three or more
+	 * packs.
+	 */
+	off_t max_size = 0;
+	off_t second_largest_size = 0;
+	off_t result_size;
+	struct packed_git *p;
+	struct repository *r = the_repository;
+
+	reprepare_packed_git(r);
+	for (p = get_all_packs(r); p; p = p->next) {
+		if (p->pack_size > max_size) {
+			second_largest_size = max_size;
+			max_size = p->pack_size;
+		} else if (p->pack_size > second_largest_size)
+			second_largest_size = p->pack_size;
+	}
+
+	result_size = second_largest_size + 1;
+
+	/* But limit ourselves to a batch size of 2g */
+	if (result_size > TWO_GIGABYTES)
+		result_size = TWO_GIGABYTES;
+
+	return result_size;
+}
+
 static int multi_pack_index_repack(struct maintenance_run_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -1046,7 +1086,8 @@ static int multi_pack_index_repack(struct maintenance_run_opts *opts)
 	if (opts->quiet)
 		strvec_push(&child.args, "--no-progress");
 
-	strvec_push(&child.args, "--batch-size=0");
+	strvec_pushf(&child.args, "--batch-size=%"PRIuMAX,
+				  (uintmax_t)get_auto_pack_size());
 
 	close_object_store(the_repository->objects);
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index dde28cf837..5c08afc19a 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -182,10 +182,42 @@ test_expect_success 'incremental-repack task' '
 	test_line_count = 4 packs-between &&
 
 	# the job deletes the two old packs, and does not write
-	# a new one because only one pack remains.
+	# a new one because the batch size is not high enough to
+	# pack the largest pack-file.
 	git maintenance run --task=incremental-repack &&
 	ls .git/objects/pack/*.pack >packs-after &&
-	test_line_count = 1 packs-after
+	test_line_count = 2 packs-after
+'
+
+test_expect_success EXPENSIVE 'incremental-repack 2g limit' '
+	for i in $(test_seq 1 5)
+	do
+		test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
+		return 1
+	done &&
+	git add big &&
+	git commit -m "Add big file (1)" &&
+
+	# ensure any possible loose objects are in a pack-file
+	git maintenance run --task=loose-objects &&
+
+	rm big &&
+	for i in $(test_seq 6 10)
+	do
+		test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
+		return 1
+	done &&
+	git add big &&
+	git commit -m "Add big file (2)" &&
+
+	# ensure any possible loose objects are in a pack-file
+	git maintenance run --task=loose-objects &&
+
+	# Now run the incremental-repack task and check the batch-size
+	GIT_TRACE2_EVENT="$(pwd)/run-2g.txt" git maintenance run \
+		--task=incremental-repack 2>/dev/null &&
+	test_subcommand git multi-pack-index repack \
+		 --no-progress --batch-size=2147483647 <run-2g.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[relevance 3%]

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  @ 2020-08-25  7:55  6% ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-08-25  7:55 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee

Hi Taylor,

Thanks for working on this.

> On Aug 25, 2020, at 04:01, Taylor Blau <me@ttaylorr.com> wrote:
> 
> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
> learned to remove a multi-pack-index file if it added or removed a pack
> from the object store.
> 
> This mechanism is a little over-eager, since it is only necessary to
> drop a MIDX if 'git repack' removes a pack that the MIDX references.
> Adding a pack outside of the MIDX does not require invalidating the
> MIDX, and likewise for removing a pack the MIDX does not know about.

I wonder if its worth to trigger write_midx_file() to update the midx instead of
just removing MIDX?

That is already the direction we are taking in the 'maintenance' patch series
whenever the multi-pack-index file was deemed invalid.

Or perhaps, we can check for 'core.multiPackIndex' value (which recently is 
'true' by default) and determine whether we should remove the MIDX or rewrite
it?

> 
> Teach 'git repack' to check for this by loading the MIDX, and checking
> whether the to-be-removed pack is known to the MIDX. This requires a
> slightly odd alternation to a test in t5319, which is explained with a
> comment.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>

Cheers,
Son Luong.

^ permalink raw reply	[relevance 6%]

* [PATCH v2 8/9] maintenance: auto-size incremental-repack batch
  @ 2020-08-18 14:25  3%   ` Derrick Stolee via GitGitGadget
    1 sibling, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-08-18 14:25 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.

Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.

The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.

Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.

Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.

We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.

Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.

Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c           | 43 +++++++++++++++++++++++++++++++++++++++++-
 t/t7900-maintenance.sh | 36 +++++++++++++++++++++++++++++++++--
 2 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 9aabef44a8..d2f3e27e54 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1093,6 +1093,46 @@ static int multi_pack_index_expire(struct maintenance_opts *opts)
 	return 0;
 }
 
+#define TWO_GIGABYTES (INT32_MAX)
+
+static off_t get_auto_pack_size(void)
+{
+	/*
+	 * The "auto" value is special: we optimize for
+	 * one large pack-file (i.e. from a clone) and
+	 * expect the rest to be small and they can be
+	 * repacked quickly.
+	 *
+	 * The strategy we select here is to select a
+	 * size that is one more than the second largest
+	 * pack-file. This ensures that we will repack
+	 * at least two packs if there are three or more
+	 * packs.
+	 */
+	off_t max_size = 0;
+	off_t second_largest_size = 0;
+	off_t result_size;
+	struct packed_git *p;
+	struct repository *r = the_repository;
+
+	reprepare_packed_git(r);
+	for (p = get_all_packs(r); p; p = p->next) {
+		if (p->pack_size > max_size) {
+			second_largest_size = max_size;
+			max_size = p->pack_size;
+		} else if (p->pack_size > second_largest_size)
+			second_largest_size = p->pack_size;
+	}
+
+	result_size = second_largest_size + 1;
+
+	/* But limit ourselves to a batch size of 2g */
+	if (result_size > TWO_GIGABYTES)
+		result_size = TWO_GIGABYTES;
+
+	return result_size;
+}
+
 static int multi_pack_index_repack(struct maintenance_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -1103,7 +1143,8 @@ static int multi_pack_index_repack(struct maintenance_opts *opts)
 	if (opts->quiet)
 		strvec_push(&child.args, "--no-progress");
 
-	strvec_push(&child.args, "--batch-size=0");
+	strvec_pushf(&child.args, "--batch-size=%"PRIuMAX,
+				  (uintmax_t)get_auto_pack_size());
 
 	close_object_store(the_repository->objects);
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index dde28cf837..5c08afc19a 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -182,10 +182,42 @@ test_expect_success 'incremental-repack task' '
 	test_line_count = 4 packs-between &&
 
 	# the job deletes the two old packs, and does not write
-	# a new one because only one pack remains.
+	# a new one because the batch size is not high enough to
+	# pack the largest pack-file.
 	git maintenance run --task=incremental-repack &&
 	ls .git/objects/pack/*.pack >packs-after &&
-	test_line_count = 1 packs-after
+	test_line_count = 2 packs-after
+'
+
+test_expect_success EXPENSIVE 'incremental-repack 2g limit' '
+	for i in $(test_seq 1 5)
+	do
+		test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
+		return 1
+	done &&
+	git add big &&
+	git commit -m "Add big file (1)" &&
+
+	# ensure any possible loose objects are in a pack-file
+	git maintenance run --task=loose-objects &&
+
+	rm big &&
+	for i in $(test_seq 6 10)
+	do
+		test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
+		return 1
+	done &&
+	git add big &&
+	git commit -m "Add big file (2)" &&
+
+	# ensure any possible loose objects are in a pack-file
+	git maintenance run --task=loose-objects &&
+
+	# Now run the incremental-repack task and check the batch-size
+	GIT_TRACE2_EVENT="$(pwd)/run-2g.txt" git maintenance run \
+		--task=incremental-repack 2>/dev/null &&
+	test_subcommand git multi-pack-index repack \
+		 --no-progress --batch-size=2147483647 <run-2g.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[relevance 3%]

* Re: [PATCH 8/9] maintenance: auto-size incremental-repack batch
  2020-08-06 17:02 11%   ` Son Luong Ngoc
@ 2020-08-06 18:13  0%     ` Derrick Stolee
  0 siblings, 0 replies; 122+ results
From: Derrick Stolee @ 2020-08-06 18:13 UTC (permalink / raw)
  To: Son Luong Ngoc, Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, Jeff King, congdanhqx,
	phillip.wood123, Emily Shaffer, Jonathan Tan, Derrick Stolee,
	Derrick Stolee

On 8/6/2020 1:02 PM, Son Luong Ngoc wrote:
> Hi Derrick,
> 
>> On Aug 6, 2020, at 18:30, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> When repacking during the 'incremental-repack' task, we use the
>> --batch-size option in 'git multi-pack-index repack'. The initial setting
>> used --batch-size=0 to repack everything into a single pack-file. This is
>> not sustainable for a large repository. The amount of work required is
>> also likely to use too many system resources for a background job.
>>
>> Update the 'incremental-repack' task by dynamically computing a
>> --batch-size option based on the current pack-file structure.
>>
>> The dynamic default size is computed with this idea in mind for a client
>> repository that was cloned from a very large remote: there is likely one
>> "big" pack-file that was created at clone time. Thus, do not try
>> repacking it as it is likely packed efficiently by the server.
>>
>> Instead, we select the second-largest pack-file, and create a batch size
>> that is one larger than that pack-file. If there are three or more
>> pack-files, then this guarantees that at least two will be combined into
>> a new pack-file.
> 
> I have been using this strategy with git-care.sh [1] with large success.
> However it worth to note that there are still edge case where I observed that
> pack count keep increasing because using '--batch-size=<second-biggest-pack>+1'
> did not resulted in any repacking.
> In one case, I have observed a local copy went up to 160+ packs without being able
> to repack.
> 
> I have been considering whether a strategy such as falling back to the '(3rd biggest
> pack size) + 1' and 4th and 5th and so on... when midx repack call resulted in no-op,
> as that was how I fixed my repo when the edge case happen.
> 
> Such strategy would require a way to detect midx repack to signal when no-op happen,
> so something like 'git multi-pack-index repack --batch-size=123456 --exit-code' would
> be much desirable.
> 
>>
>> Of course, this means that the second-largest pack-file size is likely
>> to grow over time and may eventually surpass the initially-cloned
>> pack-file. Recall that the pack-file batch is selected in a greedy
>> manner: the packs are considered from oldest to newest and are selected
>> if they have size smaller than the batch size until the total selected
>> size is larger than the batch size. Thus, that oldest "clone" pack will
>> be first to repack after the new data creates a pack larger than that.
>>
>> We also want to place some limits on how large these pack-files become,
>> in order to bound the amount of time spent repacking. A maximum
>> batch-size of two gigabytes means that large repositories will never be
>> packed into a single pack-file using this job, but also that repack is
>> rather expensive. This is a trade-off that is valuable to have if the
>> maintenance is being run automatically or in the background. Users who
>> truly want to optimize for space and performance (and are willing to pay
>> the upfront cost of a full repack) can use the 'gc' task to do so.
>>
>> Create a test for this two gigabyte limit by creating an EXPENSIVE test
>> that generates two pack-files of roughly 2.5 gigabytes in size, then
>> performs an incremental repack. Check that the --batch-size argument in
>> the subcommand uses the hard-coded maximum.
>>
>> Helped-by: Chris Torek <chris.torek@gmail.com>
>> Reported-by: Son Luong Ngoc <sluongng@gmail.com>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> 
> Generally, I have found working with '--batch-size' to be a bit unpredictable.
> I wonder if we could tweak the behavior somewhat so that its more consistent
> to use and test?

Thanks for continuing to test with this model. My experience has been limited
to the --batch-size=2g option on repos that typically have >15gb of commit
and tree data in their pack directory.

One bit of unpredictability that we've seen is that the --batch-size uses
the "expected size" of a pack-file to select if it should be repacked. This
only tracks the objects that are referenced in that pack, so when we have
new packs that were "un-thinned" by duplicating delta-bases, those affect
the expected size of a repack.

Here is a great reason to have this series be split. While part I stabilizes,
we can take the time to re-evaluate this strategy. It might require updating
the multi-pack-index builtin itself.

One thing to think about is to focus on the different possible sizes of a
repository. If the entire pack-directory is small, then we might as well
repack everything. (What should the limit be? 2gb? configurable?) In the
case of a larger directory, should we just use the --batch-size logic with
that limit, or should we create a new option for which packs to repack?

For example, the recent simulation [1] of downloading fetch packs and
running this maintenance did see extra space that could be recovered with
the current logic. However, what if we could specify "repack everything
except the largest pack" exactly? That would do exactly what this task
is _intending_, as long as that resulting pack-file does not exceed the
2gb maximum by too much.

[1] https://lore.kernel.org/git/d50fbb33-9be3-1c48-2277-8bf894df734f@gmail.com/

I will think more on this. I'm open to alternate strategies.

Thanks,
-Stolee

^ permalink raw reply	[relevance 0%]

* [PATCH 0/9] Maintenance II: prefetch, loose-objects, incremental-repack tasks
@ 2020-08-06 16:30  1% Derrick Stolee via GitGitGadget
  2020-08-06 16:30  3% ` [PATCH 8/9] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
    0 siblings, 2 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-08-06 16:30 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee

This series is based on part I [2].

This patch series contains 9 patches that were going to be part of v4 of
ds/maintenance [1], but the discussion has gotten really long. To help, I'm
splitting out the portions that create and test the 'maintenance' builtin
from the additional tasks (prefetch, loose-objects, incremental-repack) that
can be brought in later.

[1] 
https://lore.kernel.org/git/pull.671.git.1594131695.gitgitgadget@gmail.com/
[2] 
https://lore.kernel.org/git/pull.695.git.1596728921.gitgitgadget@gmail.com/

As detailed in [2], the 'git maintenance run' subcommand will run certain
tasks based on config options or the --task= arguments. The --auto option
indicates to the task to only run based on some internal check that there
has been "enough" change in that domain to merit the work. In the case of
the 'gc' task, this also reduces the amount of work done. 

The new maintenance tasks in this series are:

 * 'loose-objects' : prune packed loose objects, then create a new pack from
   a batch of loose objects.
 * 'pack-files' : expire redundant packs from the multi-pack-index, then
   repack using the multi-pack-index's incremental repack strategy.
 * 'prefetch' : fetch from each remote, storing the refs in 'refs/prefetch/
   /'.

These tasks are all disabled by default, but can be enabled with config
options or run explicitly using "git maintenance run --task=". 

Since [2] replaced the 'git gc --auto' calls with 'git maintenance run
--auto' at the end of some Git commands, users could replace the 'gc' task
with these lighter-weight changes for foreground maintenance.

The 'git maintenance' builtin has a 'run' subcommand so it can be extended
later with subcommands that manage background maintenance, such as 'start'
or 'stop'. These are not the subject of this series, as it is important to
focus on the maintenance activities themselves. I have a WIP series for this
available at [3].

[3] https://github.com/gitgitgadget/git/pull/680

UPDATES since v3 of [1]
=======================

 * The biggest change here is the use of "test_subcommand", based on
   Jonathan Nieder's approach. This requires having the exact command-line
   figured out, which now requires spelling out all --no- [quiet%7Cprogress] 
   options. I also added a bunch of "2>/dev/null" checks because of the
   isatty(2) calls. Without that, the behavior will change depending on
   whether the test is run with -x/-v or without.
   
   
 * The 0x7FFF/0x7FFFFFFF constant problem is fixed with an EXPENSIVE test
   that verifies it.
   
   
 * The option parsing has changed to use a local struct and pass that struct
   to the helper methods. This is instead of having a global singleton.
   
   

Thanks, -Stolee

Here is the range-diff from the v3 of [1].

 1:  12fe73bb72 <  -:  ---------- maintenance: create basic maintenance runner
 2:  6e533e43d7 <  -:  ---------- maintenance: add --quiet option
 3:  c4674fc211 <  -:  ---------- maintenance: replace run_auto_gc()
 4:  b9332c1318 <  -:  ---------- maintenance: initialize task array
 5:  a4d9836bed <  -:  ---------- maintenance: add commit-graph task
 6:  dafb0d9bbc <  -:  ---------- maintenance: add --task option
 7:  1b00524da3 <  -:  ---------- maintenance: take a lock on the objects directory
 8:  0e94e04dcd =  1:  83401c5200 fetch: optionally allow disabling FETCH_HEAD update
 9:  9e38ade15c !  2:  85118ed5f1 maintenance: add prefetch task
    @@ Documentation/git-maintenance.txt: since it will not expire `.graph` files that

      ## builtin/gc.c ##
     @@
    - #include "blob.h"
      #include "tree.h"
      #include "promisor-remote.h"
    + #include "refs.h"
     +#include "remote.h"

      #define FAILED_RUN "failed to run %s"

    -@@ builtin/gc.c: static int maintenance_task_commit_graph(void)
    +@@ builtin/gc.c: static int maintenance_task_commit_graph(struct maintenance_opts *opts)
          return 1;
      }

    -+static int fetch_remote(const char *remote)
    ++static int fetch_remote(const char *remote, struct maintenance_opts *opts)
     +{
     +    struct child_process child = CHILD_PROCESS_INIT;
     +
    @@ builtin/gc.c: static int maintenance_task_commit_graph(void)
     +    strvec_pushl(&child.args, "fetch", remote, "--prune", "--no-tags",
     +             "--no-write-fetch-head", "--refmap=", NULL);
     +
    -+    strvec_pushf(&child.args, "+refs/heads/*:refs/prefetch/%s/*", remote);
    -+
    -+    if (opts.quiet)
    ++    if (opts->quiet)
     +        strvec_push(&child.args, "--quiet");
     +
    ++    strvec_pushf(&child.args, "+refs/heads/*:refs/prefetch/%s/*", remote);
    ++
     +    return !!run_command(&child);
     +}
     +
    @@ builtin/gc.c: static int maintenance_task_commit_graph(void)
     +    return 0;
     +}
     +
    -+static int maintenance_task_prefetch(void)
    ++static int maintenance_task_prefetch(struct maintenance_opts *opts)
     +{
     +    int result = 0;
     +    struct string_list_item *item;
    @@ builtin/gc.c: static int maintenance_task_commit_graph(void)
     +    for (item = remotes.items;
     +         item && item < remotes.items + remotes.nr;
     +         item++)
    -+        result |= fetch_remote(item->string);
    ++        result |= fetch_remote(item->string, opts);
     +
     +cleanup:
     +    string_list_clear(&remotes, 0);
     +    return result;
     +}
     +
    - static int maintenance_task_gc(void)
    + static int maintenance_task_gc(struct maintenance_opts *opts)
      {
          struct child_process child = CHILD_PROCESS_INIT;
     @@ builtin/gc.c: struct maintenance_task {
    @@ t/t7900-maintenance.sh: test_expect_success 'run --task duplicate' '
     +    git -C clone2 switch -c two &&
     +    test_commit -C clone1 one &&
     +    test_commit -C clone2 two &&
    -+    GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch &&
    -+    grep ",\"fetch\",\"remote1\"" run-prefetch.txt &&
    -+    grep ",\"fetch\",\"remote2\"" run-prefetch.txt &&
    ++    GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
    ++    fetchargs="--prune --no-tags --no-write-fetch-head --refmap= --quiet" &&
    ++    test_subcommand git fetch remote1 $fetchargs +refs/heads/\\*:refs/prefetch/remote1/\\* <run-prefetch.txt &&
    ++    test_subcommand git fetch remote2 $fetchargs +refs/heads/\\*:refs/prefetch/remote2/\\* <run-prefetch.txt &&
     +    test_path_is_missing .git/refs/remotes &&
     +    test_cmp clone1/.git/refs/heads/one .git/refs/prefetch/remote1/one &&
     +    test_cmp clone2/.git/refs/heads/two .git/refs/prefetch/remote2/two &&
10:  0128fdfd1a !  3:  621375a3c9 maintenance: add loose-objects task
    @@ Documentation/git-maintenance.txt: gc::
      --auto::

      ## builtin/gc.c ##
    -@@ builtin/gc.c: static int maintenance_task_gc(void)
    +@@ builtin/gc.c: static int maintenance_task_gc(struct maintenance_opts *opts)
          return run_command(&child);
      }

    -+static int prune_packed(void)
    ++static int prune_packed(struct maintenance_opts *opts)
     +{
     +    struct child_process child = CHILD_PROCESS_INIT;
     +
     +    child.git_cmd = 1;
     +    strvec_push(&child.args, "prune-packed");
     +
    -+    if (opts.quiet)
    ++    if (opts->quiet)
     +        strvec_push(&child.args, "--quiet");
     +
     +    return !!run_command(&child);
    @@ builtin/gc.c: static int maintenance_task_gc(void)
     +    return ++(d->count) > d->batch_size;
     +}
     +
    -+static int pack_loose(void)
    ++static int pack_loose(struct maintenance_opts *opts)
     +{
     +    struct repository *r = the_repository;
     +    int result = 0;
    @@ builtin/gc.c: static int maintenance_task_gc(void)
     +    pack_proc.git_cmd = 1;
     +
     +    strvec_push(&pack_proc.args, "pack-objects");
    -+    if (opts.quiet)
    ++    if (opts->quiet)
     +        strvec_push(&pack_proc.args, "--quiet");
     +    strvec_pushf(&pack_proc.args, "%s/pack/loose", r->objects->odb->path);
     +
    @@ builtin/gc.c: static int maintenance_task_gc(void)
     +    return result;
     +}
     +
    -+static int maintenance_task_loose_objects(void)
    ++static int maintenance_task_loose_objects(struct maintenance_opts *opts)
     +{
    -+    return prune_packed() || pack_loose();
    ++    return prune_packed(opts) || pack_loose(opts);
     +}
     +
    - typedef int maintenance_task_fn(void);
    + typedef int maintenance_task_fn(struct maintenance_opts *opts);

    - struct maintenance_task {
    + /*
     @@ builtin/gc.c: struct maintenance_task {

      enum maintenance_task_label {
17:  6ac3a58f2f !  4:  e787403ea7 maintenance: create auto condition for loose-objects
    @@ builtin/gc.c: static struct maintenance_task tasks[] = {
              maintenance_task_loose_objects,
     +        loose_object_auto_condition,
          },
    -     [TASK_INCREMENTAL_REPACK] = {
    -         "incremental-repack",
    +     [TASK_GC] = {
    +         "gc",

      ## t/t7900-maintenance.sh ##
     @@ t/t7900-maintenance.sh: test_expect_success 'loose-objects task' '
    @@ t/t7900-maintenance.sh: test_expect_success 'loose-objects task' '
     +    git repack -adk &&
     +    GIT_TRACE2_EVENT="$(pwd)/trace-lo1.txt" \
     +        git -c maintenance.loose-objects.auto=1 maintenance \
    -+        run --auto --task=loose-objects &&
    -+    ! grep "\"prune-packed\"" trace-lo1.txt &&
    ++        run --auto --task=loose-objects 2>/dev/null &&
    ++    test_subcommand ! git prune-packed --quiet <trace-lo1.txt &&
     +    for i in 1 2
     +    do
     +        printf data-A-$i | git hash-object -t blob --stdin -w &&
     +        GIT_TRACE2_EVENT="$(pwd)/trace-loA-$i" \
     +            git -c maintenance.loose-objects.auto=2 \
    -+            maintenance run --auto --task=loose-objects &&
    -+        ! grep "\"prune-packed\"" trace-loA-$i &&
    ++            maintenance run --auto --task=loose-objects 2>/dev/null &&
    ++        test_subcommand ! git prune-packed --quiet <trace-loA-$i &&
     +        printf data-B-$i | git hash-object -t blob --stdin -w &&
     +        GIT_TRACE2_EVENT="$(pwd)/trace-loB-$i" \
     +            git -c maintenance.loose-objects.auto=2 \
    -+            maintenance run --auto --task=loose-objects &&
    -+        grep "\"prune-packed\"" trace-loB-$i &&
    ++            maintenance run --auto --task=loose-objects 2>/dev/null &&
    ++        test_subcommand git prune-packed --quiet <trace-loB-$i &&
     +        GIT_TRACE2_EVENT="$(pwd)/trace-loC-$i" \
     +            git -c maintenance.loose-objects.auto=2 \
    -+            maintenance run --auto --task=loose-objects &&
    -+        grep "\"prune-packed\"" trace-loC-$i || return 1
    ++            maintenance run --auto --task=loose-objects 2>/dev/null &&
    ++        test_subcommand git prune-packed --quiet <trace-loC-$i || return 1
     +    done
     +'
     +
    - test_expect_success 'incremental-repack task' '
    -     packDir=.git/objects/pack &&
    -     for i in $(test_seq 1 5)
    + test_done
11:  c2baf6e119 =  5:  37e59b1a8d midx: enable core.multiPackIndex by default
19:  9b4cef7635 =  6:  aba087f663 midx: use start_delayed_progress()
12:  00f47c4848 !  7:  68727c555b maintenance: add incremental-repack task
    @@ Documentation/git-maintenance.txt: loose-objects::

      ## builtin/gc.c ##
     @@
    - #include "tree.h"
      #include "promisor-remote.h"
    + #include "refs.h"
      #include "remote.h"
     +#include "midx.h"

      #define FAILED_RUN "failed to run %s"

    -@@ builtin/gc.c: static int maintenance_task_loose_objects(void)
    -     return prune_packed() || pack_loose();
    +@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts *opts)
    +     return prune_packed(opts) || pack_loose(opts);
      }

    -+static int multi_pack_index_write(void)
    ++static int multi_pack_index_write(struct maintenance_opts *opts)
     +{
     +    struct child_process child = CHILD_PROCESS_INIT;
     +
     +    child.git_cmd = 1;
     +    strvec_pushl(&child.args, "multi-pack-index", "write", NULL);
     +
    -+    if (opts.quiet)
    ++    if (opts->quiet)
     +        strvec_push(&child.args, "--no-progress");
     +
     +    if (run_command(&child))
    @@ builtin/gc.c: static int maintenance_task_loose_objects(void)
     +    return 0;
     +}
     +
    -+static int rewrite_multi_pack_index(void)
    ++static int rewrite_multi_pack_index(struct maintenance_opts *opts)
     +{
     +    struct repository *r = the_repository;
     +    char *midx_name = get_midx_filename(r->objects->odb->path);
    @@ builtin/gc.c: static int maintenance_task_loose_objects(void)
     +    unlink(midx_name);
     +    free(midx_name);
     +
    -+    return multi_pack_index_write();
    ++    return multi_pack_index_write(opts);
     +}
     +
    -+static int multi_pack_index_verify(const char *message)
    ++static int multi_pack_index_verify(struct maintenance_opts *opts,
    ++                   const char *message)
     +{
     +    struct child_process child = CHILD_PROCESS_INIT;
     +
     +    child.git_cmd = 1;
     +    strvec_pushl(&child.args, "multi-pack-index", "verify", NULL);
     +
    -+    if (opts.quiet)
    ++    if (opts->quiet)
     +        strvec_push(&child.args, "--no-progress");
     +
     +    if (run_command(&child)) {
    @@ builtin/gc.c: static int maintenance_task_loose_objects(void)
     +    return 0;
     +}
     +
    -+static int multi_pack_index_expire(void)
    ++static int multi_pack_index_expire(struct maintenance_opts *opts)
     +{
     +    struct child_process child = CHILD_PROCESS_INIT;
     +
     +    child.git_cmd = 1;
     +    strvec_pushl(&child.args, "multi-pack-index", "expire", NULL);
     +
    -+    if (opts.quiet)
    ++    if (opts->quiet)
     +        strvec_push(&child.args, "--no-progress");
     +
     +    close_object_store(the_repository->objects);
    @@ builtin/gc.c: static int maintenance_task_loose_objects(void)
     +    return 0;
     +}
     +
    -+static int multi_pack_index_repack(void)
    ++static int multi_pack_index_repack(struct maintenance_opts *opts)
     +{
     +    struct child_process child = CHILD_PROCESS_INIT;
     +
     +    child.git_cmd = 1;
     +    strvec_pushl(&child.args, "multi-pack-index", "repack", NULL);
     +
    -+    if (opts.quiet)
    ++    if (opts->quiet)
     +        strvec_push(&child.args, "--no-progress");
     +
     +    strvec_push(&child.args, "--batch-size=0");
    @@ builtin/gc.c: static int maintenance_task_loose_objects(void)
     +    return 0;
     +}
     +
    -+static int maintenance_task_incremental_repack(void)
    ++static int maintenance_task_incremental_repack(struct maintenance_opts *opts)
     +{
     +    prepare_repo_settings(the_repository);
     +    if (!the_repository->settings.core_multi_pack_index) {
    @@ builtin/gc.c: static int maintenance_task_loose_objects(void)
     +        return 0;
     +    }
     +
    -+    if (multi_pack_index_write())
    ++    if (multi_pack_index_write(opts))
     +        return 1;
    -+    if (multi_pack_index_verify("after initial write"))
    -+        return rewrite_multi_pack_index();
    -+    if (multi_pack_index_expire())
    ++    if (multi_pack_index_verify(opts, "after initial write"))
    ++        return rewrite_multi_pack_index(opts);
    ++    if (multi_pack_index_expire(opts))
     +        return 1;
    -+    if (multi_pack_index_verify("after expire step"))
    -+        return !!rewrite_multi_pack_index();
    -+    if (multi_pack_index_repack())
    ++    if (multi_pack_index_verify(opts, "after expire step"))
    ++        return !!rewrite_multi_pack_index(opts);
    ++    if (multi_pack_index_repack(opts))
     +        return 1;
    -+    if (multi_pack_index_verify("after repack step"))
    -+        return !!rewrite_multi_pack_index();
    ++    if (multi_pack_index_verify(opts, "after repack step"))
    ++        return !!rewrite_multi_pack_index(opts);
     +    return 0;
     +}
     +
    - typedef int maintenance_task_fn(void);
    + typedef int maintenance_task_fn(struct maintenance_opts *opts);

    - struct maintenance_task {
    + /*
     @@ builtin/gc.c: struct maintenance_task {
      enum maintenance_task_label {
          TASK_PREFETCH,
    @@ builtin/gc.c: struct maintenance_task {
          TASK_COMMIT_GRAPH,

     @@ builtin/gc.c: static struct maintenance_task tasks[] = {
    -         "loose-objects",
              maintenance_task_loose_objects,
    +         loose_object_auto_condition,
          },
     +    [TASK_INCREMENTAL_REPACK] = {
     +        "incremental-repack",
    @@ t/t7900-maintenance.sh: test_description='git maintenance builtin'

      test_expect_success 'help text' '
          test_expect_code 129 git maintenance -h 2>err &&
    -@@ t/t7900-maintenance.sh: test_expect_success 'loose-objects task' '
    -     test_cmp packs-between packs-after
    +@@ t/t7900-maintenance.sh: test_expect_success 'maintenance.loose-objects.auto' '
    +     done
      '

     +test_expect_success 'incremental-repack task' '
13:  ef2a231956 !  8:  c3487fb8e3 maintenance: auto-size incremental-repack batch
    @@ Commit message
         truly want to optimize for space and performance (and are willing to pay
         the upfront cost of a full repack) can use the 'gc' task to do so.

    +    Create a test for this two gigabyte limit by creating an EXPENSIVE test
    +    that generates two pack-files of roughly 2.5 gigabytes in size, then
    +    performs an incremental repack. Check that the --batch-size argument in
    +    the subcommand uses the hard-coded maximum.
    +
    +    Helped-by: Chris Torek <chris.torek@gmail.com>
         Reported-by: Son Luong Ngoc <sluongng@gmail.com>
         Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

      ## builtin/gc.c ##
    -@@ builtin/gc.c: static int multi_pack_index_expire(void)
    +@@ builtin/gc.c: static int multi_pack_index_expire(struct maintenance_opts *opts)
          return 0;
      }

    -+#define TWO_GIGABYTES (0x7FFF)
    ++#define TWO_GIGABYTES (INT32_MAX)
     +
     +static off_t get_auto_pack_size(void)
     +{
    @@ builtin/gc.c: static int multi_pack_index_expire(void)
     +    return result_size;
     +}
     +
    - static int multi_pack_index_repack(void)
    + static int multi_pack_index_repack(struct maintenance_opts *opts)
      {
          struct child_process child = CHILD_PROCESS_INIT;
    -@@ builtin/gc.c: static int multi_pack_index_repack(void)
    -     if (opts.quiet)
    +@@ builtin/gc.c: static int multi_pack_index_repack(struct maintenance_opts *opts)
    +     if (opts->quiet)
              strvec_push(&child.args, "--no-progress");

     -    strvec_push(&child.args, "--batch-size=0");
    @@ t/t7900-maintenance.sh: test_expect_success 'incremental-repack task' '
          ls .git/objects/pack/*.pack >packs-after &&
     -    test_line_count = 1 packs-after
     +    test_line_count = 2 packs-after
    ++'
    ++
    ++test_expect_success EXPENSIVE 'incremental-repack 2g limit' '
    ++    for i in $(test_seq 1 5)
    ++    do
    ++        test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
    ++        return 1
    ++    done &&
    ++    git add big &&
    ++    git commit -m "Add big file (1)" &&
    ++
    ++    # ensure any possible loose objects are in a pack-file
    ++    git maintenance run --task=loose-objects &&
    ++
    ++    rm big &&
    ++    for i in $(test_seq 6 10)
    ++    do
    ++        test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
    ++        return 1
    ++    done &&
    ++    git add big &&
    ++    git commit -m "Add big file (2)" &&
    ++
    ++    # ensure any possible loose objects are in a pack-file
    ++    git maintenance run --task=loose-objects &&
    ++
    ++    # Now run the incremental-repack task and check the batch-size
    ++    GIT_TRACE2_EVENT="$(pwd)/run-2g.txt" git maintenance run \
    ++        --task=incremental-repack 2>/dev/null &&
    ++    test_subcommand git multi-pack-index repack \
    ++         --no-progress --batch-size=2147483647 <run-2g.txt
      '

      test_done
14:  99840c4b8f <  -:  ---------- maintenance: create maintenance.<task>.enabled config
15:  a087c63572 <  -:  ---------- maintenance: use pointers to check --auto
16:  ef3a854508 <  -:  ---------- maintenance: add auto condition for commit-graph task
18:  801b262d1c !  9:  407c123c51 maintenance: add incremental-repack auto condition
    @@ Documentation/config/maintenance.txt: maintenance.loose-objects.auto::

      ## builtin/gc.c ##
     @@
    + #include "refs.h"
      #include "remote.h"
      #include "midx.h"
    - #include "refs.h"
     +#include "object-store.h"

      #define FAILED_RUN "failed to run %s"

    -@@ builtin/gc.c: static int maintenance_task_loose_objects(void)
    -     return prune_packed() || pack_loose();
    +@@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_opts *opts)
    +     return prune_packed(opts) || pack_loose(opts);
      }

     +static int incremental_repack_auto_condition(void)
    @@ builtin/gc.c: static int maintenance_task_loose_objects(void)
     +    return count >= incremental_repack_auto_limit;
     +}
     +
    - static int multi_pack_index_write(void)
    + static int multi_pack_index_write(struct maintenance_opts *opts)
      {
          struct child_process child = CHILD_PROCESS_INIT;
     @@ builtin/gc.c: static struct maintenance_task tasks[] = {
    @@ builtin/gc.c: static struct maintenance_task tasks[] = {

      ## t/t7900-maintenance.sh ##
     @@ t/t7900-maintenance.sh: test_expect_success 'incremental-repack task' '
    -     test_line_count = 2 packs-after
    + '
    + 
    + test_expect_success EXPENSIVE 'incremental-repack 2g limit' '
    ++
    +     for i in $(test_seq 1 5)
    +     do
    +         test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
    +@@ t/t7900-maintenance.sh: test_expect_success EXPENSIVE 'incremental-repack 2g limit' '
    +          --no-progress --batch-size=2147483647 <run-2g.txt
      '

     +test_expect_success 'maintenance.incremental-repack.auto' '
     +    git repack -adk &&
     +    git config core.multiPackIndex true &&
     +    git multi-pack-index write &&
    -+    GIT_TRACE2_EVENT=1 git -c maintenance.incremental-repack.auto=1 \
    -+        maintenance run --auto --task=incremental-repack >out &&
    -+    ! grep "\"multi-pack-index\"" out &&
    ++    GIT_TRACE2_EVENT="$(pwd)/midx-init.txt" git \
    ++        -c maintenance.incremental-repack.auto=1 \
    ++        maintenance run --auto --task=incremental-repack 2>/dev/null &&
    ++    test_subcommand ! git multi-pack-index write --no-progress <midx-init.txt &&
     +    for i in 1 2
     +    do
     +        test_commit A-$i &&
    @@ t/t7900-maintenance.sh: test_expect_success 'incremental-repack task' '
     +        EOF
     +        GIT_TRACE2_EVENT=$(pwd)/trace-A-$i git \
     +            -c maintenance.incremental-repack.auto=2 \
    -+            maintenance run --auto --task=incremental-repack &&
    -+        ! grep "\"multi-pack-index\"" trace-A-$i &&
    ++            maintenance run --auto --task=incremental-repack 2>/dev/null &&
    ++        test_subcommand ! git multi-pack-index write --no-progress <trace-A-$i &&
     +        test_commit B-$i &&
     +        git pack-objects --revs .git/objects/pack/pack <<-\EOF &&
     +        HEAD
    @@ t/t7900-maintenance.sh: test_expect_success 'incremental-repack task' '
     +        EOF
     +        GIT_TRACE2_EVENT=$(pwd)/trace-B-$i git \
     +            -c maintenance.incremental-repack.auto=2 \
    -+            maintenance run --auto --task=incremental-repack >out &&
    -+        grep "\"multi-pack-index\"" trace-B-$i >/dev/null || return 1
    ++            maintenance run --auto --task=incremental-repack 2>/dev/null &&
    ++        test_subcommand git multi-pack-index write --no-progress <trace-B-$i || return 1
     +    done
     +'
     +
20:  39eb83ad1e <  -:  ---------- maintenance: add trace2 regions for task execution

Derrick Stolee (8):
  maintenance: add prefetch task
  maintenance: add loose-objects task
  maintenance: create auto condition for loose-objects
  midx: enable core.multiPackIndex by default
  midx: use start_delayed_progress()
  maintenance: add incremental-repack task
  maintenance: auto-size incremental-repack batch
  maintenance: add incremental-repack auto condition

Junio C Hamano (1):
  fetch: optionally allow disabling FETCH_HEAD update

 Documentation/config/core.txt        |   4 +-
 Documentation/config/fetch.txt       |   7 +
 Documentation/config/maintenance.txt |  18 ++
 Documentation/fetch-options.txt      |  10 +
 Documentation/git-maintenance.txt    |  41 +++
 builtin/fetch.c                      |  19 +-
 builtin/gc.c                         | 364 +++++++++++++++++++++++++++
 builtin/pull.c                       |   3 +-
 midx.c                               |  23 +-
 midx.h                               |   1 +
 repo-settings.c                      |   6 +
 repository.h                         |   2 +
 t/t5319-multi-pack-index.sh          |  15 +-
 t/t5510-fetch.sh                     |  39 ++-
 t/t5521-pull-options.sh              |  16 ++
 t/t7900-maintenance.sh               | 191 ++++++++++++++
 16 files changed, 730 insertions(+), 29 deletions(-)


base-commit: a5d19148460decaf08e0e6293e996d42ff3f2d32
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-696%2Fderrickstolee%2Fmaintenance%2Fgc-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-696/derrickstolee/maintenance/gc-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/696
-- 
gitgitgadget

^ permalink raw reply	[relevance 1%]

* [PATCH 8/9] maintenance: auto-size incremental-repack batch
  2020-08-06 16:30  1% [PATCH 0/9] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
@ 2020-08-06 16:30  3% ` Derrick Stolee via GitGitGadget
  2020-08-06 17:02 11%   ` Son Luong Ngoc
    1 sibling, 1 reply; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-08-06 16:30 UTC (permalink / raw)
  To: git
  Cc: sandals, steadmon, jrnieder, peff, congdanhqx, phillip.wood123,
	emilyshaffer, sluongng, jonathantanmy, Derrick Stolee,
	Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.

Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.

The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.

Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.

Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.

We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.

Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.

Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c           | 43 +++++++++++++++++++++++++++++++++++++++++-
 t/t7900-maintenance.sh | 36 +++++++++++++++++++++++++++++++++--
 2 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 35c6d7ce82..c09bc1381c 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1094,6 +1094,46 @@ static int multi_pack_index_expire(struct maintenance_opts *opts)
 	return 0;
 }
 
+#define TWO_GIGABYTES (INT32_MAX)
+
+static off_t get_auto_pack_size(void)
+{
+	/*
+	 * The "auto" value is special: we optimize for
+	 * one large pack-file (i.e. from a clone) and
+	 * expect the rest to be small and they can be
+	 * repacked quickly.
+	 *
+	 * The strategy we select here is to select a
+	 * size that is one more than the second largest
+	 * pack-file. This ensures that we will repack
+	 * at least two packs if there are three or more
+	 * packs.
+	 */
+	off_t max_size = 0;
+	off_t second_largest_size = 0;
+	off_t result_size;
+	struct packed_git *p;
+	struct repository *r = the_repository;
+
+	reprepare_packed_git(r);
+	for (p = get_all_packs(r); p; p = p->next) {
+		if (p->pack_size > max_size) {
+			second_largest_size = max_size;
+			max_size = p->pack_size;
+		} else if (p->pack_size > second_largest_size)
+			second_largest_size = p->pack_size;
+	}
+
+	result_size = second_largest_size + 1;
+
+	/* But limit ourselves to a batch size of 2g */
+	if (result_size > TWO_GIGABYTES)
+		result_size = TWO_GIGABYTES;
+
+	return result_size;
+}
+
 static int multi_pack_index_repack(struct maintenance_opts *opts)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -1104,7 +1144,8 @@ static int multi_pack_index_repack(struct maintenance_opts *opts)
 	if (opts->quiet)
 		strvec_push(&child.args, "--no-progress");
 
-	strvec_push(&child.args, "--batch-size=0");
+	strvec_pushf(&child.args, "--batch-size=%"PRIuMAX,
+				  (uintmax_t)get_auto_pack_size());
 
 	close_object_store(the_repository->objects);
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index be19ac7623..1c5f44f2b3 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -179,10 +179,42 @@ test_expect_success 'incremental-repack task' '
 	test_line_count = 4 packs-between &&
 
 	# the job deletes the two old packs, and does not write
-	# a new one because only one pack remains.
+	# a new one because the batch size is not high enough to
+	# pack the largest pack-file.
 	git maintenance run --task=incremental-repack &&
 	ls .git/objects/pack/*.pack >packs-after &&
-	test_line_count = 1 packs-after
+	test_line_count = 2 packs-after
+'
+
+test_expect_success EXPENSIVE 'incremental-repack 2g limit' '
+	for i in $(test_seq 1 5)
+	do
+		test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
+		return 1
+	done &&
+	git add big &&
+	git commit -m "Add big file (1)" &&
+
+	# ensure any possible loose objects are in a pack-file
+	git maintenance run --task=loose-objects &&
+
+	rm big &&
+	for i in $(test_seq 6 10)
+	do
+		test-tool genrandom foo$i $((512 * 1024 * 1024 + 1)) >>big ||
+		return 1
+	done &&
+	git add big &&
+	git commit -m "Add big file (2)" &&
+
+	# ensure any possible loose objects are in a pack-file
+	git maintenance run --task=loose-objects &&
+
+	# Now run the incremental-repack task and check the batch-size
+	GIT_TRACE2_EVENT="$(pwd)/run-2g.txt" git maintenance run \
+		--task=incremental-repack 2>/dev/null &&
+	test_subcommand git multi-pack-index repack \
+		 --no-progress --batch-size=2147483647 <run-2g.txt
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[relevance 3%]

* Re: [PATCH 8/9] maintenance: auto-size incremental-repack batch
  2020-08-06 16:30  3% ` [PATCH 8/9] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
@ 2020-08-06 17:02 11%   ` Son Luong Ngoc
  2020-08-06 18:13  0%     ` Derrick Stolee
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-08-06 17:02 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, sandals, steadmon, jrnieder, Jeff King, congdanhqx,
	phillip.wood123, Emily Shaffer, Jonathan Tan, Derrick Stolee,
	Derrick Stolee

Hi Derrick,

> On Aug 6, 2020, at 18:30, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> 
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> When repacking during the 'incremental-repack' task, we use the
> --batch-size option in 'git multi-pack-index repack'. The initial setting
> used --batch-size=0 to repack everything into a single pack-file. This is
> not sustainable for a large repository. The amount of work required is
> also likely to use too many system resources for a background job.
> 
> Update the 'incremental-repack' task by dynamically computing a
> --batch-size option based on the current pack-file structure.
> 
> The dynamic default size is computed with this idea in mind for a client
> repository that was cloned from a very large remote: there is likely one
> "big" pack-file that was created at clone time. Thus, do not try
> repacking it as it is likely packed efficiently by the server.
> 
> Instead, we select the second-largest pack-file, and create a batch size
> that is one larger than that pack-file. If there are three or more
> pack-files, then this guarantees that at least two will be combined into
> a new pack-file.

I have been using this strategy with git-care.sh [1] with large success.
However it worth to note that there are still edge case where I observed that
pack count keep increasing because using '--batch-size=<second-biggest-pack>+1'
did not resulted in any repacking.
In one case, I have observed a local copy went up to 160+ packs without being able
to repack.

I have been considering whether a strategy such as falling back to the '(3rd biggest
pack size) + 1' and 4th and 5th and so on... when midx repack call resulted in no-op,
as that was how I fixed my repo when the edge case happen.

Such strategy would require a way to detect midx repack to signal when no-op happen,
so something like 'git multi-pack-index repack --batch-size=123456 --exit-code' would
be much desirable.

> 
> Of course, this means that the second-largest pack-file size is likely
> to grow over time and may eventually surpass the initially-cloned
> pack-file. Recall that the pack-file batch is selected in a greedy
> manner: the packs are considered from oldest to newest and are selected
> if they have size smaller than the batch size until the total selected
> size is larger than the batch size. Thus, that oldest "clone" pack will
> be first to repack after the new data creates a pack larger than that.
> 
> We also want to place some limits on how large these pack-files become,
> in order to bound the amount of time spent repacking. A maximum
> batch-size of two gigabytes means that large repositories will never be
> packed into a single pack-file using this job, but also that repack is
> rather expensive. This is a trade-off that is valuable to have if the
> maintenance is being run automatically or in the background. Users who
> truly want to optimize for space and performance (and are willing to pay
> the upfront cost of a full repack) can use the 'gc' task to do so.
> 
> Create a test for this two gigabyte limit by creating an EXPENSIVE test
> that generates two pack-files of roughly 2.5 gigabytes in size, then
> performs an incremental repack. Check that the --batch-size argument in
> the subcommand uses the hard-coded maximum.
> 
> Helped-by: Chris Torek <chris.torek@gmail.com>
> Reported-by: Son Luong Ngoc <sluongng@gmail.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>

Generally, I have found working with '--batch-size' to be a bit unpredictable.
I wonder if we could tweak the behavior somewhat so that its more consistent
to use and test?

Thanks a lot for making this happen.
Hope this patch would make it in stable soon

Cheers,
Son Luong.

[1]: https://github.com/sluongng/git-care

^ permalink raw reply	[relevance 11%]

* Re: [PATCH] commit-graph: add verify changed paths option
  @ 2020-07-31 19:31  6%       ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-07-31 19:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Son Luong Ngoc via GitGitGadget, git

Note: re-send  to mailing list due to me forgot to turn on Plain Text format.
(sorry for the noise)

Hi Peff, Taylor, Junio and Christian,

Thanks a lot for the valuable feedbacks.
This is exactly what I was hoping for by sending out the patch early!

> On Jul 31, 2020, at 21:14, Jeff King <peff@peff.net> wrote:
> 
> On Fri, Jul 31, 2020 at 02:09:56PM -0400, Taylor Blau wrote:
> 
>>> Is a single boolean flag sufficient? If you have incrementals, you might
>>> have some slices with this chunk and some without. What should the
>>> boolean be in that case?
>> 
>> I think you'd really want to know which layers do and don't have
>> filters. It might be even more interesting to have a tool like what 'git
>> show-index' is to '*.idx' files, maybe something like 'git show-graph'
>> or 'git show-commit-graph'. Its output would be one line per commit that
>> shows:
>> 
>>  - what layer in the chain it's located at
>>  - its graph_pos
>>  - its generation number
>>  - whether or not it has a Bloom filter
>>  - ???
>> 
>> That would be a useful tool for debugging anyway, even outside of the
>> test suite. It would be even better if we could replace the test-tool
>> with it.
> 
> Yeah, that was exactly what I had in mind, except that I'd make it a
> sub-command of "git commit-graph" ("show" or perhaps "dump").

I loved Junio's initial suggestion and the follow up here.
I was thinking of something like 'git commit-graph verify --verbose' but 
now I agree that a distinct command such as 'show' might be more 
distinct and better communicate the purpose.

I will stick with my poor-man bash/golang script for now to invalidate
the commit-graph (chain or no-chain) as it does the job just fine.

Let me see if I have the capacity to implement 'show' sub-command
after. ^_^!

> 
> -Peff

Cheers,
Son Luong.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH] commit-graph: add verify changed paths option
  2020-07-31 18:02  0% ` Jeff King
@ 2020-07-31 18:09  0%   ` Taylor Blau
    0 siblings, 1 reply; 122+ results
From: Taylor Blau @ 2020-07-31 18:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Son Luong Ngoc via GitGitGadget, git, Son Luong Ngoc

On Fri, Jul 31, 2020 at 02:02:35PM -0400, Jeff King wrote:
> On Fri, Jul 31, 2020 at 07:49:25AM +0000, Son Luong Ngoc via GitGitGadget wrote:
>
> > From: Son Luong Ngoc <sluongng@gmail.com>
> >
> > Add '--has-changed-paths' option to 'git commit-graph verify' subcommand
> > to validate whether the commit-graph was written with '--changed-paths'
> > option.
>
> Is a single boolean flag sufficient? If you have incrementals, you might
> have some slices with this chunk and some without. What should the
> boolean be in that case?

I think you'd really want to know which layers do and don't have
filters. It might be even more interesting to have a tool like what 'git
show-index' is to '*.idx' files, maybe something like 'git show-graph'
or 'git show-commit-graph'. Its output would be one line per commit that
shows:

  - what layer in the chain it's located at
  - its graph_pos
  - its generation number
  - whether or not it has a Bloom filter
  - ???

That would be a useful tool for debugging anyway, even outside of the
test suite. It would be even better if we could replace the test-tool
with it.

On an unrelated note; this patch is broken as-is, since it will only
report that Bloom filters exist if the top-most graph has them. I have a
patch to fix this that I have been meaning to send out for most of this
week. I'll try to get to it shortly.

> I thought we had some way of reporting the number of commits covered by
> filters, but I can't seem to find it.

I don't recall having anything like that.

> Our "test-tool read-graph" can report on whether there's a bloom filter
> chunk, but I think it also doesn't distinguish between different slices
> (and anyway, it wouldn't be suitable for tools that don't rely on an
> actual built git.git directory).
>
> -Peff
Thanks,
Taylor

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] commit-graph: add verify changed paths option
  2020-07-31 17:14  0% ` Junio C Hamano
@ 2020-07-31 18:06  0%   ` Taylor Blau
  0 siblings, 0 replies; 122+ results
From: Taylor Blau @ 2020-07-31 18:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Son Luong Ngoc via GitGitGadget, git, Son Luong Ngoc

On Fri, Jul 31, 2020 at 10:14:39AM -0700, Junio C Hamano wrote:
> "Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Son Luong Ngoc <sluongng@gmail.com>
> >
> > Add '--has-changed-paths' option to 'git commit-graph verify' subcommand
> > to validate whether the commit-graph was written with '--changed-paths'
> > option.
>
> The implementation seems to be only about "does this section exist?"
> and not "does this section have healthy/uncorrupted data?", which
> feels a bit strange for "verify".  Instead of setting ourselves up
> to having to add "--has-this-section" and "--has-that-section" every
> time a new kind of data is added to the system, how about giving the
> verify command an option to list all the sections found in the file,
> or a separate "git commit-graph list-sections" subcommand?

Completely agreed. When I suggested that Son work on this, I more had in
mind something like 'git commit-graph verify --changed-paths' to mean
"verify the integrity of the commit-graph(s), including regenerating
changed-path Bloom filters and making sure they match".

If you are just curious whether or not the section exists, I'd rather
write a script to look for the 'BIDX' or 'BDAT' chunk IDs. That said, if
they're spread across incremental, maybe it makes more sense to extend
the commit-graph test tool.

I dunno.

Thanks,
Taylor

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] commit-graph: add verify changed paths option
  2020-07-31  7:49  8% [PATCH] commit-graph: add verify changed paths option Son Luong Ngoc via GitGitGadget
  2020-07-31 16:21  0% ` Christian Couder
  2020-07-31 17:14  0% ` Junio C Hamano
@ 2020-07-31 18:02  0% ` Jeff King
  2020-07-31 18:09  0%   ` Taylor Blau
  2 siblings, 1 reply; 122+ results
From: Jeff King @ 2020-07-31 18:02 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget; +Cc: git, Son Luong Ngoc

On Fri, Jul 31, 2020 at 07:49:25AM +0000, Son Luong Ngoc via GitGitGadget wrote:

> From: Son Luong Ngoc <sluongng@gmail.com>
> 
> Add '--has-changed-paths' option to 'git commit-graph verify' subcommand
> to validate whether the commit-graph was written with '--changed-paths'
> option.

Is a single boolean flag sufficient? If you have incrementals, you might
have some slices with this chunk and some without. What should the
boolean be in that case?

I thought we had some way of reporting the number of commits covered by
filters, but I can't seem to find it.

Our "test-tool read-graph" can report on whether there's a bloom filter
chunk, but I think it also doesn't distinguish between different slices
(and anyway, it wouldn't be suitable for tools that don't rely on an
actual built git.git directory).

-Peff

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] commit-graph: add verify changed paths option
  2020-07-31  7:49  8% [PATCH] commit-graph: add verify changed paths option Son Luong Ngoc via GitGitGadget
  2020-07-31 16:21  0% ` Christian Couder
@ 2020-07-31 17:14  0% ` Junio C Hamano
  2020-07-31 18:06  0%   ` Taylor Blau
  2020-07-31 18:02  0% ` Jeff King
  2 siblings, 1 reply; 122+ results
From: Junio C Hamano @ 2020-07-31 17:14 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget; +Cc: git, Son Luong Ngoc

"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Son Luong Ngoc <sluongng@gmail.com>
>
> Add '--has-changed-paths' option to 'git commit-graph verify' subcommand
> to validate whether the commit-graph was written with '--changed-paths'
> option.

The implementation seems to be only about "does this section exist?"
and not "does this section have healthy/uncorrupted data?", which
feels a bit strange for "verify".  Instead of setting ourselves up
to having to add "--has-this-section" and "--has-that-section" every
time a new kind of data is added to the system, how about giving the
verify command an option to list all the sections found in the file,
or a separate "git commit-graph list-sections" subcommand?


^ permalink raw reply	[relevance 0%]

* Re: [PATCH] commit-graph: add verify changed paths option
  2020-07-31  7:49  8% [PATCH] commit-graph: add verify changed paths option Son Luong Ngoc via GitGitGadget
@ 2020-07-31 16:21  0% ` Christian Couder
  2020-07-31 17:14  0% ` Junio C Hamano
  2020-07-31 18:02  0% ` Jeff King
  2 siblings, 0 replies; 122+ results
From: Christian Couder @ 2020-07-31 16:21 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget; +Cc: git, Son Luong Ngoc

On Fri, Jul 31, 2020 at 9:52 AM Son Luong Ngoc via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Son Luong Ngoc <sluongng@gmail.com>
>
> Add '--has-changed-paths' option to 'git commit-graph verify' subcommand
> to validate whether the commit-graph was written with '--changed-paths'
> option.
>
> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>

[...]

>     It's probably going to take me a bit more time to write up some tests
>     for this,

It would need some documentation too.

> so I want to send it out first for comments.

[...]

> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 16c9f6101a..ce8a7cbe90 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -18,7 +18,8 @@ static char const * const builtin_commit_graph_usage[] = {
>  };
>
>  static const char * const builtin_commit_graph_verify_usage[] = {
> -       N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
> +       N_("git commit-graph verify [--object-dir <objdir>] [--shallow] "
> +           "[--has-changed-paths] [--[no-]progress]"),
>         NULL
>  };
>
> @@ -37,6 +38,7 @@ static struct opts_commit_graph {
>         int append;
>         int split;
>         int shallow;
> +       int has_changed_paths;
>         int progress;
>         int enable_changed_paths;
>  } opts;
> @@ -71,12 +73,14 @@ static int graph_verify(int argc, const char **argv)
>         int open_ok;
>         int fd;
>         struct stat st;
> -       int flags = 0;
> +       enum commit_graph_verify_flags flags = 0;
>
>         static struct option builtin_commit_graph_verify_options[] = {
>                 OPT_STRING(0, "object-dir", &opts.obj_dir,
>                            N_("dir"),
>                            N_("The object directory to store the graph")),
> +               OPT_BOOL(0, "has-changed-paths", &opts.has_changed_paths,
> +                        N_("verify that the commit-graph includes changed paths")),
>                 OPT_BOOL(0, "shallow", &opts.shallow,
>                          N_("if the commit-graph is split, only verify the tip file")),
>                 OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
> @@ -94,8 +98,10 @@ static int graph_verify(int argc, const char **argv)
>                 opts.obj_dir = get_object_directory();
>         if (opts.shallow)
>                 flags |= COMMIT_GRAPH_VERIFY_SHALLOW;
> +       if (opts.has_changed_paths)
> +               flags |= COMMIT_GRAPH_VERIFY_CHANGED_PATHS;

I wonder if OPT_BIT() could be used instead of OPT_BOOL() above to
directly set the above flag, as the 'has_changed_paths' field in
'struct opts_commit_graph' seems to be used only for the purpose of
setting this flag.

>         if (opts.progress)
> -               flags |= COMMIT_GRAPH_WRITE_PROGRESS;
> +               flags |= COMMIT_GRAPH_VERIFY_PROGRESS;

Does this change belong to this patch? I think it would deserve an
explanation in the commit message if that's the case.

>         odb = find_odb(the_repository, opts.obj_dir);
>         graph_name = get_commit_graph_filename(odb);
> diff --git a/commit-graph.c b/commit-graph.c
> index 1af68c297d..d83f5a2325 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -250,7 +250,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st,
>         return ret;
>  }
>
> -static int verify_commit_graph_lite(struct commit_graph *g)
> +static int verify_commit_graph_lite(struct commit_graph *g, int verify_changed_path)

[...]

> -int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
> +int verify_commit_graph(struct repository *r,
> +                       struct commit_graph *g,
> +                       enum commit_graph_verify_flags flags)

It seems to me that it would be more coherent to have both
verify_commit_graph() and verify_commit_graph_lite() accept an 'enum
commit_graph_verify_flags flags' argument.

Right now the "has_changed_paths" option is first an int, then it's
converted to a flag and then to an int again before being passed to
verify_commit_graph_lite(). It would be simpler if it could be a flag
all along.

Thanks,
Christian.

^ permalink raw reply	[relevance 0%]

* [PATCH] commit-graph: add verify changed paths option
@ 2020-07-31  7:49  8% Son Luong Ngoc via GitGitGadget
  2020-07-31 16:21  0% ` Christian Couder
                   ` (2 more replies)
  0 siblings, 3 replies; 122+ results
From: Son Luong Ngoc via GitGitGadget @ 2020-07-31  7:49 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Son Luong Ngoc

From: Son Luong Ngoc <sluongng@gmail.com>

Add '--has-changed-paths' option to 'git commit-graph verify' subcommand
to validate whether the commit-graph was written with '--changed-paths'
option.

Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
    Commit-Graph: Verify bloom filter
    
    When I was working on git-care(1) and Gitaly(2), the need to check
    whether a commit-graph (split or non-split) were built with Bloom
    filter. This is needed especially when a repository primary commit-graph
    write strategy is '--split' and the bottom chains might rarely be
    re-written (or never) thus Bloom filter is never applied to the graph.
    
    Provides users with a straight forward way to validate the existence of
    Bloom filter chunks to save user having to read the commit-graph
    manually as show in (1) and (2).
    
    References:
    
     1. https://github.com/sluongng/git-care/commit/d0feaa381ea3ec7b0e617c6596ad6e3cf16b884a
     2. https://gitlab.com/sluongng/gitaly/-/commit/78dba8b73e720b11500482b19b755346ec853025
    
    
    ------------------------------------------------------------------------
    
    It's probably going to take me a bit more time to write up some tests
    for this, so I want to send it out first for comments.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-687%2Fsluongng%2Fsluongngoc%2Fverify-bloom-filter-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-687/sluongng/sluongngoc/verify-bloom-filter-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/687

 builtin/commit-graph.c | 12 +++++++++---
 commit-graph.c         | 22 +++++++++++++++++-----
 commit-graph.h         | 12 +++++++++---
 3 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 16c9f6101a..ce8a7cbe90 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -18,7 +18,8 @@ static char const * const builtin_commit_graph_usage[] = {
 };
 
 static const char * const builtin_commit_graph_verify_usage[] = {
-	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] [--[no-]progress]"),
+	N_("git commit-graph verify [--object-dir <objdir>] [--shallow] "
+	    "[--has-changed-paths] [--[no-]progress]"),
 	NULL
 };
 
@@ -37,6 +38,7 @@ static struct opts_commit_graph {
 	int append;
 	int split;
 	int shallow;
+	int has_changed_paths;
 	int progress;
 	int enable_changed_paths;
 } opts;
@@ -71,12 +73,14 @@ static int graph_verify(int argc, const char **argv)
 	int open_ok;
 	int fd;
 	struct stat st;
-	int flags = 0;
+	enum commit_graph_verify_flags flags = 0;
 
 	static struct option builtin_commit_graph_verify_options[] = {
 		OPT_STRING(0, "object-dir", &opts.obj_dir,
 			   N_("dir"),
 			   N_("The object directory to store the graph")),
+		OPT_BOOL(0, "has-changed-paths", &opts.has_changed_paths,
+			 N_("verify that the commit-graph includes changed paths")),
 		OPT_BOOL(0, "shallow", &opts.shallow,
 			 N_("if the commit-graph is split, only verify the tip file")),
 		OPT_BOOL(0, "progress", &opts.progress, N_("force progress reporting")),
@@ -94,8 +98,10 @@ static int graph_verify(int argc, const char **argv)
 		opts.obj_dir = get_object_directory();
 	if (opts.shallow)
 		flags |= COMMIT_GRAPH_VERIFY_SHALLOW;
+	if (opts.has_changed_paths)
+		flags |= COMMIT_GRAPH_VERIFY_CHANGED_PATHS;
 	if (opts.progress)
-		flags |= COMMIT_GRAPH_WRITE_PROGRESS;
+		flags |= COMMIT_GRAPH_VERIFY_PROGRESS;
 
 	odb = find_odb(the_repository, opts.obj_dir);
 	graph_name = get_commit_graph_filename(odb);
diff --git a/commit-graph.c b/commit-graph.c
index 1af68c297d..d83f5a2325 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -250,7 +250,7 @@ struct commit_graph *load_commit_graph_one_fd_st(int fd, struct stat *st,
 	return ret;
 }
 
-static int verify_commit_graph_lite(struct commit_graph *g)
+static int verify_commit_graph_lite(struct commit_graph *g, int verify_changed_path)
 {
 	/*
 	 * Basic validation shared between parse_commit_graph()
@@ -276,6 +276,16 @@ static int verify_commit_graph_lite(struct commit_graph *g)
 		error("commit-graph is missing the Commit Data chunk");
 		return 1;
 	}
+	if (verify_changed_path) {
+		if (!g->chunk_bloom_indexes) {
+			error("commit-graph is missing Bloom Index chunk");
+			return 1;
+		}
+		if (!g->chunk_bloom_data) {
+			error("commit-graph is missing Bloom Data chunk");
+			return 1;
+		}
+	}
 
 	return 0;
 }
@@ -439,7 +449,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, size_t graph_size)
 
 	hashcpy(graph->oid.hash, graph->data + graph->data_len - graph->hash_len);
 
-	if (verify_commit_graph_lite(graph))
+	if (verify_commit_graph_lite(graph, 0))
 		goto free_and_return;
 
 	return graph;
@@ -2216,7 +2226,9 @@ static void graph_report(const char *fmt, ...)
 #define GENERATION_ZERO_EXISTS 1
 #define GENERATION_NUMBER_EXISTS 2
 
-int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
+int verify_commit_graph(struct repository *r,
+			struct commit_graph *g,
+			enum commit_graph_verify_flags flags)
 {
 	uint32_t i, cur_fanout_pos = 0;
 	struct object_id prev_oid, cur_oid, checksum;
@@ -2231,7 +2243,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 		return 1;
 	}
 
-	verify_commit_graph_error = verify_commit_graph_lite(g);
+	verify_commit_graph_error = verify_commit_graph_lite(g, flags & COMMIT_GRAPH_VERIFY_CHANGED_PATHS);
 	if (verify_commit_graph_error)
 		return verify_commit_graph_error;
 
@@ -2284,7 +2296,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags)
 	if (verify_commit_graph_error & ~VERIFY_COMMIT_GRAPH_ERROR_HASH)
 		return verify_commit_graph_error;
 
-	if (flags & COMMIT_GRAPH_WRITE_PROGRESS)
+	if (flags & COMMIT_GRAPH_VERIFY_PROGRESS)
 		progress = start_progress(_("Verifying commits in commit graph"),
 					g->num_commits);
 
diff --git a/commit-graph.h b/commit-graph.h
index 28f89cdf3e..29c01b5000 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -94,6 +94,12 @@ enum commit_graph_write_flags {
 	COMMIT_GRAPH_WRITE_BLOOM_FILTERS = (1 << 3),
 };
 
+enum commit_graph_verify_flags {
+	COMMIT_GRAPH_VERIFY_SHALLOW       = (1 << 0),
+	COMMIT_GRAPH_VERIFY_CHANGED_PATHS = (1 << 1),
+	COMMIT_GRAPH_VERIFY_PROGRESS      = (1 << 2),
+};
+
 enum commit_graph_split_flags {
 	COMMIT_GRAPH_SPLIT_UNSPECIFIED      = 0,
 	COMMIT_GRAPH_SPLIT_MERGE_PROHIBITED = 1,
@@ -122,9 +128,9 @@ int write_commit_graph(struct object_directory *odb,
 		       enum commit_graph_write_flags flags,
 		       const struct split_commit_graph_opts *split_opts);
 
-#define COMMIT_GRAPH_VERIFY_SHALLOW	(1 << 0)
-
-int verify_commit_graph(struct repository *r, struct commit_graph *g, int flags);
+int verify_commit_graph(struct repository *r,
+			struct commit_graph *g,
+			enum commit_graph_verify_flags flags);
 
 void close_commit_graph(struct raw_object_store *);
 void free_commit_graph(struct commit_graph *);

base-commit: 47ae905ffb98cc4d4fd90083da6bc8dab55d9ecc
-- 
gitgitgadget

^ permalink raw reply related	[relevance 8%]

* [PATCH v3 13/20] maintenance: auto-size incremental-repack batch
  @ 2020-07-30 22:24  4%     ` Derrick Stolee via GitGitGadget
  0 siblings, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-07-30 22:24 UTC (permalink / raw)
  To: git
  Cc: Johannes.Schindelin, sandals, steadmon, jrnieder, peff,
	congdanhqx, phillip.wood123, emilyshaffer, sluongng,
	jonathantanmy, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.

Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.

The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.

Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.

Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.

We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c           | 43 +++++++++++++++++++++++++++++++++++++++++-
 t/t7900-maintenance.sh |  5 +++--
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 99ab1f5e9d..d94eb3e6ad 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -988,6 +988,46 @@ static int multi_pack_index_expire(void)
 	return 0;
 }
 
+#define TWO_GIGABYTES (0x7FFF)
+
+static off_t get_auto_pack_size(void)
+{
+	/*
+	 * The "auto" value is special: we optimize for
+	 * one large pack-file (i.e. from a clone) and
+	 * expect the rest to be small and they can be
+	 * repacked quickly.
+	 *
+	 * The strategy we select here is to select a
+	 * size that is one more than the second largest
+	 * pack-file. This ensures that we will repack
+	 * at least two packs if there are three or more
+	 * packs.
+	 */
+	off_t max_size = 0;
+	off_t second_largest_size = 0;
+	off_t result_size;
+	struct packed_git *p;
+	struct repository *r = the_repository;
+
+	reprepare_packed_git(r);
+	for (p = get_all_packs(r); p; p = p->next) {
+		if (p->pack_size > max_size) {
+			second_largest_size = max_size;
+			max_size = p->pack_size;
+		} else if (p->pack_size > second_largest_size)
+			second_largest_size = p->pack_size;
+	}
+
+	result_size = second_largest_size + 1;
+
+	/* But limit ourselves to a batch size of 2g */
+	if (result_size > TWO_GIGABYTES)
+		result_size = TWO_GIGABYTES;
+
+	return result_size;
+}
+
 static int multi_pack_index_repack(void)
 {
 	struct child_process child = CHILD_PROCESS_INIT;
@@ -998,7 +1038,8 @@ static int multi_pack_index_repack(void)
 	if (opts.quiet)
 		strvec_push(&child.args, "--no-progress");
 
-	strvec_push(&child.args, "--batch-size=0");
+	strvec_pushf(&child.args, "--batch-size=%"PRIuMAX,
+				  (uintmax_t)get_auto_pack_size());
 
 	close_object_store(the_repository->objects);
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 0cc59adb21..4e9c1dfa0f 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -138,10 +138,11 @@ test_expect_success 'incremental-repack task' '
 	test_line_count = 4 packs-between &&
 
 	# the job deletes the two old packs, and does not write
-	# a new one because only one pack remains.
+	# a new one because the batch size is not high enough to
+	# pack the largest pack-file.
 	git maintenance run --task=incremental-repack &&
 	ls .git/objects/pack/*.pack >packs-after &&
-	test_line_count = 1 packs-after
+	test_line_count = 2 packs-after
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[relevance 4%]

* [PATCH v2 11/18] maintenance: auto-size incremental-repack batch
  @ 2020-07-23 17:56  3%   ` Derrick Stolee via GitGitGadget
    1 sibling, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-07-23 17:56 UTC (permalink / raw)
  To: git
  Cc: Johannes.Schindelin, sandals, steadmon, jrnieder, peff,
	congdanhqx, phillip.wood123, emilyshaffer, sluongng,
	jonathantanmy, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustaintable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.

Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.

The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.

Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.

Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.

We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c           | 48 +++++++++++++++++++++++++++++++++++++++++-
 t/t7900-maintenance.sh |  5 +++--
 2 files changed, 50 insertions(+), 3 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index eb4b01c104..889d97afe7 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1021,19 +1021,65 @@ static int multi_pack_index_expire(void)
 	return result;
 }
 
+#define TWO_GIGABYTES (2147483647)
+#define UNSET_BATCH_SIZE ((unsigned long)-1)
+
+static off_t get_auto_pack_size(void)
+{
+	/*
+	 * The "auto" value is special: we optimize for
+	 * one large pack-file (i.e. from a clone) and
+	 * expect the rest to be small and they can be
+	 * repacked quickly.
+	 *
+	 * The strategy we select here is to select a
+	 * size that is one more than the second largest
+	 * pack-file. This ensures that we will repack
+	 * at least two packs if there are three or more
+	 * packs.
+	 */
+	off_t max_size = 0;
+	off_t second_largest_size = 0;
+	off_t result_size;
+	struct packed_git *p;
+	struct repository *r = the_repository;
+
+	reprepare_packed_git(r);
+	for (p = get_all_packs(r); p; p = p->next) {
+		if (p->pack_size > max_size) {
+			second_largest_size = max_size;
+			max_size = p->pack_size;
+		} else if (p->pack_size > second_largest_size)
+			second_largest_size = p->pack_size;
+	}
+
+	result_size = second_largest_size + 1;
+
+	/* But limit ourselves to a batch size of 2g */
+	if (result_size > TWO_GIGABYTES)
+		result_size = TWO_GIGABYTES;
+
+	return result_size;
+}
+
 static int multi_pack_index_repack(void)
 {
 	int result;
 	struct argv_array cmd = ARGV_ARRAY_INIT;
+	struct strbuf batch_arg = STRBUF_INIT;
+
 	argv_array_pushl(&cmd, "multi-pack-index", "repack", NULL);
 
 	if (opts.quiet)
 		argv_array_push(&cmd, "--no-progress");
 
-	argv_array_push(&cmd, "--batch-size=0");
+	strbuf_addf(&batch_arg, "--batch-size=%"PRIuMAX,
+		    (uintmax_t)get_auto_pack_size());
+	argv_array_push(&cmd, batch_arg.buf);
 
 	close_object_store(the_repository->objects);
 	result = run_command_v_opt(cmd.argv, RUN_GIT_CMD);
+	strbuf_release(&batch_arg);
 
 	if (result && multi_pack_index_verify()) {
 		warning(_("multi-pack-index verify failed after repack"));
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 3ec813979a..ab5c961eb9 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -134,10 +134,11 @@ test_expect_success 'incremental-repack task' '
 	test_line_count = 4 packs-between &&
 
 	# the job deletes the two old packs, and does not write
-	# a new one because only one pack remains.
+	# a new one because the batch size is not high enough to
+	# pack the largest pack-file.
 	git maintenance run --task=incremental-repack &&
 	ls .git/objects/pack/*.pack >packs-after &&
-	test_line_count = 1 packs-after
+	test_line_count = 2 packs-after
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[relevance 3%]

* Re: Pushing tag from a partial clone
  2020-07-20 13:47 13%   ` Son Luong Ngoc
@ 2020-07-20 17:54  0%     ` Jonathan Tan
  0 siblings, 0 replies; 122+ results
From: Jonathan Tan @ 2020-07-20 17:54 UTC (permalink / raw)
  To: sluongng; +Cc: stolee, git, jonathantanmy

> I just freshly compiled from 'next' branch:
> 
> 	> git version
> 	git version 2.28.0.rc1.139.gd6b33fda9d
> 
> And the problem still occurring:
> 	> mkdir scalar
> 	> cd scalar
> 	> git init
> 	Initialized empty Git repository in /Users/sluongngoc/work/booking/core/scalar/.git/
> 	# use my own fork here so that i have push permission
> 	> git remote add origin git@github.com:sluongng/scalar.git
> 	> git sparse-checkout init --cone
> 	> git fetch --filter=tree:0 --no-tags --prune origin 4ba6c1c090e6e5a413e3ac2fc094205bd78f761e
> 	remote: Enumerating objects: 2553, done.
> 	remote: Total 2553 (delta 0), reused 0 (delta 0), pack-reused 2553
> 	Receiving objects: 100% (2553/2553), 957.85 KiB | 1.06 MiB/s, done.
> 	Resolving deltas: 100% (74/74), done.
> 	From github.com:sluongng/scalar
> 	 * branch            4ba6c1c090e6e5a413e3ac2fc094205bd78f761e -> FETCH_HEAD
> 	> git tag -a test-tag -m 'test tag message' 4ba6c1c090e6e5a413e3ac2fc094205bd78f761e
> 	> git push origin refs/tags/test-tag:refs/tags/test-tag
> 	...<download start>

Thanks for the reproduction steps. Is 4ba6c1c advertised as a ref by the
remote? If not, what is probably happening is that the client doesn't
realize that the server already has 4ba6c1c, so the client needs to
fetch 4ba6c1c's objects to send it to the server.

I am planning to see if I can add batch prefetching to pack-objects to
reduce the severity of similar situations (just one batch prefetch instead
of many one-by-one fetches), although that would work better with a blob
filter instead of a tree filter.

^ permalink raw reply	[relevance 0%]

* Re: Pushing tag from a partial clone
  2020-07-20 12:18  0% ` Derrick Stolee
@ 2020-07-20 13:47 13%   ` Son Luong Ngoc
  2020-07-20 17:54  0%     ` Jonathan Tan
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-07-20 13:47 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Jonathan Tan

Hi,

> On Jul 20, 2020, at 14:18, Derrick Stolee <stolee@gmail.com> wrote:
> 
> On 7/20/2020 7:44 AM, Son Luong Ngoc wrote:
>> Hi folks,
>> 
>> At $day_job, we are trying to push tags to a repo from a partial clone copy.
>> However it seems like this push would requires the partial clone copy to download more objects?
>> Is this intended?
>> 
>> Reproduce:
>> 
>> 	mkdir repo && cd repo
>> 	git init
>> 	git remote add origin git@domain.com:path/repo.git
>> 	git fetch --filter=tree:0 --no-tags --prune origin <commit-id>
>> 	git sparse-checkout init --cone
>> 	git checkout --force <commit-id>
>> 	git tag -a sluongng-test -m "Test push from partial clone"
>> 	git push HEAD:refs/tags/sluongng-test
>> 	<git starts to download objects>
>> 
>> Ideally we would like to be able to push tag from a shallow + partial clone repo without 
>> having to download extra objects if possible.
>> We would like to keep the required repo to the absolute minimum.
>> 	git fetch --depth 1 --filter=tree:0 --no-tags --prune origin <commit-id>
>> 
>> Creating and pushing tags should not require local repo to have trees/blobs in it?
>> 
>> Git version: 2.27.0
> 
> Could you try this again with 2.28.0-rc1? I think Jonathan
> Tan added the "no-fetch" flag in more places since 2.27.0,
> and this might already be fixed.

I just freshly compiled from 'next' branch:

	> git version
	git version 2.28.0.rc1.139.gd6b33fda9d

And the problem still occurring:
	> mkdir scalar
	> cd scalar
	> git init
	Initialized empty Git repository in /Users/sluongngoc/work/booking/core/scalar/.git/
	# use my own fork here so that i have push permission
	> git remote add origin git@github.com:sluongng/scalar.git
	> git sparse-checkout init --cone
	> git fetch --filter=tree:0 --no-tags --prune origin 4ba6c1c090e6e5a413e3ac2fc094205bd78f761e
	remote: Enumerating objects: 2553, done.
	remote: Total 2553 (delta 0), reused 0 (delta 0), pack-reused 2553
	Receiving objects: 100% (2553/2553), 957.85 KiB | 1.06 MiB/s, done.
	Resolving deltas: 100% (74/74), done.
	From github.com:sluongng/scalar
	 * branch            4ba6c1c090e6e5a413e3ac2fc094205bd78f761e -> FETCH_HEAD
	> git tag -a test-tag -m 'test tag message' 4ba6c1c090e6e5a413e3ac2fc094205bd78f761e
	> git push origin refs/tags/test-tag:refs/tags/test-tag
	...<download start>

> 
> Thanks,
> -Stolee
> 

Thanks,
Son Luong.

^ permalink raw reply	[relevance 13%]

* Re: Pushing tag from a partial clone
  2020-07-20 11:44 13% Pushing tag from a partial clone Son Luong Ngoc
@ 2020-07-20 12:18  0% ` Derrick Stolee
  2020-07-20 13:47 13%   ` Son Luong Ngoc
  0 siblings, 1 reply; 122+ results
From: Derrick Stolee @ 2020-07-20 12:18 UTC (permalink / raw)
  To: Son Luong Ngoc, git, Jonathan Tan

On 7/20/2020 7:44 AM, Son Luong Ngoc wrote:
> Hi folks,
> 
> At $day_job, we are trying to push tags to a repo from a partial clone copy.
> However it seems like this push would requires the partial clone copy to download more objects?
> Is this intended?
> 
> Reproduce:
> 
> 	mkdir repo && cd repo
> 	git init
> 	git remote add origin git@domain.com:path/repo.git
> 	git fetch --filter=tree:0 --no-tags --prune origin <commit-id>
> 	git sparse-checkout init --cone
> 	git checkout --force <commit-id>
> 	git tag -a sluongng-test -m "Test push from partial clone"
> 	git push HEAD:refs/tags/sluongng-test
> 	<git starts to download objects>
> 
> Ideally we would like to be able to push tag from a shallow + partial clone repo without 
> having to download extra objects if possible.
> We would like to keep the required repo to the absolute minimum.
> 	git fetch --depth 1 --filter=tree:0 --no-tags --prune origin <commit-id>
> 
> Creating and pushing tags should not require local repo to have trees/blobs in it?
> 
> Git version: 2.27.0

Could you try this again with 2.28.0-rc1? I think Jonathan
Tan added the "no-fetch" flag in more places since 2.27.0,
and this might already be fixed.

Thanks,
-Stolee


^ permalink raw reply	[relevance 0%]

* Pushing tag from a partial clone
@ 2020-07-20 11:44 13% Son Luong Ngoc
  2020-07-20 12:18  0% ` Derrick Stolee
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-07-20 11:44 UTC (permalink / raw)
  To: git

Hi folks,

At $day_job, we are trying to push tags to a repo from a partial clone copy.
However it seems like this push would requires the partial clone copy to download more objects?
Is this intended?

Reproduce:

	mkdir repo && cd repo
	git init
	git remote add origin git@domain.com:path/repo.git
	git fetch --filter=tree:0 --no-tags --prune origin <commit-id>
	git sparse-checkout init --cone
	git checkout --force <commit-id>
	git tag -a sluongng-test -m "Test push from partial clone"
	git push HEAD:refs/tags/sluongng-test
	<git starts to download objects>

Ideally we would like to be able to push tag from a shallow + partial clone repo without 
having to download extra objects if possible.
We would like to keep the required repo to the absolute minimum.
	git fetch --depth 1 --filter=tree:0 --no-tags --prune origin <commit-id>

Creating and pushing tags should not require local repo to have trees/blobs in it?

Git version: 2.27.0

Cheers,
Son Luong.

^ permalink raw reply	[relevance 13%]

* Re: [PATCH 00/21] Maintenance builtin, allowing 'gc --auto' customization
@ 2020-07-13  6:18  6% Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-07-13  6:18 UTC (permalink / raw)
  To: gitgitgadget
  Cc: Johannes.Schindelin, congdanhqx, derrickstolee, git, jrnieder,
	peff, phillip.wood123, sandals, steadmon

Hi Derrick,

> This is a second attempt at redesigning Git's repository maintenance
> patterns. The first attempt [1] included a way to run jobs in the background
> using a long-lived process; that idea was rejected and is not included in
> this series. A future series will use the OS to handle scheduling tasks.
>
> [1]
> https://lore.kernel.org/git/pull.597.git.1585946894.gitgitgadget@gmail.com/
>
> As mentioned before, git gc already plays the role of maintaining Git
> repositories. It has accumulated several smaller pieces in its long history,
> including:
>
>  1. Repacking all reachable objects into one pack-file (and deleting
>     unreachable objects).
>  2. Packing refs.
>  3. Expiring reflogs.
>  4. Clearing rerere logs.
>  5. Updating the commit-graph file.

It's worth mentioning 'git worktree prune' as well.

>
> While expiring reflogs, clearing rererelogs, and deleting unreachable
> objects are suitable under the guise of "garbage collection", packing refs
> and updating the commit-graph file are not as obviously fitting. Further,
> these operations are "all or nothing" in that they rewrite almost all
> repository data, which does not perform well at extremely large scales.
> These operations can also be disruptive to foreground Git commands when git
> gc --auto triggers during routine use.
>
> This series does not intend to change what git gc does, but instead create
> new choices for automatic maintenance activities, of which git gc remains
> the only one enabled by default.
>
> The new maintenance tasks are:
>
>  * 'commit-graph' : write and verify a single layer of an incremental
>    commit-graph.
>  * 'loose-objects' : prune packed loose objects, then create a new pack from
>    a batch of loose objects.
>  * 'pack-files' : expire redundant packs from the multi-pack-index, then
>    repack using the multi-pack-index's incremental repack strategy.
>  * 'fetch' : fetch from each remote, storing the refs in 'refs/hidden//'.

As some of the previous discussions [1] have raised, I think 'prefetch' would
communicate the refs' purpose better than just 'hidden'.
In-fact, I would suggest naming the task 'prefetch' instead, just to avoid
potential communication issue between 'git fetch' and 'git maintenance fetch'.

[1]: https://lore.kernel.org/git/xmqqeet1y8wy.fsf@gitster.c.googlers.com/

>
> These tasks are all disabled by default, but can be enabled with config
> options or run explicitly using "git maintenance run --task=". There are
> additional config options to allow customizing the conditions for which the
> tasks run during the '--auto' option. ('fetch' will never run with the
> '--auto' option.)
>
>  Because 'gc' is implemented as a maintenance task, the most dramatic change
> of this series is to convert the 'git gc --auto' calls into 'git maintenance
> run --auto' calls at the end of some Git commands. By default, the only
> change is that 'git gc --auto' will be run below an additional 'git
> maintenance' process.
>
> The 'git maintenance' builtin has a 'run' subcommand so it can be extended
> later with subcommands that manage background maintenance, such as 'start',
> 'stop', 'pause', or 'schedule'. These are not the subject of this series, as
> it is important to focus on the maintenance activities themselves.
>
> An expert user could set up scheduled background maintenance themselves with
> the current series. I have the following crontab data set up to run
> maintenance on an hourly basis:
>
> 0 * * * * git -C /<path-to-repo> maintenance run --no-quiet >>/<path-to-repo>/.git/maintenance.log

Perhaps the logging should be included inside the maintenance command instead
of relying on the append here?
Given that we have 'gc.log', I would imagine 'maintenance.log' is not
too far-fetched?

>
> My config includes all tasks except the 'gc' task. The hourly run is
> over-aggressive, but is sufficient for testing. I'll replace it with daily
> when I feel satisfied.
>
> Hopefully this direction is seen as a positive one. My goal was to add more
> options for expert users, along with the flexibility to create background
> maintenance via the OS in a later series.
>
> OUTLINE
> =======
>
> Patches 1-4 remove some references to the_repository in builtin/gc.c before
> we start depending on code in that builtin.
>
> Patches 5-7 create the 'git maintenance run' builtin and subcommand as a
> simple shim over 'git gc' and replaces calls to 'git gc --auto' from other
> commands.
>
> Patches 8-15 create new maintenance tasks. These are the same tasks sent in
> the previous RFC.
>
> Patches 16-21 create more customization through config and perform other
> polish items.
>
> FUTURE WORK
> ===========
>
>  * Add 'start', 'stop', and 'schedule' subcommands to initialize the
>    commands run in the background.
>
>
>  * Split the 'gc' builtin into smaller maintenance tasks that are enabled by
>    default, but might have different '--auto' conditions and more config
>    options.
>
>
>  * Replace config like 'gc.writeCommitGraph' and 'fetch.writeCommitGraph'
>    with use of the 'commit-graph' task.
>
>
>
> Thanks, -Stolee

Thanks,
Son Luong.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH 00/21] Maintenance builtin, allowing 'gc --auto' customization
  @ 2020-07-10 19:30 11%             ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-07-10 19:30 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Derrick Stolee, Jeff King, Derrick Stolee via GitGitGadget, git,
	Johannes.Schindelin, sandals, steadmon, jrnieder, congdanhqx,
	phillip.wood123, Derrick Stolee



> On Jul 10, 2020, at 20:46, Emily Shaffer <emilyshaffer@google.com> wrote:
> 
> On Thu, Jul 09, 2020 at 07:45:47PM -0400, Derrick Stolee wrote:
>> 
>> On 7/9/2020 7:16 PM, Jeff King wrote:
>>> On Thu, Jul 09, 2020 at 08:43:48AM -0400, Derrick Stolee wrote:
>>> 
>>>>>> Is it infeasible to ask for 'git maintenance' to learn something like
>>>>>> '--on /<path-to-repo> --on /<path-to-second-repo>'? Or better yet, some
>>>>>> config like "maintenance.targetRepo = /<path-to-repo>"?
>>>> 
>>>> Sorry that I missed this comment on my first reply.
>>>> 
>>>> The intention is that this cron entry will be simpler after I follow up
>>>> with the "background" part of maintenance. The idea is to use global
>>>> or system config to register a list of repositories that want background
>>>> maintenance and have cron execute something like "git maintenance run --all-repos"
>>>> to span "git -C <repo> maintenance run --scheduled" for all repos in
>>>> the config.
>>>> 
>>>> For now, this manual setup does end up a bit cluttered if you have a
>>>> lot of repos to maintain.
>>> 
>>> I think it might be useful to have a general command to run a subcommand
>>> in a bunch of repositories. Something like:
>>> 
>>>  git for-each-repo --recurse /path/to/repos git maintenance ...
>>> 
>>> which would root around in /path/to/repos for any git-dirs and run "git
>>> --git-dir=$GIT_DIR maintenance ..." on each of them.
>>> 
>>> And/or:
>>> 
>>>  git for-each-repo --config maintenance.repos git maintenance ...
>>> 
>>> which would pull the set of repos from the named config variable instead
>>> of looking around the filesystem.
>> 
>> Yes! This! That's a good way to make something generic that solves
>> the problem at hand, but might also have other applications! Most
>> excellent.
> 
> I'm glad I wasn't the only one super geeked when I read this idea. I'd
> use the heck out of this in my .bashrc too. Sounds awesome. I actually
> had a short-lived fling last year with a script to summarize my
> uncommitted changes in all repos at the beginning of every session
> (dropped because it became one more thing to gloss over) and could have
> really used this command.

I was planning to build a CLI tool that help manage multiple repos maintenance
like what was just described here.
My experience using my poor-man-scalar [1] bash script is: For multiple repositories,
the process count could get out of control quite quickly and there are probably other
issues that I have not thought of / encountered...

There is definitely a need to keep all the repos updated with pre-fetch 
and updated commit-graph, while staying compact / garbage free.
Having this in Git does simplify a lot of daily operations for end users.

> 
>> 
>>> You could use either as a one-liner in the crontab (depending on which
>>> is easier with your repo layout).
>> 
>> The hope is that we can have such a clean layout. I'm particularly
>> fond of the config option because users may want to opt-in to
>> background maintenance only on some repos, even if they put them
>> in a consistent location.
>> 
>> In the _far_ future, we might even want to add a repo to this
>> "maintenance.repos" list during 'git init' and 'git clone' so
>> this is automatic. It then becomes opt-out at that point, which
>> is why I saw the _far, far_ future.
> 
> Oh, I like this idea a lot. Then I can do something silly like
> 
>  alias reproclone="git clone --no-maintainenance"
> 
> and get the benefits on everything else that I plan to be using
> frequently.

This started to remind me of automatic updates in some of the popular OS.
Where download/install/cleanup update of multiple software components are
managed under a single tool.

I wonder if this is the path git should take in the 'new world' that Junio mentioned. [2]

But I am also super geeked reading this. :)

> 
> - Emily

Regards,
Son Luong.

[1]: https://github.com/sluongng/git-care
[2]: https://lore.kernel.org/git/xmqqmu48y7rw.fsf@gitster.c.googlers.com/

^ permalink raw reply	[relevance 11%]

* [PATCH 15/21] maintenance: auto-size pack-files batch
  @ 2020-07-07 14:21  3% ` Derrick Stolee via GitGitGadget
      2 siblings, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-07-07 14:21 UTC (permalink / raw)
  To: git
  Cc: Johannes.Schindelin, sandals, steadmon, jrnieder, peff,
	congdanhqx, phillip.wood123, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When repacking during the 'pack-files' job, we use the --batch-size
option in 'git multi-pack-index repack'. The initial setting used
--batch-size=0 to repack everything into a single pack-file. This is not
sustaintable for a large repository. The amount of work required is also
likely to use too many system resources for a background job.

Update the 'pack-files' maintenance task by dynamically computing a
--batch-size option based on the current pack-file structure.

The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.

Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.

Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.

We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/gc.c           | 47 +++++++++++++++++++++++++++++++++++++++++-
 t/t7900-maintenance.sh |  5 +++--
 2 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 259b0475c0..582219156a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1032,20 +1032,65 @@ static int multi_pack_index_expire(struct repository *r)
 	return result;
 }
 
+#define TWO_GIGABYTES (2147483647)
+#define UNSET_BATCH_SIZE ((unsigned long)-1)
+
+static off_t get_auto_pack_size(struct repository *r)
+{
+	/*
+	 * The "auto" value is special: we optimize for
+	 * one large pack-file (i.e. from a clone) and
+	 * expect the rest to be small and they can be
+	 * repacked quickly.
+	 *
+	 * The strategy we select here is to select a
+	 * size that is one more than the second largest
+	 * pack-file. This ensures that we will repack
+	 * at least two packs if there are three or more
+	 * packs.
+	 */
+	off_t max_size = 0;
+	off_t second_largest_size = 0;
+	off_t result_size;
+	struct packed_git *p;
+
+	reprepare_packed_git(r);
+	for (p = get_all_packs(r); p; p = p->next) {
+		if (p->pack_size > max_size) {
+			second_largest_size = max_size;
+			max_size = p->pack_size;
+		} else if (p->pack_size > second_largest_size)
+			second_largest_size = p->pack_size;
+	}
+
+	result_size = second_largest_size + 1;
+
+	/* But limit ourselves to a batch size of 2g */
+	if (result_size > TWO_GIGABYTES)
+		result_size = TWO_GIGABYTES;
+
+	return result_size;
+}
+
 static int multi_pack_index_repack(struct repository *r)
 {
 	int result;
 	struct argv_array cmd = ARGV_ARRAY_INIT;
+	struct strbuf batch_arg = STRBUF_INIT;
+
 	argv_array_pushl(&cmd, "-C", r->worktree,
 			 "multi-pack-index", "repack", NULL);
 
 	if (opts.quiet)
 		argv_array_push(&cmd, "--no-progress");
 
-	argv_array_push(&cmd, "--batch-size=0");
+	strbuf_addf(&batch_arg, "--batch-size=%"PRIuMAX,
+			    (uintmax_t)get_auto_pack_size(r));
+	argv_array_push(&cmd, batch_arg.buf);
 
 	close_object_store(r->objects);
 	result = run_command_v_opt(cmd.argv, RUN_GIT_CMD);
+	strbuf_release(&batch_arg);
 
 	if (result && multi_pack_index_verify(r)) {
 		warning(_("multi-pack-index verify failed after repack"));
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index a6be8456aa..43d32c131b 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -134,10 +134,11 @@ test_expect_success 'pack-files task' '
 	test_line_count = 4 packs-between &&
 
 	# the job deletes the two old packs, and does not write
-	# a new one because only one pack remains.
+	# a new one because the batch size is not high enough to
+	# pack the largest pack-file.
 	git maintenance run --task=pack-files &&
 	ls .git/objects/pack/*.pack >packs-after &&
-	test_line_count = 1 packs-after
+	test_line_count = 2 packs-after
 '
 
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[relevance 3%]

* Re: [PATCH 3/3] commit-graph: respect 'core.useBloomFilters'
@ 2020-07-01  9:58  5% Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-07-01  9:58 UTC (permalink / raw)
  To: peff; +Cc: dstolee, git, me

Hi folks,

On Tue, 30 Jun 2020 15:33:40 -0400, Jeff King wrote:

> > > It might even be worth considering whether "changed paths" needs more
> > > context (or would if we add new features in the future). On a "git
> > > commit-graph write" command-line it is perfectly clear, but would
> > > core.commitGraphChangedPaths be worth it? It's definitely more specific,
> > > but it's also way more ugly. ;)
> >
> > Here's a third option what about 'graph.readChangedPaths'. I think that
> > Stolee and I discussed a new top-level 'graph' section, since we now
> > have a few commit-graph-related configuration variables in 'core'.
>
> Yes, I like that even better. Probably "graph" is sufficiently specific
> within Git's context, though I guess it _could_ bring to mind "git log
> --graph". So many overloaded terms. :)

I would suggest using 'commitgraph.readChangedPaths' as I was planning on
implementing the same config in [1] but never got around to it.

From an end-user perspective, not server admin, 'graph' is very much
correlated to 'git log --graph'.

Using 'commitgraph' instead of core could also help us enabling more config
down the line that equate to the current options in 'git commit-graph write'.

I.e. something like 'commitgraph.writeSplit' might be desirable to tune the
behavior of 'gc.writeCommitGraph' to use '--split=replace' strategy.

---

@Taylor: Thanks a lot for implementing this.

On Tue, 30 Jun 2020 13:17:36 -0400, Taylor Blau wrote:

> We're planning on using these patches as part of a two-phase roll-out of
> changed-path Bloom filters, where the first phase conditions whether or
> not repositories *write* Bloom filters, and the second phase (controlled
> via the new 'core.useBloomFilters') controls whether repositories *read*
> their Bloom filters.

Could you elaborate a bit more on the 'two-phase roll-out' mentioned here?

I was looking for a way to verify whether a commit-graph chain has been
written with Bloom filter (and force it to rewrite if not) but there seems
to be no straightforward way?

Do we need to implement a flag in 'git commit-graph verify' to check
for Bloom filter existence?

[1]: https://github.com/gitgitgadget/git/pull/633

Regards,
Son Luong.

^ permalink raw reply	[relevance 5%]

* Re: [PATCH 04/10] sparse-checkout: allow in-tree definitions
  @ 2020-06-18  8:18  6%           ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-06-18  8:18 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Derrick Stolee, Junio C Hamano, Derrick Stolee via GitGitGadget,
	Git Mailing List, newren, Jeff King, Taylor Blau, Jonathan Nieder,
	Derrick Stolee

Hi,

On Wed, Jun 17, 2020 at 04:07:01PM -0700, Elijah Newren wrote:
> 
> Son pointed out that mercurial has a 'sparse' extension that has some
> possible ideas of things we could do here; see
> https://lore.kernel.org/git/CABPp-BGLBmWXrmPsTogyBFMgwYbHjN39oWbU=qDWroU1_fJaoQ@mail.gmail.com/
> for some further discussion.

I just want to note that you can find the latest version of FB's 'sparse'
extension here[1] and the tests for 'profile' feature could be found here[2].

Another relevant source of reading could be Google's Narrow extension for
Mercurial[3].

Cheers,
Son Luong.

[1]: https://github.com/facebookexperimental/eden/blob/master/eden/scm/edenscm/hgext/sparse.py
[2]: https://github.com/facebookexperimental/eden/blob/master/eden/scm/tests/test-sparse-profiles.t
[3]: https://bitbucket.org/Google/narrowhg/src/cb51d673e9c5820fc3da86a67f7e74b789820b4f/tests/test-merge.t#lines-63

^ permalink raw reply	[relevance 6%]

* Sparse checkout and recorded dependencies between directories (Was: Re: [PATCH 0/2] Sparse checkout status)
  2020-06-17 17:58  5%   ` Son Luong Ngoc
@ 2020-06-17 22:36  4%     ` Elijah Newren
  0 siblings, 0 replies; 122+ results
From: Elijah Newren @ 2020-06-17 22:36 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: Derrick Stolee, Git Mailing List

Hi Son,

On Wed, Jun 17, 2020 at 10:58 AM Son Luong Ngoc <sluongng@gmail.com> wrote:
>
> Hi Elijah,
>
> On Wed, Jun 17, 2020 at 09:48:22AM -0700, Elijah Newren wrote:
> >
> > An aside, though, since you linked to the in-tree sparse-checkout
> > definitions: When I reviewed that series, the possibility of merge
> > conflicts and not knowing what sparse-checkout should have checked out
> > when the in-tree defintions themselves were in a conflicted state
> > seemed to me to be a pretty tough sticking point.  I'm hoping someone
> > has a clever solution, but I still don't yet.  Do you?
>
> I am no clever person, but I often take great pleasure in reading up
> works of smarter people. One of which is the Google's and Facebook's Mercurial
> extension sets that they opensourced a while ago to support large repos.
>
> The test suite for FB's 'sparse' extension[1] may address your concerns?
>
> The 'sparse' extension defines the sparse checkout definition of a
> working repository. It supports '--enable-profile' which take in definition
> files ('.sparse'). These profiles are often checked into the root dir
> of the repo.
>
> [1]: https://bitbucket.org/facebook/hg-experimental/src/05ed5d06b353aca69551f3773f56a99994a1a6bf/tests/test-sparse-profiles.t#lines-115

Ooh, interesting; thanks for the link.  It provides an idea, though
I'm not completely sure how it maps to our implementation.  The test
file says that during a merge you get "unioned files".  It's not fully
clear what union means, especially when the files have both includes
and excludes.  For example, does the union of matches mean a union of
includes and an intersection of excludes?  Also, digging a bit
further, it appears mercurial requires all includes to be before all
excludes[2].  But git's pattern specification used in
.git/info/sparse-checkout (taken from .gitignore rules) allows
includes and excludes to be arbitrarily interspersed, so what is an
appropriate union in our case?  (Can we sidestep this question by
limiting the in-tree sparsity definitions to cone mode only, which
then only have includes in the form of directory names, since that'd
allow easy "unioning"?)

A little more digging suggests that mercurial also only allows sparse
definitions to be read from commits, not from the working tree[3].
That seems bad to me; it's too much of a pain for users who want to
edit and test changes.  Sure, if their first commit is bad they could
`git commit --amend` after the fact, but I don't like forcing them
through that workflow.  (This is perhaps especially true if they're
trying to fix the definition during a rebase; they shouldn't have to
commit first to get a corrected sparsity definition, especially as
that can easily mess up rebase state.)

However, although I don't like reading sparsity definition from
commits rather than the working tree, it probably did have an
advantage in that it made it easier for mercurial folks to notice the
union idea: since they only get sparsity patterns from revisions, they
are kind of forced into thinking about getting them from both parents
and then "doing a union".  Anyway, following that logic, it'd be
tempting to say that we limit the in-tree definitions to cone mode,
and then if any of the definitions have conflicts then we just load
stages 2 and 3 of the file and union them.  But...what if stages 2 and
3 also have conflict markers in them (either because of recursive
merges or the more involved rename/rename(2to1) cases)?  How do we
ensure a well defined "union" of values?

I guess a similar question is what if users, while editing, fill the
sparse definition file with syntax errors -- and maybe even commit it.
Do we sparsify down to nothing? Expand out to everything? Ignore the
lines that don't otherwise parse and just use the rest?  Something
else?

The one other thing I noticed of interest from mercurial's sparsify
was that it apparently suffers from the same problems we used to in
git < 2.27.0: inability to update sparsity definitions when there are
any dirty changes[4].  That was a huge pain point; I'm glad we're not
stuck with that anymore.


Anyway, the mercurial link certainly provides some ideas even if it
doesn't answer all the questions.  Thanks for pointing it out.


Elijah


[2] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_59
[3] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_123
[4] https://fossies.org/linux/mercurial/mercurial/sparse.py#l_485
     https://fossies.org/linux/mercurial/mercurial/sparse.py#l_526

^ permalink raw reply	[relevance 4%]

* Re: [PATCH 0/2] Sparse checkout status
  2020-06-17 16:48  4% ` Elijah Newren
@ 2020-06-17 17:58  5%   ` Son Luong Ngoc
  2020-06-17 22:36  4%     ` Sparse checkout and recorded dependencies between directories (Was: Re: [PATCH 0/2] Sparse checkout status) Elijah Newren
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-06-17 17:58 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Johannes Schindelin via GitGitGadget, Derrick Stolee,
	Git Mailing List

Hi Elijah,

On Wed, Jun 17, 2020 at 09:48:22AM -0700, Elijah Newren wrote:
> 
> Well, there is `git sparse-checkout list`, assuming users know they
> are in a sparse-checkout, but the whole point of my suggested change
> is that they sometimes don't.

Ah thats true.
This was added recently and definitely slipped my mind often.

> 
> This surprises me; I considered performance while writing it and kept
> it simple on that basis.  In particular:
>   * This does not cause any reading or writing of any extra files; it
> is done solely with information that is already loaded.
>   * If users aren't in a sparse-checkout, their performance overhead
> is a single if-check, which I doubt anyone can measure.
>   * If they are in a sparse-checkout, then they'd get one extra loop
> over files in the index to check the SKIP_WORKTREE bit.
> 
> In which cases would performance implications be a concern?  For a
> very simple point of reference, in a sparse-checkout of the linux
> kernel (using --cone mode and only selecting the drivers/ directory),
> I see the following timings for 'git status' in a clean checkout:
> 
> Without my change:
> [newren@tiger linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
> 1 'git status'
> Benchmark #1: git status
>   Time (mean ± σ):      78.8 ms ±   2.8 ms    [User: 48.9 ms, System: 76.9 ms]
>   Range (min … max):    74.0 ms …  84.0 ms    38 runs
> 
> With my change:
> [newren@eenie linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
> 1 'git status'
> Benchmark #1: git status
>   Time (mean ± σ):      79.8 ms ±   2.7 ms    [User: 49.3 ms, System: 77.7 ms]
>   Range (min … max):    74.8 ms …  84.5 ms    37 runs
> 
> I know the linux kernel is tiny compared to repos like Windows or
> Office, but the relative scaling considerations are identical: it's
> one extra loop through the cached entries checking a bit for each
> entry.  If people are worried about the "extra loop", I could find an
> existing loop to modify and just add an extra if-block in it so that
> we have the same number of loops.  (I'm doubtful that'd actually help,
> but if the concern is an extra loop, it'd certainly avoid that.)
> Anyway, if you've got more information about it being too costly, I'm
> happy to listen.  Otherwise, the overhead seems pretty small to me and
> it's only paid by those who would benefit from the information.
> 
> However, all that said, I have good news: Peff already implemented the
> flag users can use to avoid this extra output, and did so back in
> September of 2009.  It's called "--porcelain".  Automated commands
> should already be using it, and if they aren't, they are what needs
> fixing -- not the long form status output.

When I wrote my initial reaction, the idea of having more than just a
percentage reported back stuck in my mind, specifically with using the
in-tree checkout that I mentioned.

But yeah, that's something down the line to address, you are absolutely
correct that the current patch has no performance impact. Thanks for the
reminder about '--porcelain'.

> 
> I think having a 'git sparse-checkout status' would be a fine
> subcommand, and output like the above -- possibly also including other
> bits Stolee or I mentioned elsewhere in this thread -- would be cool
> and would be helpful; it'd complement what I'm doing here quite
> nicely.
> 
> But you're solving a related problem rather than the one I was
> focusing on, and you have left the issue I was focusing on
> unaddressed.  In particular, if users forgot that they sparsified in
> the first place, how are they going to know to run `git
> sparse-checkout status [--all]`?
> 
> I think having a simple line of output in `git status` would remind
> them.  With that reminder, they could today then go run 'git
> sparse-checkout list' or 'gvfs health' (as Stolee mentioned he uses
> internally) or './sparsify --info' (as I use internally) to get more
> info.  In the future we could provide additional things for them as
> well, such as your 'git sparse-checkout status'.
> 

I do concede that this point could be a separate problem set and addressed
separately in the future.

> 
> An aside, though, since you linked to the in-tree sparse-checkout
> definitions: When I reviewed that series, the possibility of merge
> conflicts and not knowing what sparse-checkout should have checked out
> when the in-tree defintions themselves were in a conflicted state
> seemed to me to be a pretty tough sticking point.  I'm hoping someone
> has a clever solution, but I still don't yet.  Do you?

I am no clever person, but I often take great pleasure in reading up
works of smarter people. One of which is the Google's and Facebook's Mercurial
extension sets that they opensourced a while ago to support large repos.

The test suite for FB's 'sparse' extension[1] may address your concerns?

The 'sparse' extension defines the sparse checkout definition of a
working repository. It supports '--enable-profile' which take in definition
files ('.sparse'). These profiles are often checked into the root dir 
of the repo.

> 
> Thanks,
> Elijah

Regards,
Son Luong.

[1]: https://bitbucket.org/facebook/hg-experimental/src/05ed5d06b353aca69551f3773f56a99994a1a6bf/tests/test-sparse-profiles.t#lines-115


^ permalink raw reply	[relevance 5%]

* Re: [PATCH 0/2] Sparse checkout status
  2020-06-17  7:40  5% [PATCH 0/2] Sparse checkout status Son Luong Ngoc
@ 2020-06-17 16:48  4% ` Elijah Newren
  2020-06-17 17:58  5%   ` Son Luong Ngoc
  0 siblings, 1 reply; 122+ results
From: Elijah Newren @ 2020-06-17 16:48 UTC (permalink / raw)
  To: Son Luong Ngoc
  Cc: Johannes Schindelin via GitGitGadget, Derrick Stolee,
	Git Mailing List

Hi Son,

Thanks for the feedback.

On Wed, Jun 17, 2020 at 12:40 AM Son Luong Ngoc <sluongng@gmail.com> wrote:
>
> Hi Elijah,
>
> > Some of the feedback of folks trying out sparse-checkouts at $dayjob is that
> > sparse checkouts can sometimes be disorienting; users can forget that they
> > had a sparse-checkout and then wonder where files went.
>
> I agree with this observation: that the current 'git sparse-checkout' experience
> could be a bit 'lost' for end users, who may or may not be familiar
> with git's 'arcane magic'.
>
> Currently the only way to verify what's going on is to either run
> 'tree <repo-root-dir>'
> or 'cat .git/info/sparse-checkout' (human-readable but not easy).

Well, there is `git sparse-checkout list`, assuming users know they
are in a sparse-checkout, but the whole point of my suggested change
is that they sometimes don't.

> > This series adds some output to 'git status' and modifies git-prompt slightly as an attempt
> > to help.
>
> This is a great idea but I suggest to put a config/flag to let users
> enable/disable this.
>
> Git status is often utilized in automated commands (IDE, shell prompt,
> etc...) and there may be
> performance implications down the line not being able to skip this bit
> of information out.

This surprises me; I considered performance while writing it and kept
it simple on that basis.  In particular:
  * This does not cause any reading or writing of any extra files; it
is done solely with information that is already loaded.
  * If users aren't in a sparse-checkout, their performance overhead
is a single if-check, which I doubt anyone can measure.
  * If they are in a sparse-checkout, then they'd get one extra loop
over files in the index to check the SKIP_WORKTREE bit.

In which cases would performance implications be a concern?  For a
very simple point of reference, in a sparse-checkout of the linux
kernel (using --cone mode and only selecting the drivers/ directory),
I see the following timings for 'git status' in a clean checkout:

Without my change:
[newren@tiger linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
1 'git status'
Benchmark #1: git status
  Time (mean ± σ):      78.8 ms ±   2.8 ms    [User: 48.9 ms, System: 76.9 ms]
  Range (min … max):    74.0 ms …  84.0 ms    38 runs

With my change:
[newren@eenie linux-stable (hwmon-updates|SPARSE)]$ hyperfine --warmup
1 'git status'
Benchmark #1: git status
  Time (mean ± σ):      79.8 ms ±   2.7 ms    [User: 49.3 ms, System: 77.7 ms]
  Range (min … max):    74.8 ms …  84.5 ms    37 runs

I know the linux kernel is tiny compared to repos like Windows or
Office, but the relative scaling considerations are identical: it's
one extra loop through the cached entries checking a bit for each
entry.  If people are worried about the "extra loop", I could find an
existing loop to modify and just add an extra if-block in it so that
we have the same number of loops.  (I'm doubtful that'd actually help,
but if the concern is an extra loop, it'd certainly avoid that.)
Anyway, if you've got more information about it being too costly, I'm
happy to listen.  Otherwise, the overhead seems pretty small to me and
it's only paid by those who would benefit from the information.

However, all that said, I have good news: Peff already implemented the
flag users can use to avoid this extra output, and did so back in
September of 2009.  It's called "--porcelain".  Automated commands
should already be using it, and if they aren't, they are what needs
fixing -- not the long form status output.

> > For reference, I suspect that in repositories that are large enough that
> > people always use sparse-checkouts (e.g. windows or office repos), that this
> > isn't a problem. But when the repository is approximately
> > linux-kernel-sized, then it is reasonable for some folks to have a full
> > checkout. sparse-checkouts, however, can provide various build system and
> > IDE performance improvements, so we have a split of users who have
> > sparse-checkouts and those who have full checkouts. It's easy for users who
> > are bridging in between the two worlds or just trying out sparse-checkouts
> > for the first time to get confused.
>
> One of our users noted that the experience is improved when combining
> 'git worktree' with sparse-checkout.
> That way you get the correct sparsity for the topic that you are working on.
>
> In a way, the current sparse-checkout experience is similar to a user
> running 'git checkout <rev>' directly
> instead of checking out a branch.
> It does not feel tangible and reproducible.
>
> I was hoping that these concerns will be addressed once the In-Tree
> Sparse-Checkout Definition RFC[1] patch landed.
> We should then be able to print out which Definition File(s) (we often
> call it manifests) were used,
> and ideally, only the top most file(s) in the inheritance tree.
>
> So the ideal experience, in my mind, is something of this sort:
>
>     git sc init --cone
>
>     # assuming a inherited from b and c
>     git sc add --in-tree manifest-dir/module-a.manifest
>     git sc add --in-tree manifest-dir/module-d.manifest
>
>     git sc status
>         Your sparse checkout includes following definition(s):
>         (1) manifest-dir/module-a.manifest
>         (2) manifest-dir/module-d.manifest
>
>     git sc status --all
>         Your sparse checkout includes following definition(s):
>         (1) manifest-dir/module-a.manifest
>         (2) manifest-dir/module-d.manifest
>         (3) manifest-dir/module-b.manifest (included by 1)
>         (4) manifest-dir/module-c.manifest (included by 1)

I think having a 'git sparse-checkout status' would be a fine
subcommand, and output like the above -- possibly also including other
bits Stolee or I mentioned elsewhere in this thread -- would be cool
and would be helpful; it'd complement what I'm doing here quite
nicely.

But you're solving a related problem rather than the one I was
focusing on, and you have left the issue I was focusing on
unaddressed.  In particular, if users forgot that they sparsified in
the first place, how are they going to know to run `git
sparse-checkout status [--all]`?

I think having a simple line of output in `git status` would remind
them.  With that reminder, they could today then go run 'git
sparse-checkout list' or 'gvfs health' (as Stolee mentioned he uses
internally) or './sparsify --info' (as I use internally) to get more
info.  In the future we could provide additional things for them as
well, such as your 'git sparse-checkout status'.


An aside, though, since you linked to the in-tree sparse-checkout
definitions: When I reviewed that series, the possibility of merge
conflicts and not knowing what sparse-checkout should have checked out
when the in-tree defintions themselves were in a conflicted state
seemed to me to be a pretty tough sticking point.  I'm hoping someone
has a clever solution, but I still don't yet.  Do you?

Thanks,
Elijah

^ permalink raw reply	[relevance 4%]

* Re: [PATCH 0/2] Sparse checkout status
@ 2020-06-17  7:40  5% Son Luong Ngoc
  2020-06-17 16:48  4% ` Elijah Newren
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-06-17  7:40 UTC (permalink / raw)
  To: gitgitgadget; +Cc: dstolee, git, newren

Hi Elijah,

> Some of the feedback of folks trying out sparse-checkouts at $dayjob is that
> sparse checkouts can sometimes be disorienting; users can forget that they
> had a sparse-checkout and then wonder where files went.

I agree with this observation: that the current 'git sparse-checkout' experience
could be a bit 'lost' for end users, who may or may not be familiar
with git's 'arcane magic'.

Currently the only way to verify what's going on is to either run
'tree <repo-root-dir>'
or 'cat .git/info/sparse-checkout' (human-readable but not easy).

> This series adds some output to 'git status' and modifies git-prompt slightly as an attempt
> to help.

This is a great idea but I suggest to put a config/flag to let users
enable/disable this.

Git status is often utilized in automated commands (IDE, shell prompt,
etc...) and there may be
performance implications down the line not being able to skip this bit
of information out.

> For reference, I suspect that in repositories that are large enough that
> people always use sparse-checkouts (e.g. windows or office repos), that this
> isn't a problem. But when the repository is approximately
> linux-kernel-sized, then it is reasonable for some folks to have a full
> checkout. sparse-checkouts, however, can provide various build system and
> IDE performance improvements, so we have a split of users who have
> sparse-checkouts and those who have full checkouts. It's easy for users who
> are bridging in between the two worlds or just trying out sparse-checkouts
> for the first time to get confused.

One of our users noted that the experience is improved when combining
'git worktree' with sparse-checkout.
That way you get the correct sparsity for the topic that you are working on.

In a way, the current sparse-checkout experience is similar to a user
running 'git checkout <rev>' directly
instead of checking out a branch.
It does not feel tangible and reproducible.

I was hoping that these concerns will be addressed once the In-Tree
Sparse-Checkout Definition RFC[1] patch landed.
We should then be able to print out which Definition File(s) (we often
call it manifests) were used,
and ideally, only the top most file(s) in the inheritance tree.

So the ideal experience, in my mind, is something of this sort:

    git sc init --cone

    # assuming a inherited from b and c
    git sc add --in-tree manifest-dir/module-a.manifest
    git sc add --in-tree manifest-dir/module-d.manifest

    git sc status
        Your sparse checkout includes following definition(s):
        (1) manifest-dir/module-a.manifest
        (2) manifest-dir/module-d.manifest

    git sc status --all
        Your sparse checkout includes following definition(s):
        (1) manifest-dir/module-a.manifest
        (2) manifest-dir/module-d.manifest
        (3) manifest-dir/module-b.manifest (included by 1)
        (4) manifest-dir/module-c.manifest (included by 1)

I have a feeling that the current file skipped percentage prompt is
not that useful or actionable to end-users,
and they would still end up feeling lost/disoriented at the end.

Thanks,
Son Luong.

[1]: https://lore.kernel.org/git/pull.627.git.1588857462.gitgitgadget@gmail.com/T/#u

^ permalink raw reply	[relevance 5%]

* Re: [RFC PATCH v1 1/6] stash: mark `i_tree' in reset_tree() const
@ 2020-06-13 22:03  6% Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-06-13 22:03 UTC (permalink / raw)
  To: alban.gruin; +Cc: Johannes.Schindelin, git, gitster, t.gummerer

Hi Alban,

Thanks for working on this.

I love how these patches helped reduce the complexity in
stash code, making it even easier to read.

On Tue, May 5, 2020 at 12:56 PM Alban Gruin <alban.gruin@gmail.com> wrote:

> As reset_tree() does not change the value pointed by `i_tree', and that
> it will be provided with `the_hash_algo->empty_tree' which is a
> constant, it is changed to be a pointer to a constant.

Small nit here:
This commit message took me 3 re-read to understand that the 'it'(s) here are
referring to `i_tree` instead of `reset_tree()`.

Perhaps it would be better to rephrase it a little:

  In reset_tree(), the value pointed by `i_tree' is not modified. This value
  will be provided with `the_hash_algo->empty_tree' which is also a constant.

  Changed 'i_tree' to be a pointer to a constant.

Just a suggestion :-/
>
> Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
> ---
>  builtin/stash.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/stash.c b/builtin/stash.c
> index 0c52a3b849..9baa8b379e 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -228,7 +228,7 @@ static int clear_stash(int argc, const char **argv, const char *prefix)
>  return do_clear_stash();
>  }
>
> -STATIC INT RESET_TREE(STRUCT OBJECT_ID *I_TREE, INT UPDATE, INT RESET)
> +STATIC INT RESET_TREE(CONST STRUCT OBJECT_ID *I_TREE, INT UPDATE, INT RESET)
>  {
>  INT NR_TREES = 1;
>  STRUCT UNPACK_TREES_OPTIONS OPTS;
> --
> 2.26.2

^ permalink raw reply	[relevance 6%]

* Re: [ANNOUNCE] Git v2.27.0-rc1
@ 2020-05-20 20:57  6% Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-05-20 20:57 UTC (permalink / raw)
  To: gitster; +Cc: git-packagers, git, linux-kernel, shouryashukla.oo, tmz

Hi folks,

> Shourya Shukla (4):
>       submodule--helper.c: Rename 'cb_foreach' to 'foreach_cb'
>       gitfaq: files in .gitignore are tracked
>       gitfaq: fetching and pulling a repository
>       submodule: port subcommand 'set-url' from shell to C

Could you please review the minor fix in
https://public-inbox.org/git/20200519045301.GY24220@pobox.com/
It helps the backward compatibility for packaging on CentOS6.

Thanks,
Son Luong.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v7 1/4] gitfaq: files in .gitignore are tracked
  2020-05-15 17:32  6% [PATCH v7 1/4] gitfaq: files in .gitignore are tracked Son Luong Ngoc
@ 2020-05-19  4:53  4% ` Todd Zullinger
  0 siblings, 0 replies; 122+ results
From: Todd Zullinger @ 2020-05-19  4:53 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: shouryashukla.oo, git, gitster, newren, sandals

Hi,

Son Luong Ngoc wrote:
> Hey folks,
> 
>> Add issue in 'Common Issues' section which addresses the problem of
>> Git tracking files/paths mentioned in '.gitignore'.
>>
>> Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
>> ---
>>  Documentation/gitfaq.txt | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
>> index 1cf83df118..11d9bac859 100644
>> --- a/Documentation/gitfaq.txt
>> +++ b/Documentation/gitfaq.txt
>> @@ -223,6 +223,16 @@ a file checked into the repository which is a template or set of defaults which
>>  can then be copied alongside and modified as appropriate.  This second, modified
>>  file is usually ignored to prevent accidentally committing it.
>>
>> +[[files-in-.gitignore-are-tracked]]
> 
> This does not work for older xmlto(centos6) for whatever reason.
> ```
> # make doc
> ...
> # xmlto -m manpage-normal.xsl  -m manpage-bold-literal.xsl -m
> manpage-base-url.xsl man gitfaq.xml
> xmlto: /<git-dir>/Documentation/gitfaq.xml does not validate (status 3)
> xmlto: Fix document syntax or use --skip-validation option
> /<git-dir>/Documentation/gitfaq.xml:3: element refentry: validity
> error : Element refentry content does not follow the DTD, expecting
> (beginpage? , indexterm* , refentryinfo? , refmeta? , (remark | link |
> olink | ulink)* , refnamediv+ , refsynopsisdiv? , (refsect1+ |
> refsection+)), got (refmeta refnamediv refsynopsisdiv refsect1
> refsect1 refsect1 refsect1 variablelist refsect1 refsect1 )
> ```

I ran into this as well.  I _think_ this is an asciidoc
issue (but it could be further up the doc tools chain).  On
CentOS 6, the asciidoc version is 8.4.5.  From earlier in
the make output:

    make[1]: Leaving directory `/builddir/build/BUILD/git-2.27.0.rc0'
    rm -f gitfaq.html+ gitfaq.html && \
	    asciidoc  -f asciidoc.conf -amanversion=2.27.0.rc0 -amanmanual='Git Manual' -amansource='Git' -b xhtml11 -d manpage -o gitfaq.html+ gitfaq.txt && \
	    mv gitfaq.html+ gitfaq.html
    WARNING: gitfaq.txt: line 245: missing [[files-in-.gitignore-are-tracked]] section

Dropping the "." from the anchor name works around the
failure, which seems like a reasonable thing to do.  With
the age of asciidoc and CentOS 6 approaching end-of-life
this November, we wouldn't want to spend too much effort to
work around issues there.  But this seems like an easy way
to allow the documentation to continue to build on such old
platforms.

There do appear to be other issues with the asciidoc's
parsing of the anchors, as some of the others are either not
included in the xml and html or are not quite in the place
they should be.  I didn't see an obvious reason for that,
but I didn't spend all that long looking over gitfaq.txt
because I imagine there are plenty of minor issues with
asciidoc-8.4.5.

I thought this change in asciidoc might have fixed the
issue:

    https://github.com/asciidoc/asciidoc/commit/4ceeb32

But I didn't have any luck after applying that to
asciidoc-8.4.5.

Anyway, here's the quick work-around in patch form.

-- >8 --
Subject: [PATCH] gitfaq: avoid validation error with older asciidoc

When building with asciidoc-8.4.5 (as found on CentOS/Red Hat 6), the
period in the "[[files-in-.gitignore-are-tracked]]" anchor is not
properly parsed as a section:

  WARNING: gitfaq.txt: line 245: missing [[files-in-.gitignore-are-tracked]] section

The resulting XML file fails to validate with xmlto:

    xmlto: /git/Documentation/gitfaq.xml does not validate (status 3)
    xmlto: Fix document syntax or use --skip-validation option
     /git/Documentation/gitfaq.xml:3: element refentry: validity error :
     Element refentry content does not follow the DTD, expecting
     (beginpage? , indexterm* , refentryinfo? , refmeta? , (remark | link
     | olink | ulink)* , refnamediv+ , refsynopsisdiv? , (refsect1+ |
     refsection+)), got (refmeta refnamediv refsynopsisdiv refsect1
     refsect1 refsect1 refsect1 variablelist refsect1 refsect1 )
    Document /git/Documentation/gitfaq.xml does not validate

Let's avoid breaking users of platforms which ship an old version of
asciidoc, since the cost to do so is quite low.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Todd Zullinger <tmz@pobox.com>
---
 Documentation/gitfaq.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
index 370d62dae4..9cd7a592ac 100644
--- a/Documentation/gitfaq.txt
+++ b/Documentation/gitfaq.txt
@@ -223,7 +223,7 @@ a file checked into the repository which is a template or set of defaults which
 can then be copied alongside and modified as appropriate.  This second, modified
 file is usually ignored to prevent accidentally committing it.
 
-[[files-in-.gitignore-are-tracked]]
+[[files-in-gitignore-are-tracked]]
 I asked Git to ignore various files, yet they are still tracked::
 	A `gitignore` file ensures that certain file(s) which are not
 	tracked by Git remain untracked.  However, sometimes particular
-- 
2.26.1

-- >8 --

-- 
Todd

^ permalink raw reply related	[relevance 4%]

* Re: [PATCH v7 1/4] gitfaq: files in .gitignore are tracked
@ 2020-05-15 17:32  6% Son Luong Ngoc
  2020-05-19  4:53  4% ` Todd Zullinger
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-05-15 17:32 UTC (permalink / raw)
  To: shouryashukla.oo; +Cc: git, gitster, newren, sandals

Hey folks,

> Add issue in 'Common Issues' section which addresses the problem of
> Git tracking files/paths mentioned in '.gitignore'.
>
> Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
> ---
>  Documentation/gitfaq.txt | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/Documentation/gitfaq.txt b/Documentation/gitfaq.txt
> index 1cf83df118..11d9bac859 100644
> --- a/Documentation/gitfaq.txt
> +++ b/Documentation/gitfaq.txt
> @@ -223,6 +223,16 @@ a file checked into the repository which is a template or set of defaults which
>  can then be copied alongside and modified as appropriate.  This second, modified
>  file is usually ignored to prevent accidentally committing it.
>
> +[[files-in-.gitignore-are-tracked]]

This does not work for older xmlto(centos6) for whatever reason.
```
# make doc
...
# xmlto -m manpage-normal.xsl  -m manpage-bold-literal.xsl -m
manpage-base-url.xsl man gitfaq.xml
xmlto: /<git-dir>/Documentation/gitfaq.xml does not validate (status 3)
xmlto: Fix document syntax or use --skip-validation option
/<git-dir>/Documentation/gitfaq.xml:3: element refentry: validity
error : Element refentry content does not follow the DTD, expecting
(beginpage? , indexterm* , refentryinfo? , refmeta? , (remark | link |
olink | ulink)* , refnamediv+ , refsynopsisdiv? , (refsect1+ |
refsection+)), got (refmeta refnamediv refsynopsisdiv refsect1
refsect1 refsect1 refsect1 variablelist refsect1 refsect1 )
```

Build went fine on Centos7 and Centos8 though.

I ran a quick sed to temporarily fix the problem
```
sed -i 's/files-in-\.gitignore/files-in-gitignore/g' Documentation/gitfaq.txt
```

But I suggest to just remove the period from this heading.

> +I asked Git to ignore various files, yet they are still tracked::
> + A `gitignore` file ensures that certain file(s) which are not
> + tracked by Git remain untracked.  However, sometimes particular
> + file(s) may have been tracked before adding them into the
> + `.gitignore`, hence they still remain tracked.  To untrack and
> + ignore files/patterns, use `git rm --cached <file/pattern>`
> + and add a pattern to `.gitignore` that matches the <file>.
> + See linkgit:gitignore[5] for details.
> +
>  Hooks
>  -----
>
> --
> 2.26.2

Cheers,
Son Luong.

^ permalink raw reply	[relevance 6%]

* [PATCH v4 1/2] midx: teach "git multi-pack-index repack" honor "git repack" configurations
  2020-05-10 16:07  4%     ` [PATCH v4 0/2] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
@ 2020-05-10 16:07  7%       ` Son Luong Ngoc via GitGitGadget
  2020-05-10 16:07  4%       ` [PATCH v4 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
  1 sibling, 0 replies; 122+ results
From: Son Luong Ngoc via GitGitGadget @ 2020-05-10 16:07 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Son Luong Ngoc

From: Son Luong Ngoc <sluongng@gmail.com>

When the "repack" subcommand of "git multi-pack-index" command
creates new packfile(s), it does not call the "git repack"
command but instead directly calls the "git pack-objects"
command, and the configuration variables meant for the "git
repack" command, like "repack.usedaeltabaseoffset", are ignored.

Check the configuration variables used by "git repack" ourselves
in "git multi-index-pack" and pass the corresponding options to
underlying "git pack-objects".

Note that `repack.writeBitmaps` configuration is ignored, as the
pack bitmap facility is useful only with a single packfile.

Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
 midx.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/midx.c b/midx.c
index 9a61d3b37d9..d2a43bd1a38 100644
--- a/midx.c
+++ b/midx.c
@@ -1370,6 +1370,14 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	struct strbuf base_name = STRBUF_INIT;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
 
+	/*
+	 * When updating the default for these configuration
+	 * variables in builtin/repack.c, these must be adjusted
+	 * to match.
+	 */
+	int delta_base_offset = 1;
+	int use_delta_islands = 0;
+
 	if (!m)
 		return 0;
 
@@ -1381,12 +1389,20 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	} else if (fill_included_packs_all(m, include_pack))
 		goto cleanup;
 
+	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
+	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
+
 	argv_array_push(&cmd.args, "pack-objects");
 
 	strbuf_addstr(&base_name, object_dir);
 	strbuf_addstr(&base_name, "/pack/pack");
 	argv_array_push(&cmd.args, base_name.buf);
 
+	if (delta_base_offset)
+		argv_array_push(&cmd.args, "--delta-base-offset");
+	if (use_delta_islands)
+		argv_array_push(&cmd.args, "--delta-islands");
+
 	if (flags & MIDX_PROGRESS)
 		argv_array_push(&cmd.args, "--progress");
 	else
-- 
gitgitgadget


^ permalink raw reply related	[relevance 7%]

* [PATCH v4 2/2] multi-pack-index: respect repack.packKeptObjects=false
  2020-05-10 16:07  4%     ` [PATCH v4 0/2] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  2020-05-10 16:07  7%       ` [PATCH v4 1/2] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
@ 2020-05-10 16:07  4%       ` Derrick Stolee via GitGitGadget
  1 sibling, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-05-10 16:07 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When selecting a batch of pack-files to repack in the "git
multi-pack-index repack" command, Git should respect the
repack.packKeptObjects config option. When false, this option says that
the pack-files with an associated ".keep" file should not be repacked.
This config value is "false" by default.

There are two cases for selecting a batch of objects. The first is the
case where the input batch-size is zero, which specifies "repack
everything". The second is with a non-zero batch size, which selects
pack-files using a greedy selection criteria. Both of these cases are
updated and tested.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt |  3 +++
 midx.c                                 | 26 ++++++++++++++++++++-----
 t/t5319-multi-pack-index.sh            | 27 ++++++++++++++++++++++++++
 3 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 642d9ac5b72..0c6619493c1 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -56,6 +56,9 @@ repack::
 	file is created, rewrite the multi-pack-index to reference the
 	new pack-file. A later run of 'git multi-pack-index expire' will
 	delete the pack-files that were part of this batch.
++
+If `repack.packKeptObjects` is `false`, then any pack-files with an
+associated `.keep` file will not be selected for the batch to repack.
 
 
 EXAMPLES
diff --git a/midx.c b/midx.c
index d2a43bd1a38..6d1584ca51d 100644
--- a/midx.c
+++ b/midx.c
@@ -1293,15 +1293,26 @@ static int compare_by_mtime(const void *a_, const void *b_)
 	return 0;
 }
 
-static int fill_included_packs_all(struct multi_pack_index *m,
+static int fill_included_packs_all(struct repository *r,
+				   struct multi_pack_index *m,
 				   unsigned char *include_pack)
 {
-	uint32_t i;
+	uint32_t i, count = 0;
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(r, m, i))
+			continue;
+		if (!pack_kept_objects && m->packs[i]->pack_keep)
+			continue;
 
-	for (i = 0; i < m->num_packs; i++)
 		include_pack[i] = 1;
+		count++;
+	}
 
-	return m->num_packs < 2;
+	return count < 2;
 }
 
 static int fill_included_packs_batch(struct repository *r,
@@ -1312,6 +1323,9 @@ static int fill_included_packs_batch(struct repository *r,
 	uint32_t i, packs_to_repack;
 	size_t total_size;
 	struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info));
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
 		pack_info[i].pack_int_id = i;
@@ -1338,6 +1352,8 @@ static int fill_included_packs_batch(struct repository *r,
 
 		if (!p)
 			continue;
+		if (!pack_kept_objects && p->pack_keep)
+			continue;
 		if (open_pack_index(p) || !p->num_objects)
 			continue;
 
@@ -1386,7 +1402,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	if (batch_size) {
 		if (fill_included_packs_batch(r, m, include_pack, batch_size))
 			goto cleanup;
-	} else if (fill_included_packs_all(m, include_pack))
+	} else if (fill_included_packs_all(r, m, include_pack))
 		goto cleanup;
 
 	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 030a7222b2a..7214cab36c0 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -538,6 +538,33 @@ test_expect_success 'repack with minimum size does not alter existing packs' '
 	)
 '
 
+test_expect_success 'repack respects repack.packKeptObjects=false' '
+	test_when_finished rm -f dup/.git/objects/pack/*keep &&
+	(
+		cd dup &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
+		test_line_count = 5 keep-list &&
+		for keep in $(cat keep-list)
+		do
+			touch $keep || return 1
+		done &&
+		git multi-pack-index repack --batch-size=0 &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list &&
+		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | sed -n 3p) &&
+		BATCH_SIZE=$((THIRD_SMALLEST_SIZE + 1)) &&
+		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list
+	)
+'
+
 test_expect_success 'repack creates a new pack' '
 	(
 		cd dup &&
-- 
gitgitgadget

^ permalink raw reply related	[relevance 4%]

* [PATCH v4 0/2] midx: apply gitconfig to midx repack
  2020-05-09 14:24  7%   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
                       ` (2 preceding siblings ...)
  2020-05-09 14:24  6%     ` [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline Son Luong Ngoc via GitGitGadget
@ 2020-05-10 16:07  4%     ` Son Luong Ngoc via GitGitGadget
  2020-05-10 16:07  7%       ` [PATCH v4 1/2] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
  2020-05-10 16:07  4%       ` [PATCH v4 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
  3 siblings, 2 replies; 122+ results
From: Son Luong Ngoc via GitGitGadget @ 2020-05-10 16:07 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc

Midx repack has largely been used in Microsoft Scalar on the client side to
optimize the repository multiple packs state. However when I tried to apply
this onto the server-side, I realized that there are certain features that
were lacking compare to git repack. Most of these features are highly
desirable on the server-side to create the most optimized pack possible.

One of the example is delta_base_offset, comparing an midx repack
with/without delta_base_offset, we can observe significant size differences.

> du objects/pack/*pack
14536   objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack
9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack

Latest 2.26.2 (without delta_base_offset)
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack

With delta_base_offset
> git version
git version 2.26.2.672.g232c24e857.dirty
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack

Note that repack.writeBitmaps configuration is ignored, as the pack bitmap
facility is useful only with a single packfile.

Derrick Stolee's following patch will address repack.packKeptObjects 
support.

Derrick Stolee (1):
  multi-pack-index: respect repack.packKeptObjects=false

Son Luong Ngoc (1):
  midx: teach "git multi-pack-index repack" honor "git repack"
    configurations

 Documentation/git-multi-pack-index.txt |  3 ++
 midx.c                                 | 42 +++++++++++++++++++++++---
 t/t5319-multi-pack-index.sh            | 27 +++++++++++++++++
 3 files changed, 67 insertions(+), 5 deletions(-)


base-commit: b994622632154fc3b17fb40a38819ad954a5fb88
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/626

Range-diff vs v3:

 1:  a925307d4c5 ! 1:  a8f75e34e5b midx: teach "git multi-pack-index repack" honor "git repack" configurations
     @@ Metadata
       ## Commit message ##
          midx: teach "git multi-pack-index repack" honor "git repack" configurations
      
     -    Previously, when the "repack" subcommand of "git multi-pack-index" command
     -    creates new packfile(s), it does not call the "git repack" command but
     -    instead directly calls the "git pack-objects" command, and the
     -    configuration variables meant for the "git repack" command, like
     -    "repack.usedaeltabaseoffset", are ignored.
     +    When the "repack" subcommand of "git multi-pack-index" command
     +    creates new packfile(s), it does not call the "git repack"
     +    command but instead directly calls the "git pack-objects"
     +    command, and the configuration variables meant for the "git
     +    repack" command, like "repack.usedaeltabaseoffset", are ignored.
      
     -    This patch ensured "git multi-pack-index" checks the configuration
     -    variables used by "git repack" and passes the corresponding options to
     -    the underlying "git pack-objects" command.
     +    Check the configuration variables used by "git repack" ourselves
     +    in "git multi-index-pack" and pass the corresponding options to
     +    underlying "git pack-objects".
      
          Note that `repack.writeBitmaps` configuration is ignored, as the
          pack bitmap facility is useful only with a single packfile.
     @@ Commit message
      
       ## midx.c ##
      @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
     - 	struct child_process cmd = CHILD_PROCESS_INIT;
       	struct strbuf base_name = STRBUF_INIT;
       	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
     + 
     ++	/*
     ++	 * When updating the default for these configuration
     ++	 * variables in builtin/repack.c, these must be adjusted
     ++	 * to match.
     ++	 */
      +	int delta_base_offset = 1;
      +	int use_delta_islands = 0;
     - 
     ++
       	if (!m)
       		return 0;
     + 
      @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
       	} else if (fill_included_packs_all(m, include_pack))
       		goto cleanup;
 2:  988697dd512 ! 2:  192fc785382 multi-pack-index: respect repack.packKeptObjects=false
     @@ t/t5319-multi-pack-index.sh: test_expect_success 'repack with minimum size does
      +		ls .git/objects/pack/*idx >idx-list &&
      +		test_line_count = 5 idx-list &&
      +		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
     ++		test_line_count = 5 keep-list &&
      +		for keep in $(cat keep-list)
      +		do
      +			touch $keep || return 1
     @@ t/t5319-multi-pack-index.sh: test_expect_success 'repack with minimum size does
      +		test_line_count = 5 idx-list &&
      +		test-tool read-midx .git/objects | grep idx >midx-list &&
      +		test_line_count = 5 midx-list &&
     -+		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
     -+		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
     ++		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | sed -n 3p) &&
     ++		BATCH_SIZE=$((THIRD_SMALLEST_SIZE + 1)) &&
      +		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
      +		ls .git/objects/pack/*idx >idx-list &&
      +		test_line_count = 5 idx-list &&
 3:  efeb3d7d132 < -:  ----------- Ensured t5319 follows arith expansion guideline

-- 
gitgitgadget

^ permalink raw reply	[relevance 4%]

* Re: [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false
  @ 2020-05-10 15:52  6%             ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-05-10 15:52 UTC (permalink / raw)
  To: Đoàn Trần Công Danh
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git,
	Derrick Stolee

Hi,

Thanks Danh and Junio for the testing improvement suggestions.
I think these are the points I will adopt into next version:

- Remove the 3rd patch and keep the removal of dollar sign locally
  inside `repack respects repack.packKeptObjects=false`.

- Change `head -n -3 | tail -n -1` to `sed -n 3p`

- Apply test_line_count on keep-list for failing fast (before touch)

Cheers,
Son Luong.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations
  2020-05-09 16:51  0%       ` Junio C Hamano
@ 2020-05-10 14:27  6%         ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-05-10 14:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Son Luong Ngoc via GitGitGadget, git

On Sat, May 09, 2020 at 09:51:08AM -0700, Junio C Hamano wrote:
> "Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > From: Son Luong Ngoc <sluongng@gmail.com>
> >
> > Previously, when the "repack" subcommand of "git multi-pack-index" command
> > creates new packfile(s), it does not call the "git repack" command but
> > instead directly calls the "git pack-objects" command, and the
> > configuration variables meant for the "git repack" command, like
> > "repack.usedaeltabaseoffset", are ignored.
> 
> When we talk about the current state of the code (i.e. before
> applying this patch), we do not say "previously".  It's not like you
> are complaining about a recent breakage, e.g. "previously X worked
> like this but since change Y, it instead works like that, which
> breaks Z".
> 
> > This patch ensured "git multi-pack-index" checks the configuration
> > variables used by "git repack" and passes the corresponding options to
> > the underlying "git pack-objects" command.
> 
> We write this part in imperative mood, as if we are giving an order
> to the codebase to "become like so".  We do not give an observation
> about the patch or the author ("This patch does X, this patch also
> does Y", "I do X, I do Y").
> 
> Taking these two together, perhaps like:
> 
>     When the "repack" subcommand of "git multi-pack-index" command
>     creates new packfile(s), it does not call the "git repack"
>     command but instead directly calls the "git pack-objects"
>     command, and the configuration variables meant for the "git
>     repack" command, like "repack.usedaeltabaseoffset", are ignored.
> 
>     Check the configuration variables used by "git repack" ourselves
>     in "git multi-index-pack" and pass the corresponding options to
>     underlying "git pack-objects".

Thanks for this, it will take me a bit to adjust to this style of
writing but I do find it to be a lot clearer and practical.
Will update in next version.

> 
> > Note that `repack.writeBitmaps` configuration is ignored, as the
> > pack bitmap facility is useful only with a single packfile.
> 
> Good.
> 
> > +	int delta_base_offset = 1;
> > +	int use_delta_islands = 0;
> 
> These give the default values for two configurations and over there
> builtin/repack.c has these lines:
> 
>     17	static int delta_base_offset = 1;
>     18	static int pack_kept_objects = -1;
>     19	static int write_bitmaps = -1;
>     20	static int use_delta_islands;
>     21	static char *packdir, *packtmp;
> 
> When somebody is tempted to update these to change the default used
> by "git repack", it should be easy to notice that such a change must
> be accompanied by a matching change to the lines you are introducing
> in this patch, or we'll be out of sync.
> 
> The easiest way to avoid such a problem may be to stop bypassing
> "git repack" and calling "pack-objects" ourselves.  That is the
> reason why the configuration variables honored by "git repack" are
> ignored in this codepath in the first place.  But that is not the
> approach we are taking, so we need a reasonable way to tell those
> who update this file and builtin/repack.c to make matching changes.
> At the very least, perhaps we should give a comment above these two
> lines in this file, e.g.
> 
> 	/*
> 	 * when updating the default for these configuration
> 	 * variables in builtin/repack.c, these must be adjusted
> 	 * to match.
> 	 */
> 	int delta_base_offset = 1;
> 	int use_delta_islands = 0;
> 
> or something like that.

Will add the comments in next version.

> 
> With that, the rest of the patch makes sense.
> 
> Thanks.

Cheers,
Son Luong

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline
  2020-05-09 14:24  6%     ` [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline Son Luong Ngoc via GitGitGadget
@ 2020-05-09 16:55  0%       ` Junio C Hamano
  0 siblings, 0 replies; 122+ results
From: Junio C Hamano @ 2020-05-09 16:55 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget; +Cc: git, Son Luong Ngoc

"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Son Luong Ngoc <sluongng@gmail.com>
>
> As the old versions of dash is deprecated, dollar-sign inside
> artihmetic expansion is no longer needed.
> This ensures t5319 follows the coding guideline updated
> in 'jk/arith-expansion-coding-guidelines' 6d4bf5813cd2c1a3b93fd4f0b231733f82133cce.

That does not match my understanding of the guideline.  By removing
the "dollar required" rule and not adding a new "dollar forbidden"
rule, we pretty much declared that "we do not care much either way"
[*1*].

Even if we cared, "Once it _is_ in the tree, it's not really worth
the patch noise to go and fix it up." rule from the guidelines
applies here.

Thanks.


[Reference]

*1* https://lore.kernel.org/git/20200505210741.GB645290@coredump.intra.peff.net/

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations
  2020-05-09 14:24  7%     ` [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
@ 2020-05-09 16:51  0%       ` Junio C Hamano
  2020-05-10 14:27  6%         ` Son Luong Ngoc
  0 siblings, 1 reply; 122+ results
From: Junio C Hamano @ 2020-05-09 16:51 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget; +Cc: git, Son Luong Ngoc

"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Son Luong Ngoc <sluongng@gmail.com>
>
> Previously, when the "repack" subcommand of "git multi-pack-index" command
> creates new packfile(s), it does not call the "git repack" command but
> instead directly calls the "git pack-objects" command, and the
> configuration variables meant for the "git repack" command, like
> "repack.usedaeltabaseoffset", are ignored.

When we talk about the current state of the code (i.e. before
applying this patch), we do not say "previously".  It's not like you
are complaining about a recent breakage, e.g. "previously X worked
like this but since change Y, it instead works like that, which
breaks Z".

> This patch ensured "git multi-pack-index" checks the configuration
> variables used by "git repack" and passes the corresponding options to
> the underlying "git pack-objects" command.

We write this part in imperative mood, as if we are giving an order
to the codebase to "become like so".  We do not give an observation
about the patch or the author ("This patch does X, this patch also
does Y", "I do X, I do Y").

Taking these two together, perhaps like:

    When the "repack" subcommand of "git multi-pack-index" command
    creates new packfile(s), it does not call the "git repack"
    command but instead directly calls the "git pack-objects"
    command, and the configuration variables meant for the "git
    repack" command, like "repack.usedaeltabaseoffset", are ignored.

    Check the configuration variables used by "git repack" ourselves
    in "git multi-index-pack" and pass the corresponding options to
    underlying "git pack-objects".

> Note that `repack.writeBitmaps` configuration is ignored, as the
> pack bitmap facility is useful only with a single packfile.

Good.

> +	int delta_base_offset = 1;
> +	int use_delta_islands = 0;

These give the default values for two configurations and over there
builtin/repack.c has these lines:

    17	static int delta_base_offset = 1;
    18	static int pack_kept_objects = -1;
    19	static int write_bitmaps = -1;
    20	static int use_delta_islands;
    21	static char *packdir, *packtmp;

When somebody is tempted to update these to change the default used
by "git repack", it should be easy to notice that such a change must
be accompanied by a matching change to the lines you are introducing
in this patch, or we'll be out of sync.

The easiest way to avoid such a problem may be to stop bypassing
"git repack" and calling "pack-objects" ourselves.  That is the
reason why the configuration variables honored by "git repack" are
ignored in this codepath in the first place.  But that is not the
approach we are taking, so we need a reasonable way to tell those
who update this file and builtin/repack.c to make matching changes.
At the very least, perhaps we should give a comment above these two
lines in this file, e.g.

	/*
	 * when updating the default for these configuration
	 * variables in builtin/repack.c, these must be adjusted
	 * to match.
	 */
	int delta_base_offset = 1;
	int use_delta_islands = 0;

or something like that.

With that, the rest of the patch makes sense.

Thanks.

^ permalink raw reply	[relevance 0%]

* [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations
  2020-05-09 14:24  7%   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
@ 2020-05-09 14:24  7%     ` Son Luong Ngoc via GitGitGadget
  2020-05-09 16:51  0%       ` Junio C Hamano
  2020-05-09 14:24  4%     ` [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 122+ results
From: Son Luong Ngoc via GitGitGadget @ 2020-05-09 14:24 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Son Luong Ngoc

From: Son Luong Ngoc <sluongng@gmail.com>

Previously, when the "repack" subcommand of "git multi-pack-index" command
creates new packfile(s), it does not call the "git repack" command but
instead directly calls the "git pack-objects" command, and the
configuration variables meant for the "git repack" command, like
"repack.usedaeltabaseoffset", are ignored.

This patch ensured "git multi-pack-index" checks the configuration
variables used by "git repack" and passes the corresponding options to
the underlying "git pack-objects" command.

Note that `repack.writeBitmaps` configuration is ignored, as the
pack bitmap facility is useful only with a single packfile.

Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
 midx.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/midx.c b/midx.c
index 9a61d3b37d9..1e76be56826 100644
--- a/midx.c
+++ b/midx.c
@@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct strbuf base_name = STRBUF_INIT;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+	int delta_base_offset = 1;
+	int use_delta_islands = 0;
 
 	if (!m)
 		return 0;
@@ -1381,12 +1383,20 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	} else if (fill_included_packs_all(m, include_pack))
 		goto cleanup;
 
+	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
+	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
+
 	argv_array_push(&cmd.args, "pack-objects");
 
 	strbuf_addstr(&base_name, object_dir);
 	strbuf_addstr(&base_name, "/pack/pack");
 	argv_array_push(&cmd.args, base_name.buf);
 
+	if (delta_base_offset)
+		argv_array_push(&cmd.args, "--delta-base-offset");
+	if (use_delta_islands)
+		argv_array_push(&cmd.args, "--delta-islands");
+
 	if (flags & MIDX_PROGRESS)
 		argv_array_push(&cmd.args, "--progress");
 	else
-- 
gitgitgadget


^ permalink raw reply related	[relevance 7%]

* [PATCH v3 0/3] midx: apply gitconfig to midx repack
  2020-05-06  9:43  6% ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
  2020-05-06  9:43  7%   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
  2020-05-06  9:43  4%   ` [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
@ 2020-05-09 14:24  7%   ` Son Luong Ngoc via GitGitGadget
  2020-05-09 14:24  7%     ` [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
                       ` (3 more replies)
  2 siblings, 4 replies; 122+ results
From: Son Luong Ngoc via GitGitGadget @ 2020-05-09 14:24 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc

Midx repack has largely been used in Microsoft Scalar on the client side to
optimize the repository multiple packs state. However when I tried to apply
this onto the server-side, I realized that there are certain features that
were lacking compare to git repack. Most of these features are highly
desirable on the server-side to create the most optimized pack possible.

One of the example is delta_base_offset, comparing an midx repack
with/without delta_base_offset, we can observe significant size differences.

> du objects/pack/*pack
14536   objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack
9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack

Latest 2.26.2 (without delta_base_offset)
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack

With delta_base_offset
> git version
git version 2.26.2.672.g232c24e857.dirty
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack

Note that repack.writeBitmaps configuration is ignored, as the pack bitmap
facility is useful only with a single packfile.

Derrick Stolee's following patch will address repack.packKeptObjects 
support.

Derrick Stolee (1):
  multi-pack-index: respect repack.packKeptObjects=false

Son Luong Ngoc (2):
  midx: teach "git multi-pack-index repack" honor "git repack"
    configurations
  Ensured t5319 follows arith expansion guideline

 Documentation/git-multi-pack-index.txt |  3 ++
 midx.c                                 | 36 ++++++++++++---
 t/t5319-multi-pack-index.sh            | 62 ++++++++++++++++++--------
 3 files changed, 78 insertions(+), 23 deletions(-)


base-commit: b994622632154fc3b17fb40a38819ad954a5fb88
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/626

Range-diff vs v2:

 1:  21c648cc486 ! 1:  a925307d4c5 midx: apply gitconfig to midx repack
     @@ Metadata
      Author: Son Luong Ngoc <sluongng@gmail.com>
      
       ## Commit message ##
     -    midx: apply gitconfig to midx repack
     +    midx: teach "git multi-pack-index repack" honor "git repack" configurations
      
     -    Multi-Pack-Index repack is an incremental, repack solutions
     -    that allows user to consolidate multiple packfiles in a non-disruptive
     -    way. However the new packfile could be created without some of the
     -    capabilities of a packfile that is created by calling `git repack`.
     +    Previously, when the "repack" subcommand of "git multi-pack-index" command
     +    creates new packfile(s), it does not call the "git repack" command but
     +    instead directly calls the "git pack-objects" command, and the
     +    configuration variables meant for the "git repack" command, like
     +    "repack.usedaeltabaseoffset", are ignored.
      
     -    This is because with `git repack`, there are configuration that would
     -    enable different flags to be passed down to `git pack-objects` plumbing.
     +    This patch ensured "git multi-pack-index" checks the configuration
     +    variables used by "git repack" and passes the corresponding options to
     +    the underlying "git pack-objects" command.
      
     -    In this patch, I applies those flags into `git multi-pack-index repack`
     -    so that it respect the `repack.*` config series.
     -
     -    Note:
     -    - `repack.packKeptObjects` will be addressed by Derrick Stolee in
     -    the following patch
     -    - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
     -    requires `--all` to be passed onto `git pack-objects`, which is very
     -    slow. I think it would be nice to have this in a future patch.
     +    Note that `repack.writeBitmaps` configuration is ignored, as the
     +    pack bitmap facility is useful only with a single packfile.
      
          Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
      
     @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t bat
       	struct strbuf base_name = STRBUF_INIT;
       	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
      +	int delta_base_offset = 1;
     -+	int use_delta_islands;
     ++	int use_delta_islands = 0;
       
       	if (!m)
       		return 0;
 2:  3d7b334f5c6 = 2:  988697dd512 multi-pack-index: respect repack.packKeptObjects=false
 -:  ----------- > 3:  efeb3d7d132 Ensured t5319 follows arith expansion guideline

-- 
gitgitgadget

^ permalink raw reply	[relevance 7%]

* [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline
  2020-05-09 14:24  7%   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  2020-05-09 14:24  7%     ` [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
  2020-05-09 14:24  4%     ` [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
@ 2020-05-09 14:24  6%     ` Son Luong Ngoc via GitGitGadget
  2020-05-09 16:55  0%       ` Junio C Hamano
  2020-05-10 16:07  4%     ` [PATCH v4 0/2] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  3 siblings, 1 reply; 122+ results
From: Son Luong Ngoc via GitGitGadget @ 2020-05-09 14:24 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Son Luong Ngoc

From: Son Luong Ngoc <sluongng@gmail.com>

As the old versions of dash is deprecated, dollar-sign inside
artihmetic expansion is no longer needed.
This ensures t5319 follows the coding guideline updated
in 'jk/arith-expansion-coding-guidelines' 6d4bf5813cd2c1a3b93fd4f0b231733f82133cce.

Reported-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
 t/t5319-multi-pack-index.sh | 38 ++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 67afe1bb8d9..065f48747f3 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -62,8 +62,8 @@ generate_objects () {
 	} >wide_delta_$iii &&
 	{
 		test-tool genrandom "foo"$i 100 &&
-		test-tool genrandom "foo"$(( $i + 1 )) 100 &&
-		test-tool genrandom "foo"$(( $i + 2 )) 100
+		test-tool genrandom "foo"$(( i + 1 )) 100 &&
+		test-tool genrandom "foo"$(( i + 2 )) 100
 	} >deep_delta_$iii &&
 	{
 		echo $iii &&
@@ -251,21 +251,21 @@ MIDX_BYTE_OID_VERSION=5
 MIDX_BYTE_CHUNK_COUNT=6
 MIDX_HEADER_SIZE=12
 MIDX_BYTE_CHUNK_ID=$MIDX_HEADER_SIZE
-MIDX_BYTE_CHUNK_OFFSET=$(($MIDX_HEADER_SIZE + 4))
+MIDX_BYTE_CHUNK_OFFSET=$((MIDX_HEADER_SIZE + 4))
 MIDX_NUM_CHUNKS=5
 MIDX_CHUNK_LOOKUP_WIDTH=12
-MIDX_OFFSET_PACKNAMES=$(($MIDX_HEADER_SIZE + \
-			 $MIDX_NUM_CHUNKS * $MIDX_CHUNK_LOOKUP_WIDTH))
-MIDX_BYTE_PACKNAME_ORDER=$(($MIDX_OFFSET_PACKNAMES + 2))
-MIDX_OFFSET_OID_FANOUT=$(($MIDX_OFFSET_PACKNAMES + $(test_oid packnameoff)))
+MIDX_OFFSET_PACKNAMES=$((MIDX_HEADER_SIZE + \
+			 MIDX_NUM_CHUNKS * MIDX_CHUNK_LOOKUP_WIDTH))
+MIDX_BYTE_PACKNAME_ORDER=$((MIDX_OFFSET_PACKNAMES + 2))
+MIDX_OFFSET_OID_FANOUT=$((MIDX_OFFSET_PACKNAMES + $(test_oid packnameoff)))
 MIDX_OID_FANOUT_WIDTH=4
-MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * $MIDX_OID_FANOUT_WIDTH + $(test_oid fanoutoff)))
-MIDX_OFFSET_OID_LOOKUP=$(($MIDX_OFFSET_OID_FANOUT + 256 * $MIDX_OID_FANOUT_WIDTH))
-MIDX_BYTE_OID_LOOKUP=$(($MIDX_OFFSET_OID_LOOKUP + 16 * $HASH_LEN))
-MIDX_OFFSET_OBJECT_OFFSETS=$(($MIDX_OFFSET_OID_LOOKUP + $NUM_OBJECTS * $HASH_LEN))
+MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * MIDX_OID_FANOUT_WIDTH + $(test_oid fanoutoff)))
+MIDX_OFFSET_OID_LOOKUP=$((MIDX_OFFSET_OID_FANOUT + 256 * MIDX_OID_FANOUT_WIDTH))
+MIDX_BYTE_OID_LOOKUP=$((MIDX_OFFSET_OID_LOOKUP + 16 * HASH_LEN))
+MIDX_OFFSET_OBJECT_OFFSETS=$((MIDX_OFFSET_OID_LOOKUP + NUM_OBJECTS * HASH_LEN))
 MIDX_OFFSET_WIDTH=8
-MIDX_BYTE_PACK_INT_ID=$(($MIDX_OFFSET_OBJECT_OFFSETS + 16 * $MIDX_OFFSET_WIDTH + 2))
-MIDX_BYTE_OFFSET=$(($MIDX_OFFSET_OBJECT_OFFSETS + 16 * $MIDX_OFFSET_WIDTH + 6))
+MIDX_BYTE_PACK_INT_ID=$((MIDX_OFFSET_OBJECT_OFFSETS + 16 * MIDX_OFFSET_WIDTH + 2))
+MIDX_BYTE_OFFSET=$((MIDX_OFFSET_OBJECT_OFFSETS + 16 * MIDX_OFFSET_WIDTH + 6))
 
 test_expect_success 'verify bad version' '
 	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -417,10 +417,10 @@ test_expect_success 'verify multi-pack-index with 64-bit offsets' '
 
 NUM_OBJECTS=63
 MIDX_OFFSET_OID_FANOUT=$((MIDX_OFFSET_PACKNAMES + 54))
-MIDX_OFFSET_OID_LOOKUP=$((MIDX_OFFSET_OID_FANOUT + 256 * $MIDX_OID_FANOUT_WIDTH))
-MIDX_OFFSET_OBJECT_OFFSETS=$(($MIDX_OFFSET_OID_LOOKUP + $NUM_OBJECTS * $HASH_LEN))
-MIDX_OFFSET_LARGE_OFFSETS=$(($MIDX_OFFSET_OBJECT_OFFSETS + $NUM_OBJECTS * $MIDX_OFFSET_WIDTH))
-MIDX_BYTE_LARGE_OFFSET=$(($MIDX_OFFSET_LARGE_OFFSETS + 3))
+MIDX_OFFSET_OID_LOOKUP=$((MIDX_OFFSET_OID_FANOUT + 256 * MIDX_OID_FANOUT_WIDTH))
+MIDX_OFFSET_OBJECT_OFFSETS=$((MIDX_OFFSET_OID_LOOKUP + NUM_OBJECTS * HASH_LEN))
+MIDX_OFFSET_LARGE_OFFSETS=$((MIDX_OFFSET_OBJECT_OFFSETS + NUM_OBJECTS * MIDX_OFFSET_WIDTH))
+MIDX_BYTE_LARGE_OFFSET=$((MIDX_OFFSET_LARGE_OFFSETS + 3))
 
 test_expect_success 'verify incorrect 64-bit offset' '
 	corrupt_midx_and_verify $MIDX_BYTE_LARGE_OFFSET "\07" objects64 \
@@ -555,7 +555,7 @@ test_expect_success 'repack respects repack.packKeptObjects=false' '
 		test-tool read-midx .git/objects | grep idx >midx-list &&
 		test_line_count = 5 midx-list &&
 		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
-		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
+		BATCH_SIZE=$((THIRD_SMALLEST_SIZE + 1)) &&
 		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
 		ls .git/objects/pack/*idx >idx-list &&
 		test_line_count = 5 idx-list &&
@@ -570,7 +570,7 @@ test_expect_success 'repack creates a new pack' '
 		ls .git/objects/pack/*idx >idx-list &&
 		test_line_count = 5 idx-list &&
 		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
-		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
+		BATCH_SIZE=$((THIRD_SMALLEST_SIZE + 1)) &&
 		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
 		ls .git/objects/pack/*idx >idx-list &&
 		test_line_count = 6 idx-list &&
-- 
gitgitgadget

^ permalink raw reply related	[relevance 6%]

* [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false
  2020-05-09 14:24  7%   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  2020-05-09 14:24  7%     ` [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
@ 2020-05-09 14:24  4%     ` Derrick Stolee via GitGitGadget
    2020-05-09 14:24  6%     ` [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline Son Luong Ngoc via GitGitGadget
  2020-05-10 16:07  4%     ` [PATCH v4 0/2] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  3 siblings, 1 reply; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-05-09 14:24 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When selecting a batch of pack-files to repack in the "git
multi-pack-index repack" command, Git should respect the
repack.packKeptObjects config option. When false, this option says that
the pack-files with an associated ".keep" file should not be repacked.
This config value is "false" by default.

There are two cases for selecting a batch of objects. The first is the
case where the input batch-size is zero, which specifies "repack
everything". The second is with a non-zero batch size, which selects
pack-files using a greedy selection criteria. Both of these cases are
updated and tested.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt |  3 +++
 midx.c                                 | 26 +++++++++++++++++++++-----
 t/t5319-multi-pack-index.sh            | 26 ++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 642d9ac5b72..0c6619493c1 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -56,6 +56,9 @@ repack::
 	file is created, rewrite the multi-pack-index to reference the
 	new pack-file. A later run of 'git multi-pack-index expire' will
 	delete the pack-files that were part of this batch.
++
+If `repack.packKeptObjects` is `false`, then any pack-files with an
+associated `.keep` file will not be selected for the batch to repack.
 
 
 EXAMPLES
diff --git a/midx.c b/midx.c
index 1e76be56826..9b14d915db1 100644
--- a/midx.c
+++ b/midx.c
@@ -1293,15 +1293,26 @@ static int compare_by_mtime(const void *a_, const void *b_)
 	return 0;
 }
 
-static int fill_included_packs_all(struct multi_pack_index *m,
+static int fill_included_packs_all(struct repository *r,
+				   struct multi_pack_index *m,
 				   unsigned char *include_pack)
 {
-	uint32_t i;
+	uint32_t i, count = 0;
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(r, m, i))
+			continue;
+		if (!pack_kept_objects && m->packs[i]->pack_keep)
+			continue;
 
-	for (i = 0; i < m->num_packs; i++)
 		include_pack[i] = 1;
+		count++;
+	}
 
-	return m->num_packs < 2;
+	return count < 2;
 }
 
 static int fill_included_packs_batch(struct repository *r,
@@ -1312,6 +1323,9 @@ static int fill_included_packs_batch(struct repository *r,
 	uint32_t i, packs_to_repack;
 	size_t total_size;
 	struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info));
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
 		pack_info[i].pack_int_id = i;
@@ -1338,6 +1352,8 @@ static int fill_included_packs_batch(struct repository *r,
 
 		if (!p)
 			continue;
+		if (!pack_kept_objects && p->pack_keep)
+			continue;
 		if (open_pack_index(p) || !p->num_objects)
 			continue;
 
@@ -1380,7 +1396,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	if (batch_size) {
 		if (fill_included_packs_batch(r, m, include_pack, batch_size))
 			goto cleanup;
-	} else if (fill_included_packs_all(m, include_pack))
+	} else if (fill_included_packs_all(r, m, include_pack))
 		goto cleanup;
 
 	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 030a7222b2a..67afe1bb8d9 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -538,6 +538,32 @@ test_expect_success 'repack with minimum size does not alter existing packs' '
 	)
 '
 
+test_expect_success 'repack respects repack.packKeptObjects=false' '
+	test_when_finished rm -f dup/.git/objects/pack/*keep &&
+	(
+		cd dup &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
+		for keep in $(cat keep-list)
+		do
+			touch $keep || return 1
+		done &&
+		git multi-pack-index repack --batch-size=0 &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list &&
+		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
+		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
+		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list
+	)
+'
+
 test_expect_success 'repack creates a new pack' '
 	(
 		cd dup &&
-- 
gitgitgadget


^ permalink raw reply related	[relevance 4%]

* Re: [PATCH v2 1/2] midx: apply gitconfig to midx repack
  2020-05-06 17:03  0%     ` Junio C Hamano
@ 2020-05-07  7:29  6%       ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-05-07  7:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Son Luong Ngoc via GitGitGadget, git, Derrick Stolee

Hi Junio,

Thanks for the feedbacks

> On May 6, 2020, at 19:03, Junio C Hamano <gitster@pobox.com> wrote:

...
> We write this part in imperative mood, as if
> we are giving an order to the codebase to "become like so".  We do
> not say "I do X, I do Y".

This is a great feedback.
I will try to include all of your suggestions and edit the message
before submitting V3.

>> Note:
>> - `repack.packKeptObjects` will be addressed by Derrick Stolee in
>> the following patch
> 
> This definitely does not belong to the commit log message.  It would
> make a helpful note meant for the reviewers if written below the
> three-dash line, though.

Duly noted.

> Do we need to worry about the configuration variables understood by
> the "git pack-objects" command to get in the way, by the way?
> "pack.packsizelimit" may cause "git repack" to produce more than one
> packfile, and if this codepath wants to avoid it (I do not know if
> that is the case), it may have to override it from the command line,
> for example.

I dont think we want to avoid the packsizelimit here.
The point of repacking with midx is to help
end users consolidate multiple packfile in a non-disruptive way.

If you wish to put a constraint (i.e. packsizelimit, packKeptObjects) on this process,
you should be able to.

>> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
>> ---
>> midx.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>> 
>> diff --git a/midx.c b/midx.c
>> index 9a61d3b37d9..3348f8e569b 100644
>> --- a/midx.c
>> +++ b/midx.c
>> @@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>> 	struct child_process cmd = CHILD_PROCESS_INIT;
>> 	struct strbuf base_name = STRBUF_INIT;
>> 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
>> +	int delta_base_offset = 1;
> 
> By default we use delta-base-offset, so if repo_config_get_bool()
> did not see the repack.usedeltabaseoffset configuration defined in
> any configuration file, we still want to see 1 after it returns.
> 
>> +	int use_delta_islands;
> 
> What is the reason why it is safe to leave this uninitialized?  Did
> you mean 
> 
> 	int use_delta_islands = 0;
> 
> here?

I think I totally misread how repo_config_get_bool() supposed to work
Your comment here made me re-read it and things got a lot clearer.

Will set the default value to 0 in next version.

> Thanks.

Much appreciate,
Son Luong.

^ permalink raw reply	[relevance 6%]

* Re: [PATCH v2 1/2] midx: apply gitconfig to midx repack
  2020-05-06  9:43  7%   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
  2020-05-06 12:03  0%     ` Derrick Stolee
@ 2020-05-06 17:03  0%     ` Junio C Hamano
  2020-05-07  7:29  6%       ` Son Luong Ngoc
  1 sibling, 1 reply; 122+ results
From: Junio C Hamano @ 2020-05-06 17:03 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget; +Cc: git, Son Luong Ngoc, Derrick Stolee

"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Multi-Pack-Index repack is an incremental, repack solutions
> that allows user to consolidate multiple packfiles in a non-disruptive
> way. However the new packfile could be created without some of the
> capabilities of a packfile that is created by calling `git repack`.

It may be clear to you who wrote the patch, but it is quite unclear
to readers how `repack` gets into the picture.  The first sentence
talks about what "git multi-pack-index repack" subcommand.  Unless
you mention that that "git multi-pack-index repack" subcommand calls
"git repack" under the hood in order to create a new packfile, the
second paragraph can be read as if you are pointing out a problem if
the user did

	$ git multi-pack-index repack
	$ git repack

and the explicit "repack" initiated by the user may create a
packfile that is somehow incompatible with what the previous repack
wanted to do, or something like that.

> This is because with `git repack`, there are configuration that would
> enable different flags to be passed down to `git pack-objects` plumbing.

And this does not help to clear the possible confusion, either.

I think all of the above is clearer if you rewrite the above
(including the title) like so:

    midx: teach "git multi-pack-index repack" honor "git repack" configuration

    When the "repack" subcommand of "git multi-pack-index" command
    creates new packfile(s), it does not call the "git repack"
    command but instead directly calls the "git pack-objects"
    command, and the configuration variables meant for the "git
    repack" command, like "repack.usedaeltabaseoffset", are ignored.

Now the problem description is behind us, let's see the description
of proposed solution.  We write this part in imperative mood, as if
we are giving an order to the codebase to "become like so".  We do
not say "I do X, I do Y".

> In this patch, I applies those flags into `git multi-pack-index repack`
> so that it respect the `repack.*` config series.

    Check the configuration variables used by "git repack" ourselves
    and pass the corresponding options to underlying "git pack-objects"
    in this codepath.


> Note:
> - `repack.packKeptObjects` will be addressed by Derrick Stolee in
> the following patch

This definitely does not belong to the commit log message.  It would
make a helpful note meant for the reviewers if written below the
three-dash line, though.

> - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
> requires `--all` to be passed onto `git pack-objects`, which is very
> slow. I think it would be nice to have this in a future patch.

The phrasing makes it hard to grok.  Do you want to say that the
repack.writeBitmaps configuration variable is ignored?

I think Derrick gave you the reason why bitmaps is not compatible
with midx in general, and that would be a better rationale to record
why the configuration is ignored.  Perhaps like

    Note that `repack.writeBitmaps` configuration is ignored, as the
    pack bitmap faciility is useful only with a single packfile.

or something like that?

Do we need to worry about the configuration variables understood by
the "git pack-objects" command to get in the way, by the way?
"pack.packsizelimit" may cause "git repack" to produce more than one
packfile, and if this codepath wants to avoid it (I do not know if
that is the case), it may have to override it from the command line,
for example.

> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
> ---
>  midx.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/midx.c b/midx.c
> index 9a61d3b37d9..3348f8e569b 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>  	struct child_process cmd = CHILD_PROCESS_INIT;
>  	struct strbuf base_name = STRBUF_INIT;
>  	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
> +	int delta_base_offset = 1;

By default we use delta-base-offset, so if repo_config_get_bool()
did not see the repack.usedeltabaseoffset configuration defined in
any configuration file, we still want to see 1 after it returns.

> +	int use_delta_islands;

What is the reason why it is safe to leave this uninitialized?  Did
you mean 

	int use_delta_islands = 0;

here?

> @@ -1381,12 +1383,20 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>  	} else if (fill_included_packs_all(m, include_pack))
>  		goto cleanup;
>  
> +	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
> +	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
> +
>  	argv_array_push(&cmd.args, "pack-objects");
>  
>  	strbuf_addstr(&base_name, object_dir);
>  	strbuf_addstr(&base_name, "/pack/pack");
>  	argv_array_push(&cmd.args, base_name.buf);
>  
> +	if (delta_base_offset)
> +		argv_array_push(&cmd.args, "--delta-base-offset");
> +	if (use_delta_islands)
> +		argv_array_push(&cmd.args, "--delta-islands");
> +

These look like good changes.

>  	if (flags & MIDX_PROGRESS)
>  		argv_array_push(&cmd.args, "--progress");
>  	else

Thanks.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 1/2] midx: apply gitconfig to midx repack
  2020-05-06  9:43  7%   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
@ 2020-05-06 12:03  0%     ` Derrick Stolee
  2020-05-06 17:03  0%     ` Junio C Hamano
  1 sibling, 0 replies; 122+ results
From: Derrick Stolee @ 2020-05-06 12:03 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget, git; +Cc: Son Luong Ngoc

On 5/6/2020 5:43 AM, Son Luong Ngoc via GitGitGadget wrote:
> From: Son Luong Ngoc <sluongng@gmail.com>
...
> - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
> requires `--all` to be passed onto `git pack-objects`, which is very
> slow. I think it would be nice to have this in a future patch.

Just my two cents here: the reachability bitmaps are really tied to the
idea of a single pack right now. To create bitmaps, I would currently
suggest using the 'git repack' builtin with the proper options. That
command deletes the multi-pack-index, unfortunately, but it also produces
a single pack and deletes the others (when creating bitmaps).

You are right that the `--all` option required to pack-objects is not
appropriate to add inside `git multi-pack-index repack` as that changes
the pattern. It requires loading all reachable objects, even if they are
not already in packs covered by the multi-pack-index. This at minimum
violates expectations with the --batch-size argument.

Integrating reachability bitmaps more closely with the multi-pack-index
is certainly on our radar, but is a large endeavor.

This new patch looks good to me.

Thanks,
-Stolee

^ permalink raw reply	[relevance 0%]

* [PATCH v2 1/2] midx: apply gitconfig to midx repack
  2020-05-06  9:43  6% ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
@ 2020-05-06  9:43  7%   ` Son Luong Ngoc via GitGitGadget
  2020-05-06 12:03  0%     ` Derrick Stolee
  2020-05-06 17:03  0%     ` Junio C Hamano
  2020-05-06  9:43  4%   ` [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
  2020-05-09 14:24  7%   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  2 siblings, 2 replies; 122+ results
From: Son Luong Ngoc via GitGitGadget @ 2020-05-06  9:43 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Son Luong Ngoc

From: Son Luong Ngoc <sluongng@gmail.com>

Multi-Pack-Index repack is an incremental, repack solutions
that allows user to consolidate multiple packfiles in a non-disruptive
way. However the new packfile could be created without some of the
capabilities of a packfile that is created by calling `git repack`.

This is because with `git repack`, there are configuration that would
enable different flags to be passed down to `git pack-objects` plumbing.

In this patch, I applies those flags into `git multi-pack-index repack`
so that it respect the `repack.*` config series.

Note:
- `repack.packKeptObjects` will be addressed by Derrick Stolee in
the following patch
- `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
requires `--all` to be passed onto `git pack-objects`, which is very
slow. I think it would be nice to have this in a future patch.

Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
 midx.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/midx.c b/midx.c
index 9a61d3b37d9..3348f8e569b 100644
--- a/midx.c
+++ b/midx.c
@@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct strbuf base_name = STRBUF_INIT;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+	int delta_base_offset = 1;
+	int use_delta_islands;
 
 	if (!m)
 		return 0;
@@ -1381,12 +1383,20 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	} else if (fill_included_packs_all(m, include_pack))
 		goto cleanup;
 
+	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
+	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
+
 	argv_array_push(&cmd.args, "pack-objects");
 
 	strbuf_addstr(&base_name, object_dir);
 	strbuf_addstr(&base_name, "/pack/pack");
 	argv_array_push(&cmd.args, base_name.buf);
 
+	if (delta_base_offset)
+		argv_array_push(&cmd.args, "--delta-base-offset");
+	if (use_delta_islands)
+		argv_array_push(&cmd.args, "--delta-islands");
+
 	if (flags & MIDX_PROGRESS)
 		argv_array_push(&cmd.args, "--progress");
 	else
-- 
gitgitgadget


^ permalink raw reply related	[relevance 7%]

* [PATCH v2 0/2] midx: apply gitconfig to midx repack
  2020-05-05 13:06  7% [PATCH] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  2020-05-05 13:50  3% ` Derrick Stolee
@ 2020-05-06  9:43  6% ` Son Luong Ngoc via GitGitGadget
  2020-05-06  9:43  7%   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
                     ` (2 more replies)
  1 sibling, 3 replies; 122+ results
From: Son Luong Ngoc via GitGitGadget @ 2020-05-06  9:43 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc

Midx repack has largely been used in Microsoft Scalar on the client side to
optimize the repository multiple packs state. However when I tried to apply
this onto the server-side, I realized that there are certain features that
were lacking compare to git repack. Most of these features are highly
desirable on the server-side to create the most optimized pack possible.

One of the example is delta_base_offset, comparing an midx repack
with/without delta_base_offset, we can observe significant size differences.

> du objects/pack/*pack
14536   objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack
9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack

Latest 2.26.2 (without delta_base_offset)
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack

With delta_base_offset
> git version
git version 2.26.2.672.g232c24e857.dirty
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack

In this patch, I intentionally leaving out repack.writeBitmaps as I see that
it might need some update on pack-objects to improve the performance

Derrick Stolee following patch with address repack. packKeptObjects support.

Derrick Stolee (1):
  multi-pack-index: respect repack.packKeptObjects=false

Son Luong Ngoc (1):
  midx: apply gitconfig to midx repack

 Documentation/git-multi-pack-index.txt |  3 +++
 midx.c                                 | 36 ++++++++++++++++++++++----
 t/t5319-multi-pack-index.sh            | 26 +++++++++++++++++++
 3 files changed, 60 insertions(+), 5 deletions(-)


base-commit: b34789c0b0d3b137f0bb516b417bd8d75e0cb306
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/626

Range-diff vs v1:

 1:  215c882a503 ! 1:  21c648cc486 midx: apply gitconfig to midx repack
     @@ Commit message
          In this patch, I applies those flags into `git multi-pack-index repack`
          so that it respect the `repack.*` config series.
      
     -    Note: I left out `repack.packKeptObjects` intentionally as I dont think
     -    its relevant to midx repack use case.
     +    Note:
     +    - `repack.packKeptObjects` will be addressed by Derrick Stolee in
     +    the following patch
     +    - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
     +    requires `--all` to be passed onto `git pack-objects`, which is very
     +    slow. I think it would be nice to have this in a future patch.
      
          Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
      
       ## midx.c ##
     -@@ midx.c: static int fill_included_packs_batch(struct repository *r,
     - 	return 0;
     - }
     +@@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
     + 	struct child_process cmd = CHILD_PROCESS_INIT;
     + 	struct strbuf base_name = STRBUF_INIT;
     + 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
     ++	int delta_base_offset = 1;
     ++	int use_delta_islands;
       
     -+static int delta_base_offset = 1;
     -+static int write_bitmaps = -1;
     -+static int use_delta_islands;
     -+
     - int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
     - {
     - 	int result = 0;
     + 	if (!m)
     + 		return 0;
      @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
       	} else if (fill_included_packs_all(m, include_pack))
       		goto cleanup;
       
     -+  git_config_get_bool("repack.usedeltabaseoffset", &delta_base_offset);
     -+  git_config_get_bool("repack.writebitmaps", &write_bitmaps);
     -+  git_config_get_bool("repack.usedeltaislands", &use_delta_islands);
     ++	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
     ++	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
      +
       	argv_array_push(&cmd.args, "pack-objects");
       
     @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t bat
      +		argv_array_push(&cmd.args, "--delta-base-offset");
      +	if (use_delta_islands)
      +		argv_array_push(&cmd.args, "--delta-islands");
     -+	if (write_bitmaps > 0)
     -+		argv_array_push(&cmd.args, "--write-bitmap-index");
     -+	else if (write_bitmaps < 0)
     -+		argv_array_push(&cmd.args, "--write-bitmap-index-quiet");
      +
       	if (flags & MIDX_PROGRESS)
       		argv_array_push(&cmd.args, "--progress");
 -:  ----------- > 2:  3d7b334f5c6 multi-pack-index: respect repack.packKeptObjects=false

-- 
gitgitgadget

^ permalink raw reply	[relevance 6%]

* [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false
  2020-05-06  9:43  6% ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
  2020-05-06  9:43  7%   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
@ 2020-05-06  9:43  4%   ` Derrick Stolee via GitGitGadget
  2020-05-09 14:24  7%   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  2 siblings, 0 replies; 122+ results
From: Derrick Stolee via GitGitGadget @ 2020-05-06  9:43 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When selecting a batch of pack-files to repack in the "git
multi-pack-index repack" command, Git should respect the
repack.packKeptObjects config option. When false, this option says that
the pack-files with an associated ".keep" file should not be repacked.
This config value is "false" by default.

There are two cases for selecting a batch of objects. The first is the
case where the input batch-size is zero, which specifies "repack
everything". The second is with a non-zero batch size, which selects
pack-files using a greedy selection criteria. Both of these cases are
updated and tested.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt |  3 +++
 midx.c                                 | 26 +++++++++++++++++++++-----
 t/t5319-multi-pack-index.sh            | 26 ++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 642d9ac5b72..0c6619493c1 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -56,6 +56,9 @@ repack::
 	file is created, rewrite the multi-pack-index to reference the
 	new pack-file. A later run of 'git multi-pack-index expire' will
 	delete the pack-files that were part of this batch.
++
+If `repack.packKeptObjects` is `false`, then any pack-files with an
+associated `.keep` file will not be selected for the batch to repack.
 
 
 EXAMPLES
diff --git a/midx.c b/midx.c
index 3348f8e569b..b8a52740832 100644
--- a/midx.c
+++ b/midx.c
@@ -1293,15 +1293,26 @@ static int compare_by_mtime(const void *a_, const void *b_)
 	return 0;
 }
 
-static int fill_included_packs_all(struct multi_pack_index *m,
+static int fill_included_packs_all(struct repository *r,
+				   struct multi_pack_index *m,
 				   unsigned char *include_pack)
 {
-	uint32_t i;
+	uint32_t i, count = 0;
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(r, m, i))
+			continue;
+		if (!pack_kept_objects && m->packs[i]->pack_keep)
+			continue;
 
-	for (i = 0; i < m->num_packs; i++)
 		include_pack[i] = 1;
+		count++;
+	}
 
-	return m->num_packs < 2;
+	return count < 2;
 }
 
 static int fill_included_packs_batch(struct repository *r,
@@ -1312,6 +1323,9 @@ static int fill_included_packs_batch(struct repository *r,
 	uint32_t i, packs_to_repack;
 	size_t total_size;
 	struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info));
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
 		pack_info[i].pack_int_id = i;
@@ -1338,6 +1352,8 @@ static int fill_included_packs_batch(struct repository *r,
 
 		if (!p)
 			continue;
+		if (!pack_kept_objects && p->pack_keep)
+			continue;
 		if (open_pack_index(p) || !p->num_objects)
 			continue;
 
@@ -1380,7 +1396,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	if (batch_size) {
 		if (fill_included_packs_batch(r, m, include_pack, batch_size))
 			goto cleanup;
-	} else if (fill_included_packs_all(m, include_pack))
+	} else if (fill_included_packs_all(r, m, include_pack))
 		goto cleanup;
 
 	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 030a7222b2a..67afe1bb8d9 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -538,6 +538,32 @@ test_expect_success 'repack with minimum size does not alter existing packs' '
 	)
 '
 
+test_expect_success 'repack respects repack.packKeptObjects=false' '
+	test_when_finished rm -f dup/.git/objects/pack/*keep &&
+	(
+		cd dup &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
+		for keep in $(cat keep-list)
+		do
+			touch $keep || return 1
+		done &&
+		git multi-pack-index repack --batch-size=0 &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list &&
+		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
+		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
+		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list
+	)
+'
+
 test_expect_success 'repack creates a new pack' '
 	(
 		cd dup &&
-- 
gitgitgadget

^ permalink raw reply related	[relevance 4%]

* Re: [PATCH] midx: apply gitconfig to midx repack
  2020-05-05 16:03  5%   ` Son Luong Ngoc
@ 2020-05-06  8:56  6%     ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-05-06  8:56 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Son Luong Ngoc via GitGitGadget, git

Hi,

> On May 5, 2020, at 18:03, Son Luong Ngoc <sluongng@gmail.com> wrote:
>> On May 5, 2020, at 15:50, Derrick Stolee <stolee@gmail.com> wrote:
>>> +	if (write_bitmaps > 0)
>>> +		argv_array_push(&cmd.args, "--write-bitmap-index");
>>> +	else if (write_bitmaps < 0)
>>> +		argv_array_push(&cmd.args, "--write-bitmap-index-quiet");
>> 
>> These make less sense. Unless --batch-size=0 and there are no .keep
>> packs (with the patch below) I'm not sure we _can_ write bitmap indexes
>> here. The pack-file is not necessarily closed under reachability. Or,
>> will supplying these arguments to 'git pack-objects' actually do that
>> closure?
>> 
>> I would be happy to special-case these options to the "--batch-size=0"
>> situation and otherwise ignore them. This then gets into enough
>> complication that we should update the documentation as in the patch
>> below.
> 
> You make a great point here. 
> I completely missed this as I have been largely testing with repacking only 2 packs,
> effectively with --batch-size=0.
> 
> I think having the bitmaps index is highly desirable in `--batch-size=0` case.
> I will try to include that in V2 (with Documentation).

Hmm, I just realized that there is a check for `--all` is being passed on pack-objects side.

	if (batch_size == 0) {
		argv_array_push(&cmd.args, "--all");
		if (write_bitmaps > 0)
			argv_array_push(&cmd.args, "--write-bitmap-index");
		else if (write_bitmaps < 0)
			argv_array_push(&cmd.args, "--write-bitmap-index-quiet");
	}

If I do something like this, the midx repack will become tremendously slow as I think pack-objects
needs to scan for all revs (fed from midx) and all refs.
Perhaps special exception needed to be made on pack-objects side to trust that midx is feeding it
everything there is?

I think adding `write_bitmaps` support would be a bit out of my hand for now, so I will settle with
the delta configs and Derrick's patch for V2. (sending it later today)

>> Thanks,
>> -Stolee
> 
> Cheers,
> Son Luong
> 


^ permalink raw reply	[relevance 6%]

* Re: [PATCH] midx: apply gitconfig to midx repack
  2020-05-05 13:50  3% ` Derrick Stolee
@ 2020-05-05 16:03  5%   ` Son Luong Ngoc
  2020-05-06  8:56  6%     ` Son Luong Ngoc
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-05-05 16:03 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Son Luong Ngoc via GitGitGadget, git

Hi Derrick,

Thanks for a swift and comprehensive review.

> On May 5, 2020, at 15:50, Derrick Stolee <stolee@gmail.com> wrote:
> 
> In the scenario where there is a .keep pack _and_ it is small enough to get
> picked up by the batch size, the 'git multi-pack-index repack' command will
> create a new pack containing its objects (and objects from other packs) but
> the 'git multi-pack-index expire' command will not delete the pack with .keep.
> 
> The good news is that after the first repack, the objects in the pack are
> in a newer pack, so the multi-pack-index will not repack those objects from
> that pack multiple times. However, this may be unintended behavior for the
> user that specified the .keep pack.

Yup I experienced exactly this when trying to test midx repack/expire
with biggest pack file marked with `.keep`.
Luckily the storage size bump for duplicated objects was not noticeable in my case.
You worded the situation precisely.

> I think the right thing to do to respect "repack.packKeptObjects = false" is
> to ignore the packs when selecting the batch of objects. Instead of asking
> you to do this, I added a patch below. Please take it into your v2, if you
> don't mind.

Gladly.
This should help me a lot for re-rolling V2.

>> +static int delta_base_offset = 1;
>> +static int write_bitmaps = -1;
>> +static int use_delta_islands;
>> +
> 
> Why not make these local to the midx_repack method?

No practical reason except me shamelessly lifted those from builtin/repack.c.
I was a bit confused how `git repack` houses these logic in the builtin file,
while midx was having these logic in the midx.c instead of builtin/multi-pack-index.c.

I make them local in V2.

>> int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
>> {
>> 	int result = 0;
>> @@ -1381,12 +1385,25 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>> 	} else if (fill_included_packs_all(m, include_pack))
>> 		goto cleanup;
>> 
>> +  git_config_get_bool("repack.usedeltabaseoffset", &delta_base_offset);
>> +  git_config_get_bool("repack.writebitmaps", &write_bitmaps);
>> +  git_config_get_bool("repack.usedeltaislands", &use_delta_islands);
>> +
> 
> It looks like you have some spacing issues here. Perhaps use tabs?

Rookie mistake on my part. Will fix it in V2

>> +	if (write_bitmaps > 0)
>> +		argv_array_push(&cmd.args, "--write-bitmap-index");
>> +	else if (write_bitmaps < 0)
>> +		argv_array_push(&cmd.args, "--write-bitmap-index-quiet");
> 
> These make less sense. Unless --batch-size=0 and there are no .keep
> packs (with the patch below) I'm not sure we _can_ write bitmap indexes
> here. The pack-file is not necessarily closed under reachability. Or,
> will supplying these arguments to 'git pack-objects' actually do that
> closure?
> 
> I would be happy to special-case these options to the "--batch-size=0"
> situation and otherwise ignore them. This then gets into enough
> complication that we should update the documentation as in the patch
> below.

You make a great point here. 
I completely missed this as I have been largely testing with repacking only 2 packs,
effectively with --batch-size=0.

I think having the bitmaps index is highly desirable in `--batch-size=0` case.
I will try to include that in V2 (with Documentation).

> At minimum, it would be good to have some tests that exercise these
> code paths so we know they are behaving correctly.

I will do some readings with the current tests for repack and midx.
Hopefully I will have something for V2. (^_^ !)

> Thanks,
> -Stolee

Cheers,
Son Luong


^ permalink raw reply	[relevance 5%]

* Re: [PATCH] midx: apply gitconfig to midx repack
  2020-05-05 13:06  7% [PATCH] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
@ 2020-05-05 13:50  3% ` Derrick Stolee
  2020-05-05 16:03  5%   ` Son Luong Ngoc
  2020-05-06  9:43  6% ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
  1 sibling, 1 reply; 122+ results
From: Derrick Stolee @ 2020-05-05 13:50 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget, git; +Cc: Son Luong Ngoc

On 5/5/2020 9:06 AM, Son Luong Ngoc via GitGitGadget wrote:
> From: Son Luong Ngoc <sluongng@gmail.com>
> 
> Multi-Pack-Index repack is an incremental, repack solutions
> that allows user to consolidate multiple packfiles in a non-disruptive
> way. However the new packfile could be created without some of the
> capabilities of a packfile that is created by calling `git repack`.
> 
> This is because with `git repack`, there are configuration that would
> enable different flags to be passed down to `git pack-objects` plumbing.
> 
> In this patch, I applies those flags into `git multi-pack-index repack`
> so that it respect the `repack.*` config series.

This is a good idea! The fact that these are specified by 'git repack'
and not 'git pack-objects' makes intervention here necessary.

However, I don't think that all of these will apply properly.

> Note: I left out `repack.packKeptObjects` intentionally as I dont think
> its relevant to midx repack use case.

I think it would be good to add this, but in a different way.

> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
> ---
>     midx: apply gitconfig to midx repack
>     
>     Midx repack has largely been used in Microsoft Scalar on the client side
>     to optimize the repository multiple packs state. However when I tried to
>     apply this onto the server-side, I realized that there are certain
>     features that were lacking compare to git repack. Most of these features
>     are highly desirable on the server-side to create the most optimized
>     pack possible.
>     
>     One of the example is delta_base_offset, comparing an midx repack
>     with/without delta_base_offset, we can observe significant size
>     differences.
>     
>     > du objects/pack/*pack
>     14536   objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack
>     9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack
>     
>     Latest 2.26.2 (without delta_base_offset)
>     > git multi-pack-index write
>     > git multi-pack-index repack
>     > git multi-pack-index expire
>     > du objects/pack/*pack
>     9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack
>     
>     With delta_base_offset
>     > git version
>     git version 2.26.2.672.g232c24e857.dirty
>     > git multi-pack-index write
>     > git multi-pack-index repack
>     > git multi-pack-index expire
>     > du objects/pack/*pack
>     9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack
>     
>     In this patch, I intentionally leaving out repack.packKeptObjects as I
>     don't think its very relevant to midx repack use case:
>     
>      * One could always exclude biggest packs with --batch-size option
>        
>        
>      * For non-biggest-packs exclusion use case, its rather rare (unless you
>        want to have a special pack with only commits and trees being
>        excluded from repack to serve partial clone better?)
>        
>        
>     
>     Please let me know if anyone think that we should include that option
>     for the sake of completions.

In the scenario where there is a .keep pack _and_ it is small enough to get
picked up by the batch size, the 'git multi-pack-index repack' command will
create a new pack containing its objects (and objects from other packs) but
the 'git multi-pack-index expire' command will not delete the pack with .keep.

The good news is that after the first repack, the objects in the pack are
in a newer pack, so the multi-pack-index will not repack those objects from
that pack multiple times. However, this may be unintended behavior for the
user that specified the .keep pack.

I think the right thing to do to respect "repack.packKeptObjects = false" is
to ignore the packs when selecting the batch of objects. Instead of asking
you to do this, I added a patch below. Please take it into your v2, if you
don't mind.
 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/626
> 
>  midx.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/midx.c b/midx.c
> index 9a61d3b37d9..88f16594268 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1361,6 +1361,10 @@ static int fill_included_packs_batch(struct repository *r,
>  	return 0;
>  }
>  
> +static int delta_base_offset = 1;
> +static int write_bitmaps = -1;
> +static int use_delta_islands;
> +

Why not make these local to the midx_repack method?

>  int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
>  {
>  	int result = 0;
> @@ -1381,12 +1385,25 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>  	} else if (fill_included_packs_all(m, include_pack))
>  		goto cleanup;
>  
> +  git_config_get_bool("repack.usedeltabaseoffset", &delta_base_offset);
> +  git_config_get_bool("repack.writebitmaps", &write_bitmaps);
> +  git_config_get_bool("repack.usedeltaislands", &use_delta_islands);
> +

It looks like you have some spacing issues here. Perhaps use tabs?

>  	argv_array_push(&cmd.args, "pack-objects");
>  
>  	strbuf_addstr(&base_name, object_dir);
>  	strbuf_addstr(&base_name, "/pack/pack");
>  	argv_array_push(&cmd.args, base_name.buf);
>  
> +	if (delta_base_offset)
> +		argv_array_push(&cmd.args, "--delta-base-offset");
> +	if (use_delta_islands)
> +		argv_array_push(&cmd.args, "--delta-islands");

These two probably make sense.

> +	if (write_bitmaps > 0)
> +		argv_array_push(&cmd.args, "--write-bitmap-index");
> +	else if (write_bitmaps < 0)
> +		argv_array_push(&cmd.args, "--write-bitmap-index-quiet");

These make less sense. Unless --batch-size=0 and there are no .keep
packs (with the patch below) I'm not sure we _can_ write bitmap indexes
here. The pack-file is not necessarily closed under reachability. Or,
will supplying these arguments to 'git pack-objects' actually do that
closure?

I would be happy to special-case these options to the "--batch-size=0"
situation and otherwise ignore them. This then gets into enough
complication that we should update the documentation as in the patch
below.

At minimum, it would be good to have some tests that exercise these
code paths so we know they are behaving correctly.

Thanks,
-Stolee


-- >8 --
From 8a115191cbf21c553675a235c8c678affbca609b Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Tue, 5 May 2020 09:37:50 -0400
Subject: [PATCH] multi-pack-index: respect repack.packKeptObjects=false

When selecting a batch of pack-files to repack in the "git
multi-pack-index repack" command, Git should respect the
repack.packKeptObjects config option. When false, this option says that
the pack-files with an associated ".keep" file should not be repacked.
This config value is "false" by default.

There are two cases for selecting a batch of objects. The first is the
case where the input batch-size is zero, which specifies "repack
everything". The second is with a non-zero batch size, which selects
pack-files using a greedy selection criteria. Both of these cases are
updated and tested.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt |  3 +++
 midx.c                                 | 26 +++++++++++++++++++++-----
 t/t5319-multi-pack-index.sh            | 26 ++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 642d9ac5b7..0c6619493c 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -56,6 +56,9 @@ repack::
 	file is created, rewrite the multi-pack-index to reference the
 	new pack-file. A later run of 'git multi-pack-index expire' will
 	delete the pack-files that were part of this batch.
++
+If `repack.packKeptObjects` is `false`, then any pack-files with an
+associated `.keep` file will not be selected for the batch to repack.
 
 
 EXAMPLES
diff --git a/midx.c b/midx.c
index 1527e464a7..d055bf3cd3 100644
--- a/midx.c
+++ b/midx.c
@@ -1280,15 +1280,26 @@ static int compare_by_mtime(const void *a_, const void *b_)
 	return 0;
 }
 
-static int fill_included_packs_all(struct multi_pack_index *m,
+static int fill_included_packs_all(struct repository *r,
+				   struct multi_pack_index *m,
 				   unsigned char *include_pack)
 {
-	uint32_t i;
+	uint32_t i, count = 0;
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(r, m, i))
+			continue;
+		if (!pack_kept_objects && m->packs[i]->pack_keep)
+			continue;
 
-	for (i = 0; i < m->num_packs; i++)
 		include_pack[i] = 1;
+		count++;
+	}
 
-	return m->num_packs < 2;
+	return count < 2;
 }
 
 static int fill_included_packs_batch(struct repository *r,
@@ -1299,6 +1310,9 @@ static int fill_included_packs_batch(struct repository *r,
 	uint32_t i, packs_to_repack;
 	size_t total_size;
 	struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info));
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
 		pack_info[i].pack_int_id = i;
@@ -1325,6 +1339,8 @@ static int fill_included_packs_batch(struct repository *r,
 
 		if (!p)
 			continue;
+		if (!pack_kept_objects && p->pack_keep)
+			continue;
 		if (open_pack_index(p) || !p->num_objects)
 			continue;
 
@@ -1365,7 +1381,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	if (batch_size) {
 		if (fill_included_packs_batch(r, m, include_pack, batch_size))
 			goto cleanup;
-	} else if (fill_included_packs_all(m, include_pack))
+	} else if (fill_included_packs_all(r, m, include_pack))
 		goto cleanup;
 
 	argv_array_push(&cmd.args, "pack-objects");
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 43a7a66c9d..b2fece5d3d 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -533,6 +533,32 @@ test_expect_success 'repack with minimum size does not alter existing packs' '
 	)
 '
 
+test_expect_success 'repack respects repack.packKeptObjects=false' '
+	test_when_finished rm -f dup/.git/objects/pack/*keep &&
+	(
+		cd dup &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
+		for keep in $(cat keep-list)
+		do
+			touch $keep || return 1
+		done &&
+		git multi-pack-index repack --batch-size=0 &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list &&
+		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
+		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
+		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list
+	)
+'
+
 test_expect_success 'repack creates a new pack' '
 	(
 		cd dup &&
-- 
2.26.2.vfs.1.2



^ permalink raw reply related	[relevance 3%]

* [PATCH] midx: apply gitconfig to midx repack
@ 2020-05-05 13:06  7% Son Luong Ngoc via GitGitGadget
  2020-05-05 13:50  3% ` Derrick Stolee
  2020-05-06  9:43  6% ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
  0 siblings, 2 replies; 122+ results
From: Son Luong Ngoc via GitGitGadget @ 2020-05-05 13:06 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Son Luong Ngoc

From: Son Luong Ngoc <sluongng@gmail.com>

Multi-Pack-Index repack is an incremental, repack solutions
that allows user to consolidate multiple packfiles in a non-disruptive
way. However the new packfile could be created without some of the
capabilities of a packfile that is created by calling `git repack`.

This is because with `git repack`, there are configuration that would
enable different flags to be passed down to `git pack-objects` plumbing.

In this patch, I applies those flags into `git multi-pack-index repack`
so that it respect the `repack.*` config series.

Note: I left out `repack.packKeptObjects` intentionally as I dont think
its relevant to midx repack use case.

Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
    midx: apply gitconfig to midx repack
    
    Midx repack has largely been used in Microsoft Scalar on the client side
    to optimize the repository multiple packs state. However when I tried to
    apply this onto the server-side, I realized that there are certain
    features that were lacking compare to git repack. Most of these features
    are highly desirable on the server-side to create the most optimized
    pack possible.
    
    One of the example is delta_base_offset, comparing an midx repack
    with/without delta_base_offset, we can observe significant size
    differences.
    
    > du objects/pack/*pack
    14536   objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack
    9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack
    
    Latest 2.26.2 (without delta_base_offset)
    > git multi-pack-index write
    > git multi-pack-index repack
    > git multi-pack-index expire
    > du objects/pack/*pack
    9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack
    
    With delta_base_offset
    > git version
    git version 2.26.2.672.g232c24e857.dirty
    > git multi-pack-index write
    > git multi-pack-index repack
    > git multi-pack-index expire
    > du objects/pack/*pack
    9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack
    
    In this patch, I intentionally leaving out repack.packKeptObjects as I
    don't think its very relevant to midx repack use case:
    
     * One could always exclude biggest packs with --batch-size option
       
       
     * For non-biggest-packs exclusion use case, its rather rare (unless you
       want to have a special pack with only commits and trees being
       excluded from repack to serve partial clone better?)
       
       
    
    Please let me know if anyone think that we should include that option
    for the sake of completions.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/626

 midx.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/midx.c b/midx.c
index 9a61d3b37d9..88f16594268 100644
--- a/midx.c
+++ b/midx.c
@@ -1361,6 +1361,10 @@ static int fill_included_packs_batch(struct repository *r,
 	return 0;
 }
 
+static int delta_base_offset = 1;
+static int write_bitmaps = -1;
+static int use_delta_islands;
+
 int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
 {
 	int result = 0;
@@ -1381,12 +1385,25 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	} else if (fill_included_packs_all(m, include_pack))
 		goto cleanup;
 
+  git_config_get_bool("repack.usedeltabaseoffset", &delta_base_offset);
+  git_config_get_bool("repack.writebitmaps", &write_bitmaps);
+  git_config_get_bool("repack.usedeltaislands", &use_delta_islands);
+
 	argv_array_push(&cmd.args, "pack-objects");
 
 	strbuf_addstr(&base_name, object_dir);
 	strbuf_addstr(&base_name, "/pack/pack");
 	argv_array_push(&cmd.args, base_name.buf);
 
+	if (delta_base_offset)
+		argv_array_push(&cmd.args, "--delta-base-offset");
+	if (use_delta_islands)
+		argv_array_push(&cmd.args, "--delta-islands");
+	if (write_bitmaps > 0)
+		argv_array_push(&cmd.args, "--write-bitmap-index");
+	else if (write_bitmaps < 0)
+		argv_array_push(&cmd.args, "--write-bitmap-index-quiet");
+
 	if (flags & MIDX_PROGRESS)
 		argv_array_push(&cmd.args, "--progress");
 	else

base-commit: b34789c0b0d3b137f0bb516b417bd8d75e0cb306
-- 
gitgitgadget

^ permalink raw reply related	[relevance 7%]

* Re: [PATCH 05/15] run-job: implement pack-files job
@ 2020-05-02  7:56  6% Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-05-02  7:56 UTC (permalink / raw)
  To: gitgitgadget; +Cc: dstolee, git, jrnieder, peff, stolee

Hi Derrick,

Sorry for another late reply on this RFC.
This time is a question on the multi-pack-index repack process.

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> diff --git a/builtin/run-job.c b/builtin/run-job.c
> index cecf9058c51..d3543f7ccb9 100644
> --- a/builtin/run-job.c
> +++ b/builtin/run-job.c

...

> +static int multi_pack_index_repack(void)
> +{
> + int result;
> + struct argv_array cmd = ARGV_ARRAY_INIT;
> + argv_array_pushl(&cmd, "multi-pack-index", "repack",
> + "--no-progress", "--batch-size=0", NULL);
> + result = run_command_v_opt(cmd.argv, RUN_GIT_CMD);
> +
> + if (result && multi_pack_index_verify()) {
> + warning(_("multi-pack-index verify failed after repack"));
> + result = rewrite_multi_pack_index();
> + }

Its a bit inconsistent where write() and expire() did not include
verify() within them
but repack does. What make repack() different?

> +
> + return result;
> +}
> +
> +static int run_pack_files_job(void)
> +{
> + if (multi_pack_index_write()) {
> + error(_("failed to write multi-pack-index"));
> + return 1;
> + }
> +
> + if (multi_pack_index_verify()) {
> + warning(_("multi-pack-index verify failed after initial write"));
> + return rewrite_multi_pack_index();
> + }
> +
> + if (multi_pack_index_expire()) {
> + error(_("multi-pack-index expire failed"));
> + return 1;
> + }

Why expire *before* repack and not after?

I thought `core.multiPackIndex=true` would prevent old pack files from
being used thus expiring immediately after repack is safe? (on that
note, are users
required to set this config prior to running the job?)

If expiring immediately after repack()+verify() is not safe, then should
we have a minimum allowed interval set? (although I would preferred to
make expire() safe)

> +
> + if (multi_pack_index_verify()) {
> + warning(_("multi-pack-index verify failed after expire"));
> + return rewrite_multi_pack_index();
> + }
> +
> + if (multi_pack_index_repack()) {
> + error(_("multi-pack-index repack failed"));
> + return 1;
> + }

Again, I just think the decision to include verify() inside repack()
made this part a bit inconsistent.

> +
> + return 0;
> +}
> +

Cheers,
Son Luong

^ permalink raw reply	[relevance 6%]

* Re: Seg Fault on git fetch with fetch.negotiationAlgorithm=skipping
  2020-05-01  8:13 12%   ` Son Luong Ngoc
@ 2020-05-01 22:13  0%     ` Taylor Blau
  0 siblings, 0 replies; 122+ results
From: Taylor Blau @ 2020-05-01 22:13 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: Taylor Blau, git, Junio C Hamano

On Fri, May 01, 2020 at 10:13:43AM +0200, Son Luong Ngoc wrote:
> Hi Taylor,
>
> > On Apr 30, 2020, at 22:20, Taylor Blau <me@ttaylorr.com> wrote:
> >
> > Are you able to share a core file? If not, it would be very helpful for
> > you to 'git fast-export --anonymize ...' and see if you can reproduce
> > the problem on an anonymized copy of your repository.
>
> I played a bit with --anonymized yesterday but export/import is slow and I still need to review
> the history to see whether it could be released.
>
> It might take more time than just a few days.

That's OK. Please let us know when you have it by responding in the
sub-thread here, and I will be happy to take a look.

> > I can speculate about the cause of the crash from your strace above, but
> > a core file would be more helpful.
>
> I have added perf trace2 in https://gist.github.com/sluongng/e48327cc911c617ed2ef8578acc2ea34#file-perf-trace2-L52
> (git version is 2.26.2)
>
> I think this trace is a lot cleared than the strace log.
>
> > Thanks,
> > Taylor
>
> Cheers,
> Son Luong.

Thanks,
Taylor

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] t0000: disable GIT_TEST_FAIL_PREREQS in sub-tests
  2020-04-28  8:14  5% ` [PATCH] t0000: disable GIT_TEST_FAIL_PREREQS in sub-tests Jeff King
@ 2020-05-01 20:56  0%   ` Johannes Schindelin
  0 siblings, 0 replies; 122+ results
From: Johannes Schindelin @ 2020-05-01 20:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Son Luong Ngoc, git, Junio C Hamano

Hi,

On Tue, 28 Apr 2020, Jeff King wrote:

> On Tue, Apr 28, 2020 at 08:52:34AM +0200, Son Luong Ngoc wrote:
>
> > Running t0000 with GIT_TEST_FAIL_PREREQS=true is failing.
> >
> > > GIT_TEST_FAIL_PREREQS=true ./t0000-basic.sh
> > t/./t0000-basic.sh:836: error: not ok 45 - lazy prereqs do not turn off tracing
> > #
> > #               run_sub_test_lib_test lazy-prereq-and-tracing
> >  'lazy prereqs and -x' -v -x <<-\EOF &&
> > #               test_lazy_prereq LAZY true
> > #
> > #               test_expect_success lazy 'test_have_prereq LAZY && echo trace'
> > #
> > #               test_done
> > #               EOF
> > #
> > #               grep 'echo trace' lazy-prereq-and-tracing/err
>
> I think the patch below is the right fix.
>
> -- >8 --
> Subject: [PATCH] t0000: disable GIT_TEST_FAIL_PREREQS in sub-tests
>
> The test added by 477dcaddb6 (tests: do not let lazy prereqs inside
> `test_expect_*` turn off tracing, 2020-03-26) runs a sub-test script
> that traces a test with a lazy prereq, like:
>
>   test_have_prereq LAZY && echo trace
>
> That won't work if GIT_TEST_FAIL_PREREQS is set in the environment,
> because our have_prereq will report failure, and we won't run the echo
> at all.
>
> We could work around this by avoiding the &&-chain, but we can
> fix this and any future tests at once by unsetting that variable for our
> sub-tests. These are meant to be controlled environments where we test
> the test-suite itself; the outer test snippet should be in charge of the
> sub-test environment, not whatever mode the user happens to be running
> in.

Thanks for fixing a bug I introduced! The fix looks good to me.

Thank you,
Dscho

>
> Reported-by: Son Luong Ngoc <sluongng@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t0000-basic.sh | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
> index b859721620..f58f3deaa8 100755
> --- a/t/t0000-basic.sh
> +++ b/t/t0000-basic.sh
> @@ -98,6 +98,7 @@ _run_sub_test_lib_test_common () {
>  		export TEST_DIRECTORY &&
>  		TEST_OUTPUT_DIRECTORY=$(pwd) &&
>  		export TEST_OUTPUT_DIRECTORY &&
> +		sane_unset GIT_TEST_FAIL_PREREQS &&
>  		if test -z "$neg"
>  		then
>  			./"$name.sh" "$@" >out 2>err
> --
> 2.26.2.827.g3c1233342b
>
>

^ permalink raw reply	[relevance 0%]

* Re: Seg Fault on git fetch with fetch.negotiationAlgorithm=skipping
  2020-04-30 20:20  0% ` Taylor Blau
@ 2020-05-01  8:13 12%   ` Son Luong Ngoc
  2020-05-01 22:13  0%     ` Taylor Blau
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-05-01  8:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano

Hi Taylor,

> On Apr 30, 2020, at 22:20, Taylor Blau <me@ttaylorr.com> wrote:
> 
> Are you able to share a core file? If not, it would be very helpful for
> you to 'git fast-export --anonymize ...' and see if you can reproduce
> the problem on an anonymized copy of your repository.

I played a bit with --anonymized yesterday but export/import is slow and I still need to review
the history to see whether it could be released.

It might take more time than just a few days.

> I can speculate about the cause of the crash from your strace above, but
> a core file would be more helpful.

I have added perf trace2 in https://gist.github.com/sluongng/e48327cc911c617ed2ef8578acc2ea34#file-perf-trace2-L52
(git version is 2.26.2)

I think this trace is a lot cleared than the strace log.

> Thanks,
> Taylor

Cheers,
Son Luong.

^ permalink raw reply	[relevance 12%]

* Re: Seg Fault on git fetch with fetch.negotiationAlgorithm=skipping
  2020-04-30 14:55 12% Seg Fault on git fetch with fetch.negotiationAlgorithm=skipping Son Luong Ngoc
@ 2020-04-30 20:20  0% ` Taylor Blau
  2020-05-01  8:13 12%   ` Son Luong Ngoc
  0 siblings, 1 reply; 122+ results
From: Taylor Blau @ 2020-04-30 20:20 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: git, Junio C Hamano

Hi Son,

On Thu, Apr 30, 2020 at 04:55:07PM +0200, Son Luong Ngoc wrote:
> Hi folks,
>
> We recently encountered a segfault during a git fetch
> strace output could be found https://gist.github.com/sluongng/e48327cc911c617ed2ef8578acc2ea34
>
> The root cause was due to having `fetch.negotiationAlgorithm=skipping`
> The repo is about linux.git size with a few NULL commit that we have been using `fsck.skipList`
> on both server and client side to skip.

Are you able to share a core file? If not, it would be very helpful for
you to 'git fast-export --anonymize ...' and see if you can reproduce
the problem on an anonymized copy of your repository.

I can speculate about the cause of the crash from your strace above, but
a core file would be more helpful.

> Is this and edge case for the new algorithm?
>
> Cheers,
> Son Luong.

Thanks,
Taylor

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 06/15] run-job: auto-size or use custom pack-files batch
@ 2020-04-30 16:48  6% Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-04-30 16:48 UTC (permalink / raw)
  To: gitgitgadget; +Cc: dstolee, git, jrnieder, peff, stolee

Hi Derrick,

I have been reviewing these jobs' mechanics closely and have some questions:

> The dynamic default size is computed with this idea in mind for
> a client repository that was cloned from a very large remote: there
> is likely one "big" pack-file that was created at clone time. Thus,
> do not try repacking it as it is likely packed efficiently by the
> server. Instead, try packing the other pack-files into a single
> pack-file.
>
> The size is then computed as follows:
>
> batch size = total size - max pack size

Could you please elaborate why is this the best value?
In practice I have been testing this out with the following

> % cat debug.sh
> #!/bin/bash
>
> temp=$(du -cb .git/objects/pack/*.pack)
>
> total_size=$(echo "$temp" | grep total | awk '{print $1}')
> echo total_size
> echo $total_size
>
> biggest_pack=$(echo "$temp" | sort -n | tail -2 | head -1 | awk '{print $1}')
> echo biggest pack
> echo $biggest_pack
>
> batch_size=$(expr $total_size - $biggest_pack)
> echo batch size
> echo $batch_size

If you were to run

> git multi-pack-index repack --batch-size=$(./debug.sh | tail -1)

then nothing would be repack.

Instead, I have had a lot more success with the following

> # Get the 2nd biggest pack size (in bytes) + 1
> $(du -b .git/objects/pack/*pack | sort -n | tail -2 | head -1 | awk '{print $1}') + 1

I think you also used this approach in t5319 when you used the 3rd
biggest pack size

> test_expect_success 'repack creates a new pack' '
> (
> cd dup &&
> ls .git/objects/pack/*idx >idx-list &&
> test_line_count = 5 idx-list &&
> THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
> BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
> git multi-pack-index repack --batch-size=$BATCH_SIZE &&
> ls .git/objects/pack/*idx >idx-list &&
> test_line_count = 6 idx-list &&
> test-tool read-midx .git/objects | grep idx >midx-list &&
> test_line_count = 6 midx-list
> )
> '

Looking forward to a re-roll of this RFC.

Cheers,
Son Luong.

^ permalink raw reply	[relevance 6%]

* Seg Fault on git fetch with fetch.negotiationAlgorithm=skipping
@ 2020-04-30 14:55 12% Son Luong Ngoc
  2020-04-30 20:20  0% ` Taylor Blau
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-04-30 14:55 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Hi folks,

We recently encountered a segfault during a git fetch
strace output could be found https://gist.github.com/sluongng/e48327cc911c617ed2ef8578acc2ea34

The root cause was due to having `fetch.negotiationAlgorithm=skipping`
The repo is about linux.git size with a few NULL commit that we have been using `fsck.skipList` 
on both server and client side to skip.

Is this and edge case for the new algorithm?

Cheers,
Son Luong.

^ permalink raw reply	[relevance 12%]

* Re: [DRAFT] What's cooking in git.git
  2020-04-28  5:10  6% [DRAFT] What's cooking in git.git Son Luong Ngoc
@ 2020-04-28 17:07  6% ` Junio C Hamano
  0 siblings, 0 replies; 122+ results
From: Junio C Hamano @ 2020-04-28 17:07 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: git, jonathantanmy, peff

Son Luong Ngoc <sluongng@gmail.com> writes:

>> * jt/v2-fetch-nego-fix (2020-04-27) 3 commits
>>  - fetch-pack: in protocol v2, reset in_vain upon ACK
>>  - fetch-pack: in protocol v2, in_vain only after ACK
>>  - fetch-pack: return enum from process_acks()
>
> Would it be ok to just proceed with the fix and skip the revert?
> Or do we intend to revert 'jn/demote-proto2-from-default' after the
> fix has landed into 'master'?

The demote patch hasn't even hit 'master'.  

My preference is to merge the demotion down to 'master' and 'maint'
while merging down this fix to 'next' and to 'master'.

And immediately revert the demotion on 'master', which will make the
tip of 'master' with v2 as the default, with "this" fix.

That way, those who want to help us polish the code further for the
next release would use v2 as default with the proposed fix for this
breakage and can hunt for other breakages in v2, while those on the
maintenance track (and v2.26.3 JNeider wants to see happen soon)
would revert to the original protocol as default.






^ permalink raw reply	[relevance 6%]

* Re: Git Stash brake splitIndex
  2020-04-28 13:57  5% ` Christian Couder
@ 2020-04-28 14:17  6%   ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-04-28 14:17 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Junio C Hamano

Hi

> On Apr 28, 2020, at 15:57, Christian Couder <christian.couder@gmail.com> wrote:
> 
> It looks like it should be `git commit -a -m 'add a'`
I tried to reproduce with `splitIndex.sharedIndexExpire=1.day.ago` and everything works.

It seems like the config `splitIndex.sharedIndexExpire=now` cause the sharedindex to be deleted too early?

Cheers,
Son Luong.


^ permalink raw reply	[relevance 6%]

* Re: Git Stash brake splitIndex
  2020-04-28 13:19  5% Git Stash brake splitIndex Son Luong Ngoc
@ 2020-04-28 13:57  5% ` Christian Couder
  2020-04-28 14:17  6%   ` Son Luong Ngoc
  0 siblings, 1 reply; 122+ results
From: Christian Couder @ 2020-04-28 13:57 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: git, Junio C Hamano

Hi,

On Tue, Apr 28, 2020 at 3:21 PM Son Luong Ngoc <sluongng@gmail.com> wrote:

> I am on git version 2.26.2.526.g744177e7f7 (latest next)
> When you do a git stash while using splitIndex, it seems like the index will get corrupted
>
> Using configs:
> core.splitindex=true
> splitindex.maxpercentchange=50
> splitindex.sharedindexexpire=now
>
> Reproduce steps:
>
> mkdir repo
> cd repo && git init
> echo a > a
> commit -a -m 'add a'

It looks like it should be `git commit -a -m 'add a'`

When I try to reproduce the steps using git version 2.26.2.526.g744177e7f7:

mkdir repo
cd repo && git init
git config core.splitindex true
git config splitindex.maxpercentchange 50
git config splitindex.sharedindexexpire now
echo a > a
git commit -a -m 'add a'

I get a segfault then:

Program received signal SIGSEGV, Segmentation fault.
0x00005555556ccf89 in ewah_each_bit (self=0x0, callback=0x5555557af648
<replace_entry>, payload=0x555555b47480 <the_index>)
    at ewah/ewah_bitmap.c:252
252             while (pointer < self->buffer_size) {
(gdb) bt
#0  0x00005555556ccf89 in ewah_each_bit (self=0x0,
callback=0x5555557af648 <replace_entry>, payload=0x555555b47480
<the_index>)
    at ewah/ewah_bitmap.c:252
#1  0x00005555557af90c in merge_base_index (istate=0x555555b47480
<the_index>) at split-index.c:162
#2  0x0000555555748b9f in read_index_from (istate=0x555555b47480
<the_index>, path=0x555555b4cd80 ".git/index",
    gitdir=0x555555b4ac20 ".git") at read-cache.c:2335
#3  0x000055555576faf3 in repo_read_index (repo=0x555555b33f20
<the_repo>) at repository.c:271
#4  0x000055555559b91b in prepare_to_commit (index_file=0x555555b4d760
"/tmp/git/repo/.git/index.lock", prefix=0x0,
    current_head=0x0, s=0x555555b04420 <s>,
author_ident=0x7fffffffd810) at builtin/commit.c:927
#5  0x000055555559d6a4 in cmd_commit (argc=0, argv=0x7fffffffdcf0,
prefix=0x0) at builtin/commit.c:1595
#6  0x0000555555570fda in run_builtin (p=0x555555af1218
<commands+504>, argc=4, argv=0x7fffffffdcf0) at git.c:447
#7  0x000055555557134b in handle_builtin (argc=4, argv=0x7fffffffdcf0)
at git.c:672
#8  0x0000555555571610 in run_argv (argcp=0x7fffffffdb9c,
argv=0x7fffffffdb90) at git.c:739
#9  0x0000555555571aba in cmd_main (argc=4, argv=0x7fffffffdcf0) at git.c:870
#10 0x0000555555641bc4 in main (argc=5, argv=0x7fffffffdce8) at common-main.c:52

It looks like merge_base_index() in split-index.c is calling
ewah_each_bit(si->replace_bitmap, replace_entry, istate) when
si->replace_bitmap is NULL.

^ permalink raw reply	[relevance 5%]

* Git Stash brake splitIndex
@ 2020-04-28 13:19  5% Son Luong Ngoc
  2020-04-28 13:57  5% ` Christian Couder
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-04-28 13:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Hi folks,

I am on git version 2.26.2.526.g744177e7f7 (latest next)
When you do a git stash while using splitIndex, it seems like the index will get corrupted

Using configs:
core.splitindex=true
splitindex.maxpercentchange=50
splitindex.sharedindexexpire=now

Reproduce steps:

mkdir repo
cd repo && git init
echo a > a
commit -a -m 'add a'
echo b > b
git add b
git stash
Saved working directory and index state WIP on master: 1955b62 add a
fatal: .git/sharedindex.8ddd8dad6ccb4858f27d4ff20f4d8bf6654441e0: index file open failed: No such file or directory

Some traces:
master ~/test/git/repo> GIT_TRACE2=1 GIT_TRACE2_NESTING=5 git stash
15:18:15.442295 common-main.c:48                  version 2.26.2.526.g744177e7f7
15:18:15.442914 common-main.c:49                  start git stash
15:18:15.443284 repository.c:134                  worktree /Users/sluongngoc/test/git/repo
15:18:15.443491 git.c:442                         cmd_name stash (stash)
15:18:15.448579 run-command.c:735                 child_start[0] git update-index --ignore-skip-worktree-entries -z --add --remove --stdin
15:18:15.455972 common-main.c:48                  version 2.26.2.526.g744177e7f7
15:18:15.456514 common-main.c:49                  start /Users/sluongngoc/libexec/git-core/git update-index --ignore-skip-worktree-entries -z --add --remove --stdin
15:18:15.456788 repository.c:134                  worktree /Users/sluongngoc/test/git/repo
15:18:15.456927 git.c:442                         cmd_name update-index (stash/update-index)
15:18:15.458444 git.c:672                         exit elapsed:0.004021 code:0
15:18:15.458457 trace2/tr2_tgt_normal.c:123       atexit elapsed:0.004039 code:0
15:18:15.458774 run-command.c:990                 child_exit[0] pid:1813 code:0 elapsed:0.010169
Saved working directory and index state WIP on master: 1955b62 add a
15:18:15.461082 run-command.c:735                 child_start[1] git reset --hard -q --no-recurse-submodules
15:18:15.467260 common-main.c:48                  version 2.26.2.526.g744177e7f7
15:18:15.467553 common-main.c:49                  start /Users/sluongngoc/libexec/git-core/git reset --hard -q --no-recurse-submodules
15:18:15.467931 repository.c:134                  worktree /Users/sluongngoc/test/git/repo
15:18:15.468071 git.c:442                         cmd_name reset (stash/reset)
15:18:15.468555 usage.c:64                        error .git/sharedindex.8ddd8dad6ccb4858f27d4ff20f4d8bf6654441e0: index file open failed: No such file or directory
fatal: .git/sharedindex.8ddd8dad6ccb4858f27d4ff20f4d8bf6654441e0: index file open failed: No such file or directory
15:18:15.468587 usage.c:68                        exit elapsed:0.002714 code:128
15:18:15.468595 trace2/tr2_tgt_normal.c:123       atexit elapsed:0.002726 code:128
15:18:15.468889 run-command.c:990                 child_exit[1] pid:1814 code:128 elapsed:0.007797
15:18:15.468930 git.c:672                         exit elapsed:0.028400 code:1
15:18:15.468947 trace2/tr2_tgt_normal.c:123       atexit elapsed:0.028418 code:1
exit 1

Cheers,
Son Luong.

^ permalink raw reply	[relevance 5%]

* [PATCH] t0000: disable GIT_TEST_FAIL_PREREQS in sub-tests
  2020-04-28  6:52  6% t0000 failed Son Luong Ngoc
@ 2020-04-28  8:14  5% ` Jeff King
  2020-05-01 20:56  0%   ` Johannes Schindelin
  0 siblings, 1 reply; 122+ results
From: Jeff King @ 2020-04-28  8:14 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: git, Junio C Hamano, Johannes Schindelin

On Tue, Apr 28, 2020 at 08:52:34AM +0200, Son Luong Ngoc wrote:

> Running t0000 with GIT_TEST_FAIL_PREREQS=true is failing.
> 
> > GIT_TEST_FAIL_PREREQS=true ./t0000-basic.sh
> t/./t0000-basic.sh:836: error: not ok 45 - lazy prereqs do not turn off tracing
> #
> #               run_sub_test_lib_test lazy-prereq-and-tracing
>  'lazy prereqs and -x' -v -x <<-\EOF &&
> #               test_lazy_prereq LAZY true
> #
> #               test_expect_success lazy 'test_have_prereq LAZY && echo trace'
> #
> #               test_done
> #               EOF
> #
> #               grep 'echo trace' lazy-prereq-and-tracing/err

I think the patch below is the right fix.

-- >8 --
Subject: [PATCH] t0000: disable GIT_TEST_FAIL_PREREQS in sub-tests

The test added by 477dcaddb6 (tests: do not let lazy prereqs inside
`test_expect_*` turn off tracing, 2020-03-26) runs a sub-test script
that traces a test with a lazy prereq, like:

  test_have_prereq LAZY && echo trace

That won't work if GIT_TEST_FAIL_PREREQS is set in the environment,
because our have_prereq will report failure, and we won't run the echo
at all.

We could work around this by avoiding the &&-chain, but we can
fix this and any future tests at once by unsetting that variable for our
sub-tests. These are meant to be controlled environments where we test
the test-suite itself; the outer test snippet should be in charge of the
sub-test environment, not whatever mode the user happens to be running
in.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/t0000-basic.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index b859721620..f58f3deaa8 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -98,6 +98,7 @@ _run_sub_test_lib_test_common () {
 		export TEST_DIRECTORY &&
 		TEST_OUTPUT_DIRECTORY=$(pwd) &&
 		export TEST_OUTPUT_DIRECTORY &&
+		sane_unset GIT_TEST_FAIL_PREREQS &&
 		if test -z "$neg"
 		then
 			./"$name.sh" "$@" >out 2>err
-- 
2.26.2.827.g3c1233342b


^ permalink raw reply related	[relevance 5%]

* t0000 failed
@ 2020-04-28  6:52  6% Son Luong Ngoc
  2020-04-28  8:14  5% ` [PATCH] t0000: disable GIT_TEST_FAIL_PREREQS in sub-tests Jeff King
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-04-28  6:52 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Junio C Hamano, Johannes Schindelin

Hey folks,

Running t0000 with GIT_TEST_FAIL_PREREQS=true is failing.

> GIT_TEST_FAIL_PREREQS=true ./t0000-basic.sh
t/./t0000-basic.sh:836: error: not ok 45 - lazy prereqs do not turn off tracing
#
#               run_sub_test_lib_test lazy-prereq-and-tracing
 'lazy prereqs and -x' -v -x <<-\EOF &&
#               test_lazy_prereq LAZY true
#
#               test_expect_success lazy 'test_have_prereq LAZY && echo trace'
#
#               test_done
#               EOF
#
#               grep 'echo trace' lazy-prereq-and-tracing/err
#

This was added recently with
https://public-inbox.org/git/f35830c0eba216b7b4f144409e302a87ff8b5c06.1585236929.git.gitgitgadget@gmail.com/
Is this intended?

Cheers,
Son Luong.

^ permalink raw reply	[relevance 6%]

* Re: [DRAFT] What's cooking in git.git
@ 2020-04-28  5:10  6% Son Luong Ngoc
  2020-04-28 17:07  6% ` Junio C Hamano
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-04-28  5:10 UTC (permalink / raw)
  To: gitster; +Cc: git, jonathantanmy, peff

Hi,

On Mon, Apr 27, 2020 at 06:17:37PM -0700, Junio C Hamano wrote:
> * jn/demote-proto2-from-default (2020-04-22) 1 commit
>   (merged to 'next' on 2020-04-22 at 1a5e0b221a)
>  + Revert "fetch: default to protocol version 2"
>
>  Those fetching over protocol v2 from linux-next and other kernel
>  repositories are reporting that v2 often fetches way too much than
>  needed.
>
>  Will merge to 'master'.
...
> * jt/v2-fetch-nego-fix (2020-04-27) 3 commits
>  - fetch-pack: in protocol v2, reset in_vain upon ACK
>  - fetch-pack: in protocol v2, in_vain only after ACK
>  - fetch-pack: return enum from process_acks()

Would it be ok to just proceed with the fix and skip the revert?
Or do we intend to revert 'jn/demote-proto2-from-default' after the
fix has landed into 'master'?

Cheers,
Son Luong.

^ permalink raw reply	[relevance 6%]

* Re: Git pull stuck when Trace2 target set to Unix Stream Socket
  2020-04-13 17:19 12%   ` Son Luong Ngoc
@ 2020-04-13 17:23  0%     ` Taylor Blau
  0 siblings, 0 replies; 122+ results
From: Taylor Blau @ 2020-04-13 17:23 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: Taylor Blau, git, Jeff.Hostetler

On Mon, Apr 13, 2020 at 07:19:03PM +0200, Son Luong Ngoc wrote:
> Hi Taylor,
>
> Thanks for the swift reply.
>
> > On Apr 13, 2020, at 18:00, Taylor Blau <me@ttaylorr.com> wrote:
> > I doubt that this is important (for a reason that I'll point out below),
> > but it looks like your invocation here is malformed with the trailing
> > pipe character.
> >
> > Did you mean to redirect the output of rm away? If so, '2>&1 >/dev/null'
> > will do what you want.
> It was an emailing mistake. I meant to write
> > rm /tmp/git_trace.sock || true

For what it's worth, 'rm -f' would suffice, too, but it doesn't matter.

> So that the command is reproducible on repeated run.
> I must have deleted the remaining part by mistake.
>
> > Odd. From my memory, trace2 will give up trying to connect to the socket
> > (disabling itself and optionally printing a warning) if 'socket(2)' or
> > 'connect(2)' set the error bit. My guess above is that you don't have a
> > listening socket (because your shell is waiting for you to close the
> > '|'), so there's no connection to be made.
> There is definitely connection still, as I can still receive more events after interupting the stuck git command by Ctrl-C.
>
> > Odd. What version of Git are you using? Your description makes it
> > sound like it may be a bug, so I'd be curious to hear Jeff's
> > interpretation of things, too.
> 2.26.0 built from Master git/git
>
> For more info, I have created a paste to demonstrate the bug
> https://gist.github.com/sluongng/e14563e4ce3cc9545781ecd5a95169f6
> In which, I run `git pull origin` and `git version` on a relatively stale https://gitlab.com/gitlab-org/gitlab.git local copy.
>
> You can get more information from the trace in that paste.
> I have annotated the moment which the stuck happened with `It stucks HERE` phrase so look for it.

Hmm. It sounds like maybe there is a bug. If Jeff doesn't have time to
take a look, I'll try to figure out what's going on here.

> Cheers,
> Son Luong.

Thanks,
Taylor

^ permalink raw reply	[relevance 0%]

* Re: Git pull stuck when Trace2 target set to Unix Stream Socket
  @ 2020-04-13 17:19 12%   ` Son Luong Ngoc
  2020-04-13 17:23  0%     ` Taylor Blau
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-04-13 17:19 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff.Hostetler

Hi Taylor,

Thanks for the swift reply.

> On Apr 13, 2020, at 18:00, Taylor Blau <me@ttaylorr.com> wrote:
> I doubt that this is important (for a reason that I'll point out below),
> but it looks like your invocation here is malformed with the trailing
> pipe character.
> 
> Did you mean to redirect the output of rm away? If so, '2>&1 >/dev/null'
> will do what you want.
It was an emailing mistake. I meant to write
> rm /tmp/git_trace.sock || true
So that the command is reproducible on repeated run.
I must have deleted the remaining part by mistake.

> Odd. From my memory, trace2 will give up trying to connect to the socket
> (disabling itself and optionally printing a warning) if 'socket(2)' or
> 'connect(2)' set the error bit. My guess above is that you don't have a
> listening socket (because your shell is waiting for you to close the
> '|'), so there's no connection to be made.
There is definitely connection still, as I can still receive more events after interupting the stuck git command by Ctrl-C.

> Odd. What version of Git are you using? Your description makes it
> sound like it may be a bug, so I'd be curious to hear Jeff's
> interpretation of things, too.
2.26.0 built from Master git/git

For more info, I have created a paste to demonstrate the bug
https://gist.github.com/sluongng/e14563e4ce3cc9545781ecd5a95169f6
In which, I run `git pull origin` and `git version` on a relatively stale https://gitlab.com/gitlab-org/gitlab.git local copy.

You can get more information from the trace in that paste.
I have annotated the moment which the stuck happened with `It stucks HERE` phrase so look for it.

Cheers,
Son Luong.


^ permalink raw reply	[relevance 12%]

* Re: [PATCH 03/15] run-job: implement fetch job
@ 2020-04-13 13:15  5% Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-04-13 13:15 UTC (permalink / raw)
  To: gitgitgadget; +Cc: dstolee, git, jrnieder, peff, stolee

Hi Derrick,

First of all, thanks a ton for upstreaming this.
Despite multiple complaints about re-implementing cron in git,
I see this as a huge improvement to git UX and it is very much welcome change.

> 3. By adding a new refspec "+refs/heads/*:refs/hidden/<remote>/*"
>    we can ensure that we actually load the new values somewhere in
>    our refspace while not updating refs/heads or refs/remotes. By
>    storing these refs here, the commit-graph job will update the
>    commit-graph with the commits from these hidden refs.
Ideally I think we want to let user configure which refs they want to
prefetch with the default behavior being prefecting all HEADS
available from remote.
Using Facebook's Mercurial extension
[RemoteFileLog](https://www.mercurial-scm.org/repo/hg/file/tip/hgext/remotefilelog/__init__.py#l31)
as a UX reference,
users should only prefetch the refs that they actually care about.

> 1. One downside of the refs/hidden pattern is that 'git log' will
>    decorate commits with twice as many refs if they appear at a
>    remote ref (<remote>/<ref> _and_ refs/hidden/<remote>/<ref>). Is
>    there an easy way to exclude a refspace from decorations? Should
>    we make refs/hidden/* a "special" refspace that is excluded from
>    decorations?
In git-log, there is
[--decorate-refs-exclude](https://git-scm.com/docs/git-log#Documentation/git-log.txt---decorate-refs-excludeltpatterngt)
which I think we can move into git-config as
`log.decorate-refs-exclude`?
If you let the `prefetch refs` be configurable as I suggested above, I
think it make sense to have the git-log exclusions being configurable
as well.

Cheers,
Son Luong.

^ permalink raw reply	[relevance 5%]

* Git pull stuck when Trace2 target set to Unix Stream Socket
@ 2020-04-13 12:05  6% Son Luong Ngoc
    0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-04-13 12:05 UTC (permalink / raw)
  To: git; +Cc: Jeff.Hostetler

Hey folks,

I am trying to write a simple git trace2 event collector and I notice
that when git doing git pull with trace events being sent to a unix
stream socket, the entire operation halted.

Reproduce as follow:
```
cd git/git
git config trace2.eventTarget af_unix:stream:/tmp/git_trace.sock
git config trace2.eventBrief false
(rm /tmp/git_trace.sock | ) &&  nc -lkU /tmp/git_trace.sock

# In a different terminal
git pull # Pull stuck and never complete
```

This does not happen when you set eventBrief to true
```
git config trace2.eventBrief true
```

Worth to note that if eventTarget is a file instead of a socket,
everything works fine.

Cheers,
Son Luong.

^ permalink raw reply	[relevance 6%]

* Re: Broken Git-diff on master
  2020-03-18 20:41  6% Broken Git-diff on master Son Luong Ngoc
@ 2020-03-18 20:52  6% ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-03-18 20:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano

Ah, how embarrassing, I forgot about --no-index.
Please ignore, sorry for the noise :(

> On Mar 18, 2020, at 21:41, Son Luong Ngoc <sluongng@gmail.com> wrote:
> 
> Hey folks,
> 
> I am testing out the latest changes in master be8661a3286c67a5d4088f4226cbd7f8b76544b0 and observe the following
> 
>> mkdir test
>> cd test
>> echo a > a
>> echo b > b
>> git diff a b | cat
> diff --git a/a b/b
> index 7898192..6178079 100644
> --- a/a
> +++ b/b
> @@ -1 +1 @@
> -a
> +b
>> git init
>> git diff a b | cat
>> GIT_TRACE2_PERF=1 git diff a b | cat
> 21:38:36.615653 common-main.c:48             | d0 | main                     | version      |     |           |           |              | 2.26.0.rc2.27.gbe8661a328
> 21:38:36.616258 common-main.c:49             | d0 | main                     | start        |     |  0.004075 |           |              | git diff a b
> 21:38:36.616307 git.c:440                    | d0 | main                     | cmd_name     |     |           |           |              | diff (diff)
> 21:38:36.616696 repository.c:130             | d0 | main                     | def_repo     | r1  |           |           |              | worktree:/Users/sluongngoc/work/some-dir/test
> 21:38:36.617589 read-cache.c:2303            | d0 | main                     | region_enter | r1  |  0.005408 |           | index        | label:do_read_index .git/index
> 21:38:36.617615 read-cache.c:2308            | d0 | main                     | region_leave | r1  |  0.005435 |  0.000027 | index        | label:do_read_index .git/index
> 21:38:36.617656 git.c:674                    | d0 | main                     | exit         |     |  0.005476 |           |              | code:0
> 21:38:36.617668 trace2/tr2_tgt_perf.c:213    | d0 | main                     | atexit       |     |  0.005489 |           |              | code:0
> 
>> git version
> git version 2.26.0.rc2.27.gbe8661a328
> 
> I think git-diff is broken. Hope that this get address before 2.26.0 come out.
> 
> Cheers,
> Son Luong.


^ permalink raw reply	[relevance 6%]

* Broken Git-diff on master
@ 2020-03-18 20:41  6% Son Luong Ngoc
  2020-03-18 20:52  6% ` Son Luong Ngoc
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-03-18 20:41 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Son Luong Ngoc

Hey folks,

I am testing out the latest changes in master be8661a3286c67a5d4088f4226cbd7f8b76544b0 and observe the following

> mkdir test
> cd test
> echo a > a
> echo b > b
> git diff a b | cat
diff --git a/a b/b
index 7898192..6178079 100644
--- a/a
+++ b/b
@@ -1 +1 @@
-a
+b
> git init
> git diff a b | cat
> GIT_TRACE2_PERF=1 git diff a b | cat
21:38:36.615653 common-main.c:48             | d0 | main                     | version      |     |           |           |              | 2.26.0.rc2.27.gbe8661a328
21:38:36.616258 common-main.c:49             | d0 | main                     | start        |     |  0.004075 |           |              | git diff a b
21:38:36.616307 git.c:440                    | d0 | main                     | cmd_name     |     |           |           |              | diff (diff)
21:38:36.616696 repository.c:130             | d0 | main                     | def_repo     | r1  |           |           |              | worktree:/Users/sluongngoc/work/some-dir/test
21:38:36.617589 read-cache.c:2303            | d0 | main                     | region_enter | r1  |  0.005408 |           | index        | label:do_read_index .git/index
21:38:36.617615 read-cache.c:2308            | d0 | main                     | region_leave | r1  |  0.005435 |  0.000027 | index        | label:do_read_index .git/index
21:38:36.617656 git.c:674                    | d0 | main                     | exit         |     |  0.005476 |           |              | code:0
21:38:36.617668 trace2/tr2_tgt_perf.c:213    | d0 | main                     | atexit       |     |  0.005489 |           |              | code:0

> git version
git version 2.26.0.rc2.27.gbe8661a328

I think git-diff is broken. Hope that this get address before 2.26.0 come out.

Cheers,
Son Luong.

^ permalink raw reply related	[relevance 6%]

* Re: [External] Git Rebase: test failing with GIT_TEST_STASH_USE_BUILTIN=false
  2020-03-01  9:59  5%     ` [External] " Son Luong Ngoc
@ 2020-03-01 10:40  5%       ` Son Luong Ngoc
  0 siblings, 0 replies; 122+ results
From: Son Luong Ngoc @ 2020-03-01 10:40 UTC (permalink / raw)
  To: Son Luong Ngoc
  Cc: Junio C Hamano, git, Johannes Schindelin, Thomas Gummerer,
	Paul-Sebastian Ungureanu

Similar strategy was used for t.3904, which has been failing with this flag since 2.25.0

$ cat run-test.sh
#!/bin/bash

make -j8 prefix=/usr all
(
  cd ./t && GIT_TEST_STASH_USE_BUILTIN=false ./t3904-stash-patch.sh
)

$ git bisect log
# bad: [2d2118b814c11f509e1aa76cb07110f7231668dc] The seventh batch for 2.26
# good: [d9589d4051537c387b70dc76e430c61b4c85a86d] Git 2.22.2
git bisect start 'HEAD' 'v2.22.2'
# good: [22dd22dce050f042b3eec165440966186691db42] Merge branch 'wb/fsmonitor-bitmap-fix'
git bisect good 22dd22dce050f042b3eec165440966186691db42
# good: [6514ad40a1a3cf80b2c25e3318dbf0252599fb8d] Merge branch 'ra/t5150-depends-on-perl'
git bisect good 6514ad40a1a3cf80b2c25e3318dbf0252599fb8d
# bad: [f52ab33616ee1d241f2292f1c1e47ba84a263523] Merge branch 'bc/hash-independent-tests-part-7'
git bisect bad f52ab33616ee1d241f2292f1c1e47ba84a263523
# good: [4d924528d8bfe947abfc54ee9bd3892ab509c8cd] Revert "Merge branch 'ra/rebase-i-more-options'"
git bisect good 4d924528d8bfe947abfc54ee9bd3892ab509c8cd
# good: [f0940743facd619f251009e0307d8d6452cc582e] Merge branch 'js/builtin-add-i-cmds'
git bisect good f0940743facd619f251009e0307d8d6452cc582e
# good: [381e8e9de142b636e4a25b6df113d70168e21a34] Merge branch 'dl/test-must-fail-fixes'
git bisect good 381e8e9de142b636e4a25b6df113d70168e21a34
# bad: [d0e70cd32e95df3be2250536f9089c858a298874] Merge branch 'am/checkout-file-and-ref-ref-ambiguity'
git bisect bad d0e70cd32e95df3be2250536f9089c858a298874
# bad: [94ac3c31f730ab278e1373a942fb4503829f4279] terminal: make the code of disable_echo() reusable
git bisect bad 94ac3c31f730ab278e1373a942fb4503829f4279
# bad: [52628f94fc35f57f0b3c54e4f849e490bfa44449] built-in add -p: implement the "checkout" patch modes
git bisect bad 52628f94fc35f57f0b3c54e4f849e490bfa44449
# good: [36bae1dc0ee777aa529dd955f2e619281265f262] built-in add -p: implement the "stash" and "reset" patch modes
git bisect good 36bae1dc0ee777aa529dd955f2e619281265f262
# bad: [6610e4628ac12396efc20201fe85d67591bed247] built-in stash: use the built-in `git add -p` if so configured
git bisect bad 6610e4628ac12396efc20201fe85d67591bed247
# bad: [90a6bb98d11a664f729dbb86c90d9c7a38ea825a] legacy stash -p: respect the add.interactive.usebuiltin setting
git bisect bad 90a6bb98d11a664f729dbb86c90d9c7a38ea825a
# first bad commit: [90a6bb98d11a664f729dbb86c90d9c7a38ea825a] legacy stash -p: respect the add.interactive.usebuiltin setting

Which was merged in https://github.com/gitgitgadget/git/commit/9a5315edfdf662c4d9bf444ebc297bc802fa5e04
Author was Johannes Schindelin

Thanks,
Son Luong.

> On Mar 1, 2020, at 10:59, Son Luong Ngoc <sluongng@gmail.com> wrote:
> 
> (this is a resent due to git@vger.kernel.org blocked HTML content, sry for the noises)
> (following up on https://public-inbox.org/git/xmqq36ayob9a.fsf@gitster-ct.c.googlers.com/T/#t )
> 
> Hi folks,
> 
> I ran a simple git-bisect on this to try to figure-out whats wrong:
> 
> $ cat run-test.sh
> #!/bin/bash
> 
> make -j8 prefix=/usr all
> (
>  cd ./t
>  GIT_TEST_STASH_USE_BUILTIN=false ./t3903-stash.sh --run='103'
> )
> $ git bisect start master v2.22.2
> $ git bisect run ./run-test.sh
> ...
> $ git bisect log
> (b932f6a5e8...)|BISECTING ~/work/booking/git/git> git bisect log
> # bad: [2d2118b814c11f509e1aa76cb07110f7231668dc] The seventh batch for 2.26
> # good: [d9589d4051537c387b70dc76e430c61b4c85a86d] Git 2.22.2
> git bisect start 'HEAD' 'v2.22.2'
> # bad: [22dd22dce050f042b3eec165440966186691db42] Merge branch 'wb/fsmonitor-bitmap-fix'
> git bisect bad 22dd22dce050f042b3eec165440966186691db42
> # bad: [fa9e7934c780bc804a09bfc88a93825096b3155e] Merge branch 'bm/repository-layout-typofix'
> git bisect bad fa9e7934c780bc804a09bfc88a93825096b3155e
> # good: [3a94cb31d52f061c315b00bfc005f1b1c42ac92d] bin-wrappers: append `.exe` to target paths if necessary
> git bisect good 3a94cb31d52f061c315b00bfc005f1b1c42ac92d
> # bad: [7b70d46ca410f9d37045558329c3143570d47ba6] Merge branch 'bb/grep-pcre2-bug-message-fix'
> git bisect bad 7b70d46ca410f9d37045558329c3143570d47ba6
> # good: [d60dc1a0b3829f3c4d69696f43f1c178c0701cdb] Merge branch 'ew/repack-with-bitmaps-by-default'
> git bisect good d60dc1a0b3829f3c4d69696f43f1c178c0701cdb
> # good: [43ba21cb574ee3f9a1acf4580868982f4c883ac6] Merge branch 'tg/range-diff-output-update'
> git bisect good 43ba21cb574ee3f9a1acf4580868982f4c883ac6
> # good: [080af915a3ee4d9511dc288b29143b9958ac0adc] Merge branch 'mt/dir-iterator-updates'
> git bisect good 080af915a3ee4d9511dc288b29143b9958ac0adc
> # bad: [75ce48674889df6a2bb493fb5d6bef0ef60ca7ae] Merge branch 'di/readme-markup-fix'
> git bisect bad 75ce48674889df6a2bb493fb5d6bef0ef60ca7ae
> # good: [984da7f8d2589b53cca7c920e597eab30d4c1b36] Merge branch 'sr/gpg-interface-stop-at-the-end'
> git bisect good 984da7f8d2589b53cca7c920e597eab30d4c1b36
> # bad: [f8aee8576ac5e01fa993c80b5b888af214c03758] Merge branch 'tg/stash-keep-index-with-removed-paths'
> git bisect bad f8aee8576ac5e01fa993c80b5b888af214c03758
> # good: [b932f6a5e8cdbb33eff4563fdfb1eae9ebf70a65] stash: fix handling removed files with --keep-index
> git bisect good b932f6a5e8cdbb33eff4563fdfb1eae9ebf70a65
> # first bad commit: [f8aee8576ac5e01fa993c80b5b888af214c03758] Merge branch 'tg/stash-keep-index-with-removed-paths'
> 
> Which pinned point the failure starting from the moment the test was added at https://github.com/git/git/commit/b932f6a5e8cdbb33eff4563fdfb1eae9ebf70a65 by @t.gummerer
> 
> I would appreciate if we can either deprecate the GIT_TEST_STASH_USE_BUILTIN flag entirely or wrap the test with an 'if' so that we auto skip it when the flag is enabled.
> 
> Thanks,
> Son Luong.
> 
>> On Feb 25, 2020, at 17:57, Junio C Hamano <gitster@pobox.com> wrote:
>> 
>> Son Luong Ngoc <son.luong@booking.com> writes:
>> 
>>> I have been trying to build git from source and noticing that some
>>> tests have been failing since 2.25 with the flag
>>> "GIT_TEST_STASH_USE_BUILTIN=false"
>>> 
>>> I think in 2.25 t3903.103 started to fail (rebase related) and
>>> current master t3904 may be failing also.
>>> 
>>> Is "GIT_TEST_STASH_USE_BUILTIN=false" is still being tested with
>>> or are we totally deprecating this flag?
>> 
>> In the longer term, when "git stash" gains new features that did not
>> exist in the original scripted version, tests that observe how these
>> features work would start failing when using the scripted version.
>> 
>> I picked some people from "git shortlog --no-merges builtin/stash.c"
>> and placed them on the CC line---perhaps they may know more.  It
>> happens that Johannes is also familiar with "rebase", which you
>> said is involved in the test failure, so I'd imagine he would be the
>> best person to ask.
>> 
>> Thanks for a report.
>> 
>> 
> 
> 


^ permalink raw reply	[relevance 5%]

* Re: [External] Git Rebase: test failing with GIT_TEST_STASH_USE_BUILTIN=false
       [not found]       ` <710DB9BA-D134-48E7-8CAB-B8816FED8AB8@booking.com>
@ 2020-03-01  9:59  5%     ` Son Luong Ngoc
  2020-03-01 10:40  5%       ` Son Luong Ngoc
  0 siblings, 1 reply; 122+ results
From: Son Luong Ngoc @ 2020-03-01  9:59 UTC (permalink / raw)
  To: Son Luong Ngoc
  Cc: Junio C Hamano, git, Johannes Schindelin, Thomas Gummerer,
	Paul-Sebastian Ungureanu, Son Luong Ngoc

(this is a resent due to git@vger.kernel.org blocked HTML content, sry for the noises)
(following up on https://public-inbox.org/git/xmqq36ayob9a.fsf@gitster-ct.c.googlers.com/T/#t )

Hi folks,

I ran a simple git-bisect on this to try to figure-out whats wrong:

$ cat run-test.sh
#!/bin/bash

make -j8 prefix=/usr all
(
  cd ./t
  GIT_TEST_STASH_USE_BUILTIN=false ./t3903-stash.sh --run='103'
)
$ git bisect start master v2.22.2
$ git bisect run ./run-test.sh
...
$ git bisect log
(b932f6a5e8...)|BISECTING ~/work/booking/git/git> git bisect log
# bad: [2d2118b814c11f509e1aa76cb07110f7231668dc] The seventh batch for 2.26
# good: [d9589d4051537c387b70dc76e430c61b4c85a86d] Git 2.22.2
git bisect start 'HEAD' 'v2.22.2'
# bad: [22dd22dce050f042b3eec165440966186691db42] Merge branch 'wb/fsmonitor-bitmap-fix'
git bisect bad 22dd22dce050f042b3eec165440966186691db42
# bad: [fa9e7934c780bc804a09bfc88a93825096b3155e] Merge branch 'bm/repository-layout-typofix'
git bisect bad fa9e7934c780bc804a09bfc88a93825096b3155e
# good: [3a94cb31d52f061c315b00bfc005f1b1c42ac92d] bin-wrappers: append `.exe` to target paths if necessary
git bisect good 3a94cb31d52f061c315b00bfc005f1b1c42ac92d
# bad: [7b70d46ca410f9d37045558329c3143570d47ba6] Merge branch 'bb/grep-pcre2-bug-message-fix'
git bisect bad 7b70d46ca410f9d37045558329c3143570d47ba6
# good: [d60dc1a0b3829f3c4d69696f43f1c178c0701cdb] Merge branch 'ew/repack-with-bitmaps-by-default'
git bisect good d60dc1a0b3829f3c4d69696f43f1c178c0701cdb
# good: [43ba21cb574ee3f9a1acf4580868982f4c883ac6] Merge branch 'tg/range-diff-output-update'
git bisect good 43ba21cb574ee3f9a1acf4580868982f4c883ac6
# good: [080af915a3ee4d9511dc288b29143b9958ac0adc] Merge branch 'mt/dir-iterator-updates'
git bisect good 080af915a3ee4d9511dc288b29143b9958ac0adc
# bad: [75ce48674889df6a2bb493fb5d6bef0ef60ca7ae] Merge branch 'di/readme-markup-fix'
git bisect bad 75ce48674889df6a2bb493fb5d6bef0ef60ca7ae
# good: [984da7f8d2589b53cca7c920e597eab30d4c1b36] Merge branch 'sr/gpg-interface-stop-at-the-end'
git bisect good 984da7f8d2589b53cca7c920e597eab30d4c1b36
# bad: [f8aee8576ac5e01fa993c80b5b888af214c03758] Merge branch 'tg/stash-keep-index-with-removed-paths'
git bisect bad f8aee8576ac5e01fa993c80b5b888af214c03758
# good: [b932f6a5e8cdbb33eff4563fdfb1eae9ebf70a65] stash: fix handling removed files with --keep-index
git bisect good b932f6a5e8cdbb33eff4563fdfb1eae9ebf70a65
# first bad commit: [f8aee8576ac5e01fa993c80b5b888af214c03758] Merge branch 'tg/stash-keep-index-with-removed-paths'

Which pinned point the failure starting from the moment the test was added at https://github.com/git/git/commit/b932f6a5e8cdbb33eff4563fdfb1eae9ebf70a65 by @t.gummerer

I would appreciate if we can either deprecate the GIT_TEST_STASH_USE_BUILTIN flag entirely or wrap the test with an 'if' so that we auto skip it when the flag is enabled.

Thanks,
Son Luong.

> On Feb 25, 2020, at 17:57, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Son Luong Ngoc <son.luong@booking.com> writes:
> 
>> I have been trying to build git from source and noticing that some
>> tests have been failing since 2.25 with the flag
>> "GIT_TEST_STASH_USE_BUILTIN=false"
>> 
>> I think in 2.25 t3903.103 started to fail (rebase related) and
>> current master t3904 may be failing also.
>> 
>> Is "GIT_TEST_STASH_USE_BUILTIN=false" is still being tested with
>> or are we totally deprecating this flag?
> 
> In the longer term, when "git stash" gains new features that did not
> exist in the original scripted version, tests that observe how these
> features work would start failing when using the scripted version.
> 
> I picked some people from "git shortlog --no-merges builtin/stash.c"
> and placed them on the CC line---perhaps they may know more.  It
> happens that Johannes is also familiar with "rebase", which you
> said is involved in the test failure, so I'd imagine he would be the
> best person to ask.
> 
> Thanks for a report.
> 
> 



^ permalink raw reply	[relevance 5%]

Results 1-122 of 122 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2019-10-23 20:13     RFC: Moving git-gui development to GitHub Pratyush Yadav
2019-10-24 19:46     ` Elijah Newren
2019-10-26 18:25       ` Jakub Narebski
2019-10-30  6:21         ` Elijah Newren
2019-11-20 12:19           ` Birger Skogeng Pedersen
2019-11-20 17:13             ` Elijah Newren
2021-04-19 20:33               ` Pain points in PRs [was: Re: RFC: Moving git-gui development to GitHub] SZEDER Gábor
2021-04-19 21:52                 ` Junio C Hamano
2021-04-20  7:49  5%               ` Son Luong Ngoc
2021-04-20 20:17  5%                 ` Junio C Hamano
2020-02-24  8:33     Git Rebase: test failing with GIT_TEST_STASH_USE_BUILTIN=false Son Luong Ngoc
2020-02-25 16:57     ` Junio C Hamano
     [not found]       ` <710DB9BA-D134-48E7-8CAB-B8816FED8AB8@booking.com>
2020-03-01  9:59  5%     ` [External] " Son Luong Ngoc
2020-03-01 10:40  5%       ` Son Luong Ngoc
2020-03-18 20:41  6% Broken Git-diff on master Son Luong Ngoc
2020-03-18 20:52  6% ` Son Luong Ngoc
2020-04-13 12:05  6% Git pull stuck when Trace2 target set to Unix Stream Socket Son Luong Ngoc
2020-04-13 16:00     ` Taylor Blau
2020-04-13 17:19 12%   ` Son Luong Ngoc
2020-04-13 17:23  0%     ` Taylor Blau
2020-04-13 13:15  5% [PATCH 03/15] run-job: implement fetch job Son Luong Ngoc
2020-04-28  5:10  6% [DRAFT] What's cooking in git.git Son Luong Ngoc
2020-04-28 17:07  6% ` Junio C Hamano
2020-04-28  6:52  6% t0000 failed Son Luong Ngoc
2020-04-28  8:14  5% ` [PATCH] t0000: disable GIT_TEST_FAIL_PREREQS in sub-tests Jeff King
2020-05-01 20:56  0%   ` Johannes Schindelin
2020-04-28 13:19  5% Git Stash brake splitIndex Son Luong Ngoc
2020-04-28 13:57  5% ` Christian Couder
2020-04-28 14:17  6%   ` Son Luong Ngoc
2020-04-30 14:55 12% Seg Fault on git fetch with fetch.negotiationAlgorithm=skipping Son Luong Ngoc
2020-04-30 20:20  0% ` Taylor Blau
2020-05-01  8:13 12%   ` Son Luong Ngoc
2020-05-01 22:13  0%     ` Taylor Blau
2020-04-30 16:48  6% [PATCH 06/15] run-job: auto-size or use custom pack-files batch Son Luong Ngoc
2020-05-02  7:56  6% [PATCH 05/15] run-job: implement pack-files job Son Luong Ngoc
2020-05-05 13:06  7% [PATCH] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
2020-05-05 13:50  3% ` Derrick Stolee
2020-05-05 16:03  5%   ` Son Luong Ngoc
2020-05-06  8:56  6%     ` Son Luong Ngoc
2020-05-06  9:43  6% ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
2020-05-06  9:43  7%   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
2020-05-06 12:03  0%     ` Derrick Stolee
2020-05-06 17:03  0%     ` Junio C Hamano
2020-05-07  7:29  6%       ` Son Luong Ngoc
2020-05-06  9:43  4%   ` [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
2020-05-09 14:24  7%   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
2020-05-09 14:24  7%     ` [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
2020-05-09 16:51  0%       ` Junio C Hamano
2020-05-10 14:27  6%         ` Son Luong Ngoc
2020-05-09 14:24  4%     ` [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
2020-05-09 16:11           ` Đoàn Trần Công Danh
2020-05-09 17:33             ` Junio C Hamano
2020-05-10  6:38               ` Đoàn Trần Công Danh
2020-05-10 15:52  6%             ` Son Luong Ngoc
2020-05-09 14:24  6%     ` [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline Son Luong Ngoc via GitGitGadget
2020-05-09 16:55  0%       ` Junio C Hamano
2020-05-10 16:07  4%     ` [PATCH v4 0/2] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
2020-05-10 16:07  7%       ` [PATCH v4 1/2] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
2020-05-10 16:07  4%       ` [PATCH v4 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
2020-05-07 13:17     [PATCH 00/10] [RFC] In-tree sparse-checkout definitions Derrick Stolee via GitGitGadget
2020-05-07 13:17     ` [PATCH 04/10] sparse-checkout: allow in-tree definitions Derrick Stolee via GitGitGadget
2020-05-07 22:58       ` Junio C Hamano
2020-05-08 15:40         ` Derrick Stolee
2020-05-20 17:52           ` Elijah Newren
2020-06-17 23:07             ` Elijah Newren
2020-06-18  8:18  6%           ` Son Luong Ngoc
2020-05-15 17:32  6% [PATCH v7 1/4] gitfaq: files in .gitignore are tracked Son Luong Ngoc
2020-05-19  4:53  4% ` Todd Zullinger
2020-05-20 20:57  6% [ANNOUNCE] Git v2.27.0-rc1 Son Luong Ngoc
2020-06-13 22:03  6% [RFC PATCH v1 1/6] stash: mark `i_tree' in reset_tree() const Son Luong Ngoc
2020-06-17  7:40  5% [PATCH 0/2] Sparse checkout status Son Luong Ngoc
2020-06-17 16:48  4% ` Elijah Newren
2020-06-17 17:58  5%   ` Son Luong Ngoc
2020-06-17 22:36  4%     ` Sparse checkout and recorded dependencies between directories (Was: Re: [PATCH 0/2] Sparse checkout status) Elijah Newren
2020-07-01  9:58  5% [PATCH 3/3] commit-graph: respect 'core.useBloomFilters' Son Luong Ngoc
2020-07-07 14:21     [PATCH 00/21] Maintenance builtin, allowing 'gc --auto' customization Derrick Stolee via GitGitGadget
2020-07-07 14:21  3% ` [PATCH 15/21] maintenance: auto-size pack-files batch Derrick Stolee via GitGitGadget
2020-07-08 23:57     ` [PATCH 00/21] Maintenance builtin, allowing 'gc --auto' customization Emily Shaffer
2020-07-09 11:21       ` Derrick Stolee
2020-07-09 12:43         ` Derrick Stolee
2020-07-09 23:16           ` Jeff King
2020-07-09 23:45             ` Derrick Stolee
2020-07-10 18:46               ` Emily Shaffer
2020-07-10 19:30 11%             ` Son Luong Ngoc
2020-07-23 17:56     ` [PATCH v2 00/18] " Derrick Stolee via GitGitGadget
2020-07-23 17:56  3%   ` [PATCH v2 11/18] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-07-30 22:24       ` [PATCH v3 00/20] Maintenance builtin, allowing 'gc --auto' customization Derrick Stolee via GitGitGadget
2020-07-30 22:24  4%     ` [PATCH v3 13/20] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-07-13  6:18  6% [PATCH 00/21] Maintenance builtin, allowing 'gc --auto' customization Son Luong Ngoc
2020-07-20 11:44 13% Pushing tag from a partial clone Son Luong Ngoc
2020-07-20 12:18  0% ` Derrick Stolee
2020-07-20 13:47 13%   ` Son Luong Ngoc
2020-07-20 17:54  0%     ` Jonathan Tan
2020-07-31  7:49  8% [PATCH] commit-graph: add verify changed paths option Son Luong Ngoc via GitGitGadget
2020-07-31 16:21  0% ` Christian Couder
2020-07-31 17:14  0% ` Junio C Hamano
2020-07-31 18:06  0%   ` Taylor Blau
2020-07-31 18:02  0% ` Jeff King
2020-07-31 18:09  0%   ` Taylor Blau
2020-07-31 19:14         ` Jeff King
2020-07-31 19:31  6%       ` Son Luong Ngoc
2020-08-06 16:30  1% [PATCH 0/9] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-06 16:30  3% ` [PATCH 8/9] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-06 17:02 11%   ` Son Luong Ngoc
2020-08-06 18:13  0%     ` Derrick Stolee
2020-08-18 14:25     ` [PATCH v2 0/9] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-18 14:25  3%   ` [PATCH v2 8/9] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-25 18:36       ` [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks Derrick Stolee via GitGitGadget
2020-08-25 18:36  3%     ` [PATCH v3 7/8] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-26 15:15  6%     ` [PATCH v3 0/8] Maintenance II: prefetch, loose-objects, incremental-repack tasks Son Luong Ngoc
2020-09-25 12:33         ` [PATCH v4 " Derrick Stolee via GitGitGadget
2020-09-25 12:33  3%       ` [PATCH v4 7/8] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-08-25  2:01     [PATCH] builtin/repack.c: invalidate MIDX only when necessary Taylor Blau
2020-08-25  7:55  6% ` Son Luong Ngoc
2020-08-25 18:33     [PATCH v3 00/11] Maintenance I: Command, gc and commit-graph tasks Derrick Stolee via GitGitGadget
2020-09-04 13:09  4% ` [PATCH v4 " Derrick Stolee via GitGitGadget
2020-09-17 18:11  4%   ` [PATCH v5 " Derrick Stolee via GitGitGadget
2020-08-25 18:39     [PATCH v2 0/7] [RFC] Maintenance III: background maintenance Derrick Stolee via GitGitGadget
2020-08-28 15:45  2% ` [PATCH v3 0/6] " Derrick Stolee via GitGitGadget
2020-09-14  5:27     Caching Git Pull Benson Muite
     [not found]     ` <70DB3786-CB8E-4D82-9774-439AB2A79A8D@gmail.com>
2020-09-14  8:39  6%   ` Son Luong Ngoc
2020-11-04 20:06     [PATCH v2 0/4] Maintenance IV: Platform-specific background maintenance Derrick Stolee via GitGitGadget
2020-11-13 14:00  3% ` [PATCH v3 " Derrick Stolee via GitGitGadget
2020-11-17 21:13  2%   ` [PATCH v4 " Derrick Stolee via GitGitGadget
2020-11-24  4:16     [PATCH v5 " Derrick Stolee via GitGitGadget
2020-12-09 19:28  4% ` [PATCH v6 " Derrick Stolee via GitGitGadget
2021-01-05 13:08  3%   ` [PATCH v7 " Derrick Stolee via GitGitGadget
2021-02-08 14:52     [PATCH 0/2] Maintenance: add pack-refs task Derrick Stolee via GitGitGadget
2021-02-09 13:42  3% ` [PATCH v2 " Derrick Stolee via GitGitGadget
2021-03-16  9:45  5% Tests failed with GIT_TEST_FAIL_PREREQS and/or GIT_TEST_PROTOCOL_VERSION Son Luong Ngoc
2021-03-16 13:52     ` Taylor Blau
2021-03-17 13:38  5%   ` Son Luong Ngoc
2021-03-17 17:54         ` Jeff King
2021-03-17 22:47  4%       ` [PATCH] t: annotate !PTHREADS tests with !FAIL_PREREQS Jeff King
2021-03-18 21:17  0%         ` Junio C Hamano
2021-04-14  6:13     Pain points in Git's patch flow Jonathan Nieder
2021-04-15 15:45  3% ` Son Luong Ngoc
2021-04-19  2:57  4%   ` Eric Wong
2021-04-21 10:19  0%     ` Ævar Arnfjörð Bjarmason
2021-04-28  7:21  0%       ` Eric Wong
2021-04-28  7:05  0%     ` Eric Wong
2021-07-10 19:01     [PATCH] packfile: enhance the mtime of packfile by idx file Sun Chao via GitGitGadget
2021-07-14 17:04     ` [PATCH v2] packfile: freshen the mtime of packfile by configuration Taylor Blau
2021-07-14 18:19       ` Ævar Arnfjörð Bjarmason
2021-07-14 19:11         ` Martin Fick
2021-07-14 19:41           ` Ævar Arnfjörð Bjarmason
2021-07-15  8:23  5%         ` Son Luong Ngoc
2021-07-11  1:26     [PATCH] pull: abort if --ff-only is given and fast-forwarding is impossible Alex Henrie
2021-07-14  8:37  5% ` Son Luong Ngoc
2021-07-14 15:22  5%   ` Elijah Newren
2021-07-14 17:31  0%     ` Felipe Contreras
2021-08-11 13:02  5% t5607 fail with GIT_TEST_FAIL_PREREQS enabled Son Luong Ngoc
2021-10-21 11:55     Notes from the Git Contributors' Summit 2021, virtual, Oct 19/20 Johannes Schindelin
2021-10-21 11:55     ` [Summit topic] Crazy (and not so crazy) ideas Johannes Schindelin
2021-10-21 12:30  6%   ` Son Luong Ngoc
2021-10-21 11:56     ` [Summit topic] Increasing diversity & inclusion (transition to `main`, etc) Johannes Schindelin
2021-10-21 12:55  6%   ` Son Luong Ngoc
2021-12-29  9:43     Filtering commits after filtering the tree Ulrich Windl
2021-12-30 13:19  6% ` Son Luong Ngoc
2021-12-31 23:48  5%   ` Elijah Newren
2022-01-03  9:26  0%     ` Antw: [EXT] " Ulrich Windl
2022-06-06 16:04     How to watch files in a Git repository R. Diez
2022-06-09  8:33  6% ` Son Luong Ngoc
2022-06-07  7:54 20% [PATCH] fsmonitor: query watchman with right valid json Son Luong Ngoc
2022-06-07  8:40  0% ` Ævar Arnfjörð Bjarmason
2022-06-07 10:56  6%   ` Son Luong Ngoc
2022-06-07 11:14 21%     ` [PATCH v2] " Son Luong Ngoc
2022-06-07 14:39  0%       ` Ævar Arnfjörð Bjarmason
2022-06-07 17:00  6%       ` Junio C Hamano
2022-06-08  1:12  1% What's cooking in git.git (Jun 2022, #02; Tue, 7) Junio C Hamano
2022-06-11  3:39  1% What's cooking in git.git (Jun 2022, #03; Fri, 10) Junio C Hamano
2022-06-14  1:46  1% What's cooking in git.git (Jun 2022, #04; Mon, 13) Junio C Hamano
2023-02-10 21:31     Subject: [RFC PATCH] upload_pack.c: make deepen-not more tree-ish Andrew Wansink
2023-02-11 22:23     ` Andrew Wansink
     [not found]       ` <CAL3xRKdCkAAR0r3jyKFy+TtUi65LQcHaste=2WCqYHtwi8cUhw@mail.gmail.com>
2023-02-12 14:12 12%     ` Son Luong Ngoc
2023-05-29 13:38     Automatically re-running commands during an interactive rebase or post commit Paul Jolly
2023-05-30  7:22  5% ` Son Luong Ngoc
2023-06-28 16:28     SHA256 support not experimental, or? Adam Majer
2023-06-29  5:59     ` Junio C Hamano
2023-06-29 21:17       ` brian m. carlson
2023-06-29 22:22         ` Junio C Hamano
2023-06-30  1:21           ` brian m. carlson
2023-06-30  9:31             ` Patrick Steinhardt
2023-06-30 11:25               ` Adam Majer
2023-06-30 12:20  5%             ` Son Luong Ngoc
2023-06-30 16:45  5%               ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).